public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager 2/3] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance
Date: Mon,  9 Mar 2026 22:57:09 +0100	[thread overview]
Message-ID: <20260309220128.973793-3-t.lamprecht@proxmox.com> (raw)
In-Reply-To: <20260309220128.973793-1-t.lamprecht@proxmox.com>

Certain cluster maintenance tasks, such as reconfiguring the network
or the cluster communication stack, can cause temporary quorum loss or
network partitions. Normally, HA would trigger self-fencing in such
situations, disrupting services unnecessarily.

Add a disarm/arm mechanism that releases all watchdogs cluster-wide,
allowing such work to be done safely.

A 'resource-mode' parameter controls how HA managed resources are
handled while disarmed (the current state of resources is not
affected):
- 'freeze': new commands and state changes are not applied, just like
  what's done automatically when restarting an LRM.
- 'ignore': resources are removed from HA tracking and can be managed
  as if they were not HA managed.

After disarm is requested, the CRM freezes or removes services and
waits for each LRM to finish active workers and release its agent
lock and watchdog. Once all LRMs are idle, the CRM releases its own
watchdog too. The CRM keeps the manager lock throughout so it can
process arm-ha to reverse the process.

Disarm is deferred while any services are being fenced or recovered.
The disarm state is preserved across CRM master changes. Maintenance
mode takes priority over disarm in the LRM.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---

More test ideas welcome, but testing on real clusters would be actually
even better ;-)

 src/PVE/HA/CRM.pm                             |  33 ++++-
 src/PVE/HA/Config.pm                          |   5 +
 src/PVE/HA/LRM.pm                             |  31 ++++-
 src/PVE/HA/Manager.pm                         | 124 +++++++++++++++++-
 src/PVE/HA/Sim/Hardware.pm                    |   4 +
 src/test/test-disarm-crm-stop1/README         |  13 ++
 src/test/test-disarm-crm-stop1/cmdlist        |   6 +
 .../test-disarm-crm-stop1/hardware_status     |   5 +
 src/test/test-disarm-crm-stop1/log.expect     |  66 ++++++++++
 src/test/test-disarm-crm-stop1/manager_status |   1 +
 src/test/test-disarm-crm-stop1/service_config |   5 +
 src/test/test-disarm-fence1/cmdlist           |   9 ++
 src/test/test-disarm-fence1/hardware_status   |   5 +
 src/test/test-disarm-fence1/log.expect        |  78 +++++++++++
 src/test/test-disarm-fence1/manager_status    |   1 +
 src/test/test-disarm-fence1/service_config    |   5 +
 src/test/test-disarm-frozen1/README           |  10 ++
 src/test/test-disarm-frozen1/cmdlist          |   5 +
 src/test/test-disarm-frozen1/hardware_status  |   5 +
 src/test/test-disarm-frozen1/log.expect       |  59 +++++++++
 src/test/test-disarm-frozen1/manager_status   |   1 +
 src/test/test-disarm-frozen1/service_config   |   5 +
 src/test/test-disarm-ignored1/README          |  10 ++
 src/test/test-disarm-ignored1/cmdlist         |   5 +
 src/test/test-disarm-ignored1/hardware_status |   5 +
 src/test/test-disarm-ignored1/log.expect      |  60 +++++++++
 src/test/test-disarm-ignored1/manager_status  |   1 +
 src/test/test-disarm-ignored1/service_config  |   5 +
 src/test/test-disarm-maintenance1/cmdlist     |   7 +
 .../test-disarm-maintenance1/hardware_status  |   5 +
 src/test/test-disarm-maintenance1/log.expect  |  79 +++++++++++
 .../test-disarm-maintenance1/manager_status   |   1 +
 .../test-disarm-maintenance1/service_config   |   5 +
 src/test/test-disarm-relocate1/README         |   3 +
 src/test/test-disarm-relocate1/cmdlist        |   7 +
 .../test-disarm-relocate1/hardware_status     |   5 +
 src/test/test-disarm-relocate1/log.expect     |  51 +++++++
 src/test/test-disarm-relocate1/manager_status |   1 +
 src/test/test-disarm-relocate1/service_config |   4 +
 39 files changed, 723 insertions(+), 7 deletions(-)
 create mode 100644 src/test/test-disarm-crm-stop1/README
 create mode 100644 src/test/test-disarm-crm-stop1/cmdlist
 create mode 100644 src/test/test-disarm-crm-stop1/hardware_status
 create mode 100644 src/test/test-disarm-crm-stop1/log.expect
 create mode 100644 src/test/test-disarm-crm-stop1/manager_status
 create mode 100644 src/test/test-disarm-crm-stop1/service_config
 create mode 100644 src/test/test-disarm-fence1/cmdlist
 create mode 100644 src/test/test-disarm-fence1/hardware_status
 create mode 100644 src/test/test-disarm-fence1/log.expect
 create mode 100644 src/test/test-disarm-fence1/manager_status
 create mode 100644 src/test/test-disarm-fence1/service_config
 create mode 100644 src/test/test-disarm-frozen1/README
 create mode 100644 src/test/test-disarm-frozen1/cmdlist
 create mode 100644 src/test/test-disarm-frozen1/hardware_status
 create mode 100644 src/test/test-disarm-frozen1/log.expect
 create mode 100644 src/test/test-disarm-frozen1/manager_status
 create mode 100644 src/test/test-disarm-frozen1/service_config
 create mode 100644 src/test/test-disarm-ignored1/README
 create mode 100644 src/test/test-disarm-ignored1/cmdlist
 create mode 100644 src/test/test-disarm-ignored1/hardware_status
 create mode 100644 src/test/test-disarm-ignored1/log.expect
 create mode 100644 src/test/test-disarm-ignored1/manager_status
 create mode 100644 src/test/test-disarm-ignored1/service_config
 create mode 100644 src/test/test-disarm-maintenance1/cmdlist
 create mode 100644 src/test/test-disarm-maintenance1/hardware_status
 create mode 100644 src/test/test-disarm-maintenance1/log.expect
 create mode 100644 src/test/test-disarm-maintenance1/manager_status
 create mode 100644 src/test/test-disarm-maintenance1/service_config
 create mode 100644 src/test/test-disarm-relocate1/README
 create mode 100644 src/test/test-disarm-relocate1/cmdlist
 create mode 100644 src/test/test-disarm-relocate1/hardware_status
 create mode 100644 src/test/test-disarm-relocate1/log.expect
 create mode 100644 src/test/test-disarm-relocate1/manager_status
 create mode 100644 src/test/test-disarm-relocate1/service_config

diff --git a/src/PVE/HA/CRM.pm b/src/PVE/HA/CRM.pm
index 2739763..a76cf67 100644
--- a/src/PVE/HA/CRM.pm
+++ b/src/PVE/HA/CRM.pm
@@ -104,9 +104,17 @@ sub get_protected_ha_manager_lock {
         if ($haenv->get_ha_manager_lock()) {
             if ($self->{ha_manager_wd}) {
                 $haenv->watchdog_update($self->{ha_manager_wd});
-            } else {
-                my $wfh = $haenv->watchdog_open();
-                $self->{ha_manager_wd} = $wfh;
+            } elsif (!$self->{disarmed}) {
+                # check on-disk disarm state to avoid briefly opening a watchdog when taking
+                # over as new master while the stack is already fully disarmed
+                my $ms = eval { $haenv->read_manager_status() };
+                if ($ms && $ms->{disarm} && $ms->{disarm}->{state} eq 'disarmed') {
+                    $haenv->log('info', "taking over as disarmed master, skipping watchdog");
+                    $self->{disarmed} = 1;
+                } else {
+                    my $wfh = $haenv->watchdog_open();
+                    $self->{ha_manager_wd} = $wfh;
+                }
             }
             return 1;
         }
@@ -211,6 +219,10 @@ sub can_get_active {
                 if (scalar($ss->%*)) {
                     return 1; # need to get active to clean up stale service status entries
                 }
+
+                if ($manager_status->{disarm}) {
+                    return 1; # stay active while HA stack is disarmed
+                }
             }
             return 0; # no services, no node in maintenance mode, and no crm cmds -> can stay idle
         }
@@ -232,6 +244,9 @@ sub allowed_to_get_idle {
     my $manager_status = get_manager_status_guarded($haenv);
     return 0 if !$self->is_cluster_and_ha_healthy($manager_status);
 
+    # don't go idle while HA stack is disarmed - need to stay active to process arm-ha
+    return 0 if $manager_status->{disarm};
+
     my $conf = eval { $haenv->read_service_config() };
     if (my $err = $@) {
         $haenv->log('err', "could not read service config: $err");
@@ -379,6 +394,18 @@ sub work {
                 }
 
                 $manager->manage();
+
+                if ($manager->is_fully_disarmed()) {
+                    if (!$self->{disarmed}) {
+                        $haenv->log('info', "HA stack fully disarmed, releasing CRM watchdog");
+                        give_up_watchdog_protection($self);
+                        $self->{disarmed} = 1;
+                    }
+                } elsif ($self->{disarmed}) {
+                    $haenv->log('info', "re-arming HA stack");
+                    $self->{disarmed} = 0;
+                    # watchdog will be re-opened by get_protected_ha_manager_lock next iteration
+                }
             }
         };
         if (my $err = $@) {
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 19eec2a..ad7f8a4 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -362,6 +362,11 @@ my $service_check_ha_state = sub {
         if (!defined($has_state)) {
             # ignored service behave as if they were not managed by HA
             return 0 if defined($d->{state}) && $d->{state} eq 'ignored';
+            # cluster-wide disarm with ignore mode - resources can be managed directly
+            my $ms = cfs_read_file($manager_status_filename);
+            if (my $disarm = $ms->{disarm}) {
+                return 0 if $disarm->{mode} eq 'ignore';
+            }
             return 1;
         }
 
diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index e4b0ec8..762c941 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -37,7 +37,7 @@ sub new {
         restart_tries => {},
         shutdown_request => 0,
         shutdown_errors => 0,
-        # mode can be: active, reboot, shutdown, restart, maintenance
+        # mode can be: active, reboot, shutdown, restart, maintenance, disarm
         mode => 'active',
         cluster_state_update => 0,
         active_idle_rounds => 0,
@@ -212,7 +212,9 @@ sub update_service_status {
             my $request = $ms->{node_request}->{$nodename} // {};
             if ($request->{maintenance}) {
                 $self->{mode} = 'maintenance';
-            } elsif ($self->{mode} eq 'maintenance') {
+            } elsif ($ms->{disarm}) {
+                $self->{mode} = 'disarm';
+            } elsif ($self->{mode} eq 'maintenance' || $self->{mode} eq 'disarm') {
                 $self->{mode} = 'active';
             }
         }
@@ -359,7 +361,9 @@ sub work {
 
         my $service_count = $self->active_service_count();
 
-        if (!$fence_request && $service_count && $haenv->quorate()) {
+        if ($self->{mode} eq 'disarm') {
+            # stay idle while disarmed, don't acquire lock
+        } elsif (!$fence_request && $service_count && $haenv->quorate()) {
             if ($self->get_protected_ha_agent_lock()) {
                 $self->set_local_status({ state => 'active' });
             }
@@ -382,6 +386,13 @@ sub work {
             $self->set_local_status({ state => 'lost_agent_lock' });
         } elsif ($self->is_maintenance_requested()) {
             $self->set_local_status({ state => 'maintenance' });
+        } elsif ($self->{mode} eq 'disarm' && !$self->run_workers()) {
+            $haenv->log('info', "HA disarm requested, releasing agent lock and watchdog");
+            # safety: disarming requested, no fence request (handled in earlier if-branch) and no
+            # running workers anymore, so safe to go idle.
+            $haenv->release_ha_agent_lock();
+            give_up_watchdog_protection($self);
+            $self->set_local_status({ state => 'wait_for_agent_lock' });
         } else {
             if (!$self->has_configured_service_on_local_node() && !$self->run_workers()) {
                 # no active service configured for this node and all (old) workers are done
@@ -409,6 +420,20 @@ sub work {
                 "node need to be fenced during maintenance mode - releasing agent_lock\n",
             );
             $self->set_local_status({ state => 'lost_agent_lock' });
+        } elsif (
+            $self->{mode} eq 'disarm'
+            && !$self->active_service_count()
+            && !$self->run_workers()
+        ) {
+            # disarm takes priority - release lock and watchdog, go idle
+            $haenv->log(
+                'info',
+                "HA disarm requested during maintenance, releasing agent lock and watchdog",
+            );
+            # safety: no active services and no running workers, so safe to go idle.
+            $haenv->release_ha_agent_lock();
+            give_up_watchdog_protection($self);
+            $self->set_local_status({ state => 'wait_for_agent_lock' });
         } elsif ($self->active_service_count() || $self->run_workers()) {
             # keep the lock and watchdog as long as not all services cleared the node
             if (!$self->get_protected_ha_agent_lock()) {
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index b1dbe6a..f579e81 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -75,6 +75,9 @@ sub new {
     # on change of active master.
     $self->{ms}->{node_request} = $old_ms->{node_request} if defined($old_ms->{node_request});
 
+    # preserve disarm state across CRM master changes
+    $self->{ms}->{disarm} = $old_ms->{disarm} if defined($old_ms->{disarm});
+
     $self->update_crs_scheduler_mode(); # initial set, we update it once every loop
 
     return $self;
@@ -472,7 +475,12 @@ sub update_crm_commands {
             my $node = $1;
 
             my $state = $ns->get_node_state($node);
-            if ($state eq 'online') {
+            if ($ms->{disarm}) {
+                $haenv->log(
+                    'warn',
+                    "ignoring maintenance command for node $node - HA stack is disarmed",
+                );
+            } elsif ($state eq 'online') {
                 $ms->{node_request}->{$node}->{maintenance} = 1;
             } elsif ($state eq 'maintenance') {
                 $haenv->log(
@@ -493,6 +501,25 @@ sub update_crm_commands {
                 );
             }
             delete $ms->{node_request}->{$node}->{maintenance}; # gets flushed out at the end of the CRM loop
+        } elsif ($cmd =~ m/^disarm-ha\s+(freeze|ignore)$/) {
+            my $mode = $1;
+
+            if ($ms->{disarm}) {
+                $haenv->log(
+                    'warn',
+                    "ignoring disarm-ha command - already in disarm state ($ms->{disarm}->{state})",
+                );
+            } else {
+                $haenv->log('info', "got crm command: disarm-ha $mode");
+                $ms->{disarm} = { mode => $mode, state => 'disarming' };
+            }
+        } elsif ($cmd =~ m/^arm-ha$/) {
+            if ($ms->{disarm}) {
+                $haenv->log('info', "got crm command: arm-ha");
+                delete $ms->{disarm};
+            } else {
+                $haenv->log('info', "ignoring arm-ha command - HA stack is not disarmed");
+            }
         } else {
             $haenv->log('err', "unable to parse crm command: $cmd");
         }
@@ -631,6 +658,87 @@ sub try_persistent_group_migration {
     }
 }
 
+sub handle_disarm {
+    my ($self, $disarm, $ss, $lrm_modes) = @_;
+
+    my $haenv = $self->{haenv};
+    my $ns = $self->{ns};
+
+    # defer disarm if any services are in a transient state that needs the state machine to resolve
+    for my $sid (sort keys %$ss) {
+        my $state = $ss->{$sid}->{state};
+        if ($state eq 'fence' || $state eq 'recovery') {
+            $haenv->log(
+                'warn', "deferring disarm - service '$sid' is in '$state' state",
+            );
+            return 0; # let manage() continue so fence/recovery can progress
+        }
+        if ($state eq 'migrate' || $state eq 'relocate') {
+            $haenv->log(
+                'info', "deferring disarm - service '$sid' is in '$state' state",
+            );
+            return 0; # let manage() continue so migration can complete
+        }
+    }
+
+    my $mode = $disarm->{mode};
+
+    if ($mode eq 'freeze') {
+        for my $sid (sort keys %$ss) {
+            my $state = $ss->{$sid}->{state};
+            next if $state eq 'freeze'; # already frozen
+            if (
+                $state eq 'started'
+                || $state eq 'stopped'
+                || $state eq 'request_stop'
+                || $state eq 'request_start'
+                || $state eq 'request_start_balance'
+                || $state eq 'error'
+            ) {
+                $haenv->log('info', "disarm: freezing service '$sid' (was '$state')");
+                $ss->{$sid}->{state} = 'freeze';
+                $ss->{$sid}->{uid} = compute_new_uuid('freeze');
+            }
+        }
+    } elsif ($mode eq 'ignore') {
+        for my $sid (sort keys %$ss) {
+            $haenv->log('info', "disarm: removing service '$sid' from tracking");
+        }
+        $self->{ss} = {};
+        $ss = $self->{ss};
+    }
+
+    # check if all online LRMs have entered disarm mode
+    my $all_disarmed = 1;
+    my $online_nodes = $ns->list_online_nodes();
+
+    for my $node (@$online_nodes) {
+        my $lrm_mode = $lrm_modes->{$node} // 'unknown';
+        if ($lrm_mode ne 'disarm') {
+            $all_disarmed = 0;
+            last;
+        }
+    }
+
+    if ($all_disarmed && $disarm->{state} ne 'disarmed') {
+        $haenv->log('info', "all LRMs disarmed, HA stack is now fully disarmed");
+        $disarm->{state} = 'disarmed';
+    }
+
+    # once disarmed, stay disarmed - a returning node's LRM will catch up within one cycle
+    $self->{all_lrms_disarmed} = $disarm->{state} eq 'disarmed';
+
+    $self->flush_master_status();
+
+    return 1;
+}
+
+sub is_fully_disarmed {
+    my ($self) = @_;
+
+    return $self->{all_lrms_disarmed};
+}
+
 sub manage {
     my ($self) = @_;
 
@@ -657,8 +765,12 @@ sub manage {
 
     # compute new service status
 
+    # skip service add/remove when disarmed - handle_disarm manages service status
+    my $is_disarmed = $ms->{disarm};
+
     # add new service
     foreach my $sid (sort keys %$sc) {
+        next if $is_disarmed;
         next if $ss->{$sid}; # already there
         my $cd = $sc->{$sid};
         next if $cd->{state} eq 'ignored';
@@ -675,6 +787,7 @@ sub manage {
 
     # remove stale or ignored services from manager state
     foreach my $sid (keys %$ss) {
+        next if $is_disarmed;
         next if $sc->{$sid} && $sc->{$sid}->{state} ne 'ignored';
 
         my $reason = defined($sc->{$sid}) ? 'ignored state requested' : 'no config';
@@ -713,6 +826,15 @@ sub manage {
 
     $self->update_crm_commands();
 
+    if (my $disarm = $ms->{disarm}) {
+        if ($self->handle_disarm($disarm, $ss, $lrm_modes)) {
+            return; # disarm active and progressing, skip normal service state machine
+        }
+        # disarm deferred (e.g. due to active fencing) - fall through to let it complete
+    }
+
+    $self->{all_lrms_disarmed} = 0;
+
     for (;;) {
         my $repeat = 0;
 
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 8cbf48d..b4f1d88 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -835,6 +835,10 @@ sub sim_hardware_cmd {
                 || $action eq 'disable-node-maintenance'
             ) {
                 $self->queue_crm_commands_nolock("$action $node");
+            } elsif ($action eq 'disarm-ha') {
+                $self->queue_crm_commands_nolock("disarm-ha $params[0]");
+            } elsif ($action eq 'arm-ha') {
+                $self->queue_crm_commands_nolock("arm-ha");
             } else {
                 die "sim_hardware_cmd: unknown action '$action'";
             }
diff --git a/src/test/test-disarm-crm-stop1/README b/src/test/test-disarm-crm-stop1/README
new file mode 100644
index 0000000..5f81497
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/README
@@ -0,0 +1,13 @@
+Test CRM master takeover while HA stack is fully disarmed.
+
+Verify that when the CRM master is stopped during full disarm, a slave
+takes over cleanly without briefly opening a watchdog, and that arming
+via the new master works correctly.
+
+1. Start 3 nodes with services
+2. Disarm HA with freeze resource mode
+3. Wait for full disarm (all LRMs disarmed, CRM watchdog released)
+4. Stop CRM on master node (node1)
+5. Slave on node2 takes over as new master, preserving disarm state
+6. Arm HA via new master
+7. Services unfreeze and resume normal operation
diff --git a/src/test/test-disarm-crm-stop1/cmdlist b/src/test/test-disarm-crm-stop1/cmdlist
new file mode 100644
index 0000000..01e9cd9
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/cmdlist
@@ -0,0 +1,6 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node1 disarm-ha freeze" ],
+    [ "crm node1 stop" ],
+    [ "crm node2 arm-ha" ]
+]
\ No newline at end of file
diff --git a/src/test/test-disarm-crm-stop1/hardware_status b/src/test/test-disarm-crm-stop1/hardware_status
new file mode 100644
index 0000000..4990fd0
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-crm-stop1/log.expect b/src/test/test-disarm-crm-stop1/log.expect
new file mode 100644
index 0000000..880008f
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/log.expect
@@ -0,0 +1,66 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     40    node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info     40    node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute crm node1 disarm-ha freeze
+info    120    node1/crm: got crm command: disarm-ha freeze
+info    120    node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info    120    node1/crm: disarm: freezing service 'vm:102' (was 'stopped')
+info    120    node1/crm: disarm: freezing service 'vm:103' (was 'stopped')
+info    121    node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info    121    node1/lrm: status change active => wait_for_agent_lock
+info    123    node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info    123    node2/lrm: status change active => wait_for_agent_lock
+info    125    node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info    125    node3/lrm: status change active => wait_for_agent_lock
+info    140    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info    140    node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info    220      cmdlist: execute crm node1 stop
+info    220    node1/crm: server received shutdown request
+info    220    node1/crm: voluntary release CRM lock
+info    221    node1/crm: exit (loop end)
+info    222    node2/crm: got lock 'ha_manager_lock'
+info    222    node2/crm: taking over as disarmed master, skipping watchdog
+info    222    node2/crm: status change slave => master
+info    320      cmdlist: execute crm node2 arm-ha
+info    321    node2/crm: got crm command: arm-ha
+info    321    node2/crm: re-arming HA stack
+info    341    node2/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info    341    node2/crm: service 'vm:102': state changed from 'freeze' to 'request_stop'
+info    341    node2/crm: service 'vm:103': state changed from 'freeze' to 'request_stop'
+info    342    node2/lrm: got lock 'ha_agent_node2_lock'
+info    342    node2/lrm: status change wait_for_agent_lock => active
+info    344    node3/lrm: got lock 'ha_agent_node3_lock'
+info    344    node3/lrm: status change wait_for_agent_lock => active
+info    360    node1/lrm: got lock 'ha_agent_node1_lock'
+info    360    node1/lrm: status change wait_for_agent_lock => active
+info    361    node2/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info    361    node2/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info    920     hardware: exit simulation - done
diff --git a/src/test/test-disarm-crm-stop1/manager_status b/src/test/test-disarm-crm-stop1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-crm-stop1/service_config b/src/test/test-disarm-crm-stop1/service_config
new file mode 100644
index 0000000..c2ddbce
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "stopped" },
+    "vm:103": { "node": "node3", "state": "disabled" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-fence1/cmdlist b/src/test/test-disarm-fence1/cmdlist
new file mode 100644
index 0000000..7473615
--- /dev/null
+++ b/src/test/test-disarm-fence1/cmdlist
@@ -0,0 +1,9 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node2 off" ],
+    [ "crm node1 disarm-ha freeze" ],
+    [],
+    [],
+    [],
+    [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-fence1/hardware_status b/src/test/test-disarm-fence1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-fence1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-fence1/log.expect b/src/test/test-disarm-fence1/log.expect
new file mode 100644
index 0000000..9a56c5d
--- /dev/null
+++ b/src/test/test-disarm-fence1/log.expect
@@ -0,0 +1,78 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:102
+info     23    node2/lrm: service status vm:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     40    node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute network node2 off
+info    120    node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info    122    node2/crm: status change slave => wait_for_quorum
+info    123    node2/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node2'
+info    164     watchdog: execute power node2 off
+info    163    node2/crm: killed by poweroff
+info    164    node2/lrm: killed by poweroff
+info    164     hardware: server 'node2' stopped by poweroff (watchdog)
+info    220      cmdlist: execute crm node1 disarm-ha freeze
+info    220    node1/crm: got crm command: disarm-ha freeze
+warn    220    node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info    221    node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info    221    node1/lrm: status change active => wait_for_agent_lock
+info    223    node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info    223    node3/lrm: status change active => wait_for_agent_lock
+warn    240    node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info    240    node1/crm: got lock 'ha_agent_node2_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info    240    node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info    240    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node3'
+info    240    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node3)
+info    260    node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info    260    node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info    260    node1/crm: disarm: freezing service 'vm:103' (was 'stopped')
+info    260    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info    260    node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info    620      cmdlist: execute crm node1 arm-ha
+info    620    node1/crm: got crm command: arm-ha
+info    620    node1/crm: re-arming HA stack
+info    640    node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info    640    node1/crm: service 'vm:102': state changed from 'freeze' to 'started'
+info    640    node1/crm: service 'vm:103': state changed from 'freeze' to 'request_stop'
+info    641    node1/lrm: got lock 'ha_agent_node1_lock'
+info    641    node1/lrm: status change wait_for_agent_lock => active
+info    643    node3/lrm: got lock 'ha_agent_node3_lock'
+info    643    node3/lrm: status change wait_for_agent_lock => active
+info    643    node3/lrm: starting service vm:102
+info    643    node3/lrm: service status vm:102 started
+info    660    node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info   1220     hardware: exit simulation - done
diff --git a/src/test/test-disarm-fence1/manager_status b/src/test/test-disarm-fence1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-fence1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-fence1/service_config b/src/test/test-disarm-fence1/service_config
new file mode 100644
index 0000000..0487834
--- /dev/null
+++ b/src/test/test-disarm-fence1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "started" },
+    "vm:103": { "node": "node3", "state": "stopped" }
+}
diff --git a/src/test/test-disarm-frozen1/README b/src/test/test-disarm-frozen1/README
new file mode 100644
index 0000000..e68ea2c
--- /dev/null
+++ b/src/test/test-disarm-frozen1/README
@@ -0,0 +1,10 @@
+Test disarm-ha with freeze resource mode.
+
+Verify the full disarm cycle:
+1. Start 3 nodes with services
+2. Disarm HA with freeze resource mode
+3. All services should transition to freeze state
+4. LRMs should release locks and watchdogs (disarm mode)
+5. CRM should release watchdog once all LRMs disarmed
+6. Arm HA again
+7. Services should unfreeze and resume normal operation
diff --git a/src/test/test-disarm-frozen1/cmdlist b/src/test/test-disarm-frozen1/cmdlist
new file mode 100644
index 0000000..e6fc192
--- /dev/null
+++ b/src/test/test-disarm-frozen1/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node1 disarm-ha freeze" ],
+    [ "crm node1 arm-ha" ]
+]
\ No newline at end of file
diff --git a/src/test/test-disarm-frozen1/hardware_status b/src/test/test-disarm-frozen1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-frozen1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-frozen1/log.expect b/src/test/test-disarm-frozen1/log.expect
new file mode 100644
index 0000000..206f14e
--- /dev/null
+++ b/src/test/test-disarm-frozen1/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     40    node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info     40    node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute crm node1 disarm-ha freeze
+info    120    node1/crm: got crm command: disarm-ha freeze
+info    120    node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info    120    node1/crm: disarm: freezing service 'vm:102' (was 'stopped')
+info    120    node1/crm: disarm: freezing service 'vm:103' (was 'stopped')
+info    121    node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info    121    node1/lrm: status change active => wait_for_agent_lock
+info    123    node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info    123    node2/lrm: status change active => wait_for_agent_lock
+info    125    node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info    125    node3/lrm: status change active => wait_for_agent_lock
+info    140    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info    140    node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info    220      cmdlist: execute crm node1 arm-ha
+info    220    node1/crm: got crm command: arm-ha
+info    220    node1/crm: re-arming HA stack
+info    240    node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info    240    node1/crm: service 'vm:102': state changed from 'freeze' to 'request_stop'
+info    240    node1/crm: service 'vm:103': state changed from 'freeze' to 'request_stop'
+info    241    node1/lrm: got lock 'ha_agent_node1_lock'
+info    241    node1/lrm: status change wait_for_agent_lock => active
+info    243    node2/lrm: got lock 'ha_agent_node2_lock'
+info    243    node2/lrm: status change wait_for_agent_lock => active
+info    245    node3/lrm: got lock 'ha_agent_node3_lock'
+info    245    node3/lrm: status change wait_for_agent_lock => active
+info    260    node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info    260    node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-disarm-frozen1/manager_status b/src/test/test-disarm-frozen1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-frozen1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-frozen1/service_config b/src/test/test-disarm-frozen1/service_config
new file mode 100644
index 0000000..c2ddbce
--- /dev/null
+++ b/src/test/test-disarm-frozen1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "stopped" },
+    "vm:103": { "node": "node3", "state": "disabled" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored1/README b/src/test/test-disarm-ignored1/README
new file mode 100644
index 0000000..bb55e63
--- /dev/null
+++ b/src/test/test-disarm-ignored1/README
@@ -0,0 +1,10 @@
+Test disarm-ha with ignore resource mode.
+
+Verify the full disarm cycle with ignore resource mode:
+1. Start 3 nodes with services
+2. Disarm HA with ignore resource mode
+3. All services should be removed from tracking
+4. LRMs should release locks and watchdogs (disarm mode)
+5. CRM should release watchdog once all LRMs disarmed
+6. Arm HA again
+7. Services should be re-discovered from config and started
diff --git a/src/test/test-disarm-ignored1/cmdlist b/src/test/test-disarm-ignored1/cmdlist
new file mode 100644
index 0000000..b8a0c04
--- /dev/null
+++ b/src/test/test-disarm-ignored1/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node1 disarm-ha ignore" ],
+    [ "crm node1 arm-ha" ]
+]
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored1/hardware_status b/src/test/test-disarm-ignored1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-ignored1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-ignored1/log.expect b/src/test/test-disarm-ignored1/log.expect
new file mode 100644
index 0000000..dc7e29a
--- /dev/null
+++ b/src/test/test-disarm-ignored1/log.expect
@@ -0,0 +1,60 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     40    node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info     40    node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute crm node1 disarm-ha ignore
+info    120    node1/crm: got crm command: disarm-ha ignore
+info    120    node1/crm: disarm: removing service 'vm:101' from tracking
+info    120    node1/crm: disarm: removing service 'vm:102' from tracking
+info    120    node1/crm: disarm: removing service 'vm:103' from tracking
+info    121    node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info    121    node1/lrm: status change active => wait_for_agent_lock
+info    123    node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info    123    node2/lrm: status change active => wait_for_agent_lock
+info    125    node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info    125    node3/lrm: status change active => wait_for_agent_lock
+info    140    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info    140    node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info    220      cmdlist: execute crm node1 arm-ha
+info    220    node1/crm: got crm command: arm-ha
+info    220    node1/crm: re-arming HA stack
+info    240    node1/crm: adding new service 'vm:101' on node 'node1'
+info    240    node1/crm: adding new service 'vm:102' on node 'node2'
+info    240    node1/crm: adding new service 'vm:103' on node 'node3'
+info    240    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info    241    node1/lrm: got lock 'ha_agent_node1_lock'
+info    241    node1/lrm: status change wait_for_agent_lock => active
+info    243    node2/lrm: got lock 'ha_agent_node2_lock'
+info    243    node2/lrm: status change wait_for_agent_lock => active
+info    245    node3/lrm: got lock 'ha_agent_node3_lock'
+info    245    node3/lrm: status change wait_for_agent_lock => active
+info    260    node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info    260    node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-disarm-ignored1/manager_status b/src/test/test-disarm-ignored1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-ignored1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored1/service_config b/src/test/test-disarm-ignored1/service_config
new file mode 100644
index 0000000..c2ddbce
--- /dev/null
+++ b/src/test/test-disarm-ignored1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "stopped" },
+    "vm:103": { "node": "node3", "state": "disabled" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-maintenance1/cmdlist b/src/test/test-disarm-maintenance1/cmdlist
new file mode 100644
index 0000000..6f8a8ea
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/cmdlist
@@ -0,0 +1,7 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node1 disarm-ha freeze" ],
+    [ "crm node1 arm-ha" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-disarm-maintenance1/hardware_status b/src/test/test-disarm-maintenance1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-maintenance1/log.expect b/src/test/test-disarm-maintenance1/log.expect
new file mode 100644
index 0000000..b5e0e5b
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/log.expect
@@ -0,0 +1,79 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:102
+info     23    node2/lrm: service status vm:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info    140    node1/crm: migrate service 'vm:103' to node 'node1' (running)
+info    140    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    145    node3/lrm: service vm:103 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:103 - end migrate to node 'node1'
+info    160    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node1)
+info    161    node1/lrm: starting service vm:103
+info    161    node1/lrm: service status vm:103 started
+info    220      cmdlist: execute crm node1 disarm-ha freeze
+info    220    node1/crm: got crm command: disarm-ha freeze
+info    220    node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info    220    node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info    220    node1/crm: disarm: freezing service 'vm:103' (was 'started')
+info    221    node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info    221    node1/lrm: status change active => wait_for_agent_lock
+info    223    node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info    223    node2/lrm: status change active => wait_for_agent_lock
+info    240    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info    240    node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info    320      cmdlist: execute crm node1 arm-ha
+info    320    node1/crm: got crm command: arm-ha
+info    320    node1/crm: re-arming HA stack
+info    340    node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info    340    node1/crm: service 'vm:102': state changed from 'freeze' to 'started'
+info    340    node1/crm: service 'vm:103': state changed from 'freeze' to 'started'
+info    341    node1/lrm: got lock 'ha_agent_node1_lock'
+info    341    node1/lrm: status change wait_for_agent_lock => active
+info    343    node2/lrm: got lock 'ha_agent_node2_lock'
+info    343    node2/lrm: status change wait_for_agent_lock => active
+info    420      cmdlist: execute crm node3 disable-node-maintenance
+info    425    node3/lrm: got lock 'ha_agent_node3_lock'
+info    425    node3/lrm: status change maintenance => active
+info    440    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    440    node1/crm: moving service 'vm:103' back to 'node3', node came back from maintenance.
+info    440    node1/crm: migrate service 'vm:103' to node 'node3' (running)
+info    440    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    441    node1/lrm: service vm:103 - start migrate to node 'node3'
+info    441    node1/lrm: service vm:103 - end migrate to node 'node3'
+info    460    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    465    node3/lrm: starting service vm:103
+info    465    node3/lrm: service status vm:103 started
+info   1020     hardware: exit simulation - done
diff --git a/src/test/test-disarm-maintenance1/manager_status b/src/test/test-disarm-maintenance1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-maintenance1/service_config b/src/test/test-disarm-maintenance1/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-relocate1/README b/src/test/test-disarm-relocate1/README
new file mode 100644
index 0000000..a5b6324
--- /dev/null
+++ b/src/test/test-disarm-relocate1/README
@@ -0,0 +1,3 @@
+Test disarm-ha freeze when a relocate command arrives in the same CRM cycle.
+The disarm takes priority: the relocate command is pre-empted and the service
+is frozen directly. After arm-ha, both services resume normally.
diff --git a/src/test/test-disarm-relocate1/cmdlist b/src/test/test-disarm-relocate1/cmdlist
new file mode 100644
index 0000000..99f2916
--- /dev/null
+++ b/src/test/test-disarm-relocate1/cmdlist
@@ -0,0 +1,7 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "service vm:101 relocate node2", "crm node1 disarm-ha freeze" ],
+    [],
+    [],
+    [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-relocate1/hardware_status b/src/test/test-disarm-relocate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-relocate1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-relocate1/log.expect b/src/test/test-disarm-relocate1/log.expect
new file mode 100644
index 0000000..b051cac
--- /dev/null
+++ b/src/test/test-disarm-relocate1/log.expect
@@ -0,0 +1,51 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:102
+info     23    node2/lrm: service status vm:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute service vm:101 relocate node2
+info    120      cmdlist: execute crm node1 disarm-ha freeze
+info    120    node1/crm: got crm command: relocate vm:101 node2
+info    120    node1/crm: got crm command: disarm-ha freeze
+info    120    node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info    120    node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info    121    node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info    121    node1/lrm: status change active => wait_for_agent_lock
+info    123    node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info    123    node2/lrm: status change active => wait_for_agent_lock
+info    140    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info    140    node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info    420      cmdlist: execute crm node1 arm-ha
+info    420    node1/crm: got crm command: arm-ha
+info    420    node1/crm: re-arming HA stack
+info    440    node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info    440    node1/crm: service 'vm:102': state changed from 'freeze' to 'started'
+info    441    node1/lrm: got lock 'ha_agent_node1_lock'
+info    441    node1/lrm: status change wait_for_agent_lock => active
+info    443    node2/lrm: got lock 'ha_agent_node2_lock'
+info    443    node2/lrm: status change wait_for_agent_lock => active
+info   1020     hardware: exit simulation - done
diff --git a/src/test/test-disarm-relocate1/manager_status b/src/test/test-disarm-relocate1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-relocate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-relocate1/service_config b/src/test/test-disarm-relocate1/service_config
new file mode 100644
index 0000000..0336d09
--- /dev/null
+++ b/src/test/test-disarm-relocate1/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "started" }
+}
-- 
2.47.3





  parent reply	other threads:[~2026-03-09 22:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 21:57 [PATCH ha-manager 0/3] fix #2751: implement disarm/arm HA for safer " Thomas Lamprecht
2026-03-09 21:57 ` [PATCH ha-manager 1/3] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
2026-03-09 21:57 ` Thomas Lamprecht [this message]
2026-03-09 21:57 ` [PATCH ha-manager 3/3] api: status: add disarm-ha and arm-ha endpoints and CLI wiring Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260309220128.973793-3-t.lamprecht@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal