* [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance
2026-03-21 23:42 [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 1/4] sim: hardware: add manual-migrate command for ignored services Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 2/4] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
@ 2026-03-21 23:42 ` Thomas Lamprecht
2026-03-23 13:04 ` Dominik Rusovac
` (2 more replies)
2026-03-21 23:42 ` [PATCH ha-manager v2 4/4] api: status: add disarm-ha and arm-ha endpoints and CLI wiring Thomas Lamprecht
` (2 subsequent siblings)
5 siblings, 3 replies; 13+ messages in thread
From: Thomas Lamprecht @ 2026-03-21 23:42 UTC (permalink / raw)
To: pve-devel
Certain cluster maintenance tasks, such as reconfiguring the network
or the cluster communication stack, can cause temporary quorum loss or
network partitions. Normally, HA would trigger self-fencing in such
situations, disrupting services unnecessarily.
Add a disarm/arm mechanism that releases all watchdogs cluster-wide,
allowing such work to be done safely.
A 'resource-mode' parameter controls how HA managed resources are
handled while disarmed (the current state of resources is not
affected):
- 'freeze': new commands and state changes are not applied, just like
what's done automatically when restarting an LRM.
- 'ignore': resources are removed from HA tracking and can be managed
as if they were not HA managed.
After disarm is requested, the CRM freezes or removes services and
waits for each LRM to finish active workers and release its agent
lock and watchdog. Once all LRMs are idle, the CRM releases its own
watchdog too. The CRM keeps the manager lock throughout so it can
process arm-ha to reverse the process.
Disarm is deferred while any services are being fenced or recovered.
The disarm state is preserved across CRM master changes. Maintenance
mode takes priority over disarm in the LRM.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
changes v1 -> v2:
- ignore mode: preserve relevant parts of service status instead of
fully clearing it, so runtime state like maintenance_node survives;
recheck node info from service config on arm-ha for manual
migrations during disarm. Similar for freeze.
- wrap add/remove service loops in single disarm guard [Dominik]
- validate resource mode in simulator command handler [Dominik]
- use 'notice' log level for duplicate disarm-when-not-armed or
arm-when-not-disarmed, previously one was warn and one info.
- add tests: ignore with manual-migrate and node recheck,
maintenance+ignore, maintenance+ignore+manual-migrate, double
disarm/arm idempotency, error state service during disarm
src/PVE/HA/CRM.pm | 33 ++-
src/PVE/HA/Config.pm | 5 +
src/PVE/HA/LRM.pm | 31 ++-
src/PVE/HA/Manager.pm | 197 ++++++++++++++++--
src/PVE/HA/Sim/Hardware.pm | 24 ++-
src/test/test-disarm-crm-stop1/README | 13 ++
src/test/test-disarm-crm-stop1/cmdlist | 6 +
.../test-disarm-crm-stop1/hardware_status | 5 +
src/test/test-disarm-crm-stop1/log.expect | 66 ++++++
src/test/test-disarm-crm-stop1/manager_status | 1 +
src/test/test-disarm-crm-stop1/service_config | 5 +
src/test/test-disarm-double1/cmdlist | 7 +
src/test/test-disarm-double1/hardware_status | 5 +
src/test/test-disarm-double1/log.expect | 53 +++++
src/test/test-disarm-double1/manager_status | 1 +
src/test/test-disarm-double1/service_config | 4 +
src/test/test-disarm-failing-service1/cmdlist | 6 +
.../hardware_status | 5 +
.../test-disarm-failing-service1/log.expect | 125 +++++++++++
.../manager_status | 1 +
.../service_config | 4 +
src/test/test-disarm-fence1/cmdlist | 9 +
src/test/test-disarm-fence1/hardware_status | 5 +
src/test/test-disarm-fence1/log.expect | 78 +++++++
src/test/test-disarm-fence1/manager_status | 1 +
src/test/test-disarm-fence1/service_config | 5 +
src/test/test-disarm-frozen1/README | 10 +
src/test/test-disarm-frozen1/cmdlist | 5 +
src/test/test-disarm-frozen1/hardware_status | 5 +
src/test/test-disarm-frozen1/log.expect | 59 ++++++
src/test/test-disarm-frozen1/manager_status | 1 +
src/test/test-disarm-frozen1/service_config | 5 +
src/test/test-disarm-ignored1/README | 10 +
src/test/test-disarm-ignored1/cmdlist | 5 +
src/test/test-disarm-ignored1/hardware_status | 5 +
src/test/test-disarm-ignored1/log.expect | 50 +++++
src/test/test-disarm-ignored1/manager_status | 1 +
src/test/test-disarm-ignored1/service_config | 5 +
src/test/test-disarm-ignored2/cmdlist | 6 +
src/test/test-disarm-ignored2/hardware_status | 5 +
src/test/test-disarm-ignored2/log.expect | 60 ++++++
src/test/test-disarm-ignored2/manager_status | 1 +
src/test/test-disarm-ignored2/service_config | 5 +
src/test/test-disarm-maintenance1/cmdlist | 7 +
.../test-disarm-maintenance1/hardware_status | 5 +
src/test/test-disarm-maintenance1/log.expect | 79 +++++++
.../test-disarm-maintenance1/manager_status | 1 +
.../test-disarm-maintenance1/service_config | 5 +
src/test/test-disarm-maintenance2/cmdlist | 7 +
.../test-disarm-maintenance2/hardware_status | 5 +
src/test/test-disarm-maintenance2/log.expect | 78 +++++++
.../test-disarm-maintenance2/manager_status | 1 +
.../test-disarm-maintenance2/service_config | 5 +
src/test/test-disarm-maintenance3/cmdlist | 8 +
.../test-disarm-maintenance3/hardware_status | 5 +
src/test/test-disarm-maintenance3/log.expect | 80 +++++++
.../test-disarm-maintenance3/manager_status | 1 +
.../test-disarm-maintenance3/service_config | 5 +
src/test/test-disarm-relocate1/README | 3 +
src/test/test-disarm-relocate1/cmdlist | 7 +
.../test-disarm-relocate1/hardware_status | 5 +
src/test/test-disarm-relocate1/log.expect | 51 +++++
src/test/test-disarm-relocate1/manager_status | 1 +
src/test/test-disarm-relocate1/service_config | 4 +
64 files changed, 1264 insertions(+), 32 deletions(-)
create mode 100644 src/test/test-disarm-crm-stop1/README
create mode 100644 src/test/test-disarm-crm-stop1/cmdlist
create mode 100644 src/test/test-disarm-crm-stop1/hardware_status
create mode 100644 src/test/test-disarm-crm-stop1/log.expect
create mode 100644 src/test/test-disarm-crm-stop1/manager_status
create mode 100644 src/test/test-disarm-crm-stop1/service_config
create mode 100644 src/test/test-disarm-double1/cmdlist
create mode 100644 src/test/test-disarm-double1/hardware_status
create mode 100644 src/test/test-disarm-double1/log.expect
create mode 100644 src/test/test-disarm-double1/manager_status
create mode 100644 src/test/test-disarm-double1/service_config
create mode 100644 src/test/test-disarm-failing-service1/cmdlist
create mode 100644 src/test/test-disarm-failing-service1/hardware_status
create mode 100644 src/test/test-disarm-failing-service1/log.expect
create mode 100644 src/test/test-disarm-failing-service1/manager_status
create mode 100644 src/test/test-disarm-failing-service1/service_config
create mode 100644 src/test/test-disarm-fence1/cmdlist
create mode 100644 src/test/test-disarm-fence1/hardware_status
create mode 100644 src/test/test-disarm-fence1/log.expect
create mode 100644 src/test/test-disarm-fence1/manager_status
create mode 100644 src/test/test-disarm-fence1/service_config
create mode 100644 src/test/test-disarm-frozen1/README
create mode 100644 src/test/test-disarm-frozen1/cmdlist
create mode 100644 src/test/test-disarm-frozen1/hardware_status
create mode 100644 src/test/test-disarm-frozen1/log.expect
create mode 100644 src/test/test-disarm-frozen1/manager_status
create mode 100644 src/test/test-disarm-frozen1/service_config
create mode 100644 src/test/test-disarm-ignored1/README
create mode 100644 src/test/test-disarm-ignored1/cmdlist
create mode 100644 src/test/test-disarm-ignored1/hardware_status
create mode 100644 src/test/test-disarm-ignored1/log.expect
create mode 100644 src/test/test-disarm-ignored1/manager_status
create mode 100644 src/test/test-disarm-ignored1/service_config
create mode 100644 src/test/test-disarm-ignored2/cmdlist
create mode 100644 src/test/test-disarm-ignored2/hardware_status
create mode 100644 src/test/test-disarm-ignored2/log.expect
create mode 100644 src/test/test-disarm-ignored2/manager_status
create mode 100644 src/test/test-disarm-ignored2/service_config
create mode 100644 src/test/test-disarm-maintenance1/cmdlist
create mode 100644 src/test/test-disarm-maintenance1/hardware_status
create mode 100644 src/test/test-disarm-maintenance1/log.expect
create mode 100644 src/test/test-disarm-maintenance1/manager_status
create mode 100644 src/test/test-disarm-maintenance1/service_config
create mode 100644 src/test/test-disarm-maintenance2/cmdlist
create mode 100644 src/test/test-disarm-maintenance2/hardware_status
create mode 100644 src/test/test-disarm-maintenance2/log.expect
create mode 100644 src/test/test-disarm-maintenance2/manager_status
create mode 100644 src/test/test-disarm-maintenance2/service_config
create mode 100644 src/test/test-disarm-maintenance3/cmdlist
create mode 100644 src/test/test-disarm-maintenance3/hardware_status
create mode 100644 src/test/test-disarm-maintenance3/log.expect
create mode 100644 src/test/test-disarm-maintenance3/manager_status
create mode 100644 src/test/test-disarm-maintenance3/service_config
create mode 100644 src/test/test-disarm-relocate1/README
create mode 100644 src/test/test-disarm-relocate1/cmdlist
create mode 100644 src/test/test-disarm-relocate1/hardware_status
create mode 100644 src/test/test-disarm-relocate1/log.expect
create mode 100644 src/test/test-disarm-relocate1/manager_status
create mode 100644 src/test/test-disarm-relocate1/service_config
diff --git a/src/PVE/HA/CRM.pm b/src/PVE/HA/CRM.pm
index 2739763..a76cf67 100644
--- a/src/PVE/HA/CRM.pm
+++ b/src/PVE/HA/CRM.pm
@@ -104,9 +104,17 @@ sub get_protected_ha_manager_lock {
if ($haenv->get_ha_manager_lock()) {
if ($self->{ha_manager_wd}) {
$haenv->watchdog_update($self->{ha_manager_wd});
- } else {
- my $wfh = $haenv->watchdog_open();
- $self->{ha_manager_wd} = $wfh;
+ } elsif (!$self->{disarmed}) {
+ # check on-disk disarm state to avoid briefly opening a watchdog when taking
+ # over as new master while the stack is already fully disarmed
+ my $ms = eval { $haenv->read_manager_status() };
+ if ($ms && $ms->{disarm} && $ms->{disarm}->{state} eq 'disarmed') {
+ $haenv->log('info', "taking over as disarmed master, skipping watchdog");
+ $self->{disarmed} = 1;
+ } else {
+ my $wfh = $haenv->watchdog_open();
+ $self->{ha_manager_wd} = $wfh;
+ }
}
return 1;
}
@@ -211,6 +219,10 @@ sub can_get_active {
if (scalar($ss->%*)) {
return 1; # need to get active to clean up stale service status entries
}
+
+ if ($manager_status->{disarm}) {
+ return 1; # stay active while HA stack is disarmed
+ }
}
return 0; # no services, no node in maintenance mode, and no crm cmds -> can stay idle
}
@@ -232,6 +244,9 @@ sub allowed_to_get_idle {
my $manager_status = get_manager_status_guarded($haenv);
return 0 if !$self->is_cluster_and_ha_healthy($manager_status);
+ # don't go idle while HA stack is disarmed - need to stay active to process arm-ha
+ return 0 if $manager_status->{disarm};
+
my $conf = eval { $haenv->read_service_config() };
if (my $err = $@) {
$haenv->log('err', "could not read service config: $err");
@@ -379,6 +394,18 @@ sub work {
}
$manager->manage();
+
+ if ($manager->is_fully_disarmed()) {
+ if (!$self->{disarmed}) {
+ $haenv->log('info', "HA stack fully disarmed, releasing CRM watchdog");
+ give_up_watchdog_protection($self);
+ $self->{disarmed} = 1;
+ }
+ } elsif ($self->{disarmed}) {
+ $haenv->log('info', "re-arming HA stack");
+ $self->{disarmed} = 0;
+ # watchdog will be re-opened by get_protected_ha_manager_lock next iteration
+ }
}
};
if (my $err = $@) {
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 19eec2a..ad7f8a4 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -362,6 +362,11 @@ my $service_check_ha_state = sub {
if (!defined($has_state)) {
# ignored service behave as if they were not managed by HA
return 0 if defined($d->{state}) && $d->{state} eq 'ignored';
+ # cluster-wide disarm with ignore mode - resources can be managed directly
+ my $ms = cfs_read_file($manager_status_filename);
+ if (my $disarm = $ms->{disarm}) {
+ return 0 if $disarm->{mode} eq 'ignore';
+ }
return 1;
}
diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 09a965c..9545018 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -37,7 +37,7 @@ sub new {
restart_tries => {},
shutdown_request => 0,
shutdown_errors => 0,
- # mode can be: active, reboot, shutdown, restart, maintenance
+ # mode can be: active, reboot, shutdown, restart, maintenance, disarm
mode => 'active',
cluster_state_update => 0,
active_idle_rounds => 0,
@@ -212,7 +212,9 @@ sub update_service_status {
my $request = $ms->{node_request}->{$nodename} // {};
if ($request->{maintenance}) {
$self->{mode} = 'maintenance';
- } elsif ($self->{mode} eq 'maintenance') {
+ } elsif ($ms->{disarm}) {
+ $self->{mode} = 'disarm';
+ } elsif ($self->{mode} eq 'maintenance' || $self->{mode} eq 'disarm') {
$self->{mode} = 'active';
}
}
@@ -359,7 +361,9 @@ sub work {
my $service_count = $self->active_service_count();
- if (!$fence_request && $service_count && $haenv->quorate()) {
+ if ($self->{mode} eq 'disarm') {
+ # stay idle while disarmed, don't acquire lock
+ } elsif (!$fence_request && $service_count && $haenv->quorate()) {
if ($self->get_protected_ha_agent_lock()) {
$self->set_local_status({ state => 'active' });
}
@@ -382,6 +386,13 @@ sub work {
$self->set_local_status({ state => 'lost_agent_lock' });
} elsif ($self->is_maintenance_requested()) {
$self->set_local_status({ state => 'maintenance' });
+ } elsif ($self->{mode} eq 'disarm' && !$self->run_workers()) {
+ $haenv->log('info', "HA disarm requested, releasing agent lock and watchdog");
+ # safety: disarming requested, no fence request (handled in earlier if-branch) and no
+ # running workers anymore, so safe to go idle.
+ $haenv->release_ha_agent_lock();
+ give_up_watchdog_protection($self);
+ $self->set_local_status({ state => 'wait_for_agent_lock' });
} else {
if (!$self->has_configured_service_on_local_node() && !$self->run_workers()) {
# no active service configured for this node and all (old) workers are done
@@ -409,6 +420,20 @@ sub work {
"node needs to be fenced during maintenance mode - releasing agent_lock\n",
);
$self->set_local_status({ state => 'lost_agent_lock' });
+ } elsif (
+ $self->{mode} eq 'disarm'
+ && !$self->active_service_count()
+ && !$self->run_workers()
+ ) {
+ # disarm takes priority - release lock and watchdog, go idle
+ $haenv->log(
+ 'info',
+ "HA disarm requested during maintenance, releasing agent lock and watchdog",
+ );
+ # safety: no active services and no running workers, so safe to go idle.
+ $haenv->release_ha_agent_lock();
+ give_up_watchdog_protection($self);
+ $self->set_local_status({ state => 'wait_for_agent_lock' });
} elsif ($self->active_service_count() || $self->run_workers()) {
# keep the lock and watchdog as long as not all services cleared the node
if (!$self->get_protected_ha_agent_lock()) {
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index b1dbe6a..aa29858 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -75,6 +75,9 @@ sub new {
# on change of active master.
$self->{ms}->{node_request} = $old_ms->{node_request} if defined($old_ms->{node_request});
+ # preserve disarm state across CRM master changes
+ $self->{ms}->{disarm} = $old_ms->{disarm} if defined($old_ms->{disarm});
+
$self->update_crs_scheduler_mode(); # initial set, we update it once every loop
return $self;
@@ -472,7 +475,12 @@ sub update_crm_commands {
my $node = $1;
my $state = $ns->get_node_state($node);
- if ($state eq 'online') {
+ if ($ms->{disarm}) {
+ $haenv->log(
+ 'warn',
+ "ignoring maintenance command for node $node - HA stack is disarmed",
+ );
+ } elsif ($state eq 'online') {
$ms->{node_request}->{$node}->{maintenance} = 1;
} elsif ($state eq 'maintenance') {
$haenv->log(
@@ -493,6 +501,51 @@ sub update_crm_commands {
);
}
delete $ms->{node_request}->{$node}->{maintenance}; # gets flushed out at the end of the CRM loop
+ } elsif ($cmd =~ m/^disarm-ha\s+(freeze|ignore)$/) {
+ my $mode = $1;
+
+ if ($ms->{disarm}) {
+ $haenv->log(
+ 'notice',
+ "ignoring disarm-ha command - already in disarm state ($ms->{disarm}->{state})",
+ );
+ } else {
+ $haenv->log('info', "got crm command: disarm-ha $mode");
+ if ($mode eq 'ignore') {
+ for my $sid (sort keys %$ss) {
+ $haenv->log(
+ 'info', "disarm: suspending HA tracking for service '$sid'",
+ );
+ }
+ }
+ $ms->{disarm} = { mode => $mode, state => 'disarming' };
+ }
+ } elsif ($cmd =~ m/^arm-ha$/) {
+ if ($ms->{disarm}) {
+ $haenv->log('info', "got crm command: arm-ha");
+
+ # recheck node info after ignore mode, as services may have been manually
+ # migrated while HA tracking was suspended
+ if ($ms->{disarm}->{mode} eq 'ignore') {
+ my $sc = $haenv->read_service_config();
+ for my $sid (sort keys %$ss) {
+ my $cd = $sc->{$sid};
+ next if !$cd;
+ next if $cd->{node} eq $ss->{$sid}->{node};
+ $haenv->log(
+ 'info',
+ "service '$sid': updating node"
+ . " '$ss->{$sid}->{node}' => '$cd->{node}'"
+ . " (changed while disarmed)",
+ );
+ $ss->{$sid}->{node} = $cd->{node};
+ }
+ }
+
+ delete $ms->{disarm};
+ } else {
+ $haenv->log('notice', "ignoring arm-ha command - HA stack is not disarmed");
+ }
} else {
$haenv->log('err', "unable to parse crm command: $cmd");
}
@@ -631,6 +684,94 @@ sub try_persistent_group_migration {
}
}
+sub handle_disarm {
+ my ($self, $disarm, $ss, $lrm_modes) = @_;
+
+ my $haenv = $self->{haenv};
+ my $ns = $self->{ns};
+
+ # defer disarm if any services are in a transient state that needs the state machine to resolve
+ for my $sid (sort keys %$ss) {
+ my $state = $ss->{$sid}->{state};
+ if ($state eq 'fence' || $state eq 'recovery') {
+ $haenv->log(
+ 'warn', "deferring disarm - service '$sid' is in '$state' state",
+ );
+ return 0; # let manage() continue so fence/recovery can progress
+ }
+ if ($state eq 'migrate' || $state eq 'relocate') {
+ $haenv->log(
+ 'info', "deferring disarm - service '$sid' is in '$state' state",
+ );
+ return 0; # let manage() continue so migration can complete
+ }
+ }
+
+ my $mode = $disarm->{mode};
+
+ # prune stale runtime data (failed_nodes, cmd, target, ...) so the state machine starts
+ # fresh on re-arm; preserve maintenance_node for correct return behavior
+ my %keep_keys = map { $_ => 1 } qw(state node uid maintenance_node);
+
+ if ($mode eq 'freeze') {
+ for my $sid (sort keys %$ss) {
+ my $sd = $ss->{$sid};
+ my $state = $sd->{state};
+ next if $state eq 'freeze'; # already frozen
+ if (
+ $state eq 'started'
+ || $state eq 'stopped'
+ || $state eq 'request_stop'
+ || $state eq 'request_start'
+ || $state eq 'request_start_balance'
+ || $state eq 'error'
+ ) {
+ $haenv->log('info', "disarm: freezing service '$sid' (was '$state')");
+ delete $sd->{$_} for grep { !$keep_keys{$_} } keys %$sd;
+ $sd->{state} = 'freeze';
+ $sd->{uid} = compute_new_uuid('freeze');
+ }
+ }
+ } elsif ($mode eq 'ignore') {
+ # keep $ss intact; the disarm flag in $ms causes service loops and vm_is_ha_managed()
+ # to skip these services while disarmed
+ for my $sid (sort keys %$ss) {
+ my $sd = $ss->{$sid};
+ delete $sd->{$_} for grep { !$keep_keys{$_} } keys %$sd;
+ }
+ }
+
+ # check if all online LRMs have entered disarm mode
+ my $all_disarmed = 1;
+ my $online_nodes = $ns->list_online_nodes();
+
+ for my $node (@$online_nodes) {
+ my $lrm_mode = $lrm_modes->{$node} // 'unknown';
+ if ($lrm_mode ne 'disarm') {
+ $all_disarmed = 0;
+ last;
+ }
+ }
+
+ if ($all_disarmed && $disarm->{state} ne 'disarmed') {
+ $haenv->log('info', "all LRMs disarmed, HA stack is now fully disarmed");
+ $disarm->{state} = 'disarmed';
+ }
+
+ # once disarmed, stay disarmed - a returning node's LRM will catch up within one cycle
+ $self->{all_lrms_disarmed} = $disarm->{state} eq 'disarmed';
+
+ $self->flush_master_status();
+
+ return 1;
+}
+
+sub is_fully_disarmed {
+ my ($self) = @_;
+
+ return $self->{all_lrms_disarmed};
+}
+
sub manage {
my ($self) = @_;
@@ -657,31 +798,34 @@ sub manage {
# compute new service status
- # add new service
- foreach my $sid (sort keys %$sc) {
- next if $ss->{$sid}; # already there
- my $cd = $sc->{$sid};
- next if $cd->{state} eq 'ignored';
+ # skip service add/remove when disarmed - handle_disarm manages service status
+ if (!$ms->{disarm}) {
+ # add new service
+ foreach my $sid (sort keys %$sc) {
+ next if $ss->{$sid}; # already there
+ my $cd = $sc->{$sid};
+ next if $cd->{state} eq 'ignored';
- $haenv->log('info', "adding new service '$sid' on node '$cd->{node}'");
- # assume we are running to avoid relocate running service at add
- my $state = ($cd->{state} eq 'started') ? 'request_start' : 'request_stop';
- $ss->{$sid} = {
- state => $state,
- node => $cd->{node},
- uid => compute_new_uuid('started'),
- };
- }
+ $haenv->log('info', "adding new service '$sid' on node '$cd->{node}'");
+ # assume we are running to avoid relocate running service at add
+ my $state = ($cd->{state} eq 'started') ? 'request_start' : 'request_stop';
+ $ss->{$sid} = {
+ state => $state,
+ node => $cd->{node},
+ uid => compute_new_uuid('started'),
+ };
+ }
- # remove stale or ignored services from manager state
- foreach my $sid (keys %$ss) {
- next if $sc->{$sid} && $sc->{$sid}->{state} ne 'ignored';
+ # remove stale or ignored services from manager state
+ foreach my $sid (keys %$ss) {
+ next if $sc->{$sid} && $sc->{$sid}->{state} ne 'ignored';
- my $reason = defined($sc->{$sid}) ? 'ignored state requested' : 'no config';
- $haenv->log('info', "removing stale service '$sid' ($reason)");
+ my $reason = defined($sc->{$sid}) ? 'ignored state requested' : 'no config';
+ $haenv->log('info', "removing stale service '$sid' ($reason)");
- # remove all service related state information
- delete $ss->{$sid};
+ # remove all service related state information
+ delete $ss->{$sid};
+ }
}
$self->recompute_online_node_usage();
@@ -713,6 +857,15 @@ sub manage {
$self->update_crm_commands();
+ if (my $disarm = $ms->{disarm}) {
+ if ($self->handle_disarm($disarm, $ss, $lrm_modes)) {
+ return; # disarm active and progressing, skip normal service state machine
+ }
+ # disarm deferred (e.g. due to active fencing) - fall through to let it complete
+ }
+
+ $self->{all_lrms_disarmed} = 0;
+
for (;;) {
my $repeat = 0;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 301c391..f6857bd 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -835,6 +835,15 @@ sub sim_hardware_cmd {
|| $action eq 'disable-node-maintenance'
) {
$self->queue_crm_commands_nolock("$action $node");
+ } elsif ($action eq 'disarm-ha') {
+ my $mode = $params[0];
+
+ die "sim_hardware_cmd: unknown resource mode '$mode'\n"
+ if $mode !~ m/^(freeze|ignore)$/;
+
+ $self->queue_crm_commands_nolock("disarm-ha $mode");
+ } elsif ($action eq 'arm-ha') {
+ $self->queue_crm_commands_nolock("arm-ha");
} else {
die "sim_hardware_cmd: unknown action '$action'";
}
@@ -892,10 +901,17 @@ sub sim_hardware_cmd {
my $current_node = $conf->{$sid}->{node}
|| die "sim_hardware_cmd: service '$sid' has no node\n";
- die "sim_hardware_cmd: manual-migrate requires service"
- . " in 'ignored' state\n"
- if !defined($conf->{$sid}->{state})
- || $conf->{$sid}->{state} ne 'ignored';
+ my $svc_ignored = defined($conf->{$sid}->{state})
+ && $conf->{$sid}->{state} eq 'ignored';
+
+ my $ms = PVE::HA::Tools::read_json_from_file(
+ "$self->{statusdir}/manager_status", {},
+ );
+ my $disarm_ignored = $ms->{disarm} && $ms->{disarm}->{mode} eq 'ignore';
+
+ die "sim_hardware_cmd: manual-migrate requires service in"
+ . " 'ignored' state or disarm-ha ignore mode\n"
+ if !$svc_ignored && !$disarm_ignored;
$self->change_service_location($sid, $current_node, $param);
diff --git a/src/test/test-disarm-crm-stop1/README b/src/test/test-disarm-crm-stop1/README
new file mode 100644
index 0000000..5f81497
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/README
@@ -0,0 +1,13 @@
+Test CRM master takeover while HA stack is fully disarmed.
+
+Verify that when the CRM master is stopped during full disarm, a slave
+takes over cleanly without briefly opening a watchdog, and that arming
+via the new master works correctly.
+
+1. Start 3 nodes with services
+2. Disarm HA with freeze resource mode
+3. Wait for full disarm (all LRMs disarmed, CRM watchdog released)
+4. Stop CRM on master node (node1)
+5. Slave on node2 takes over as new master, preserving disarm state
+6. Arm HA via new master
+7. Services unfreeze and resume normal operation
diff --git a/src/test/test-disarm-crm-stop1/cmdlist b/src/test/test-disarm-crm-stop1/cmdlist
new file mode 100644
index 0000000..01e9cd9
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/cmdlist
@@ -0,0 +1,6 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node1 disarm-ha freeze" ],
+ [ "crm node1 stop" ],
+ [ "crm node2 arm-ha" ]
+]
\ No newline at end of file
diff --git a/src/test/test-disarm-crm-stop1/hardware_status b/src/test/test-disarm-crm-stop1/hardware_status
new file mode 100644
index 0000000..4990fd0
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-crm-stop1/log.expect b/src/test/test-disarm-crm-stop1/log.expect
new file mode 100644
index 0000000..880008f
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 40 node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info 40 node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info 120 cmdlist: execute crm node1 disarm-ha freeze
+info 120 node1/crm: got crm command: disarm-ha freeze
+info 120 node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info 120 node1/crm: disarm: freezing service 'vm:102' (was 'stopped')
+info 120 node1/crm: disarm: freezing service 'vm:103' (was 'stopped')
+info 121 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 121 node1/lrm: status change active => wait_for_agent_lock
+info 123 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 123 node2/lrm: status change active => wait_for_agent_lock
+info 125 node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info 125 node3/lrm: status change active => wait_for_agent_lock
+info 140 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 140 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 220 cmdlist: execute crm node1 stop
+info 220 node1/crm: server received shutdown request
+info 220 node1/crm: voluntary release CRM lock
+info 221 node1/crm: exit (loop end)
+info 222 node2/crm: got lock 'ha_manager_lock'
+info 222 node2/crm: taking over as disarmed master, skipping watchdog
+info 222 node2/crm: status change slave => master
+info 320 cmdlist: execute crm node2 arm-ha
+info 321 node2/crm: got crm command: arm-ha
+info 321 node2/crm: re-arming HA stack
+info 341 node2/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info 341 node2/crm: service 'vm:102': state changed from 'freeze' to 'request_stop'
+info 341 node2/crm: service 'vm:103': state changed from 'freeze' to 'request_stop'
+info 342 node2/lrm: got lock 'ha_agent_node2_lock'
+info 342 node2/lrm: status change wait_for_agent_lock => active
+info 344 node3/lrm: got lock 'ha_agent_node3_lock'
+info 344 node3/lrm: status change wait_for_agent_lock => active
+info 360 node1/lrm: got lock 'ha_agent_node1_lock'
+info 360 node1/lrm: status change wait_for_agent_lock => active
+info 361 node2/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info 361 node2/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info 920 hardware: exit simulation - done
diff --git a/src/test/test-disarm-crm-stop1/manager_status b/src/test/test-disarm-crm-stop1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-crm-stop1/service_config b/src/test/test-disarm-crm-stop1/service_config
new file mode 100644
index 0000000..c2ddbce
--- /dev/null
+++ b/src/test/test-disarm-crm-stop1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "stopped" },
+ "vm:103": { "node": "node3", "state": "disabled" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-double1/cmdlist b/src/test/test-disarm-double1/cmdlist
new file mode 100644
index 0000000..36ecb69
--- /dev/null
+++ b/src/test/test-disarm-double1/cmdlist
@@ -0,0 +1,7 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node1 disarm-ha freeze" ],
+ [ "crm node1 disarm-ha ignore" ],
+ [ "crm node1 arm-ha" ],
+ [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-double1/hardware_status b/src/test/test-disarm-double1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-double1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-double1/log.expect b/src/test/test-disarm-double1/log.expect
new file mode 100644
index 0000000..c45f586
--- /dev/null
+++ b/src/test/test-disarm-double1/log.expect
@@ -0,0 +1,53 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute crm node1 disarm-ha freeze
+info 120 node1/crm: got crm command: disarm-ha freeze
+info 120 node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info 120 node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info 121 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 121 node1/lrm: status change active => wait_for_agent_lock
+info 123 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 123 node2/lrm: status change active => wait_for_agent_lock
+info 140 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 140 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 220 cmdlist: execute crm node1 disarm-ha ignore
+noti 220 node1/crm: ignoring disarm-ha command - already in disarm state (disarmed)
+info 320 cmdlist: execute crm node1 arm-ha
+info 320 node1/crm: got crm command: arm-ha
+info 320 node1/crm: re-arming HA stack
+info 340 node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info 340 node1/crm: service 'vm:102': state changed from 'freeze' to 'started'
+info 341 node1/lrm: got lock 'ha_agent_node1_lock'
+info 341 node1/lrm: status change wait_for_agent_lock => active
+info 343 node2/lrm: got lock 'ha_agent_node2_lock'
+info 343 node2/lrm: status change wait_for_agent_lock => active
+info 420 cmdlist: execute crm node1 arm-ha
+noti 420 node1/crm: ignoring arm-ha command - HA stack is not disarmed
+info 1020 hardware: exit simulation - done
diff --git a/src/test/test-disarm-double1/manager_status b/src/test/test-disarm-double1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-double1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-double1/service_config b/src/test/test-disarm-double1/service_config
new file mode 100644
index 0000000..0336d09
--- /dev/null
+++ b/src/test/test-disarm-double1/service_config
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-disarm-failing-service1/cmdlist b/src/test/test-disarm-failing-service1/cmdlist
new file mode 100644
index 0000000..97ae018
--- /dev/null
+++ b/src/test/test-disarm-failing-service1/cmdlist
@@ -0,0 +1,6 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service fa:1001 disabled" ],
+ [ "crm node1 disarm-ha freeze" ],
+ [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-failing-service1/hardware_status b/src/test/test-disarm-failing-service1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-failing-service1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-failing-service1/log.expect b/src/test/test-disarm-failing-service1/log.expect
new file mode 100644
index 0000000..eddf8fe
--- /dev/null
+++ b/src/test/test-disarm-failing-service1/log.expect
@@ -0,0 +1,125 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:1001' on node 'node2'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: service 'fa:1001': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service fa:1001
+info 23 node2/lrm: service status fa:1001 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service fa:1001 disabled
+info 120 node1/crm: service 'fa:1001': state changed from 'started' to 'request_stop'
+info 123 node2/lrm: stopping service fa:1001
+info 123 node2/lrm: unable to stop service fa:1001 (still running)
+err 140 node1/crm: service 'fa:1001' stop failed (exit code 1)
+info 140 node1/crm: service 'fa:1001': state changed from 'request_stop' to 'error'
+info 140 node1/crm: service 'fa:1001': state changed from 'error' to 'stopped'
+info 143 node2/lrm: stopping service fa:1001
+info 143 node2/lrm: unable to stop service fa:1001 (still running)
+info 163 node2/lrm: stopping service fa:1001
+info 163 node2/lrm: unable to stop service fa:1001 (still running)
+info 183 node2/lrm: stopping service fa:1001
+info 183 node2/lrm: unable to stop service fa:1001 (still running)
+info 203 node2/lrm: stopping service fa:1001
+info 203 node2/lrm: unable to stop service fa:1001 (still running)
+info 220 cmdlist: execute crm node1 disarm-ha freeze
+info 220 node1/crm: got crm command: disarm-ha freeze
+info 220 node1/crm: disarm: freezing service 'fa:1001' (was 'stopped')
+info 220 node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info 221 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 221 node1/lrm: status change active => wait_for_agent_lock
+info 223 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 223 node2/lrm: status change active => wait_for_agent_lock
+info 240 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 240 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 320 cmdlist: execute crm node1 arm-ha
+info 320 node1/crm: got crm command: arm-ha
+info 320 node1/crm: re-arming HA stack
+info 340 node1/crm: service 'fa:1001': state changed from 'freeze' to 'request_stop'
+info 340 node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info 341 node1/lrm: got lock 'ha_agent_node1_lock'
+info 341 node1/lrm: status change wait_for_agent_lock => active
+info 343 node2/lrm: got lock 'ha_agent_node2_lock'
+info 343 node2/lrm: status change wait_for_agent_lock => active
+info 343 node2/lrm: stopping service fa:1001
+info 343 node2/lrm: unable to stop service fa:1001 (still running)
+err 360 node1/crm: service 'fa:1001' stop failed (exit code 1)
+info 360 node1/crm: service 'fa:1001': state changed from 'request_stop' to 'error'
+info 360 node1/crm: service 'fa:1001': state changed from 'error' to 'stopped'
+info 363 node2/lrm: stopping service fa:1001
+info 363 node2/lrm: unable to stop service fa:1001 (still running)
+info 383 node2/lrm: stopping service fa:1001
+info 383 node2/lrm: unable to stop service fa:1001 (still running)
+info 403 node2/lrm: stopping service fa:1001
+info 403 node2/lrm: unable to stop service fa:1001 (still running)
+info 423 node2/lrm: stopping service fa:1001
+info 423 node2/lrm: unable to stop service fa:1001 (still running)
+info 443 node2/lrm: stopping service fa:1001
+info 443 node2/lrm: unable to stop service fa:1001 (still running)
+info 463 node2/lrm: stopping service fa:1001
+info 463 node2/lrm: unable to stop service fa:1001 (still running)
+info 483 node2/lrm: stopping service fa:1001
+info 483 node2/lrm: unable to stop service fa:1001 (still running)
+info 503 node2/lrm: stopping service fa:1001
+info 503 node2/lrm: unable to stop service fa:1001 (still running)
+info 523 node2/lrm: stopping service fa:1001
+info 523 node2/lrm: unable to stop service fa:1001 (still running)
+info 543 node2/lrm: stopping service fa:1001
+info 543 node2/lrm: unable to stop service fa:1001 (still running)
+info 563 node2/lrm: stopping service fa:1001
+info 563 node2/lrm: unable to stop service fa:1001 (still running)
+info 583 node2/lrm: stopping service fa:1001
+info 583 node2/lrm: unable to stop service fa:1001 (still running)
+info 603 node2/lrm: stopping service fa:1001
+info 603 node2/lrm: unable to stop service fa:1001 (still running)
+info 623 node2/lrm: stopping service fa:1001
+info 623 node2/lrm: unable to stop service fa:1001 (still running)
+info 643 node2/lrm: stopping service fa:1001
+info 643 node2/lrm: unable to stop service fa:1001 (still running)
+info 663 node2/lrm: stopping service fa:1001
+info 663 node2/lrm: unable to stop service fa:1001 (still running)
+info 683 node2/lrm: stopping service fa:1001
+info 683 node2/lrm: unable to stop service fa:1001 (still running)
+info 703 node2/lrm: stopping service fa:1001
+info 703 node2/lrm: unable to stop service fa:1001 (still running)
+info 723 node2/lrm: stopping service fa:1001
+info 723 node2/lrm: unable to stop service fa:1001 (still running)
+info 743 node2/lrm: stopping service fa:1001
+info 743 node2/lrm: unable to stop service fa:1001 (still running)
+info 763 node2/lrm: stopping service fa:1001
+info 763 node2/lrm: unable to stop service fa:1001 (still running)
+info 783 node2/lrm: stopping service fa:1001
+info 783 node2/lrm: unable to stop service fa:1001 (still running)
+info 803 node2/lrm: stopping service fa:1001
+info 803 node2/lrm: unable to stop service fa:1001 (still running)
+info 823 node2/lrm: stopping service fa:1001
+info 823 node2/lrm: unable to stop service fa:1001 (still running)
+info 843 node2/lrm: stopping service fa:1001
+info 843 node2/lrm: unable to stop service fa:1001 (still running)
+info 863 node2/lrm: stopping service fa:1001
+info 863 node2/lrm: unable to stop service fa:1001 (still running)
+info 883 node2/lrm: stopping service fa:1001
+info 883 node2/lrm: unable to stop service fa:1001 (still running)
+info 903 node2/lrm: stopping service fa:1001
+info 903 node2/lrm: unable to stop service fa:1001 (still running)
+info 920 hardware: exit simulation - done
diff --git a/src/test/test-disarm-failing-service1/manager_status b/src/test/test-disarm-failing-service1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-failing-service1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-failing-service1/service_config b/src/test/test-disarm-failing-service1/service_config
new file mode 100644
index 0000000..f4f00e0
--- /dev/null
+++ b/src/test/test-disarm-failing-service1/service_config
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "fa:1001": { "node": "node2", "state": "enabled" }
+}
diff --git a/src/test/test-disarm-fence1/cmdlist b/src/test/test-disarm-fence1/cmdlist
new file mode 100644
index 0000000..7473615
--- /dev/null
+++ b/src/test/test-disarm-fence1/cmdlist
@@ -0,0 +1,9 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off" ],
+ [ "crm node1 disarm-ha freeze" ],
+ [],
+ [],
+ [],
+ [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-fence1/hardware_status b/src/test/test-disarm-fence1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-fence1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-fence1/log.expect b/src/test/test-disarm-fence1/log.expect
new file mode 100644
index 0000000..9a56c5d
--- /dev/null
+++ b/src/test/test-disarm-fence1/log.expect
@@ -0,0 +1,78 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 40 node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute crm node1 disarm-ha freeze
+info 220 node1/crm: got crm command: disarm-ha freeze
+warn 220 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info 221 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 221 node1/lrm: status change active => wait_for_agent_lock
+info 223 node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info 223 node3/lrm: status change active => wait_for_agent_lock
+warn 240 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node3)
+info 260 node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info 260 node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info 260 node1/crm: disarm: freezing service 'vm:103' (was 'stopped')
+info 260 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 260 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 620 cmdlist: execute crm node1 arm-ha
+info 620 node1/crm: got crm command: arm-ha
+info 620 node1/crm: re-arming HA stack
+info 640 node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info 640 node1/crm: service 'vm:102': state changed from 'freeze' to 'started'
+info 640 node1/crm: service 'vm:103': state changed from 'freeze' to 'request_stop'
+info 641 node1/lrm: got lock 'ha_agent_node1_lock'
+info 641 node1/lrm: status change wait_for_agent_lock => active
+info 643 node3/lrm: got lock 'ha_agent_node3_lock'
+info 643 node3/lrm: status change wait_for_agent_lock => active
+info 643 node3/lrm: starting service vm:102
+info 643 node3/lrm: service status vm:102 started
+info 660 node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info 1220 hardware: exit simulation - done
diff --git a/src/test/test-disarm-fence1/manager_status b/src/test/test-disarm-fence1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-fence1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-fence1/service_config b/src/test/test-disarm-fence1/service_config
new file mode 100644
index 0000000..0487834
--- /dev/null
+++ b/src/test/test-disarm-fence1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "stopped" }
+}
diff --git a/src/test/test-disarm-frozen1/README b/src/test/test-disarm-frozen1/README
new file mode 100644
index 0000000..e68ea2c
--- /dev/null
+++ b/src/test/test-disarm-frozen1/README
@@ -0,0 +1,10 @@
+Test disarm-ha with freeze resource mode.
+
+Verify the full disarm cycle:
+1. Start 3 nodes with services
+2. Disarm HA with freeze resource mode
+3. All services should transition to freeze state
+4. LRMs should release locks and watchdogs (disarm mode)
+5. CRM should release watchdog once all LRMs disarmed
+6. Arm HA again
+7. Services should unfreeze and resume normal operation
diff --git a/src/test/test-disarm-frozen1/cmdlist b/src/test/test-disarm-frozen1/cmdlist
new file mode 100644
index 0000000..e6fc192
--- /dev/null
+++ b/src/test/test-disarm-frozen1/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node1 disarm-ha freeze" ],
+ [ "crm node1 arm-ha" ]
+]
\ No newline at end of file
diff --git a/src/test/test-disarm-frozen1/hardware_status b/src/test/test-disarm-frozen1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-frozen1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-frozen1/log.expect b/src/test/test-disarm-frozen1/log.expect
new file mode 100644
index 0000000..206f14e
--- /dev/null
+++ b/src/test/test-disarm-frozen1/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 40 node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info 40 node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info 120 cmdlist: execute crm node1 disarm-ha freeze
+info 120 node1/crm: got crm command: disarm-ha freeze
+info 120 node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info 120 node1/crm: disarm: freezing service 'vm:102' (was 'stopped')
+info 120 node1/crm: disarm: freezing service 'vm:103' (was 'stopped')
+info 121 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 121 node1/lrm: status change active => wait_for_agent_lock
+info 123 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 123 node2/lrm: status change active => wait_for_agent_lock
+info 125 node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info 125 node3/lrm: status change active => wait_for_agent_lock
+info 140 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 140 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 220 cmdlist: execute crm node1 arm-ha
+info 220 node1/crm: got crm command: arm-ha
+info 220 node1/crm: re-arming HA stack
+info 240 node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info 240 node1/crm: service 'vm:102': state changed from 'freeze' to 'request_stop'
+info 240 node1/crm: service 'vm:103': state changed from 'freeze' to 'request_stop'
+info 241 node1/lrm: got lock 'ha_agent_node1_lock'
+info 241 node1/lrm: status change wait_for_agent_lock => active
+info 243 node2/lrm: got lock 'ha_agent_node2_lock'
+info 243 node2/lrm: status change wait_for_agent_lock => active
+info 245 node3/lrm: got lock 'ha_agent_node3_lock'
+info 245 node3/lrm: status change wait_for_agent_lock => active
+info 260 node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info 260 node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-disarm-frozen1/manager_status b/src/test/test-disarm-frozen1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-frozen1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-frozen1/service_config b/src/test/test-disarm-frozen1/service_config
new file mode 100644
index 0000000..c2ddbce
--- /dev/null
+++ b/src/test/test-disarm-frozen1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "stopped" },
+ "vm:103": { "node": "node3", "state": "disabled" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored1/README b/src/test/test-disarm-ignored1/README
new file mode 100644
index 0000000..12bed55
--- /dev/null
+++ b/src/test/test-disarm-ignored1/README
@@ -0,0 +1,10 @@
+Test disarm-ha with ignore resource mode.
+
+Verify the full disarm cycle with ignore resource mode:
+1. Start 3 nodes with services
+2. Disarm HA with ignore resource mode
+3. Services are suspended from HA tracking but kept in service status
+4. LRMs should release locks and watchdogs (disarm mode)
+5. CRM should release watchdog once all LRMs disarmed
+6. Arm HA again
+7. Services resume normal HA tracking with their preserved states
diff --git a/src/test/test-disarm-ignored1/cmdlist b/src/test/test-disarm-ignored1/cmdlist
new file mode 100644
index 0000000..b8a0c04
--- /dev/null
+++ b/src/test/test-disarm-ignored1/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node1 disarm-ha ignore" ],
+ [ "crm node1 arm-ha" ]
+]
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored1/hardware_status b/src/test/test-disarm-ignored1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-ignored1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-ignored1/log.expect b/src/test/test-disarm-ignored1/log.expect
new file mode 100644
index 0000000..2577dff
--- /dev/null
+++ b/src/test/test-disarm-ignored1/log.expect
@@ -0,0 +1,50 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 40 node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info 40 node1/crm: service 'vm:103': state changed from 'request_stop' to 'stopped'
+info 120 cmdlist: execute crm node1 disarm-ha ignore
+info 120 node1/crm: got crm command: disarm-ha ignore
+info 120 node1/crm: disarm: suspending HA tracking for service 'vm:101'
+info 120 node1/crm: disarm: suspending HA tracking for service 'vm:102'
+info 120 node1/crm: disarm: suspending HA tracking for service 'vm:103'
+info 121 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 121 node1/lrm: status change active => wait_for_agent_lock
+info 123 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 123 node2/lrm: status change active => wait_for_agent_lock
+info 125 node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info 125 node3/lrm: status change active => wait_for_agent_lock
+info 140 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 140 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 220 cmdlist: execute crm node1 arm-ha
+info 220 node1/crm: got crm command: arm-ha
+info 220 node1/crm: re-arming HA stack
+info 221 node1/lrm: got lock 'ha_agent_node1_lock'
+info 221 node1/lrm: status change wait_for_agent_lock => active
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-disarm-ignored1/manager_status b/src/test/test-disarm-ignored1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-ignored1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored1/service_config b/src/test/test-disarm-ignored1/service_config
new file mode 100644
index 0000000..c2ddbce
--- /dev/null
+++ b/src/test/test-disarm-ignored1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "stopped" },
+ "vm:103": { "node": "node3", "state": "disabled" }
+}
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored2/cmdlist b/src/test/test-disarm-ignored2/cmdlist
new file mode 100644
index 0000000..eecd37d
--- /dev/null
+++ b/src/test/test-disarm-ignored2/cmdlist
@@ -0,0 +1,6 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node1 disarm-ha ignore" ],
+ [ "service vm:103 manual-migrate node1" ],
+ [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-ignored2/hardware_status b/src/test/test-disarm-ignored2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-ignored2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-ignored2/log.expect b/src/test/test-disarm-ignored2/log.expect
new file mode 100644
index 0000000..5f37869
--- /dev/null
+++ b/src/test/test-disarm-ignored2/log.expect
@@ -0,0 +1,60 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute crm node1 disarm-ha ignore
+info 120 node1/crm: got crm command: disarm-ha ignore
+info 120 node1/crm: disarm: suspending HA tracking for service 'vm:101'
+info 120 node1/crm: disarm: suspending HA tracking for service 'vm:102'
+info 120 node1/crm: disarm: suspending HA tracking for service 'vm:103'
+info 121 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 121 node1/lrm: status change active => wait_for_agent_lock
+info 123 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 123 node2/lrm: status change active => wait_for_agent_lock
+info 125 node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info 125 node3/lrm: status change active => wait_for_agent_lock
+info 140 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 140 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 220 cmdlist: execute service vm:103 manual-migrate node1
+info 320 cmdlist: execute crm node1 arm-ha
+info 320 node1/crm: got crm command: arm-ha
+info 320 node1/crm: service 'vm:103': updating node 'node3' => 'node1' (changed while disarmed)
+info 320 node1/crm: re-arming HA stack
+info 321 node1/lrm: got lock 'ha_agent_node1_lock'
+info 321 node1/lrm: status change wait_for_agent_lock => active
+info 321 node1/lrm: starting service vm:103
+info 321 node1/lrm: service status vm:103 started
+info 323 node2/lrm: got lock 'ha_agent_node2_lock'
+info 323 node2/lrm: status change wait_for_agent_lock => active
+info 920 hardware: exit simulation - done
diff --git a/src/test/test-disarm-ignored2/manager_status b/src/test/test-disarm-ignored2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-ignored2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-ignored2/service_config b/src/test/test-disarm-ignored2/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-disarm-ignored2/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-maintenance1/cmdlist b/src/test/test-disarm-maintenance1/cmdlist
new file mode 100644
index 0000000..6f8a8ea
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/cmdlist
@@ -0,0 +1,7 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node3 enable-node-maintenance" ],
+ [ "crm node1 disarm-ha freeze" ],
+ [ "crm node1 arm-ha" ],
+ [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-disarm-maintenance1/hardware_status b/src/test/test-disarm-maintenance1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-maintenance1/log.expect b/src/test/test-disarm-maintenance1/log.expect
new file mode 100644
index 0000000..b5e0e5b
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/log.expect
@@ -0,0 +1,79 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute crm node3 enable-node-maintenance
+info 125 node3/lrm: status change active => maintenance
+info 140 node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info 140 node1/crm: migrate service 'vm:103' to node 'node1' (running)
+info 140 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 145 node3/lrm: service vm:103 - start migrate to node 'node1'
+info 145 node3/lrm: service vm:103 - end migrate to node 'node1'
+info 160 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 161 node1/lrm: starting service vm:103
+info 161 node1/lrm: service status vm:103 started
+info 220 cmdlist: execute crm node1 disarm-ha freeze
+info 220 node1/crm: got crm command: disarm-ha freeze
+info 220 node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info 220 node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info 220 node1/crm: disarm: freezing service 'vm:103' (was 'started')
+info 221 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 221 node1/lrm: status change active => wait_for_agent_lock
+info 223 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 223 node2/lrm: status change active => wait_for_agent_lock
+info 240 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 240 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 320 cmdlist: execute crm node1 arm-ha
+info 320 node1/crm: got crm command: arm-ha
+info 320 node1/crm: re-arming HA stack
+info 340 node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info 340 node1/crm: service 'vm:102': state changed from 'freeze' to 'started'
+info 340 node1/crm: service 'vm:103': state changed from 'freeze' to 'started'
+info 341 node1/lrm: got lock 'ha_agent_node1_lock'
+info 341 node1/lrm: status change wait_for_agent_lock => active
+info 343 node2/lrm: got lock 'ha_agent_node2_lock'
+info 343 node2/lrm: status change wait_for_agent_lock => active
+info 420 cmdlist: execute crm node3 disable-node-maintenance
+info 425 node3/lrm: got lock 'ha_agent_node3_lock'
+info 425 node3/lrm: status change maintenance => active
+info 440 node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info 440 node1/crm: moving service 'vm:103' back to 'node3', node came back from maintenance.
+info 440 node1/crm: migrate service 'vm:103' to node 'node3' (running)
+info 440 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 441 node1/lrm: service vm:103 - start migrate to node 'node3'
+info 441 node1/lrm: service vm:103 - end migrate to node 'node3'
+info 460 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 465 node3/lrm: starting service vm:103
+info 465 node3/lrm: service status vm:103 started
+info 1020 hardware: exit simulation - done
diff --git a/src/test/test-disarm-maintenance1/manager_status b/src/test/test-disarm-maintenance1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-maintenance1/service_config b/src/test/test-disarm-maintenance1/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-disarm-maintenance1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-maintenance2/cmdlist b/src/test/test-disarm-maintenance2/cmdlist
new file mode 100644
index 0000000..2c85b4d
--- /dev/null
+++ b/src/test/test-disarm-maintenance2/cmdlist
@@ -0,0 +1,7 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node3 enable-node-maintenance" ],
+ [ "crm node1 disarm-ha ignore" ],
+ [ "crm node3 disable-node-maintenance" ],
+ [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-maintenance2/hardware_status b/src/test/test-disarm-maintenance2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-maintenance2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-maintenance2/log.expect b/src/test/test-disarm-maintenance2/log.expect
new file mode 100644
index 0000000..b21b72f
--- /dev/null
+++ b/src/test/test-disarm-maintenance2/log.expect
@@ -0,0 +1,78 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute crm node3 enable-node-maintenance
+info 125 node3/lrm: status change active => maintenance
+info 140 node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info 140 node1/crm: migrate service 'vm:103' to node 'node1' (running)
+info 140 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 145 node3/lrm: service vm:103 - start migrate to node 'node1'
+info 145 node3/lrm: service vm:103 - end migrate to node 'node1'
+info 160 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 161 node1/lrm: starting service vm:103
+info 161 node1/lrm: service status vm:103 started
+info 220 cmdlist: execute crm node1 disarm-ha ignore
+info 220 node1/crm: got crm command: disarm-ha ignore
+info 220 node1/crm: disarm: suspending HA tracking for service 'vm:101'
+info 220 node1/crm: disarm: suspending HA tracking for service 'vm:102'
+info 220 node1/crm: disarm: suspending HA tracking for service 'vm:103'
+info 221 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 221 node1/lrm: status change active => wait_for_agent_lock
+info 223 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 223 node2/lrm: status change active => wait_for_agent_lock
+info 240 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 240 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 320 cmdlist: execute crm node3 disable-node-maintenance
+info 325 node3/lrm: HA disarm requested during maintenance, releasing agent lock and watchdog
+info 325 node3/lrm: status change maintenance => wait_for_agent_lock
+info 340 node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info 420 cmdlist: execute crm node1 arm-ha
+info 420 node1/crm: got crm command: arm-ha
+info 420 node1/crm: moving service 'vm:103' back to 'node3', node came back from maintenance.
+info 420 node1/crm: migrate service 'vm:103' to node 'node3' (running)
+info 420 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 420 node1/crm: re-arming HA stack
+info 421 node1/lrm: got lock 'ha_agent_node1_lock'
+info 421 node1/lrm: status change wait_for_agent_lock => active
+info 421 node1/lrm: service vm:103 - start migrate to node 'node3'
+info 421 node1/lrm: service vm:103 - end migrate to node 'node3'
+info 423 node2/lrm: got lock 'ha_agent_node2_lock'
+info 423 node2/lrm: status change wait_for_agent_lock => active
+info 425 node3/lrm: got lock 'ha_agent_node3_lock'
+info 425 node3/lrm: status change wait_for_agent_lock => active
+info 440 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 445 node3/lrm: starting service vm:103
+info 445 node3/lrm: service status vm:103 started
+info 1020 hardware: exit simulation - done
diff --git a/src/test/test-disarm-maintenance2/manager_status b/src/test/test-disarm-maintenance2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-maintenance2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-maintenance2/service_config b/src/test/test-disarm-maintenance2/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-disarm-maintenance2/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-maintenance3/cmdlist b/src/test/test-disarm-maintenance3/cmdlist
new file mode 100644
index 0000000..d49095c
--- /dev/null
+++ b/src/test/test-disarm-maintenance3/cmdlist
@@ -0,0 +1,8 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "crm node3 enable-node-maintenance" ],
+ [ "crm node1 disarm-ha ignore" ],
+ [ "service vm:103 manual-migrate node2" ],
+ [ "crm node3 disable-node-maintenance" ],
+ [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-maintenance3/hardware_status b/src/test/test-disarm-maintenance3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-maintenance3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-maintenance3/log.expect b/src/test/test-disarm-maintenance3/log.expect
new file mode 100644
index 0000000..b26f8b8
--- /dev/null
+++ b/src/test/test-disarm-maintenance3/log.expect
@@ -0,0 +1,80 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute crm node3 enable-node-maintenance
+info 125 node3/lrm: status change active => maintenance
+info 140 node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info 140 node1/crm: migrate service 'vm:103' to node 'node1' (running)
+info 140 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 145 node3/lrm: service vm:103 - start migrate to node 'node1'
+info 145 node3/lrm: service vm:103 - end migrate to node 'node1'
+info 160 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 161 node1/lrm: starting service vm:103
+info 161 node1/lrm: service status vm:103 started
+info 220 cmdlist: execute crm node1 disarm-ha ignore
+info 220 node1/crm: got crm command: disarm-ha ignore
+info 220 node1/crm: disarm: suspending HA tracking for service 'vm:101'
+info 220 node1/crm: disarm: suspending HA tracking for service 'vm:102'
+info 220 node1/crm: disarm: suspending HA tracking for service 'vm:103'
+info 221 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 221 node1/lrm: status change active => wait_for_agent_lock
+info 223 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 223 node2/lrm: status change active => wait_for_agent_lock
+info 240 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 240 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 320 cmdlist: execute service vm:103 manual-migrate node2
+info 420 cmdlist: execute crm node3 disable-node-maintenance
+info 425 node3/lrm: HA disarm requested during maintenance, releasing agent lock and watchdog
+info 425 node3/lrm: status change maintenance => wait_for_agent_lock
+info 440 node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info 520 cmdlist: execute crm node1 arm-ha
+info 520 node1/crm: got crm command: arm-ha
+info 520 node1/crm: service 'vm:103': updating node 'node1' => 'node2' (changed while disarmed)
+info 520 node1/crm: moving service 'vm:103' back to 'node3', node came back from maintenance.
+info 520 node1/crm: migrate service 'vm:103' to node 'node3' (running)
+info 520 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 520 node1/crm: re-arming HA stack
+info 521 node1/lrm: got lock 'ha_agent_node1_lock'
+info 521 node1/lrm: status change wait_for_agent_lock => active
+info 523 node2/lrm: got lock 'ha_agent_node2_lock'
+info 523 node2/lrm: status change wait_for_agent_lock => active
+info 523 node2/lrm: service vm:103 - start migrate to node 'node3'
+info 523 node2/lrm: service vm:103 - end migrate to node 'node3'
+info 525 node3/lrm: got lock 'ha_agent_node3_lock'
+info 525 node3/lrm: status change wait_for_agent_lock => active
+info 540 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 545 node3/lrm: starting service vm:103
+info 545 node3/lrm: service status vm:103 started
+info 1120 hardware: exit simulation - done
diff --git a/src/test/test-disarm-maintenance3/manager_status b/src/test/test-disarm-maintenance3/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-maintenance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-maintenance3/service_config b/src/test/test-disarm-maintenance3/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-disarm-maintenance3/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-relocate1/README b/src/test/test-disarm-relocate1/README
new file mode 100644
index 0000000..a5b6324
--- /dev/null
+++ b/src/test/test-disarm-relocate1/README
@@ -0,0 +1,3 @@
+Test disarm-ha freeze when a relocate command arrives in the same CRM cycle.
+The disarm takes priority: the relocate command is pre-empted and the service
+is frozen directly. After arm-ha, both services resume normally.
diff --git a/src/test/test-disarm-relocate1/cmdlist b/src/test/test-disarm-relocate1/cmdlist
new file mode 100644
index 0000000..99f2916
--- /dev/null
+++ b/src/test/test-disarm-relocate1/cmdlist
@@ -0,0 +1,7 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 relocate node2", "crm node1 disarm-ha freeze" ],
+ [],
+ [],
+ [ "crm node1 arm-ha" ]
+]
diff --git a/src/test/test-disarm-relocate1/hardware_status b/src/test/test-disarm-relocate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-disarm-relocate1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-relocate1/log.expect b/src/test/test-disarm-relocate1/log.expect
new file mode 100644
index 0000000..b051cac
--- /dev/null
+++ b/src/test/test-disarm-relocate1/log.expect
@@ -0,0 +1,51 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:101 relocate node2
+info 120 cmdlist: execute crm node1 disarm-ha freeze
+info 120 node1/crm: got crm command: relocate vm:101 node2
+info 120 node1/crm: got crm command: disarm-ha freeze
+info 120 node1/crm: disarm: freezing service 'vm:101' (was 'started')
+info 120 node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info 121 node1/lrm: HA disarm requested, releasing agent lock and watchdog
+info 121 node1/lrm: status change active => wait_for_agent_lock
+info 123 node2/lrm: HA disarm requested, releasing agent lock and watchdog
+info 123 node2/lrm: status change active => wait_for_agent_lock
+info 140 node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info 140 node1/crm: HA stack fully disarmed, releasing CRM watchdog
+info 420 cmdlist: execute crm node1 arm-ha
+info 420 node1/crm: got crm command: arm-ha
+info 420 node1/crm: re-arming HA stack
+info 440 node1/crm: service 'vm:101': state changed from 'freeze' to 'started'
+info 440 node1/crm: service 'vm:102': state changed from 'freeze' to 'started'
+info 441 node1/lrm: got lock 'ha_agent_node1_lock'
+info 441 node1/lrm: status change wait_for_agent_lock => active
+info 443 node2/lrm: got lock 'ha_agent_node2_lock'
+info 443 node2/lrm: status change wait_for_agent_lock => active
+info 1020 hardware: exit simulation - done
diff --git a/src/test/test-disarm-relocate1/manager_status b/src/test/test-disarm-relocate1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-disarm-relocate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-disarm-relocate1/service_config b/src/test/test-disarm-relocate1/service_config
new file mode 100644
index 0000000..0336d09
--- /dev/null
+++ b/src/test/test-disarm-relocate1/service_config
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH ha-manager v2 4/4] api: status: add disarm-ha and arm-ha endpoints and CLI wiring
2026-03-21 23:42 [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Thomas Lamprecht
` (2 preceding siblings ...)
2026-03-21 23:42 ` [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance Thomas Lamprecht
@ 2026-03-21 23:42 ` Thomas Lamprecht
2026-03-23 13:05 ` [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Dominik Rusovac
2026-03-25 12:06 ` applied: " Thomas Lamprecht
5 siblings, 0 replies; 13+ messages in thread
From: Thomas Lamprecht @ 2026-03-21 23:42 UTC (permalink / raw)
To: pve-devel
Expose the disarm/arm mechanism as two separate POST endpoints under
/cluster/ha/status/ and wire them into the crm-command CLI namespace.
Extend the fencing status entry with disarming/disarmed states and
the active resource mode. Each LRM entry shows 'watchdog released'
once in disarm mode. The master and service status lines include the
disarm state when applicable.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
changes v1 -> v2:
- mark both endpoints as protected, otherwise only CLI worked [Dominik]
- show 'ignore' as service state during disarm-ignore mode for all
services now that we preserve the service status in general when
disarmed.
src/PVE/API2/HA/Status.pm | 142 ++++++++++++++++++++++++++++++++------
src/PVE/CLI/ha_manager.pm | 2 +
2 files changed, 123 insertions(+), 21 deletions(-)
diff --git a/src/PVE/API2/HA/Status.pm b/src/PVE/API2/HA/Status.pm
index a6b00b9..59a93b6 100644
--- a/src/PVE/API2/HA/Status.pm
+++ b/src/PVE/API2/HA/Status.pm
@@ -52,7 +52,10 @@ __PACKAGE__->register_method({
my ($param) = @_;
my $result = [
- { name => 'current' }, { name => 'manager_status' },
+ { name => 'current' },
+ { name => 'manager_status' },
+ { name => 'disarm-ha' },
+ { name => 'arm-ha' },
];
return $result;
@@ -144,10 +147,17 @@ __PACKAGE__->register_method({
optional => 1,
},
'armed-state' => {
- description => "For type 'fencing'. Whether HA fencing is armed"
- . " or on standby.",
+ description => "For type 'fencing'. Whether HA is armed, on standby,"
+ . " disarming or disarmed.",
type => "string",
- enum => ['armed', 'standby'],
+ enum => ['armed', 'standby', 'disarming', 'disarmed'],
+ optional => 1,
+ },
+ resource_mode => {
+ description =>
+ "For type 'fencing'. How resources are handled while disarmed.",
+ type => "string",
+ enum => ['freeze', 'ignore'],
optional => 1,
},
},
@@ -184,9 +194,13 @@ __PACKAGE__->register_method({
my $extra_status = '';
+ if (my $disarm = $status->{disarm}) {
+ $extra_status .= " - $disarm->{state}, resource mode: $disarm->{mode}";
+ }
my $datacenter_config = eval { cfs_read_file('datacenter.cfg') } // {};
if (my $crs = $datacenter_config->{crs}) {
- $extra_status = " - $crs->{ha} load CRS" if $crs->{ha} && $crs->{ha} ne 'basic';
+ $extra_status .= " - $crs->{ha} load CRS"
+ if $crs->{ha} && $crs->{ha} ne 'basic';
}
my $time_str = localtime($status->{timestamp});
my $status_text = "$master ($status_str, $time_str)$extra_status";
@@ -206,16 +220,32 @@ __PACKAGE__->register_method({
&& defined($status->{timestamp})
&& $timestamp_to_status->($ctime, $status->{timestamp}) eq 'active';
- my $armed_state = $crm_active ? 'armed' : 'standby';
- my $crm_wd = $crm_active ? "CRM watchdog active" : "CRM watchdog standby";
- push @$res,
- {
- id => 'fencing',
- type => 'fencing',
- node => $status->{master_node} // $nodename,
- status => "$armed_state ($crm_wd)",
- 'armed-state' => $armed_state,
- };
+ if (my $disarm = $status->{disarm}) {
+ my $mode = $disarm->{mode} // 'unknown';
+ my $disarm_state = $disarm->{state} // 'unknown';
+ my $wd_released = $disarm_state eq 'disarmed';
+ my $crm_wd = $wd_released ? "CRM watchdog released" : "CRM watchdog active";
+ push @$res,
+ {
+ id => 'fencing',
+ type => 'fencing',
+ node => $status->{master_node} // $nodename,
+ status => "$disarm_state, resource mode: $mode ($crm_wd)",
+ 'armed-state' => $disarm_state,
+ resource_mode => $mode,
+ };
+ } else {
+ my $armed_state = $crm_active ? 'armed' : 'standby';
+ my $crm_wd = $crm_active ? "CRM watchdog active" : "CRM watchdog standby";
+ push @$res,
+ {
+ id => 'fencing',
+ type => 'fencing',
+ node => $status->{master_node} // $nodename,
+ status => "$armed_state ($crm_wd)",
+ 'armed-state' => $armed_state,
+ };
+ }
foreach my $node (sort keys %{ $status->{node_status} }) {
my $active_count =
@@ -236,11 +266,17 @@ __PACKAGE__->register_method({
my $lrm_state = $lrm_status->{state} || 'unknown';
# LRM holds its watchdog while it has the agent lock
- my $lrm_wd =
- ($status_str eq 'active'
- && ($lrm_state eq 'active' || $lrm_state eq 'maintenance'))
- ? 'watchdog active'
- : 'watchdog standby';
+ my $lrm_wd;
+ if (
+ $status_str eq 'active'
+ && ($lrm_state eq 'active' || $lrm_state eq 'maintenance')
+ ) {
+ $lrm_wd = 'watchdog active';
+ } elsif ($lrm_mode && $lrm_mode eq 'disarm') {
+ $lrm_wd = 'watchdog released';
+ } else {
+ $lrm_wd = 'watchdog standby';
+ }
if ($status_str eq 'active') {
$lrm_mode ||= 'active';
@@ -253,7 +289,7 @@ __PACKAGE__->register_method({
$status_str = $lrm_state;
}
}
- } elsif ($lrm_mode && $lrm_mode eq 'maintenance') {
+ } elsif ($lrm_mode && ($lrm_mode eq 'maintenance' || $lrm_mode eq 'disarm')) {
$status_str = "$lrm_mode mode";
}
@@ -284,6 +320,14 @@ __PACKAGE__->register_method({
my $node = $data->{node} // '---'; # to be safe against manual tinkering
$data->{state} = PVE::HA::Tools::get_verbose_service_state($ss, $sc);
+
+ # show disarm resource mode instead of internal service state
+ if (my $disarm = $status->{disarm}) {
+ if ($disarm->{mode} eq 'ignore') {
+ $data->{state} = 'ignore';
+ }
+ }
+
$data->{status} = "$sid ($node, $data->{state})"; # backward compat. and CLI
# also return common resource attributes
@@ -348,4 +392,60 @@ __PACKAGE__->register_method({
},
});
+__PACKAGE__->register_method({
+ name => 'disarm-ha',
+ path => 'disarm-ha',
+ method => 'POST',
+ protected => 1,
+ description => "Request disarming the HA stack, releasing all watchdogs cluster-wide.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ 'resource-mode' => {
+ description => "Controls how HA managed resources are handled while disarmed."
+ . " The current state of resources is not affected."
+ . " 'freeze': new commands and state changes are not applied."
+ . " 'ignore': resources are removed from HA tracking and can be"
+ . " managed as if they were not HA managed.",
+ type => 'string',
+ enum => ['freeze', 'ignore'],
+ },
+ },
+ },
+ returns => { type => 'null' },
+ code => sub {
+ my ($param) = @_;
+
+ PVE::HA::Config::queue_crm_commands("disarm-ha $param->{'resource-mode'}");
+
+ return undef;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'arm-ha',
+ path => 'arm-ha',
+ method => 'POST',
+ protected => 1,
+ description => "Request re-arming the HA stack after it was disarmed.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {},
+ },
+ returns => { type => 'null' },
+ code => sub {
+ my ($param) = @_;
+
+ PVE::HA::Config::queue_crm_commands("arm-ha");
+
+ return undef;
+ },
+});
+
1;
diff --git a/src/PVE/CLI/ha_manager.pm b/src/PVE/CLI/ha_manager.pm
index be6978c..f257c01 100644
--- a/src/PVE/CLI/ha_manager.pm
+++ b/src/PVE/CLI/ha_manager.pm
@@ -298,6 +298,8 @@ our $cmddef = {
enable => [__PACKAGE__, 'node-maintenance-set', ['node'], { disable => 0 }],
disable => [__PACKAGE__, 'node-maintenance-set', ['node'], { disable => 1 }],
},
+ 'disarm-ha' => ['PVE::API2::HA::Status', 'disarm-ha', ['resource-mode']],
+ 'arm-ha' => ['PVE::API2::HA::Status', 'arm-ha', []],
},
};
--
2.47.3
^ permalink raw reply [flat|nested] 13+ messages in thread