From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources
Date: Tue, 19 May 2026 16:38:35 +0200 [thread overview]
Message-ID: <20260519143842.382324-2-d.kral@proxmox.com> (raw)
In-Reply-To: <20260519143842.382324-1-d.kral@proxmox.com>
These test cases document how the HA stack currently behaves if the HA
stack is disarmed while there are HA resources in disarm-deferring
states (fence, recovery, migrate, relocate) and their responsible LRMs
are already in idle state, which makes them irresponsive to the HA
Manager while they are in disarm mode.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/test/test-disarm-idle-lrm1/README | 8 +++
src/test/test-disarm-idle-lrm1/cmdlist | 3 +
.../test-disarm-idle-lrm1/hardware_status | 5 ++
src/test/test-disarm-idle-lrm1/log.expect | 59 +++++++++++++++++++
src/test/test-disarm-idle-lrm1/manager_status | 26 ++++++++
src/test/test-disarm-idle-lrm1/service_config | 5 ++
src/test/test-disarm-idle-lrm2/README | 8 +++
src/test/test-disarm-idle-lrm2/cmdlist | 3 +
.../test-disarm-idle-lrm2/hardware_status | 5 ++
src/test/test-disarm-idle-lrm2/log.expect | 56 ++++++++++++++++++
src/test/test-disarm-idle-lrm2/manager_status | 26 ++++++++
src/test/test-disarm-idle-lrm2/service_config | 5 ++
12 files changed, 209 insertions(+)
create mode 100644 src/test/test-disarm-idle-lrm1/README
create mode 100644 src/test/test-disarm-idle-lrm1/cmdlist
create mode 100644 src/test/test-disarm-idle-lrm1/hardware_status
create mode 100644 src/test/test-disarm-idle-lrm1/log.expect
create mode 100644 src/test/test-disarm-idle-lrm1/manager_status
create mode 100644 src/test/test-disarm-idle-lrm1/service_config
create mode 100644 src/test/test-disarm-idle-lrm2/README
create mode 100644 src/test/test-disarm-idle-lrm2/cmdlist
create mode 100644 src/test/test-disarm-idle-lrm2/hardware_status
create mode 100644 src/test/test-disarm-idle-lrm2/log.expect
create mode 100644 src/test/test-disarm-idle-lrm2/manager_status
create mode 100644 src/test/test-disarm-idle-lrm2/service_config
diff --git a/src/test/test-disarm-idle-lrm1/README b/src/test/test-disarm-idle-lrm1/README
new file mode 100644
index 00000000..6d5124cd
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/README
@@ -0,0 +1,8 @@
+Disarm the HA stack in 'ignore' mode (keep HA resources in their previous
+state) while there is an HA resource in a transient state on a node, whose LRM
+is idle.
+
+Fenced HA resources are already handled by the HA Manager itself, so this works
+as expected. Though as the LRM for the moving HA resource is idle, the HA
+Manager doesn't get any LRM response for the HA resource, for which the
+disarming is deferred, and therefore the HA Manager is stuck in a loop.
diff --git a/src/test/test-disarm-idle-lrm1/cmdlist b/src/test/test-disarm-idle-lrm1/cmdlist
new file mode 100644
index 00000000..a29cf8e3
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha ignore" ]
+]
diff --git a/src/test/test-disarm-idle-lrm1/hardware_status b/src/test/test-disarm-idle-lrm1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-idle-lrm1/log.expect b/src/test/test-disarm-idle-lrm1/log.expect
new file mode 100644
index 00000000..1b7f4ece
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute crm node1 disarm-ha ignore
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: got crm command: disarm-ha ignore
+info 20 node1/crm: disarm: suspending HA tracking for service 'vm:101'
+info 20 node1/crm: disarm: suspending HA tracking for service 'vm:102'
+info 20 node1/crm: disarm: suspending HA tracking for service 'vm:103'
+warn 20 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info 20 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 20 node1/crm: got lock 'ha_agent_node2_lock'
+info 20 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 20 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 20 node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1'
+info 20 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 40 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 60 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 80 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 100 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 120 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 140 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 160 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 180 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 200 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 220 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 240 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 260 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 280 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 300 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 320 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 340 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 360 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 380 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 400 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 420 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 440 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 460 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 480 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 500 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 520 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 540 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 560 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 580 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 600 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-disarm-idle-lrm1/manager_status b/src/test/test-disarm-idle-lrm1/manager_status
new file mode 100644
index 00000000..ba7448a0
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/manager_status
@@ -0,0 +1,26 @@
+{
+ "master_node": "node1",
+ "node_status": {
+ "node1":"online",
+ "node2":"fence",
+ "node3":"online"
+ },
+ "service_status": {
+ "vm:101": {
+ "node": "node1",
+ "state": "started",
+ "uid": "lavE3c7vLnotUBGT9whswg"
+ },
+ "vm:102": {
+ "node": "node2",
+ "state": "fence",
+ "uid": "lavE3c7vLnotUBGT9whswh"
+ },
+ "vm:103": {
+ "node": "node3",
+ "state": "migrate",
+ "target": "node2",
+ "uid": "lavE3c7vLnotUBGT9whswj"
+ }
+ }
+}
diff --git a/src/test/test-disarm-idle-lrm1/service_config b/src/test/test-disarm-idle-lrm1/service_config
new file mode 100644
index 00000000..4b26f6b4
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-idle-lrm2/README b/src/test/test-disarm-idle-lrm2/README
new file mode 100644
index 00000000..d1731578
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/README
@@ -0,0 +1,8 @@
+Disarm the HA stack in 'freeze' mode (keep HA resources frozen while disarmed)
+while there is an HA resource in a transient state on a node, whose LRM is
+idle.
+
+Fenced HA resources are already handled by the HA Manager itself, so this works
+as expected. Though as the LRM for the moving HA resource is idle, the HA
+Manager doesn't get any LRM response for the HA resource, for which the
+disarming is deferred, and therefore the HA Manager is stuck in a loop.
diff --git a/src/test/test-disarm-idle-lrm2/cmdlist b/src/test/test-disarm-idle-lrm2/cmdlist
new file mode 100644
index 00000000..5a46a662
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha freeze" ]
+]
diff --git a/src/test/test-disarm-idle-lrm2/hardware_status b/src/test/test-disarm-idle-lrm2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-idle-lrm2/log.expect b/src/test/test-disarm-idle-lrm2/log.expect
new file mode 100644
index 00000000..d0ba96ff
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/log.expect
@@ -0,0 +1,56 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute crm node1 disarm-ha freeze
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: got crm command: disarm-ha freeze
+warn 20 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info 20 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 20 node1/crm: got lock 'ha_agent_node2_lock'
+info 20 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 20 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 20 node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1'
+info 20 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 40 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 60 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 80 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 100 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 120 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 140 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 160 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 180 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 200 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 220 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 240 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 260 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 280 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 300 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 320 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 340 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 360 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 380 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 400 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 420 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 440 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 460 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 480 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 500 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 520 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 540 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 560 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 580 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 600 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-disarm-idle-lrm2/manager_status b/src/test/test-disarm-idle-lrm2/manager_status
new file mode 100644
index 00000000..c28f4ffd
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/manager_status
@@ -0,0 +1,26 @@
+{
+ "master_node": "node1",
+ "node_status": {
+ "node1":"online",
+ "node2":"fence",
+ "node3":"online"
+ },
+ "service_status": {
+ "vm:101": {
+ "node": "node1",
+ "state": "online",
+ "uid": "lavE3c7vLnotUBGT9whswg"
+ },
+ "vm:102": {
+ "node": "node2",
+ "state": "fence",
+ "uid": "lavE3c7vLnotUBGT9whswh"
+ },
+ "vm:103": {
+ "node": "node3",
+ "state": "migrate",
+ "target": "node2",
+ "uid": "lavE3c7vLnotUBGT9whswj"
+ }
+ }
+}
diff --git a/src/test/test-disarm-idle-lrm2/service_config b/src/test/test-disarm-idle-lrm2/service_config
new file mode 100644
index 00000000..4b26f6b4
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
--
2.47.3
next prev parent reply other threads:[~2026-05-19 14:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` Daniel Kral [this message]
2026-05-19 14:38 ` [PATCH ha-manager 2/2] " Daniel Kral
2026-05-19 16:00 ` Fiona Ebner
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00 ` Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260519143842.382324-2-d.kral@proxmox.com \
--to=d.kral@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox