From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources
Date: Tue, 19 May 2026 16:38:35 +0200 [thread overview]
Message-ID: <20260519143842.382324-2-d.kral@proxmox.com> (raw)
In-Reply-To: <20260519143842.382324-1-d.kral@proxmox.com>
These test cases document how the HA stack currently behaves if the HA
stack is disarmed while there are HA resources in disarm-deferring
states (fence, recovery, migrate, relocate) and their responsible LRMs
are already in idle state, which makes them irresponsive to the HA
Manager while they are in disarm mode.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/test/test-disarm-idle-lrm1/README | 8 +++
src/test/test-disarm-idle-lrm1/cmdlist | 3 +
.../test-disarm-idle-lrm1/hardware_status | 5 ++
src/test/test-disarm-idle-lrm1/log.expect | 59 +++++++++++++++++++
src/test/test-disarm-idle-lrm1/manager_status | 26 ++++++++
src/test/test-disarm-idle-lrm1/service_config | 5 ++
src/test/test-disarm-idle-lrm2/README | 8 +++
src/test/test-disarm-idle-lrm2/cmdlist | 3 +
.../test-disarm-idle-lrm2/hardware_status | 5 ++
src/test/test-disarm-idle-lrm2/log.expect | 56 ++++++++++++++++++
src/test/test-disarm-idle-lrm2/manager_status | 26 ++++++++
src/test/test-disarm-idle-lrm2/service_config | 5 ++
12 files changed, 209 insertions(+)
create mode 100644 src/test/test-disarm-idle-lrm1/README
create mode 100644 src/test/test-disarm-idle-lrm1/cmdlist
create mode 100644 src/test/test-disarm-idle-lrm1/hardware_status
create mode 100644 src/test/test-disarm-idle-lrm1/log.expect
create mode 100644 src/test/test-disarm-idle-lrm1/manager_status
create mode 100644 src/test/test-disarm-idle-lrm1/service_config
create mode 100644 src/test/test-disarm-idle-lrm2/README
create mode 100644 src/test/test-disarm-idle-lrm2/cmdlist
create mode 100644 src/test/test-disarm-idle-lrm2/hardware_status
create mode 100644 src/test/test-disarm-idle-lrm2/log.expect
create mode 100644 src/test/test-disarm-idle-lrm2/manager_status
create mode 100644 src/test/test-disarm-idle-lrm2/service_config
diff --git a/src/test/test-disarm-idle-lrm1/README b/src/test/test-disarm-idle-lrm1/README
new file mode 100644
index 00000000..6d5124cd
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/README
@@ -0,0 +1,8 @@
+Disarm the HA stack in 'ignore' mode (keep HA resources in their previous
+state) while there is an HA resource in a transient state on a node, whose LRM
+is idle.
+
+Fenced HA resources are already handled by the HA Manager itself, so this works
+as expected. Though as the LRM for the moving HA resource is idle, the HA
+Manager doesn't get any LRM response for the HA resource, for which the
+disarming is deferred, and therefore the HA Manager is stuck in a loop.
diff --git a/src/test/test-disarm-idle-lrm1/cmdlist b/src/test/test-disarm-idle-lrm1/cmdlist
new file mode 100644
index 00000000..a29cf8e3
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha ignore" ]
+]
diff --git a/src/test/test-disarm-idle-lrm1/hardware_status b/src/test/test-disarm-idle-lrm1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-idle-lrm1/log.expect b/src/test/test-disarm-idle-lrm1/log.expect
new file mode 100644
index 00000000..1b7f4ece
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute crm node1 disarm-ha ignore
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: got crm command: disarm-ha ignore
+info 20 node1/crm: disarm: suspending HA tracking for service 'vm:101'
+info 20 node1/crm: disarm: suspending HA tracking for service 'vm:102'
+info 20 node1/crm: disarm: suspending HA tracking for service 'vm:103'
+warn 20 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info 20 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 20 node1/crm: got lock 'ha_agent_node2_lock'
+info 20 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 20 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 20 node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1'
+info 20 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 40 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 60 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 80 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 100 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 120 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 140 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 160 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 180 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 200 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 220 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 240 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 260 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 280 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 300 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 320 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 340 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 360 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 380 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 400 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 420 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 440 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 460 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 480 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 500 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 520 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 540 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 560 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 580 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 600 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-disarm-idle-lrm1/manager_status b/src/test/test-disarm-idle-lrm1/manager_status
new file mode 100644
index 00000000..ba7448a0
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/manager_status
@@ -0,0 +1,26 @@
+{
+ "master_node": "node1",
+ "node_status": {
+ "node1":"online",
+ "node2":"fence",
+ "node3":"online"
+ },
+ "service_status": {
+ "vm:101": {
+ "node": "node1",
+ "state": "started",
+ "uid": "lavE3c7vLnotUBGT9whswg"
+ },
+ "vm:102": {
+ "node": "node2",
+ "state": "fence",
+ "uid": "lavE3c7vLnotUBGT9whswh"
+ },
+ "vm:103": {
+ "node": "node3",
+ "state": "migrate",
+ "target": "node2",
+ "uid": "lavE3c7vLnotUBGT9whswj"
+ }
+ }
+}
diff --git a/src/test/test-disarm-idle-lrm1/service_config b/src/test/test-disarm-idle-lrm1/service_config
new file mode 100644
index 00000000..4b26f6b4
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-idle-lrm2/README b/src/test/test-disarm-idle-lrm2/README
new file mode 100644
index 00000000..d1731578
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/README
@@ -0,0 +1,8 @@
+Disarm the HA stack in 'freeze' mode (keep HA resources frozen while disarmed)
+while there is an HA resource in a transient state on a node, whose LRM is
+idle.
+
+Fenced HA resources are already handled by the HA Manager itself, so this works
+as expected. Though as the LRM for the moving HA resource is idle, the HA
+Manager doesn't get any LRM response for the HA resource, for which the
+disarming is deferred, and therefore the HA Manager is stuck in a loop.
diff --git a/src/test/test-disarm-idle-lrm2/cmdlist b/src/test/test-disarm-idle-lrm2/cmdlist
new file mode 100644
index 00000000..5a46a662
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha freeze" ]
+]
diff --git a/src/test/test-disarm-idle-lrm2/hardware_status b/src/test/test-disarm-idle-lrm2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-idle-lrm2/log.expect b/src/test/test-disarm-idle-lrm2/log.expect
new file mode 100644
index 00000000..d0ba96ff
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/log.expect
@@ -0,0 +1,56 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute crm node1 disarm-ha freeze
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: got crm command: disarm-ha freeze
+warn 20 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info 20 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 20 node1/crm: got lock 'ha_agent_node2_lock'
+info 20 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 20 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 20 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 20 node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1'
+info 20 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 40 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 60 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 80 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 100 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 120 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 140 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 160 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 180 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 200 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 220 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 240 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 260 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 280 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 300 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 320 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 340 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 360 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 380 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 400 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 420 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 440 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 460 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 480 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 500 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 520 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 540 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 560 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 580 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 600 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-disarm-idle-lrm2/manager_status b/src/test/test-disarm-idle-lrm2/manager_status
new file mode 100644
index 00000000..c28f4ffd
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/manager_status
@@ -0,0 +1,26 @@
+{
+ "master_node": "node1",
+ "node_status": {
+ "node1":"online",
+ "node2":"fence",
+ "node3":"online"
+ },
+ "service_status": {
+ "vm:101": {
+ "node": "node1",
+ "state": "online",
+ "uid": "lavE3c7vLnotUBGT9whswg"
+ },
+ "vm:102": {
+ "node": "node2",
+ "state": "fence",
+ "uid": "lavE3c7vLnotUBGT9whswh"
+ },
+ "vm:103": {
+ "node": "node3",
+ "state": "migrate",
+ "target": "node2",
+ "uid": "lavE3c7vLnotUBGT9whswj"
+ }
+ }
+}
diff --git a/src/test/test-disarm-idle-lrm2/service_config b/src/test/test-disarm-idle-lrm2/service_config
new file mode 100644
index 00000000..4b26f6b4
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
--
2.47.3
next prev parent reply other threads:[~2026-05-19 14:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` Daniel Kral [this message]
2026-05-19 14:38 ` [PATCH ha-manager 2/2] " Daniel Kral
2026-05-19 16:00 ` Fiona Ebner
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00 ` Fiona Ebner
2026-05-19 20:11 ` applied: " Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260519143842.382324-2-d.kral@proxmox.com \
--to=d.kral@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.