all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources
Date: Tue, 19 May 2026 16:38:35 +0200	[thread overview]
Message-ID: <20260519143842.382324-2-d.kral@proxmox.com> (raw)
In-Reply-To: <20260519143842.382324-1-d.kral@proxmox.com>

These test cases document how the HA stack currently behaves if the HA
stack is disarmed while there are HA resources in disarm-deferring
states (fence, recovery, migrate, relocate) and their responsible LRMs
are already in idle state, which makes them irresponsive to the HA
Manager while they are in disarm mode.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/test/test-disarm-idle-lrm1/README         |  8 +++
 src/test/test-disarm-idle-lrm1/cmdlist        |  3 +
 .../test-disarm-idle-lrm1/hardware_status     |  5 ++
 src/test/test-disarm-idle-lrm1/log.expect     | 59 +++++++++++++++++++
 src/test/test-disarm-idle-lrm1/manager_status | 26 ++++++++
 src/test/test-disarm-idle-lrm1/service_config |  5 ++
 src/test/test-disarm-idle-lrm2/README         |  8 +++
 src/test/test-disarm-idle-lrm2/cmdlist        |  3 +
 .../test-disarm-idle-lrm2/hardware_status     |  5 ++
 src/test/test-disarm-idle-lrm2/log.expect     | 56 ++++++++++++++++++
 src/test/test-disarm-idle-lrm2/manager_status | 26 ++++++++
 src/test/test-disarm-idle-lrm2/service_config |  5 ++
 12 files changed, 209 insertions(+)
 create mode 100644 src/test/test-disarm-idle-lrm1/README
 create mode 100644 src/test/test-disarm-idle-lrm1/cmdlist
 create mode 100644 src/test/test-disarm-idle-lrm1/hardware_status
 create mode 100644 src/test/test-disarm-idle-lrm1/log.expect
 create mode 100644 src/test/test-disarm-idle-lrm1/manager_status
 create mode 100644 src/test/test-disarm-idle-lrm1/service_config
 create mode 100644 src/test/test-disarm-idle-lrm2/README
 create mode 100644 src/test/test-disarm-idle-lrm2/cmdlist
 create mode 100644 src/test/test-disarm-idle-lrm2/hardware_status
 create mode 100644 src/test/test-disarm-idle-lrm2/log.expect
 create mode 100644 src/test/test-disarm-idle-lrm2/manager_status
 create mode 100644 src/test/test-disarm-idle-lrm2/service_config

diff --git a/src/test/test-disarm-idle-lrm1/README b/src/test/test-disarm-idle-lrm1/README
new file mode 100644
index 00000000..6d5124cd
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/README
@@ -0,0 +1,8 @@
+Disarm the HA stack in 'ignore' mode (keep HA resources in their previous
+state) while there is an HA resource in a transient state on a node, whose LRM
+is idle.
+
+Fenced HA resources are already handled by the HA Manager itself, so this works
+as expected. Though as the LRM for the moving HA resource is idle, the HA
+Manager doesn't get any LRM response for the HA resource, for which the
+disarming is deferred, and therefore the HA Manager is stuck in a loop.
diff --git a/src/test/test-disarm-idle-lrm1/cmdlist b/src/test/test-disarm-idle-lrm1/cmdlist
new file mode 100644
index 00000000..a29cf8e3
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha ignore" ]
+]
diff --git a/src/test/test-disarm-idle-lrm1/hardware_status b/src/test/test-disarm-idle-lrm1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-idle-lrm1/log.expect b/src/test/test-disarm-idle-lrm1/log.expect
new file mode 100644
index 00000000..1b7f4ece
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute crm node1 disarm-ha ignore
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: got crm command: disarm-ha ignore
+info     20    node1/crm: disarm: suspending HA tracking for service 'vm:101'
+info     20    node1/crm: disarm: suspending HA tracking for service 'vm:102'
+info     20    node1/crm: disarm: suspending HA tracking for service 'vm:103'
+warn     20    node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info     20    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     20    node1/crm: got lock 'ha_agent_node2_lock'
+info     20    node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info     20    node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai     20    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info     20    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info     20    node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1'
+info     20    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node1)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     40    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     60    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     80    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    100    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    120    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    140    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    160    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    180    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    200    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    220    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    240    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    260    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    280    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    300    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    320    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    340    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    360    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    380    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    400    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    420    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    440    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    460    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    480    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    500    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    520    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    540    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    560    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    580    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    600    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-disarm-idle-lrm1/manager_status b/src/test/test-disarm-idle-lrm1/manager_status
new file mode 100644
index 00000000..ba7448a0
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/manager_status
@@ -0,0 +1,26 @@
+{
+    "master_node": "node1",
+    "node_status": {
+	"node1":"online",
+	"node2":"fence",
+	"node3":"online"
+    },
+    "service_status": {
+	"vm:101": {
+	    "node": "node1",
+	    "state": "started",
+	    "uid": "lavE3c7vLnotUBGT9whswg"
+	},
+	"vm:102": {
+	    "node": "node2",
+	    "state": "fence",
+	    "uid": "lavE3c7vLnotUBGT9whswh"
+	},
+	"vm:103": {
+	    "node": "node3",
+	    "state": "migrate",
+	    "target": "node2",
+	    "uid": "lavE3c7vLnotUBGT9whswj"
+	}
+    }
+}
diff --git a/src/test/test-disarm-idle-lrm1/service_config b/src/test/test-disarm-idle-lrm1/service_config
new file mode 100644
index 00000000..4b26f6b4
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-disarm-idle-lrm2/README b/src/test/test-disarm-idle-lrm2/README
new file mode 100644
index 00000000..d1731578
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/README
@@ -0,0 +1,8 @@
+Disarm the HA stack in 'freeze' mode (keep HA resources frozen while disarmed)
+while there is an HA resource in a transient state on a node, whose LRM is
+idle.
+
+Fenced HA resources are already handled by the HA Manager itself, so this works
+as expected. Though as the LRM for the moving HA resource is idle, the HA
+Manager doesn't get any LRM response for the HA resource, for which the
+disarming is deferred, and therefore the HA Manager is stuck in a loop.
diff --git a/src/test/test-disarm-idle-lrm2/cmdlist b/src/test/test-disarm-idle-lrm2/cmdlist
new file mode 100644
index 00000000..5a46a662
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha freeze" ]
+]
diff --git a/src/test/test-disarm-idle-lrm2/hardware_status b/src/test/test-disarm-idle-lrm2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-disarm-idle-lrm2/log.expect b/src/test/test-disarm-idle-lrm2/log.expect
new file mode 100644
index 00000000..d0ba96ff
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/log.expect
@@ -0,0 +1,56 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute crm node1 disarm-ha freeze
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: got crm command: disarm-ha freeze
+warn     20    node1/crm: deferring disarm - service 'vm:102' is in 'fence' state
+info     20    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     20    node1/crm: got lock 'ha_agent_node2_lock'
+info     20    node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info     20    node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai     20    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info     20    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info     20    node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1'
+info     20    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node1)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     40    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     60    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     80    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    100    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    120    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    140    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    160    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    180    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    200    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    220    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    240    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    260    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    280    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    300    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    320    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    340    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    360    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    380    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    400    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    420    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    440    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    460    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    480    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    500    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    520    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    540    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    560    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    580    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    600    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-disarm-idle-lrm2/manager_status b/src/test/test-disarm-idle-lrm2/manager_status
new file mode 100644
index 00000000..c28f4ffd
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/manager_status
@@ -0,0 +1,26 @@
+{
+    "master_node": "node1",
+    "node_status": {
+	"node1":"online",
+	"node2":"fence",
+	"node3":"online"
+    },
+    "service_status": {
+	"vm:101": {
+	    "node": "node1",
+	    "state": "online",
+	    "uid": "lavE3c7vLnotUBGT9whswg"
+	},
+	"vm:102": {
+	    "node": "node2",
+	    "state": "fence",
+	    "uid": "lavE3c7vLnotUBGT9whswh"
+	},
+	"vm:103": {
+	    "node": "node3",
+	    "state": "migrate",
+	    "target": "node2",
+	    "uid": "lavE3c7vLnotUBGT9whswj"
+	}
+    }
+}
diff --git a/src/test/test-disarm-idle-lrm2/service_config b/src/test/test-disarm-idle-lrm2/service_config
new file mode 100644
index 00000000..4b26f6b4
--- /dev/null
+++ b/src/test/test-disarm-idle-lrm2/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" }
+}
-- 
2.47.3





  reply	other threads:[~2026-05-19 14:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` Daniel Kral [this message]
2026-05-19 14:38 ` [PATCH ha-manager 2/2] " Daniel Kral
2026-05-19 16:00   ` Fiona Ebner
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00   ` Fiona Ebner
2026-05-19 20:11 ` applied: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519143842.382324-2-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal