From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id D6AB11FF141 for ; Tue, 19 May 2026 16:38:54 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 55DD7AAE5; Tue, 19 May 2026 16:38:52 +0200 (CEST) From: Daniel Kral To: pve-devel@lists.proxmox.com Subject: [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources Date: Tue, 19 May 2026 16:38:35 +0200 Message-ID: <20260519143842.382324-2-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260519143842.382324-1-d.kral@proxmox.com> References: <20260519143842.382324-1-d.kral@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1779201511802 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.075 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: ZEARLP4FOLESLK7NV6QA5MV7QEHSLVQR X-Message-ID-Hash: ZEARLP4FOLESLK7NV6QA5MV7QEHSLVQR X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: These test cases document how the HA stack currently behaves if the HA stack is disarmed while there are HA resources in disarm-deferring states (fence, recovery, migrate, relocate) and their responsible LRMs are already in idle state, which makes them irresponsive to the HA Manager while they are in disarm mode. Signed-off-by: Daniel Kral --- src/test/test-disarm-idle-lrm1/README | 8 +++ src/test/test-disarm-idle-lrm1/cmdlist | 3 + .../test-disarm-idle-lrm1/hardware_status | 5 ++ src/test/test-disarm-idle-lrm1/log.expect | 59 +++++++++++++++++++ src/test/test-disarm-idle-lrm1/manager_status | 26 ++++++++ src/test/test-disarm-idle-lrm1/service_config | 5 ++ src/test/test-disarm-idle-lrm2/README | 8 +++ src/test/test-disarm-idle-lrm2/cmdlist | 3 + .../test-disarm-idle-lrm2/hardware_status | 5 ++ src/test/test-disarm-idle-lrm2/log.expect | 56 ++++++++++++++++++ src/test/test-disarm-idle-lrm2/manager_status | 26 ++++++++ src/test/test-disarm-idle-lrm2/service_config | 5 ++ 12 files changed, 209 insertions(+) create mode 100644 src/test/test-disarm-idle-lrm1/README create mode 100644 src/test/test-disarm-idle-lrm1/cmdlist create mode 100644 src/test/test-disarm-idle-lrm1/hardware_status create mode 100644 src/test/test-disarm-idle-lrm1/log.expect create mode 100644 src/test/test-disarm-idle-lrm1/manager_status create mode 100644 src/test/test-disarm-idle-lrm1/service_config create mode 100644 src/test/test-disarm-idle-lrm2/README create mode 100644 src/test/test-disarm-idle-lrm2/cmdlist create mode 100644 src/test/test-disarm-idle-lrm2/hardware_status create mode 100644 src/test/test-disarm-idle-lrm2/log.expect create mode 100644 src/test/test-disarm-idle-lrm2/manager_status create mode 100644 src/test/test-disarm-idle-lrm2/service_config diff --git a/src/test/test-disarm-idle-lrm1/README b/src/test/test-disarm-idle-lrm1/README new file mode 100644 index 00000000..6d5124cd --- /dev/null +++ b/src/test/test-disarm-idle-lrm1/README @@ -0,0 +1,8 @@ +Disarm the HA stack in 'ignore' mode (keep HA resources in their previous +state) while there is an HA resource in a transient state on a node, whose LRM +is idle. + +Fenced HA resources are already handled by the HA Manager itself, so this works +as expected. Though as the LRM for the moving HA resource is idle, the HA +Manager doesn't get any LRM response for the HA resource, for which the +disarming is deferred, and therefore the HA Manager is stuck in a loop. diff --git a/src/test/test-disarm-idle-lrm1/cmdlist b/src/test/test-disarm-idle-lrm1/cmdlist new file mode 100644 index 00000000..a29cf8e3 --- /dev/null +++ b/src/test/test-disarm-idle-lrm1/cmdlist @@ -0,0 +1,3 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha ignore" ] +] diff --git a/src/test/test-disarm-idle-lrm1/hardware_status b/src/test/test-disarm-idle-lrm1/hardware_status new file mode 100644 index 00000000..451beb13 --- /dev/null +++ b/src/test/test-disarm-idle-lrm1/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" } +} diff --git a/src/test/test-disarm-idle-lrm1/log.expect b/src/test/test-disarm-idle-lrm1/log.expect new file mode 100644 index 00000000..1b7f4ece --- /dev/null +++ b/src/test/test-disarm-idle-lrm1/log.expect @@ -0,0 +1,59 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute crm node1 disarm-ha ignore +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: got crm command: disarm-ha ignore +info 20 node1/crm: disarm: suspending HA tracking for service 'vm:101' +info 20 node1/crm: disarm: suspending HA tracking for service 'vm:102' +info 20 node1/crm: disarm: suspending HA tracking for service 'vm:103' +warn 20 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state +info 20 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 20 node1/crm: got lock 'ha_agent_node2_lock' +info 20 node1/crm: fencing: acknowledged - got agent lock for node 'node2' +info 20 node1/crm: node 'node2': state changed from 'fence' => 'unknown' +emai 20 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2' +info 20 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery' +info 20 node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1' +info 20 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1) +info 22 node2/crm: status change wait_for_quorum => slave +info 24 node3/crm: status change wait_for_quorum => slave +info 40 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 40 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 60 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 80 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 100 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 120 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 140 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 160 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 180 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 200 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 220 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 240 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 260 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 280 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 300 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 320 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 340 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 360 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 380 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 400 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 420 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 440 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 460 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 480 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 500 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 520 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 540 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 560 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 580 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 600 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 620 hardware: exit simulation - done diff --git a/src/test/test-disarm-idle-lrm1/manager_status b/src/test/test-disarm-idle-lrm1/manager_status new file mode 100644 index 00000000..ba7448a0 --- /dev/null +++ b/src/test/test-disarm-idle-lrm1/manager_status @@ -0,0 +1,26 @@ +{ + "master_node": "node1", + "node_status": { + "node1":"online", + "node2":"fence", + "node3":"online" + }, + "service_status": { + "vm:101": { + "node": "node1", + "state": "started", + "uid": "lavE3c7vLnotUBGT9whswg" + }, + "vm:102": { + "node": "node2", + "state": "fence", + "uid": "lavE3c7vLnotUBGT9whswh" + }, + "vm:103": { + "node": "node3", + "state": "migrate", + "target": "node2", + "uid": "lavE3c7vLnotUBGT9whswj" + } + } +} diff --git a/src/test/test-disarm-idle-lrm1/service_config b/src/test/test-disarm-idle-lrm1/service_config new file mode 100644 index 00000000..4b26f6b4 --- /dev/null +++ b/src/test/test-disarm-idle-lrm1/service_config @@ -0,0 +1,5 @@ +{ + "vm:101": { "node": "node1", "state": "started" }, + "vm:102": { "node": "node2", "state": "started" }, + "vm:103": { "node": "node3", "state": "started" } +} diff --git a/src/test/test-disarm-idle-lrm2/README b/src/test/test-disarm-idle-lrm2/README new file mode 100644 index 00000000..d1731578 --- /dev/null +++ b/src/test/test-disarm-idle-lrm2/README @@ -0,0 +1,8 @@ +Disarm the HA stack in 'freeze' mode (keep HA resources frozen while disarmed) +while there is an HA resource in a transient state on a node, whose LRM is +idle. + +Fenced HA resources are already handled by the HA Manager itself, so this works +as expected. Though as the LRM for the moving HA resource is idle, the HA +Manager doesn't get any LRM response for the HA resource, for which the +disarming is deferred, and therefore the HA Manager is stuck in a loop. diff --git a/src/test/test-disarm-idle-lrm2/cmdlist b/src/test/test-disarm-idle-lrm2/cmdlist new file mode 100644 index 00000000..5a46a662 --- /dev/null +++ b/src/test/test-disarm-idle-lrm2/cmdlist @@ -0,0 +1,3 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on", "crm node1 disarm-ha freeze" ] +] diff --git a/src/test/test-disarm-idle-lrm2/hardware_status b/src/test/test-disarm-idle-lrm2/hardware_status new file mode 100644 index 00000000..451beb13 --- /dev/null +++ b/src/test/test-disarm-idle-lrm2/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" } +} diff --git a/src/test/test-disarm-idle-lrm2/log.expect b/src/test/test-disarm-idle-lrm2/log.expect new file mode 100644 index 00000000..d0ba96ff --- /dev/null +++ b/src/test/test-disarm-idle-lrm2/log.expect @@ -0,0 +1,56 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute crm node1 disarm-ha freeze +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: got crm command: disarm-ha freeze +warn 20 node1/crm: deferring disarm - service 'vm:102' is in 'fence' state +info 20 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 20 node1/crm: got lock 'ha_agent_node2_lock' +info 20 node1/crm: fencing: acknowledged - got agent lock for node 'node2' +info 20 node1/crm: node 'node2': state changed from 'fence' => 'unknown' +emai 20 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2' +info 20 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery' +info 20 node1/crm: recover service 'vm:102' from fenced node 'node2' to node 'node1' +info 20 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1) +info 22 node2/crm: status change wait_for_quorum => slave +info 24 node3/crm: status change wait_for_quorum => slave +info 40 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 40 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 60 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 80 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 100 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 120 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 140 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 160 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 180 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 200 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 220 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 240 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 260 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 280 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 300 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 320 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 340 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 360 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 380 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 400 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 420 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 440 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 460 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 480 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 500 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 520 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 540 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 560 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 580 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 600 node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state +info 620 hardware: exit simulation - done diff --git a/src/test/test-disarm-idle-lrm2/manager_status b/src/test/test-disarm-idle-lrm2/manager_status new file mode 100644 index 00000000..c28f4ffd --- /dev/null +++ b/src/test/test-disarm-idle-lrm2/manager_status @@ -0,0 +1,26 @@ +{ + "master_node": "node1", + "node_status": { + "node1":"online", + "node2":"fence", + "node3":"online" + }, + "service_status": { + "vm:101": { + "node": "node1", + "state": "online", + "uid": "lavE3c7vLnotUBGT9whswg" + }, + "vm:102": { + "node": "node2", + "state": "fence", + "uid": "lavE3c7vLnotUBGT9whswh" + }, + "vm:103": { + "node": "node3", + "state": "migrate", + "target": "node2", + "uid": "lavE3c7vLnotUBGT9whswj" + } + } +} diff --git a/src/test/test-disarm-idle-lrm2/service_config b/src/test/test-disarm-idle-lrm2/service_config new file mode 100644 index 00000000..4b26f6b4 --- /dev/null +++ b/src/test/test-disarm-idle-lrm2/service_config @@ -0,0 +1,5 @@ +{ + "vm:101": { "node": "node1", "state": "started" }, + "vm:102": { "node": "node2", "state": "started" }, + "vm:103": { "node": "node3", "state": "started" } +} -- 2.47.3