From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 4172B1FF144 for ; Tue, 24 Mar 2026 19:33:19 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id CECC11B0F4; Tue, 24 Mar 2026 19:31:37 +0100 (CET) From: Daniel Kral To: pve-devel@lists.proxmox.com Subject: [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Date: Tue, 24 Mar 2026 19:30:16 +0100 Message-ID: <20260324183029.1274972-33-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260324183029.1274972-1-d.kral@proxmox.com> References: <20260324183029.1274972-1-d.kral@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774376989693 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.059 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 2YKXSK35HMNP72HAC67U4PPOAAA2C5CQ X-Message-ID-Hash: 2YKXSK35HMNP72HAC67U4PPOAAA2C5CQ X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: These test cases document the basic behavior of the scheduler using the dynamic usage information of the HA resources with rebalance-on-start being cleared and set respectively. As the mechanisms for the scheduler with static and dynamic usage information are mostly the same, these test cases verify only the essential parts, which are: - dynamic usage information is used correctly (for both test cases), and - repeatedly scheduling resources with score_nodes_to_start_service(...) correctly simulates that the previously scheduled HA resources are already started Signed-off-by: Daniel Kral --- changes v1 -> v2: - new! src/test/test-crs-dynamic-rebalance1/README | 3 + src/test/test-crs-dynamic-rebalance1/cmdlist | 4 + .../datacenter.cfg | 7 ++ .../dynamic_service_stats | 7 ++ .../hardware_status | 5 ++ .../test-crs-dynamic-rebalance1/log.expect | 88 +++++++++++++++++++ .../manager_status | 1 + .../service_config | 7 ++ .../static_service_stats | 7 ++ src/test/test-crs-dynamic1/README | 4 + src/test/test-crs-dynamic1/cmdlist | 4 + src/test/test-crs-dynamic1/datacenter.cfg | 6 ++ .../test-crs-dynamic1/dynamic_service_stats | 3 + src/test/test-crs-dynamic1/hardware_status | 5 ++ src/test/test-crs-dynamic1/log.expect | 51 +++++++++++ src/test/test-crs-dynamic1/manager_status | 1 + src/test/test-crs-dynamic1/service_config | 3 + .../test-crs-dynamic1/static_service_stats | 3 + 18 files changed, 209 insertions(+) create mode 100644 src/test/test-crs-dynamic-rebalance1/README create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats create mode 100644 src/test/test-crs-dynamic1/README create mode 100644 src/test/test-crs-dynamic1/cmdlist create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats create mode 100644 src/test/test-crs-dynamic1/hardware_status create mode 100644 src/test/test-crs-dynamic1/log.expect create mode 100644 src/test/test-crs-dynamic1/manager_status create mode 100644 src/test/test-crs-dynamic1/service_config create mode 100644 src/test/test-crs-dynamic1/static_service_stats diff --git a/src/test/test-crs-dynamic-rebalance1/README b/src/test/test-crs-dynamic-rebalance1/README new file mode 100644 index 00000000..df0ba0a8 --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/README @@ -0,0 +1,3 @@ +Test rebalancing on start and how after a failed node the recovery gets +balanced out for a small batch of HA resources with the dynamic usage +information. diff --git a/src/test/test-crs-dynamic-rebalance1/cmdlist b/src/test/test-crs-dynamic-rebalance1/cmdlist new file mode 100644 index 00000000..eee0e40e --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on"], + [ "network node3 off" ] +] diff --git a/src/test/test-crs-dynamic-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg new file mode 100644 index 00000000..0f76d24e --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg @@ -0,0 +1,7 @@ +{ + "crs": { + "ha": "dynamic", + "ha-rebalance-on-start": 1 + } +} + diff --git a/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats new file mode 100644 index 00000000..5ef75ae0 --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats @@ -0,0 +1,7 @@ +{ + "vm:101": { "cpu": 1.3, "mem": 1073741824 }, + "vm:102": { "cpu": 5.6, "mem": 3221225472 }, + "vm:103": { "cpu": 0.5, "mem": 4000000000 }, + "vm:104": { "cpu": 7.9, "mem": 2147483648 }, + "vm:105": { "cpu": 3.2, "mem": 2684354560 } +} diff --git a/src/test/test-crs-dynamic-rebalance1/hardware_status b/src/test/test-crs-dynamic-rebalance1/hardware_status new file mode 100644 index 00000000..bfdbbf7b --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 }, + "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 }, + "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 } +} diff --git a/src/test/test-crs-dynamic-rebalance1/log.expect b/src/test/test-crs-dynamic-rebalance1/log.expect new file mode 100644 index 00000000..4017f7be --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/log.expect @@ -0,0 +1,88 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: using scheduler mode 'dynamic' +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'vm:101' on node 'node3' +info 20 node1/crm: adding new service 'vm:102' on node 'node3' +info 20 node1/crm: adding new service 'vm:103' on node 'node3' +info 20 node1/crm: adding new service 'vm:104' on node 'node3' +info 20 node1/crm: adding new service 'vm:105' on node 'node3' +info 20 node1/crm: service vm:101: re-balance selected new node node1 for startup +info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1) +info 20 node1/crm: service vm:102: re-balance selected new node node2 for startup +info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2) +info 20 node1/crm: service vm:103: re-balance selected current node node3 for startup +info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3) +info 20 node1/crm: service vm:104: re-balance selected current node node3 for startup +info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node3) +info 20 node1/crm: service vm:105: re-balance selected current node node3 for startup +info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3) +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 24 node3/crm: status change wait_for_quorum => slave +info 25 node3/lrm: got lock 'ha_agent_node3_lock' +info 25 node3/lrm: status change wait_for_agent_lock => active +info 25 node3/lrm: service vm:101 - start relocate to node 'node1' +info 25 node3/lrm: service vm:101 - end relocate to node 'node1' +info 25 node3/lrm: service vm:102 - start relocate to node 'node2' +info 25 node3/lrm: service vm:102 - end relocate to node 'node2' +info 25 node3/lrm: starting service vm:103 +info 25 node3/lrm: service status vm:103 started +info 25 node3/lrm: starting service vm:104 +info 25 node3/lrm: service status vm:104 started +info 25 node3/lrm: starting service vm:105 +info 25 node3/lrm: service status vm:105 started +info 40 node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started' (node = node1) +info 40 node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started' (node = node2) +info 41 node1/lrm: starting service vm:101 +info 41 node1/lrm: service status vm:101 started +info 43 node2/lrm: starting service vm:102 +info 43 node2/lrm: service status vm:102 started +info 120 cmdlist: execute network node3 off +info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown' +info 124 node3/crm: status change slave => wait_for_quorum +info 125 node3/lrm: status change active => lost_agent_lock +info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence' +info 160 node1/crm: service 'vm:104': state changed from 'started' to 'fence' +info 160 node1/crm: service 'vm:105': state changed from 'started' to 'fence' +info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence' +emai 160 node1/crm: FENCE: Try to fence node 'node3' +info 166 watchdog: execute power node3 off +info 165 node3/crm: killed by poweroff +info 166 node3/lrm: killed by poweroff +info 166 hardware: server 'node3' stopped by poweroff (watchdog) +info 240 node1/crm: got lock 'ha_agent_node3_lock' +info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown' +emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery' +info 240 node1/crm: service 'vm:104': state changed from 'fence' to 'recovery' +info 240 node1/crm: service 'vm:105': state changed from 'fence' to 'recovery' +info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node1' +info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node1) +info 240 node1/crm: recover service 'vm:104' from fenced node 'node3' to node 'node1' +info 240 node1/crm: service 'vm:104': state changed from 'recovery' to 'started' (node = node1) +info 240 node1/crm: recover service 'vm:105' from fenced node 'node3' to node 'node1' +info 240 node1/crm: service 'vm:105': state changed from 'recovery' to 'started' (node = node1) +info 241 node1/lrm: starting service vm:103 +info 241 node1/lrm: service status vm:103 started +info 241 node1/lrm: starting service vm:104 +info 241 node1/lrm: service status vm:104 started +info 241 node1/lrm: starting service vm:105 +info 241 node1/lrm: service status vm:105 started +info 720 hardware: exit simulation - done diff --git a/src/test/test-crs-dynamic-rebalance1/manager_status b/src/test/test-crs-dynamic-rebalance1/manager_status new file mode 100644 index 00000000..9e26dfee --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/manager_status @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/src/test/test-crs-dynamic-rebalance1/service_config b/src/test/test-crs-dynamic-rebalance1/service_config new file mode 100644 index 00000000..3071f480 --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/service_config @@ -0,0 +1,7 @@ +{ + "vm:101": { "node": "node3", "state": "started" }, + "vm:102": { "node": "node3", "state": "started" }, + "vm:103": { "node": "node3", "state": "started" }, + "vm:104": { "node": "node3", "state": "started" }, + "vm:105": { "node": "node3", "state": "started" } +} diff --git a/src/test/test-crs-dynamic-rebalance1/static_service_stats b/src/test/test-crs-dynamic-rebalance1/static_service_stats new file mode 100644 index 00000000..a9e810d7 --- /dev/null +++ b/src/test/test-crs-dynamic-rebalance1/static_service_stats @@ -0,0 +1,7 @@ +{ + "vm:101": { "maxcpu": 8, "maxmem": 4294967296 }, + "vm:102": { "maxcpu": 8, "maxmem": 4294967296 }, + "vm:103": { "maxcpu": 8, "maxmem": 4294967296 }, + "vm:104": { "maxcpu": 8, "maxmem": 4294967296 }, + "vm:105": { "maxcpu": 8, "maxmem": 4294967296 } +} diff --git a/src/test/test-crs-dynamic1/README b/src/test/test-crs-dynamic1/README new file mode 100644 index 00000000..e6382130 --- /dev/null +++ b/src/test/test-crs-dynamic1/README @@ -0,0 +1,4 @@ +Test how service recovery works with dynamic usage information. + +Expect that the single service gets recovered to the node with the most +available resources. diff --git a/src/test/test-crs-dynamic1/cmdlist b/src/test/test-crs-dynamic1/cmdlist new file mode 100644 index 00000000..8684073c --- /dev/null +++ b/src/test/test-crs-dynamic1/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on"], + [ "network node1 off" ] +] diff --git a/src/test/test-crs-dynamic1/datacenter.cfg b/src/test/test-crs-dynamic1/datacenter.cfg new file mode 100644 index 00000000..6a7fbc48 --- /dev/null +++ b/src/test/test-crs-dynamic1/datacenter.cfg @@ -0,0 +1,6 @@ +{ + "crs": { + "ha": "dynamic" + } +} + diff --git a/src/test/test-crs-dynamic1/dynamic_service_stats b/src/test/test-crs-dynamic1/dynamic_service_stats new file mode 100644 index 00000000..922ae9a6 --- /dev/null +++ b/src/test/test-crs-dynamic1/dynamic_service_stats @@ -0,0 +1,3 @@ +{ + "vm:102": { "cpu": 5.9, "mem": 2744123392 } +} diff --git a/src/test/test-crs-dynamic1/hardware_status b/src/test/test-crs-dynamic1/hardware_status new file mode 100644 index 00000000..bbe44a96 --- /dev/null +++ b/src/test/test-crs-dynamic1/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 100000000000 }, + "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 200000000000 }, + "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 300000000000 } +} diff --git a/src/test/test-crs-dynamic1/log.expect b/src/test/test-crs-dynamic1/log.expect new file mode 100644 index 00000000..b7e298e1 --- /dev/null +++ b/src/test/test-crs-dynamic1/log.expect @@ -0,0 +1,51 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: using scheduler mode 'dynamic' +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'vm:102' on node 'node1' +info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1) +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 21 node1/lrm: starting service vm:102 +info 21 node1/lrm: service status vm:102 started +info 22 node2/crm: status change wait_for_quorum => slave +info 24 node3/crm: status change wait_for_quorum => slave +info 120 cmdlist: execute network node1 off +info 120 node1/crm: status change master => lost_manager_lock +info 120 node1/crm: status change lost_manager_lock => wait_for_quorum +info 121 node1/lrm: status change active => lost_agent_lock +info 162 watchdog: execute power node1 off +info 161 node1/crm: killed by poweroff +info 162 node1/lrm: killed by poweroff +info 162 hardware: server 'node1' stopped by poweroff (watchdog) +info 222 node3/crm: got lock 'ha_manager_lock' +info 222 node3/crm: status change slave => master +info 222 node3/crm: using scheduler mode 'dynamic' +info 222 node3/crm: node 'node1': state changed from 'online' => 'unknown' +info 282 node3/crm: service 'vm:102': state changed from 'started' to 'fence' +info 282 node3/crm: node 'node1': state changed from 'unknown' => 'fence' +emai 282 node3/crm: FENCE: Try to fence node 'node1' +info 282 node3/crm: got lock 'ha_agent_node1_lock' +info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node1' +info 282 node3/crm: node 'node1': state changed from 'fence' => 'unknown' +emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1' +info 282 node3/crm: service 'vm:102': state changed from 'fence' to 'recovery' +info 282 node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3' +info 282 node3/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node3) +info 283 node3/lrm: got lock 'ha_agent_node3_lock' +info 283 node3/lrm: status change wait_for_agent_lock => active +info 283 node3/lrm: starting service vm:102 +info 283 node3/lrm: service status vm:102 started +info 720 hardware: exit simulation - done diff --git a/src/test/test-crs-dynamic1/manager_status b/src/test/test-crs-dynamic1/manager_status new file mode 100644 index 00000000..0967ef42 --- /dev/null +++ b/src/test/test-crs-dynamic1/manager_status @@ -0,0 +1 @@ +{} diff --git a/src/test/test-crs-dynamic1/service_config b/src/test/test-crs-dynamic1/service_config new file mode 100644 index 00000000..9c124471 --- /dev/null +++ b/src/test/test-crs-dynamic1/service_config @@ -0,0 +1,3 @@ +{ + "vm:102": { "node": "node1", "state": "enabled" } +} diff --git a/src/test/test-crs-dynamic1/static_service_stats b/src/test/test-crs-dynamic1/static_service_stats new file mode 100644 index 00000000..1819d24c --- /dev/null +++ b/src/test/test-crs-dynamic1/static_service_stats @@ -0,0 +1,3 @@ +{ + "vm:102": { "maxcpu": 8, "maxmem": 4294967296 } +} -- 2.47.3