From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose colocation rules
Date: Tue, 25 Mar 2025 16:12:52 +0100 [thread overview]
Message-ID: <20250325151254.193177-15-d.kral@proxmox.com> (raw)
In-Reply-To: <20250325151254.193177-1-d.kral@proxmox.com>
Add test cases for loose positive and negative colocation rules, i.e.
where services should be kept on the same node together or kept separate
nodes. These are copies of their strict counterpart tests, but verify
the behavior if the colocation rule cannot be met, i.e. not adhering to
the colocation rule. The test scenarios are:
- 2 neg. colocated services in a 3 node cluster; 1 node failing
- 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
recovery node cannot start the service
- 2 pos. colocated services in a 3 node cluster; 1 node failing
- 3 pos. colocated services in a 3 node cluster; 1 node failing, but the
recovery node cannot start one of the services
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
.../test-colocation-loose-separate1/README | 13 +++
.../test-colocation-loose-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 ++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 ++
.../test-colocation-loose-separate4/README | 17 ++++
.../test-colocation-loose-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 73 +++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 ++
.../test-colocation-loose-together1/README | 11 +++
.../test-colocation-loose-together1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 66 +++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 ++
.../test-colocation-loose-together3/README | 16 ++++
.../test-colocation-loose-together3/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 93 +++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 ++
28 files changed, 431 insertions(+)
create mode 100644 src/test/test-colocation-loose-separate1/README
create mode 100644 src/test/test-colocation-loose-separate1/cmdlist
create mode 100644 src/test/test-colocation-loose-separate1/hardware_status
create mode 100644 src/test/test-colocation-loose-separate1/log.expect
create mode 100644 src/test/test-colocation-loose-separate1/manager_status
create mode 100644 src/test/test-colocation-loose-separate1/rules_config
create mode 100644 src/test/test-colocation-loose-separate1/service_config
create mode 100644 src/test/test-colocation-loose-separate4/README
create mode 100644 src/test/test-colocation-loose-separate4/cmdlist
create mode 100644 src/test/test-colocation-loose-separate4/hardware_status
create mode 100644 src/test/test-colocation-loose-separate4/log.expect
create mode 100644 src/test/test-colocation-loose-separate4/manager_status
create mode 100644 src/test/test-colocation-loose-separate4/rules_config
create mode 100644 src/test/test-colocation-loose-separate4/service_config
create mode 100644 src/test/test-colocation-loose-together1/README
create mode 100644 src/test/test-colocation-loose-together1/cmdlist
create mode 100644 src/test/test-colocation-loose-together1/hardware_status
create mode 100644 src/test/test-colocation-loose-together1/log.expect
create mode 100644 src/test/test-colocation-loose-together1/manager_status
create mode 100644 src/test/test-colocation-loose-together1/rules_config
create mode 100644 src/test/test-colocation-loose-together1/service_config
create mode 100644 src/test/test-colocation-loose-together3/README
create mode 100644 src/test/test-colocation-loose-together3/cmdlist
create mode 100644 src/test/test-colocation-loose-together3/hardware_status
create mode 100644 src/test/test-colocation-loose-together3/log.expect
create mode 100644 src/test/test-colocation-loose-together3/manager_status
create mode 100644 src/test/test-colocation-loose-together3/rules_config
create mode 100644 src/test/test-colocation-loose-together3/service_config
diff --git a/src/test/test-colocation-loose-separate1/README b/src/test/test-colocation-loose-separate1/README
new file mode 100644
index 0000000..ac7c395
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/README
@@ -0,0 +1,13 @@
+Test whether a loose negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other in case of a
+failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 should be kept separate
+- vm:101 and vm:102 are currently running on node2 and node3 respectively
+- node1 has a higher service count than node2 to test that the rule is applied
+ even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, vm:102 is migrated to node1; even though the utilization of
+ node1 is high already, the services must be kept separate
diff --git a/src/test/test-colocation-loose-separate1/cmdlist b/src/test/test-colocation-loose-separate1/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-separate1/hardware_status b/src/test/test-colocation-loose-separate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-separate1/log.expect b/src/test/test-colocation-loose-separate1/log.expect
new file mode 100644
index 0000000..475db39
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/log.expect
@@ -0,0 +1,60 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-separate1/manager_status b/src/test/test-colocation-loose-separate1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-separate1/rules_config b/src/test/test-colocation-loose-separate1/rules_config
new file mode 100644
index 0000000..5227309
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-should-vms-be
+ services vm:101,vm:102
+ affinity separate
+ strict 0
diff --git a/src/test/test-colocation-loose-separate1/service_config b/src/test/test-colocation-loose-separate1/service_config
new file mode 100644
index 0000000..6582e8c
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-separate4/README b/src/test/test-colocation-loose-separate4/README
new file mode 100644
index 0000000..5b68cde
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/README
@@ -0,0 +1,17 @@
+Test whether a loose negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other service in
+case of a failover of service's previously assigned node. As the service fails
+to start on the recovery node (e.g. insufficient resources), the failing
+service is kept on the recovery node.
+
+The test scenario is:
+- vm:101 and fa:120001 should be kept separate
+- vm:101 and fa:120001 are on node2 and node3 respectively
+- fa:120001 will fail to start on node1
+- node1 has a higher service count than node2 to test the colocation rule is
+ applied even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, fa:120001 is migrated to node1
+- fa:120001 will be relocated to another node, since it couldn't start on its
+ initial recovery node
diff --git a/src/test/test-colocation-loose-separate4/cmdlist b/src/test/test-colocation-loose-separate4/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-separate4/hardware_status b/src/test/test-colocation-loose-separate4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-separate4/log.expect b/src/test/test-colocation-loose-separate4/log.expect
new file mode 100644
index 0000000..bf70aca
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/log.expect
@@ -0,0 +1,73 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120001' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'fa:120001': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120001
+info 25 node3/lrm: service status fa:120001 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120001': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120001': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120001' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'fa:120001': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service fa:120001
+warn 241 node1/lrm: unable to start service fa:120001
+warn 241 node1/lrm: restart policy: retry number 1 for service 'fa:120001'
+info 261 node1/lrm: starting service fa:120001
+warn 261 node1/lrm: unable to start service fa:120001
+err 261 node1/lrm: unable to start service fa:120001 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120001 on node 'node1' failed, relocating service.
+info 280 node1/crm: relocate service 'fa:120001' to node 'node2'
+info 280 node1/crm: service 'fa:120001': state changed from 'started' to 'relocate' (node = node1, target = node2)
+info 281 node1/lrm: service fa:120001 - start relocate to node 'node2'
+info 281 node1/lrm: service fa:120001 - end relocate to node 'node2'
+info 300 node1/crm: service 'fa:120001': state changed from 'relocate' to 'started' (node = node2)
+info 303 node2/lrm: starting service fa:120001
+info 303 node2/lrm: service status fa:120001 started
+info 320 node1/crm: relocation policy successful for 'fa:120001' on node 'node2', failed nodes: node1
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-separate4/manager_status b/src/test/test-colocation-loose-separate4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-separate4/rules_config b/src/test/test-colocation-loose-separate4/rules_config
new file mode 100644
index 0000000..8a4b869
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-should-vms-be
+ services vm:101,fa:120001
+ affinity separate
+ strict 0
diff --git a/src/test/test-colocation-loose-separate4/service_config b/src/test/test-colocation-loose-separate4/service_config
new file mode 100644
index 0000000..f53c2bc
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "fa:120001": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-together1/README b/src/test/test-colocation-loose-together1/README
new file mode 100644
index 0000000..2f5aeec
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/README
@@ -0,0 +1,11 @@
+Test whether a loose positive colocation rule makes two services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 should be kept together
+- vm:101 and vm:102 are both currently running on node3
+- node1 and node2 have the same service count to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, both services are migrated to node2
diff --git a/src/test/test-colocation-loose-together1/cmdlist b/src/test/test-colocation-loose-together1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-together1/hardware_status b/src/test/test-colocation-loose-together1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-together1/log.expect b/src/test/test-colocation-loose-together1/log.expect
new file mode 100644
index 0000000..7d43314
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node1)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:101
+info 241 node1/lrm: service status vm:101 started
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-together1/manager_status b/src/test/test-colocation-loose-together1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-together1/rules_config b/src/test/test-colocation-loose-together1/rules_config
new file mode 100644
index 0000000..37f6aab
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-might-stick-together
+ services vm:101,vm:102
+ affinity together
+ strict 0
diff --git a/src/test/test-colocation-loose-together1/service_config b/src/test/test-colocation-loose-together1/service_config
new file mode 100644
index 0000000..9fb091d
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-together3/README b/src/test/test-colocation-loose-together3/README
new file mode 100644
index 0000000..c2aebcf
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/README
@@ -0,0 +1,16 @@
+Test whether a loose positive colocation rule makes three services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+If one of those fail to start on the recovery node (e.g. insufficient
+resources), the failed service will be relocated to another node.
+
+The test scenario is:
+- vm:101, vm:102, and fa:120002 should be kept together
+- vm:101, vm:102, and fa:120002 are all currently running on node3
+- fa:120002 will fail to start on node2
+- node1 has a higher service count than node2 to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, all services are migrated to node2
+- Two of those services will start successfully, but fa:120002 will be
+ relocated to another node, since it couldn't start on the same recovery node
diff --git a/src/test/test-colocation-loose-together3/cmdlist b/src/test/test-colocation-loose-together3/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-together3/hardware_status b/src/test/test-colocation-loose-together3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-together3/log.expect b/src/test/test-colocation-loose-together3/log.expect
new file mode 100644
index 0000000..6ca8053
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/log.expect
@@ -0,0 +1,93 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120002' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node2'
+info 20 node1/crm: service 'fa:120002': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:106
+info 23 node2/lrm: service status vm:106 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120002
+info 25 node3/lrm: service status fa:120002 started
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120002': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120002': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120002' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'fa:120002': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service fa:120002
+warn 243 node2/lrm: unable to start service fa:120002
+warn 243 node2/lrm: restart policy: retry number 1 for service 'fa:120002'
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 263 node2/lrm: starting service fa:120002
+warn 263 node2/lrm: unable to start service fa:120002
+err 263 node2/lrm: unable to start service fa:120002 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
+info 280 node1/crm: relocate service 'fa:120002' to node 'node1'
+info 280 node1/crm: service 'fa:120002': state changed from 'started' to 'relocate' (node = node2, target = node1)
+info 283 node2/lrm: service fa:120002 - start relocate to node 'node1'
+info 283 node2/lrm: service fa:120002 - end relocate to node 'node1'
+info 300 node1/crm: service 'fa:120002': state changed from 'relocate' to 'started' (node = node1)
+info 301 node1/lrm: starting service fa:120002
+info 301 node1/lrm: service status fa:120002 started
+info 320 node1/crm: relocation policy successful for 'fa:120002' on node 'node1', failed nodes: node2
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-together3/manager_status b/src/test/test-colocation-loose-together3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-loose-together3/rules_config b/src/test/test-colocation-loose-together3/rules_config
new file mode 100644
index 0000000..b43c087
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-might-stick-together
+ services vm:101,vm:102,fa:120002
+ affinity together
+ strict 0
diff --git a/src/test/test-colocation-loose-together3/service_config b/src/test/test-colocation-loose-together3/service_config
new file mode 100644
index 0000000..3ce5f27
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/service_config
@@ -0,0 +1,8 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "fa:120002": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node2", "state": "started" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-03-25 15:15 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA " Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
2025-03-25 17:49 ` [pve-devel] applied: " Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
2025-03-25 17:53 ` Thomas Lamprecht
2025-04-03 12:16 ` Fabian Grünbichler
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
2025-04-03 12:16 ` Fabian Grünbichler
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17 ` Fabian Grünbichler
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
2025-03-25 15:12 ` Daniel Kral [this message]
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-04-01 1:50 ` DERUMIER, Alexandre
2025-04-01 9:39 ` Daniel Kral
2025-04-01 11:05 ` DERUMIER, Alexandre via pve-devel
2025-04-03 12:26 ` Fabian Grünbichler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250325151254.193177-15-d.kral@proxmox.com \
--to=d.kral@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal