From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 5321B1FF15E for <inbox@lore.proxmox.com>; Tue, 25 Mar 2025 16:15:09 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 38C8BD1D1; Tue, 25 Mar 2025 16:14:08 +0100 (CET) From: Daniel Kral <d.kral@proxmox.com> To: pve-devel@lists.proxmox.com Date: Tue, 25 Mar 2025 16:12:50 +0100 Message-Id: <20250325151254.193177-13-d.kral@proxmox.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250325151254.193177-1-d.kral@proxmox.com> References: <20250325151254.193177-1-d.kral@proxmox.com> MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.012 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> Add test cases for strict negative colocation rules, i.e. where services must be kept on separate nodes. These verify the behavior of the services in strict negative colocation rules in case of a failover of the node of one or more of these services in the following scenarios: - 2 neg. colocated services in a 3 node cluster; 1 node failing - 3 neg. colocated services in a 5 node cluster; 1 node failing - 3 neg. colocated services in a 5 node cluster; 2 nodes failing - 2 neg. colocated services in a 3 node cluster; 1 node failing, but the recovery node cannot start the service - Pair of 2 neg. colocated services (with one common service in both) in a 3 node cluster; 1 node failing Signed-off-by: Daniel Kral <d.kral@proxmox.com> --- .../test-colocation-strict-separate1/README | 13 +++ .../test-colocation-strict-separate1/cmdlist | 4 + .../hardware_status | 5 + .../log.expect | 60 ++++++++++ .../manager_status | 1 + .../rules_config | 4 + .../service_config | 6 + .../test-colocation-strict-separate2/README | 15 +++ .../test-colocation-strict-separate2/cmdlist | 4 + .../hardware_status | 7 ++ .../log.expect | 90 ++++++++++++++ .../manager_status | 1 + .../rules_config | 4 + .../service_config | 10 ++ .../test-colocation-strict-separate3/README | 16 +++ .../test-colocation-strict-separate3/cmdlist | 4 + .../hardware_status | 7 ++ .../log.expect | 110 ++++++++++++++++++ .../manager_status | 1 + .../rules_config | 4 + .../service_config | 10 ++ .../test-colocation-strict-separate4/README | 17 +++ .../test-colocation-strict-separate4/cmdlist | 4 + .../hardware_status | 5 + .../log.expect | 69 +++++++++++ .../manager_status | 1 + .../rules_config | 4 + .../service_config | 6 + .../test-colocation-strict-separate5/README | 11 ++ .../test-colocation-strict-separate5/cmdlist | 4 + .../hardware_status | 5 + .../log.expect | 56 +++++++++ .../manager_status | 1 + .../rules_config | 9 ++ .../service_config | 5 + 35 files changed, 573 insertions(+) create mode 100644 src/test/test-colocation-strict-separate1/README create mode 100644 src/test/test-colocation-strict-separate1/cmdlist create mode 100644 src/test/test-colocation-strict-separate1/hardware_status create mode 100644 src/test/test-colocation-strict-separate1/log.expect create mode 100644 src/test/test-colocation-strict-separate1/manager_status create mode 100644 src/test/test-colocation-strict-separate1/rules_config create mode 100644 src/test/test-colocation-strict-separate1/service_config create mode 100644 src/test/test-colocation-strict-separate2/README create mode 100644 src/test/test-colocation-strict-separate2/cmdlist create mode 100644 src/test/test-colocation-strict-separate2/hardware_status create mode 100644 src/test/test-colocation-strict-separate2/log.expect create mode 100644 src/test/test-colocation-strict-separate2/manager_status create mode 100644 src/test/test-colocation-strict-separate2/rules_config create mode 100644 src/test/test-colocation-strict-separate2/service_config create mode 100644 src/test/test-colocation-strict-separate3/README create mode 100644 src/test/test-colocation-strict-separate3/cmdlist create mode 100644 src/test/test-colocation-strict-separate3/hardware_status create mode 100644 src/test/test-colocation-strict-separate3/log.expect create mode 100644 src/test/test-colocation-strict-separate3/manager_status create mode 100644 src/test/test-colocation-strict-separate3/rules_config create mode 100644 src/test/test-colocation-strict-separate3/service_config create mode 100644 src/test/test-colocation-strict-separate4/README create mode 100644 src/test/test-colocation-strict-separate4/cmdlist create mode 100644 src/test/test-colocation-strict-separate4/hardware_status create mode 100644 src/test/test-colocation-strict-separate4/log.expect create mode 100644 src/test/test-colocation-strict-separate4/manager_status create mode 100644 src/test/test-colocation-strict-separate4/rules_config create mode 100644 src/test/test-colocation-strict-separate4/service_config create mode 100644 src/test/test-colocation-strict-separate5/README create mode 100644 src/test/test-colocation-strict-separate5/cmdlist create mode 100644 src/test/test-colocation-strict-separate5/hardware_status create mode 100644 src/test/test-colocation-strict-separate5/log.expect create mode 100644 src/test/test-colocation-strict-separate5/manager_status create mode 100644 src/test/test-colocation-strict-separate5/rules_config create mode 100644 src/test/test-colocation-strict-separate5/service_config diff --git a/src/test/test-colocation-strict-separate1/README b/src/test/test-colocation-strict-separate1/README new file mode 100644 index 0000000..5a03d99 --- /dev/null +++ b/src/test/test-colocation-strict-separate1/README @@ -0,0 +1,13 @@ +Test whether a strict negative colocation rule among two services makes one of +the services migrate to a different recovery node than the other in case of a +failover of their previously assigned node. + +The test scenario is: +- vm:101 and vm:102 must be kept separate +- vm:101 and vm:102 are currently running on node2 and node3 respectively +- node1 has a higher service count than node2 to test the colocation rule is + applied even though the scheduler would prefer the less utilized node + +Therefore, the expected outcome is: +- As node3 fails, vm:102 is migrated to node1; even though the utilization of + node1 is high already, the services must be kept separate diff --git a/src/test/test-colocation-strict-separate1/cmdlist b/src/test/test-colocation-strict-separate1/cmdlist new file mode 100644 index 0000000..c0a4daa --- /dev/null +++ b/src/test/test-colocation-strict-separate1/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on" ], + [ "network node3 off" ] +] diff --git a/src/test/test-colocation-strict-separate1/hardware_status b/src/test/test-colocation-strict-separate1/hardware_status new file mode 100644 index 0000000..451beb1 --- /dev/null +++ b/src/test/test-colocation-strict-separate1/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" } +} diff --git a/src/test/test-colocation-strict-separate1/log.expect b/src/test/test-colocation-strict-separate1/log.expect new file mode 100644 index 0000000..475db39 --- /dev/null +++ b/src/test/test-colocation-strict-separate1/log.expect @@ -0,0 +1,60 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'vm:101' on node 'node2' +info 20 node1/crm: adding new service 'vm:102' on node 'node3' +info 20 node1/crm: adding new service 'vm:103' on node 'node1' +info 20 node1/crm: adding new service 'vm:104' on node 'node1' +info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2) +info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3) +info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1) +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 21 node1/lrm: starting service vm:103 +info 21 node1/lrm: service status vm:103 started +info 21 node1/lrm: starting service vm:104 +info 21 node1/lrm: service status vm:104 started +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 23 node2/lrm: starting service vm:101 +info 23 node2/lrm: service status vm:101 started +info 24 node3/crm: status change wait_for_quorum => slave +info 25 node3/lrm: got lock 'ha_agent_node3_lock' +info 25 node3/lrm: status change wait_for_agent_lock => active +info 25 node3/lrm: starting service vm:102 +info 25 node3/lrm: service status vm:102 started +info 120 cmdlist: execute network node3 off +info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown' +info 124 node3/crm: status change slave => wait_for_quorum +info 125 node3/lrm: status change active => lost_agent_lock +info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence' +info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence' +emai 160 node1/crm: FENCE: Try to fence node 'node3' +info 166 watchdog: execute power node3 off +info 165 node3/crm: killed by poweroff +info 166 node3/lrm: killed by poweroff +info 166 hardware: server 'node3' stopped by poweroff (watchdog) +info 240 node1/crm: got lock 'ha_agent_node3_lock' +info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown' +emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery' +info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1' +info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1) +info 241 node1/lrm: starting service vm:102 +info 241 node1/lrm: service status vm:102 started +info 720 hardware: exit simulation - done diff --git a/src/test/test-colocation-strict-separate1/manager_status b/src/test/test-colocation-strict-separate1/manager_status new file mode 100644 index 0000000..0967ef4 --- /dev/null +++ b/src/test/test-colocation-strict-separate1/manager_status @@ -0,0 +1 @@ +{} diff --git a/src/test/test-colocation-strict-separate1/rules_config b/src/test/test-colocation-strict-separate1/rules_config new file mode 100644 index 0000000..21c5608 --- /dev/null +++ b/src/test/test-colocation-strict-separate1/rules_config @@ -0,0 +1,4 @@ +colocation: lonely-must-vms-be + services vm:101,vm:102 + affinity separate + strict 1 diff --git a/src/test/test-colocation-strict-separate1/service_config b/src/test/test-colocation-strict-separate1/service_config new file mode 100644 index 0000000..6582e8c --- /dev/null +++ b/src/test/test-colocation-strict-separate1/service_config @@ -0,0 +1,6 @@ +{ + "vm:101": { "node": "node2", "state": "started" }, + "vm:102": { "node": "node3", "state": "started" }, + "vm:103": { "node": "node1", "state": "started" }, + "vm:104": { "node": "node1", "state": "started" } +} diff --git a/src/test/test-colocation-strict-separate2/README b/src/test/test-colocation-strict-separate2/README new file mode 100644 index 0000000..f494d2b --- /dev/null +++ b/src/test/test-colocation-strict-separate2/README @@ -0,0 +1,15 @@ +Test whether a strict negative colocation rule among three services makes one +of the services migrate to a different node than the other services in case of +a failover of the service's previously assigned node. + +The test scenario is: +- vm:101, vm:102, and vm:103 must be kept separate +- vm:101, vm:102, and vm:103 are on node3, node4, and node5 respectively +- node1 and node2 have each both higher service counts than node3, node4 and + node5 to test the rule is applied even though the scheduler would prefer the + less utilizied nodes node3, node4, or node5 + +Therefore, the expected outcome is: +- As node5 fails, vm:103 is migrated to node2; even though the utilization of + node2 is high already, the services must be kept separate; node2 is chosen + since node1 has one more service running on it diff --git a/src/test/test-colocation-strict-separate2/cmdlist b/src/test/test-colocation-strict-separate2/cmdlist new file mode 100644 index 0000000..89d09c9 --- /dev/null +++ b/src/test/test-colocation-strict-separate2/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ], + [ "network node5 off" ] +] diff --git a/src/test/test-colocation-strict-separate2/hardware_status b/src/test/test-colocation-strict-separate2/hardware_status new file mode 100644 index 0000000..7b8e961 --- /dev/null +++ b/src/test/test-colocation-strict-separate2/hardware_status @@ -0,0 +1,7 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" }, + "node4": { "power": "off", "network": "off" }, + "node5": { "power": "off", "network": "off" } +} diff --git a/src/test/test-colocation-strict-separate2/log.expect b/src/test/test-colocation-strict-separate2/log.expect new file mode 100644 index 0000000..858d3c9 --- /dev/null +++ b/src/test/test-colocation-strict-separate2/log.expect @@ -0,0 +1,90 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node4 on +info 20 node4/crm: status change startup => wait_for_quorum +info 20 node4/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node5 on +info 20 node5/crm: status change startup => wait_for_quorum +info 20 node5/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'vm:101' on node 'node3' +info 20 node1/crm: adding new service 'vm:102' on node 'node4' +info 20 node1/crm: adding new service 'vm:103' on node 'node5' +info 20 node1/crm: adding new service 'vm:104' on node 'node1' +info 20 node1/crm: adding new service 'vm:105' on node 'node1' +info 20 node1/crm: adding new service 'vm:106' on node 'node1' +info 20 node1/crm: adding new service 'vm:107' on node 'node2' +info 20 node1/crm: adding new service 'vm:108' on node 'node2' +info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3) +info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node4) +info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node5) +info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node2) +info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'started' (node = node2) +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 21 node1/lrm: starting service vm:104 +info 21 node1/lrm: service status vm:104 started +info 21 node1/lrm: starting service vm:105 +info 21 node1/lrm: service status vm:105 started +info 21 node1/lrm: starting service vm:106 +info 21 node1/lrm: service status vm:106 started +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 23 node2/lrm: starting service vm:107 +info 23 node2/lrm: service status vm:107 started +info 23 node2/lrm: starting service vm:108 +info 23 node2/lrm: service status vm:108 started +info 24 node3/crm: status change wait_for_quorum => slave +info 25 node3/lrm: got lock 'ha_agent_node3_lock' +info 25 node3/lrm: status change wait_for_agent_lock => active +info 25 node3/lrm: starting service vm:101 +info 25 node3/lrm: service status vm:101 started +info 26 node4/crm: status change wait_for_quorum => slave +info 27 node4/lrm: got lock 'ha_agent_node4_lock' +info 27 node4/lrm: status change wait_for_agent_lock => active +info 27 node4/lrm: starting service vm:102 +info 27 node4/lrm: service status vm:102 started +info 28 node5/crm: status change wait_for_quorum => slave +info 29 node5/lrm: got lock 'ha_agent_node5_lock' +info 29 node5/lrm: status change wait_for_agent_lock => active +info 29 node5/lrm: starting service vm:103 +info 29 node5/lrm: service status vm:103 started +info 120 cmdlist: execute network node5 off +info 120 node1/crm: node 'node5': state changed from 'online' => 'unknown' +info 128 node5/crm: status change slave => wait_for_quorum +info 129 node5/lrm: status change active => lost_agent_lock +info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence' +info 160 node1/crm: node 'node5': state changed from 'unknown' => 'fence' +emai 160 node1/crm: FENCE: Try to fence node 'node5' +info 170 watchdog: execute power node5 off +info 169 node5/crm: killed by poweroff +info 170 node5/lrm: killed by poweroff +info 170 hardware: server 'node5' stopped by poweroff (watchdog) +info 240 node1/crm: got lock 'ha_agent_node5_lock' +info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node5' +info 240 node1/crm: node 'node5': state changed from 'fence' => 'unknown' +emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5' +info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery' +info 240 node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node2' +info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2) +info 243 node2/lrm: starting service vm:103 +info 243 node2/lrm: service status vm:103 started +info 720 hardware: exit simulation - done diff --git a/src/test/test-colocation-strict-separate2/manager_status b/src/test/test-colocation-strict-separate2/manager_status new file mode 100644 index 0000000..9e26dfe --- /dev/null +++ b/src/test/test-colocation-strict-separate2/manager_status @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/src/test/test-colocation-strict-separate2/rules_config b/src/test/test-colocation-strict-separate2/rules_config new file mode 100644 index 0000000..4167bab --- /dev/null +++ b/src/test/test-colocation-strict-separate2/rules_config @@ -0,0 +1,4 @@ +colocation: lonely-must-vms-be + services vm:101,vm:102,vm:103 + affinity separate + strict 1 diff --git a/src/test/test-colocation-strict-separate2/service_config b/src/test/test-colocation-strict-separate2/service_config new file mode 100644 index 0000000..2c27816 --- /dev/null +++ b/src/test/test-colocation-strict-separate2/service_config @@ -0,0 +1,10 @@ +{ + "vm:101": { "node": "node3", "state": "started" }, + "vm:102": { "node": "node4", "state": "started" }, + "vm:103": { "node": "node5", "state": "started" }, + "vm:104": { "node": "node1", "state": "started" }, + "vm:105": { "node": "node1", "state": "started" }, + "vm:106": { "node": "node1", "state": "started" }, + "vm:107": { "node": "node2", "state": "started" }, + "vm:108": { "node": "node2", "state": "started" } +} diff --git a/src/test/test-colocation-strict-separate3/README b/src/test/test-colocation-strict-separate3/README new file mode 100644 index 0000000..44d88ef --- /dev/null +++ b/src/test/test-colocation-strict-separate3/README @@ -0,0 +1,16 @@ +Test whether a strict negative colocation rule among three services makes two +of the services migrate to two different recovery nodes than the node of the +third service in case of a failover of their two previously assigned nodes. + +The test scenario is: +- vm:101, vm:102, and vm:103 must be kept separate +- vm:101, vm:102, and vm:103 are respectively on node3, node4, and node5 +- node1 and node2 have both higher service counts than node3, node4 and node5 + to test the colocation rule is enforced even though the utilization would + prefer the other node3, node4, and node5 + +Therefore, the expected outcome is: +- As node4 and node5 fails, vm:102 and vm:103 are migrated to node2 and node1 + respectively; even though the utilization of node1 and node2 are high + already, the services must be kept separate; node2 is chosen first since + node1 has one more service running on it diff --git a/src/test/test-colocation-strict-separate3/cmdlist b/src/test/test-colocation-strict-separate3/cmdlist new file mode 100644 index 0000000..1934596 --- /dev/null +++ b/src/test/test-colocation-strict-separate3/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ], + [ "network node4 off", "network node5 off" ] +] diff --git a/src/test/test-colocation-strict-separate3/hardware_status b/src/test/test-colocation-strict-separate3/hardware_status new file mode 100644 index 0000000..7b8e961 --- /dev/null +++ b/src/test/test-colocation-strict-separate3/hardware_status @@ -0,0 +1,7 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" }, + "node4": { "power": "off", "network": "off" }, + "node5": { "power": "off", "network": "off" } +} diff --git a/src/test/test-colocation-strict-separate3/log.expect b/src/test/test-colocation-strict-separate3/log.expect new file mode 100644 index 0000000..4acdcec --- /dev/null +++ b/src/test/test-colocation-strict-separate3/log.expect @@ -0,0 +1,110 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node4 on +info 20 node4/crm: status change startup => wait_for_quorum +info 20 node4/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node5 on +info 20 node5/crm: status change startup => wait_for_quorum +info 20 node5/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'vm:101' on node 'node3' +info 20 node1/crm: adding new service 'vm:102' on node 'node4' +info 20 node1/crm: adding new service 'vm:103' on node 'node5' +info 20 node1/crm: adding new service 'vm:104' on node 'node1' +info 20 node1/crm: adding new service 'vm:105' on node 'node1' +info 20 node1/crm: adding new service 'vm:106' on node 'node1' +info 20 node1/crm: adding new service 'vm:107' on node 'node2' +info 20 node1/crm: adding new service 'vm:108' on node 'node2' +info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3) +info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node4) +info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node5) +info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node2) +info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'started' (node = node2) +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 21 node1/lrm: starting service vm:104 +info 21 node1/lrm: service status vm:104 started +info 21 node1/lrm: starting service vm:105 +info 21 node1/lrm: service status vm:105 started +info 21 node1/lrm: starting service vm:106 +info 21 node1/lrm: service status vm:106 started +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 23 node2/lrm: starting service vm:107 +info 23 node2/lrm: service status vm:107 started +info 23 node2/lrm: starting service vm:108 +info 23 node2/lrm: service status vm:108 started +info 24 node3/crm: status change wait_for_quorum => slave +info 25 node3/lrm: got lock 'ha_agent_node3_lock' +info 25 node3/lrm: status change wait_for_agent_lock => active +info 25 node3/lrm: starting service vm:101 +info 25 node3/lrm: service status vm:101 started +info 26 node4/crm: status change wait_for_quorum => slave +info 27 node4/lrm: got lock 'ha_agent_node4_lock' +info 27 node4/lrm: status change wait_for_agent_lock => active +info 27 node4/lrm: starting service vm:102 +info 27 node4/lrm: service status vm:102 started +info 28 node5/crm: status change wait_for_quorum => slave +info 29 node5/lrm: got lock 'ha_agent_node5_lock' +info 29 node5/lrm: status change wait_for_agent_lock => active +info 29 node5/lrm: starting service vm:103 +info 29 node5/lrm: service status vm:103 started +info 120 cmdlist: execute network node4 off +info 120 cmdlist: execute network node5 off +info 120 node1/crm: node 'node4': state changed from 'online' => 'unknown' +info 120 node1/crm: node 'node5': state changed from 'online' => 'unknown' +info 126 node4/crm: status change slave => wait_for_quorum +info 127 node4/lrm: status change active => lost_agent_lock +info 128 node5/crm: status change slave => wait_for_quorum +info 129 node5/lrm: status change active => lost_agent_lock +info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence' +info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence' +info 160 node1/crm: node 'node4': state changed from 'unknown' => 'fence' +emai 160 node1/crm: FENCE: Try to fence node 'node4' +info 160 node1/crm: node 'node5': state changed from 'unknown' => 'fence' +emai 160 node1/crm: FENCE: Try to fence node 'node5' +info 168 watchdog: execute power node4 off +info 167 node4/crm: killed by poweroff +info 168 node4/lrm: killed by poweroff +info 168 hardware: server 'node4' stopped by poweroff (watchdog) +info 170 watchdog: execute power node5 off +info 169 node5/crm: killed by poweroff +info 170 node5/lrm: killed by poweroff +info 170 hardware: server 'node5' stopped by poweroff (watchdog) +info 240 node1/crm: got lock 'ha_agent_node4_lock' +info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node4' +info 240 node1/crm: node 'node4': state changed from 'fence' => 'unknown' +emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node4' +info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery' +info 240 node1/crm: got lock 'ha_agent_node5_lock' +info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node5' +info 240 node1/crm: node 'node5': state changed from 'fence' => 'unknown' +emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5' +info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery' +info 240 node1/crm: recover service 'vm:102' from fenced node 'node4' to node 'node2' +info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2) +info 240 node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node1' +info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node1) +info 241 node1/lrm: starting service vm:103 +info 241 node1/lrm: service status vm:103 started +info 243 node2/lrm: starting service vm:102 +info 243 node2/lrm: service status vm:102 started +info 720 hardware: exit simulation - done diff --git a/src/test/test-colocation-strict-separate3/manager_status b/src/test/test-colocation-strict-separate3/manager_status new file mode 100644 index 0000000..9e26dfe --- /dev/null +++ b/src/test/test-colocation-strict-separate3/manager_status @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/src/test/test-colocation-strict-separate3/rules_config b/src/test/test-colocation-strict-separate3/rules_config new file mode 100644 index 0000000..4167bab --- /dev/null +++ b/src/test/test-colocation-strict-separate3/rules_config @@ -0,0 +1,4 @@ +colocation: lonely-must-vms-be + services vm:101,vm:102,vm:103 + affinity separate + strict 1 diff --git a/src/test/test-colocation-strict-separate3/service_config b/src/test/test-colocation-strict-separate3/service_config new file mode 100644 index 0000000..2c27816 --- /dev/null +++ b/src/test/test-colocation-strict-separate3/service_config @@ -0,0 +1,10 @@ +{ + "vm:101": { "node": "node3", "state": "started" }, + "vm:102": { "node": "node4", "state": "started" }, + "vm:103": { "node": "node5", "state": "started" }, + "vm:104": { "node": "node1", "state": "started" }, + "vm:105": { "node": "node1", "state": "started" }, + "vm:106": { "node": "node1", "state": "started" }, + "vm:107": { "node": "node2", "state": "started" }, + "vm:108": { "node": "node2", "state": "started" } +} diff --git a/src/test/test-colocation-strict-separate4/README b/src/test/test-colocation-strict-separate4/README new file mode 100644 index 0000000..31f127d --- /dev/null +++ b/src/test/test-colocation-strict-separate4/README @@ -0,0 +1,17 @@ +Test whether a strict negative colocation rule among two services makes one of +the services migrate to a different recovery node than the other service in +case of a failover of service's previously assigned node. As the service fails +to start on the recovery node (e.g. insufficient resources), the failing +service is kept on the recovery node. + +The test scenario is: +- vm:101 and fa:120001 must be kept separate +- vm:101 and fa:120001 are on node2 and node3 respectively +- fa:120001 will fail to start on node1 +- node1 has a higher service count than node2 to test the colocation rule is + applied even though the scheduler would prefer the less utilized node + +Therefore, the expected outcome is: +- As node3 fails, fa:120001 is migrated to node1 +- fa:120001 will stay in recovery, since it cannot be started on node1, but + cannot be relocated to another one either due to the strict colocation rule diff --git a/src/test/test-colocation-strict-separate4/cmdlist b/src/test/test-colocation-strict-separate4/cmdlist new file mode 100644 index 0000000..c0a4daa --- /dev/null +++ b/src/test/test-colocation-strict-separate4/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on" ], + [ "network node3 off" ] +] diff --git a/src/test/test-colocation-strict-separate4/hardware_status b/src/test/test-colocation-strict-separate4/hardware_status new file mode 100644 index 0000000..451beb1 --- /dev/null +++ b/src/test/test-colocation-strict-separate4/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" } +} diff --git a/src/test/test-colocation-strict-separate4/log.expect b/src/test/test-colocation-strict-separate4/log.expect new file mode 100644 index 0000000..f772ea8 --- /dev/null +++ b/src/test/test-colocation-strict-separate4/log.expect @@ -0,0 +1,69 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'fa:120001' on node 'node3' +info 20 node1/crm: adding new service 'vm:101' on node 'node2' +info 20 node1/crm: adding new service 'vm:103' on node 'node1' +info 20 node1/crm: adding new service 'vm:104' on node 'node1' +info 20 node1/crm: service 'fa:120001': state changed from 'request_start' to 'started' (node = node3) +info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2) +info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1) +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 21 node1/lrm: starting service vm:103 +info 21 node1/lrm: service status vm:103 started +info 21 node1/lrm: starting service vm:104 +info 21 node1/lrm: service status vm:104 started +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 23 node2/lrm: starting service vm:101 +info 23 node2/lrm: service status vm:101 started +info 24 node3/crm: status change wait_for_quorum => slave +info 25 node3/lrm: got lock 'ha_agent_node3_lock' +info 25 node3/lrm: status change wait_for_agent_lock => active +info 25 node3/lrm: starting service fa:120001 +info 25 node3/lrm: service status fa:120001 started +info 120 cmdlist: execute network node3 off +info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown' +info 124 node3/crm: status change slave => wait_for_quorum +info 125 node3/lrm: status change active => lost_agent_lock +info 160 node1/crm: service 'fa:120001': state changed from 'started' to 'fence' +info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence' +emai 160 node1/crm: FENCE: Try to fence node 'node3' +info 166 watchdog: execute power node3 off +info 165 node3/crm: killed by poweroff +info 166 node3/lrm: killed by poweroff +info 166 hardware: server 'node3' stopped by poweroff (watchdog) +info 240 node1/crm: got lock 'ha_agent_node3_lock' +info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown' +emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: service 'fa:120001': state changed from 'fence' to 'recovery' +info 240 node1/crm: recover service 'fa:120001' from fenced node 'node3' to node 'node1' +info 240 node1/crm: service 'fa:120001': state changed from 'recovery' to 'started' (node = node1) +info 241 node1/lrm: starting service fa:120001 +warn 241 node1/lrm: unable to start service fa:120001 +warn 241 node1/lrm: restart policy: retry number 1 for service 'fa:120001' +info 261 node1/lrm: starting service fa:120001 +warn 261 node1/lrm: unable to start service fa:120001 +err 261 node1/lrm: unable to start service fa:120001 on local node after 1 retries +warn 280 node1/crm: starting service fa:120001 on node 'node1' failed, relocating service. +warn 280 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:120001', retry start on current node. Tried nodes: node1 +info 281 node1/lrm: starting service fa:120001 +info 281 node1/lrm: service status fa:120001 started +info 300 node1/crm: relocation policy successful for 'fa:120001' on node 'node1', failed nodes: node1 +info 720 hardware: exit simulation - done diff --git a/src/test/test-colocation-strict-separate4/manager_status b/src/test/test-colocation-strict-separate4/manager_status new file mode 100644 index 0000000..9e26dfe --- /dev/null +++ b/src/test/test-colocation-strict-separate4/manager_status @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/src/test/test-colocation-strict-separate4/rules_config b/src/test/test-colocation-strict-separate4/rules_config new file mode 100644 index 0000000..3db0056 --- /dev/null +++ b/src/test/test-colocation-strict-separate4/rules_config @@ -0,0 +1,4 @@ +colocation: lonely-must-vms-be + services vm:101,fa:120001 + affinity separate + strict 1 diff --git a/src/test/test-colocation-strict-separate4/service_config b/src/test/test-colocation-strict-separate4/service_config new file mode 100644 index 0000000..f53c2bc --- /dev/null +++ b/src/test/test-colocation-strict-separate4/service_config @@ -0,0 +1,6 @@ +{ + "vm:101": { "node": "node2", "state": "started" }, + "fa:120001": { "node": "node3", "state": "started" }, + "vm:103": { "node": "node1", "state": "started" }, + "vm:104": { "node": "node1", "state": "started" } +} diff --git a/src/test/test-colocation-strict-separate5/README b/src/test/test-colocation-strict-separate5/README new file mode 100644 index 0000000..4cdcbf5 --- /dev/null +++ b/src/test/test-colocation-strict-separate5/README @@ -0,0 +1,11 @@ +Test whether two pair-wise strict negative colocation rules, i.e. where one +service is in two separate non-colocation relationship with two other services, +makes one of the outer services migrate to the same node as the other outer +service in case of a failover of their previously assigned node. + +The test scenario is: +- vm:101 and vm:102, and vm:101 and vm:103 must each be kept separate +- vm:101, vm:102, and vm:103 are respectively on node1, node2, and node3 + +Therefore, the expected outcome is: +- As node3 fails, vm:103 is migrated to node2 - the same as vm:102 diff --git a/src/test/test-colocation-strict-separate5/cmdlist b/src/test/test-colocation-strict-separate5/cmdlist new file mode 100644 index 0000000..c0a4daa --- /dev/null +++ b/src/test/test-colocation-strict-separate5/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on" ], + [ "network node3 off" ] +] diff --git a/src/test/test-colocation-strict-separate5/hardware_status b/src/test/test-colocation-strict-separate5/hardware_status new file mode 100644 index 0000000..451beb1 --- /dev/null +++ b/src/test/test-colocation-strict-separate5/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" } +} diff --git a/src/test/test-colocation-strict-separate5/log.expect b/src/test/test-colocation-strict-separate5/log.expect new file mode 100644 index 0000000..16156ad --- /dev/null +++ b/src/test/test-colocation-strict-separate5/log.expect @@ -0,0 +1,56 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'vm:101' on node 'node1' +info 20 node1/crm: adding new service 'vm:102' on node 'node2' +info 20 node1/crm: adding new service 'vm:103' on node 'node3' +info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1) +info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2) +info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3) +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 21 node1/lrm: starting service vm:101 +info 21 node1/lrm: service status vm:101 started +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 23 node2/lrm: starting service vm:102 +info 23 node2/lrm: service status vm:102 started +info 24 node3/crm: status change wait_for_quorum => slave +info 25 node3/lrm: got lock 'ha_agent_node3_lock' +info 25 node3/lrm: status change wait_for_agent_lock => active +info 25 node3/lrm: starting service vm:103 +info 25 node3/lrm: service status vm:103 started +info 120 cmdlist: execute network node3 off +info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown' +info 124 node3/crm: status change slave => wait_for_quorum +info 125 node3/lrm: status change active => lost_agent_lock +info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence' +info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence' +emai 160 node1/crm: FENCE: Try to fence node 'node3' +info 166 watchdog: execute power node3 off +info 165 node3/crm: killed by poweroff +info 166 node3/lrm: killed by poweroff +info 166 hardware: server 'node3' stopped by poweroff (watchdog) +info 240 node1/crm: got lock 'ha_agent_node3_lock' +info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown' +emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3' +info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery' +info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2' +info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2) +info 243 node2/lrm: starting service vm:103 +info 243 node2/lrm: service status vm:103 started +info 720 hardware: exit simulation - done diff --git a/src/test/test-colocation-strict-separate5/manager_status b/src/test/test-colocation-strict-separate5/manager_status new file mode 100644 index 0000000..9e26dfe --- /dev/null +++ b/src/test/test-colocation-strict-separate5/manager_status @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/src/test/test-colocation-strict-separate5/rules_config b/src/test/test-colocation-strict-separate5/rules_config new file mode 100644 index 0000000..f72fc66 --- /dev/null +++ b/src/test/test-colocation-strict-separate5/rules_config @@ -0,0 +1,9 @@ +colocation: lonely-must-some-vms-be1 + services vm:101,vm:102 + affinity separate + strict 1 + +colocation: lonely-must-some-vms-be2 + services vm:101,vm:103 + affinity separate + strict 1 diff --git a/src/test/test-colocation-strict-separate5/service_config b/src/test/test-colocation-strict-separate5/service_config new file mode 100644 index 0000000..4b26f6b --- /dev/null +++ b/src/test/test-colocation-strict-separate5/service_config @@ -0,0 +1,5 @@ +{ + "vm:101": { "node": "node1", "state": "started" }, + "vm:102": { "node": "node2", "state": "started" }, + "vm:103": { "node": "node3", "state": "started" } +} -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel