* [pve-devel] [PATCH ha-manager 1/2] test: add delayed positive resource affinity migration test case
  2025-11-03 15:17 [pve-devel] [PATCH ha-manager 0/2] fix #6801 Daniel Kral
@ 2025-11-03 15:17 ` Daniel Kral
  2025-11-03 15:17 ` [pve-devel] [PATCH ha-manager 2/2] fix #6801: only consider target node during positive resource affinity migration Daniel Kral
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel Kral @ 2025-11-03 15:17 UTC (permalink / raw)
  To: pve-devel
Add a test case, which tests what happens if two HA resources in
positive resource affinity, where one of the HA resources is already on
the target node, while the other is stuck still in migration.
The current behavior is not correct as the already migrated HA resource
will be migrated back to the source node instead of staying on the
common target node. This behavior will be fixed with the next patch.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 .../README                                    |  5 ++
 .../cmdlist                                   |  3 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 46 +++++++++++++++++++
 .../manager_status                            | 21 +++++++++
 .../rules_config                              |  3 ++
 .../service_config                            |  4 ++
 7 files changed, 87 insertions(+)
 create mode 100644 src/test/test-resource-affinity-strict-positive6/README
 create mode 100644 src/test/test-resource-affinity-strict-positive6/cmdlist
 create mode 100644 src/test/test-resource-affinity-strict-positive6/hardware_status
 create mode 100644 src/test/test-resource-affinity-strict-positive6/log.expect
 create mode 100644 src/test/test-resource-affinity-strict-positive6/manager_status
 create mode 100644 src/test/test-resource-affinity-strict-positive6/rules_config
 create mode 100644 src/test/test-resource-affinity-strict-positive6/service_config
diff --git a/src/test/test-resource-affinity-strict-positive6/README b/src/test/test-resource-affinity-strict-positive6/README
new file mode 100644
index 00000000..a6affda3
--- /dev/null
+++ b/src/test/test-resource-affinity-strict-positive6/README
@@ -0,0 +1,5 @@
+Test whether two HA resources in positive resource affinity will migrate to the
+same target node when one of them finishes earlier than the other.
+
+The current behavior is not correct, because the already migrated HA resource
+will be migrated back to the source node.
diff --git a/src/test/test-resource-affinity-strict-positive6/cmdlist b/src/test/test-resource-affinity-strict-positive6/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-resource-affinity-strict-positive6/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-resource-affinity-strict-positive6/hardware_status b/src/test/test-resource-affinity-strict-positive6/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-resource-affinity-strict-positive6/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-resource-affinity-strict-positive6/log.expect b/src/test/test-resource-affinity-strict-positive6/log.expect
new file mode 100644
index 00000000..69f8d867
--- /dev/null
+++ b/src/test/test-resource-affinity-strict-positive6/log.expect
@@ -0,0 +1,46 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info     20    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: service vm:102 - start migrate to node 'node3'
+info     21    node1/lrm: service vm:102 - end migrate to node 'node3'
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: service vm:101 - start migrate to node 'node1'
+info     25    node3/lrm: service vm:101 - end migrate to node 'node1'
+info     40    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info     40    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info     40    node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info     40    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info     40    node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info     40    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info     41    node1/lrm: service vm:101 - start migrate to node 'node3'
+info     41    node1/lrm: service vm:101 - end migrate to node 'node3'
+info     45    node3/lrm: service vm:102 - start migrate to node 'node1'
+info     45    node3/lrm: service vm:102 - end migrate to node 'node1'
+info     60    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info     60    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info     60    node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info     60    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info     61    node1/lrm: starting service vm:102
+info     61    node1/lrm: service status vm:102 started
+info     65    node3/lrm: service vm:101 - start migrate to node 'node1'
+info     65    node3/lrm: service vm:101 - end migrate to node 'node1'
+info     80    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info     81    node1/lrm: starting service vm:101
+info     81    node1/lrm: service status vm:101 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-strict-positive6/manager_status b/src/test/test-resource-affinity-strict-positive6/manager_status
new file mode 100644
index 00000000..9e7cdf21
--- /dev/null
+++ b/src/test/test-resource-affinity-strict-positive6/manager_status
@@ -0,0 +1,21 @@
+{
+    "master_node": "node1",
+    "node_status": {
+	"node1":"online",
+	"node2":"online",
+	"node3":"online"
+    },
+    "service_status": {
+	"vm:101": {
+	    "node": "node3",
+	    "state": "started",
+	    "uid": "RoPGTlvNYq/oZFokv9fgWw"
+	},
+	"vm:102": {
+	    "node": "node1",
+	    "state": "migrate",
+	    "target": "node3",
+	    "uid": "JVDARwmsXoVTF8Zd0BY2Mg"
+	}
+    }
+}
diff --git a/src/test/test-resource-affinity-strict-positive6/rules_config b/src/test/test-resource-affinity-strict-positive6/rules_config
new file mode 100644
index 00000000..9789d7cc
--- /dev/null
+++ b/src/test/test-resource-affinity-strict-positive6/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-must-stick-together
+	resources vm:101,vm:102
+	affinity positive
diff --git a/src/test/test-resource-affinity-strict-positive6/service_config b/src/test/test-resource-affinity-strict-positive6/service_config
new file mode 100644
index 00000000..e71594d9
--- /dev/null
+++ b/src/test/test-resource-affinity-strict-positive6/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" }
+}
-- 
2.47.3
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply	[flat|nested] 3+ messages in thread* [pve-devel] [PATCH ha-manager 2/2] fix #6801: only consider target node during positive resource affinity migration
  2025-11-03 15:17 [pve-devel] [PATCH ha-manager 0/2] fix #6801 Daniel Kral
  2025-11-03 15:17 ` [pve-devel] [PATCH ha-manager 1/2] test: add delayed positive resource affinity migration test case Daniel Kral
@ 2025-11-03 15:17 ` Daniel Kral
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel Kral @ 2025-11-03 15:17 UTC (permalink / raw)
  To: pve-devel
When a HA resource with positive affinity to other HA resources is moved
to another node, the other HA resources in positive affinity are
automatically moved to the same target node as well.
If the HA resources have significant differences in migration time
(more than the average HA Manager round of ~10 seconds) the already
migrated HA resources in 'started' state will check for better node
placements while the other(s) are still migrating.
This search includes whether the positive resource affinity rules are
held and will query where the other HA resources are. When HA resources
are still migrating this will report that these are both on the source
and target node, which is correct from a accounting standpoint, but will
add equal weights on both nodes and might result in the already started
HA resource to be migrated to the source node.
Therefore, only consider the target node for positive affinity during
migration or relocation to prevent this from happening.
As a side-effect, two test cases for positive resource affinity rules
will result in a slightly quicker convergence to a steady state as these
now will get the information about the common target node sooner.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/HA/Rules/ResourceAffinity.pm          |  6 ++--
 .../log.expect                                | 25 +++--------------
 .../log.expect                                | 28 +++++++++----------
 .../README                                    |  3 --
 .../log.expect                                | 28 +++----------------
 5 files changed, 26 insertions(+), 64 deletions(-)
diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 4f5ffca5..9303bafd 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -517,8 +517,10 @@ sub get_resource_affinity {
     for my $csid (keys $positive->%*) {
         my ($current_node, $target_node) = $get_used_service_nodes->($csid);
 
-        $together->{$current_node}++ if defined($current_node);
-        $together->{$target_node}++ if defined($target_node);
+        # consider only the target node for positive affinity to prevent already
+        # moved HA resources to move back to the source node (see #6801)
+        my $node = $target_node // $current_node;
+        $together->{$node}++ if defined($node);
     }
 
     for my $csid (keys $negative->%*) {
diff --git a/src/test/test-resource-affinity-strict-mixed3/log.expect b/src/test/test-resource-affinity-strict-mixed3/log.expect
index b3de104f..ee6412a1 100644
--- a/src/test/test-resource-affinity-strict-mixed3/log.expect
+++ b/src/test/test-resource-affinity-strict-mixed3/log.expect
@@ -58,17 +58,11 @@ info     40    node1/crm: service 'vm:102': state changed from 'migrate' to 'sta
 info     40    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
 info     40    node1/crm: migrate service 'vm:201' to node 'node2' (running)
 info     40    node1/crm: service 'vm:201': state changed from 'started' to 'migrate'  (node = node1, target = node2)
-info     40    node1/crm: migrate service 'vm:202' to node 'node1' (running)
-info     40    node1/crm: service 'vm:202': state changed from 'started' to 'migrate'  (node = node2, target = node1)
 info     40    node1/crm: service 'vm:203': state changed from 'migrate' to 'started'  (node = node2)
-info     40    node1/crm: migrate service 'vm:203' to node 'node1' (running)
-info     40    node1/crm: service 'vm:203': state changed from 'started' to 'migrate'  (node = node2, target = node1)
 info     41    node1/lrm: service vm:201 - start migrate to node 'node2'
 info     41    node1/lrm: service vm:201 - end migrate to node 'node2'
-info     43    node2/lrm: service vm:202 - start migrate to node 'node1'
-info     43    node2/lrm: service vm:202 - end migrate to node 'node1'
-info     43    node2/lrm: service vm:203 - start migrate to node 'node1'
-info     43    node2/lrm: service vm:203 - end migrate to node 'node1'
+info     43    node2/lrm: starting service vm:203
+info     43    node2/lrm: service status vm:203 started
 info     45    node3/lrm: starting service vm:101
 info     45    node3/lrm: service status vm:101 started
 info     45    node3/lrm: starting service vm:102
@@ -76,17 +70,6 @@ info     45    node3/lrm: service status vm:102 started
 info     45    node3/lrm: starting service vm:103
 info     45    node3/lrm: service status vm:103 started
 info     60    node1/crm: service 'vm:201': state changed from 'migrate' to 'started'  (node = node2)
-info     60    node1/crm: service 'vm:202': state changed from 'migrate' to 'started'  (node = node1)
-info     60    node1/crm: service 'vm:203': state changed from 'migrate' to 'started'  (node = node1)
-info     60    node1/crm: migrate service 'vm:201' to node 'node1' (running)
-info     60    node1/crm: service 'vm:201': state changed from 'started' to 'migrate'  (node = node2, target = node1)
-info     61    node1/lrm: starting service vm:202
-info     61    node1/lrm: service status vm:202 started
-info     61    node1/lrm: starting service vm:203
-info     61    node1/lrm: service status vm:203 started
-info     63    node2/lrm: service vm:201 - start migrate to node 'node1'
-info     63    node2/lrm: service vm:201 - end migrate to node 'node1'
-info     80    node1/crm: service 'vm:201': state changed from 'migrate' to 'started'  (node = node1)
-info     81    node1/lrm: starting service vm:201
-info     81    node1/lrm: service status vm:201 started
+info     63    node2/lrm: starting service vm:201
+info     63    node2/lrm: service status vm:201 started
 info    620     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-strict-positive3/log.expect b/src/test/test-resource-affinity-strict-positive3/log.expect
index b5d7018f..5f4e6531 100644
--- a/src/test/test-resource-affinity-strict-positive3/log.expect
+++ b/src/test/test-resource-affinity-strict-positive3/log.expect
@@ -84,24 +84,24 @@ err     263    node2/lrm: unable to start service fa:120002 on local node after
 warn    280    node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
 info    280    node1/crm: relocate service 'fa:120002' to node 'node1'
 info    280    node1/crm: service 'fa:120002': state changed from 'started' to 'relocate'  (node = node2, target = node1)
+info    280    node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info    280    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    280    node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info    280    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node2, target = node1)
 info    283    node2/lrm: service fa:120002 - start relocate to node 'node1'
 info    283    node2/lrm: service fa:120002 - end relocate to node 'node1'
+info    283    node2/lrm: service vm:101 - start migrate to node 'node1'
+info    283    node2/lrm: service vm:101 - end migrate to node 'node1'
+info    283    node2/lrm: service vm:102 - start migrate to node 'node1'
+info    283    node2/lrm: service vm:102 - end migrate to node 'node1'
 info    300    node1/crm: service 'fa:120002': state changed from 'relocate' to 'started'  (node = node1)
-info    300    node1/crm: migrate service 'vm:101' to node 'node1' (running)
-info    300    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node2, target = node1)
-info    300    node1/crm: migrate service 'vm:102' to node 'node1' (running)
-info    300    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    300    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info    300    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
 info    301    node1/lrm: starting service fa:120002
 info    301    node1/lrm: service status fa:120002 started
-info    303    node2/lrm: service vm:101 - start migrate to node 'node1'
-info    303    node2/lrm: service vm:101 - end migrate to node 'node1'
-info    303    node2/lrm: service vm:102 - start migrate to node 'node1'
-info    303    node2/lrm: service vm:102 - end migrate to node 'node1'
+info    301    node1/lrm: starting service vm:101
+info    301    node1/lrm: service status vm:101 started
+info    301    node1/lrm: starting service vm:102
+info    301    node1/lrm: service status vm:102 started
 info    320    node1/crm: relocation policy successful for 'fa:120002' on node 'node1', failed nodes: node2
-info    320    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
-info    320    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
-info    321    node1/lrm: starting service vm:101
-info    321    node1/lrm: service status vm:101 started
-info    321    node1/lrm: starting service vm:102
-info    321    node1/lrm: service status vm:102 started
 info    720     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-strict-positive6/README b/src/test/test-resource-affinity-strict-positive6/README
index a6affda3..e174e458 100644
--- a/src/test/test-resource-affinity-strict-positive6/README
+++ b/src/test/test-resource-affinity-strict-positive6/README
@@ -1,5 +1,2 @@
 Test whether two HA resources in positive resource affinity will migrate to the
 same target node when one of them finishes earlier than the other.
-
-The current behavior is not correct, because the already migrated HA resource
-will be migrated back to the source node.
diff --git a/src/test/test-resource-affinity-strict-positive6/log.expect b/src/test/test-resource-affinity-strict-positive6/log.expect
index 69f8d867..cbc63a1e 100644
--- a/src/test/test-resource-affinity-strict-positive6/log.expect
+++ b/src/test/test-resource-affinity-strict-positive6/log.expect
@@ -10,8 +10,6 @@ info     20    node3/crm: status change startup => wait_for_quorum
 info     20    node3/lrm: status change startup => wait_for_agent_lock
 info     20    node1/crm: got lock 'ha_manager_lock'
 info     20    node1/crm: status change wait_for_quorum => master
-info     20    node1/crm: migrate service 'vm:101' to node 'node1' (running)
-info     20    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
 info     21    node1/lrm: got lock 'ha_agent_node1_lock'
 info     21    node1/lrm: status change wait_for_agent_lock => active
 info     21    node1/lrm: service vm:102 - start migrate to node 'node3'
@@ -20,27 +18,9 @@ info     22    node2/crm: status change wait_for_quorum => slave
 info     24    node3/crm: status change wait_for_quorum => slave
 info     25    node3/lrm: got lock 'ha_agent_node3_lock'
 info     25    node3/lrm: status change wait_for_agent_lock => active
-info     25    node3/lrm: service vm:101 - start migrate to node 'node1'
-info     25    node3/lrm: service vm:101 - end migrate to node 'node1'
-info     40    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
 info     40    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
-info     40    node1/crm: migrate service 'vm:101' to node 'node3' (running)
-info     40    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
-info     40    node1/crm: migrate service 'vm:102' to node 'node1' (running)
-info     40    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node3, target = node1)
-info     41    node1/lrm: service vm:101 - start migrate to node 'node3'
-info     41    node1/lrm: service vm:101 - end migrate to node 'node3'
-info     45    node3/lrm: service vm:102 - start migrate to node 'node1'
-info     45    node3/lrm: service vm:102 - end migrate to node 'node1'
-info     60    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
-info     60    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
-info     60    node1/crm: migrate service 'vm:101' to node 'node1' (running)
-info     60    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
-info     61    node1/lrm: starting service vm:102
-info     61    node1/lrm: service status vm:102 started
-info     65    node3/lrm: service vm:101 - start migrate to node 'node1'
-info     65    node3/lrm: service vm:101 - end migrate to node 'node1'
-info     80    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
-info     81    node1/lrm: starting service vm:101
-info     81    node1/lrm: service status vm:101 started
+info     45    node3/lrm: starting service vm:102
+info     45    node3/lrm: service status vm:102 started
 info    620     hardware: exit simulation - done
-- 
2.47.3
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply	[flat|nested] 3+ messages in thread