From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id A06F91FF187 for ; Mon, 3 Nov 2025 16:17:54 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 75FC521140; Mon, 3 Nov 2025 16:18:31 +0100 (CET) From: Daniel Kral To: pve-devel@lists.proxmox.com Date: Mon, 3 Nov 2025 16:17:12 +0100 Message-ID: <20251103151823.387984-3-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251103151823.387984-1-d.kral@proxmox.com> References: <20251103151823.387984-1-d.kral@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762183089407 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.015 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH ha-manager 2/2] fix #6801: only consider target node during positive resource affinity migration X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" When a HA resource with positive affinity to other HA resources is moved to another node, the other HA resources in positive affinity are automatically moved to the same target node as well. If the HA resources have significant differences in migration time (more than the average HA Manager round of ~10 seconds) the already migrated HA resources in 'started' state will check for better node placements while the other(s) are still migrating. This search includes whether the positive resource affinity rules are held and will query where the other HA resources are. When HA resources are still migrating this will report that these are both on the source and target node, which is correct from a accounting standpoint, but will add equal weights on both nodes and might result in the already started HA resource to be migrated to the source node. Therefore, only consider the target node for positive affinity during migration or relocation to prevent this from happening. As a side-effect, two test cases for positive resource affinity rules will result in a slightly quicker convergence to a steady state as these now will get the information about the common target node sooner. Signed-off-by: Daniel Kral --- src/PVE/HA/Rules/ResourceAffinity.pm | 6 ++-- .../log.expect | 25 +++-------------- .../log.expect | 28 +++++++++---------- .../README | 3 -- .../log.expect | 28 +++---------------- 5 files changed, 26 insertions(+), 64 deletions(-) diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm index 4f5ffca5..9303bafd 100644 --- a/src/PVE/HA/Rules/ResourceAffinity.pm +++ b/src/PVE/HA/Rules/ResourceAffinity.pm @@ -517,8 +517,10 @@ sub get_resource_affinity { for my $csid (keys $positive->%*) { my ($current_node, $target_node) = $get_used_service_nodes->($csid); - $together->{$current_node}++ if defined($current_node); - $together->{$target_node}++ if defined($target_node); + # consider only the target node for positive affinity to prevent already + # moved HA resources to move back to the source node (see #6801) + my $node = $target_node // $current_node; + $together->{$node}++ if defined($node); } for my $csid (keys $negative->%*) { diff --git a/src/test/test-resource-affinity-strict-mixed3/log.expect b/src/test/test-resource-affinity-strict-mixed3/log.expect index b3de104f..ee6412a1 100644 --- a/src/test/test-resource-affinity-strict-mixed3/log.expect +++ b/src/test/test-resource-affinity-strict-mixed3/log.expect @@ -58,17 +58,11 @@ info 40 node1/crm: service 'vm:102': state changed from 'migrate' to 'sta info 40 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3) info 40 node1/crm: migrate service 'vm:201' to node 'node2' (running) info 40 node1/crm: service 'vm:201': state changed from 'started' to 'migrate' (node = node1, target = node2) -info 40 node1/crm: migrate service 'vm:202' to node 'node1' (running) -info 40 node1/crm: service 'vm:202': state changed from 'started' to 'migrate' (node = node2, target = node1) info 40 node1/crm: service 'vm:203': state changed from 'migrate' to 'started' (node = node2) -info 40 node1/crm: migrate service 'vm:203' to node 'node1' (running) -info 40 node1/crm: service 'vm:203': state changed from 'started' to 'migrate' (node = node2, target = node1) info 41 node1/lrm: service vm:201 - start migrate to node 'node2' info 41 node1/lrm: service vm:201 - end migrate to node 'node2' -info 43 node2/lrm: service vm:202 - start migrate to node 'node1' -info 43 node2/lrm: service vm:202 - end migrate to node 'node1' -info 43 node2/lrm: service vm:203 - start migrate to node 'node1' -info 43 node2/lrm: service vm:203 - end migrate to node 'node1' +info 43 node2/lrm: starting service vm:203 +info 43 node2/lrm: service status vm:203 started info 45 node3/lrm: starting service vm:101 info 45 node3/lrm: service status vm:101 started info 45 node3/lrm: starting service vm:102 @@ -76,17 +70,6 @@ info 45 node3/lrm: service status vm:102 started info 45 node3/lrm: starting service vm:103 info 45 node3/lrm: service status vm:103 started info 60 node1/crm: service 'vm:201': state changed from 'migrate' to 'started' (node = node2) -info 60 node1/crm: service 'vm:202': state changed from 'migrate' to 'started' (node = node1) -info 60 node1/crm: service 'vm:203': state changed from 'migrate' to 'started' (node = node1) -info 60 node1/crm: migrate service 'vm:201' to node 'node1' (running) -info 60 node1/crm: service 'vm:201': state changed from 'started' to 'migrate' (node = node2, target = node1) -info 61 node1/lrm: starting service vm:202 -info 61 node1/lrm: service status vm:202 started -info 61 node1/lrm: starting service vm:203 -info 61 node1/lrm: service status vm:203 started -info 63 node2/lrm: service vm:201 - start migrate to node 'node1' -info 63 node2/lrm: service vm:201 - end migrate to node 'node1' -info 80 node1/crm: service 'vm:201': state changed from 'migrate' to 'started' (node = node1) -info 81 node1/lrm: starting service vm:201 -info 81 node1/lrm: service status vm:201 started +info 63 node2/lrm: starting service vm:201 +info 63 node2/lrm: service status vm:201 started info 620 hardware: exit simulation - done diff --git a/src/test/test-resource-affinity-strict-positive3/log.expect b/src/test/test-resource-affinity-strict-positive3/log.expect index b5d7018f..5f4e6531 100644 --- a/src/test/test-resource-affinity-strict-positive3/log.expect +++ b/src/test/test-resource-affinity-strict-positive3/log.expect @@ -84,24 +84,24 @@ err 263 node2/lrm: unable to start service fa:120002 on local node after warn 280 node1/crm: starting service fa:120002 on node 'node2' failed, relocating service. info 280 node1/crm: relocate service 'fa:120002' to node 'node1' info 280 node1/crm: service 'fa:120002': state changed from 'started' to 'relocate' (node = node2, target = node1) +info 280 node1/crm: migrate service 'vm:101' to node 'node1' (running) +info 280 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node1) +info 280 node1/crm: migrate service 'vm:102' to node 'node1' (running) +info 280 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node2, target = node1) info 283 node2/lrm: service fa:120002 - start relocate to node 'node1' info 283 node2/lrm: service fa:120002 - end relocate to node 'node1' +info 283 node2/lrm: service vm:101 - start migrate to node 'node1' +info 283 node2/lrm: service vm:101 - end migrate to node 'node1' +info 283 node2/lrm: service vm:102 - start migrate to node 'node1' +info 283 node2/lrm: service vm:102 - end migrate to node 'node1' info 300 node1/crm: service 'fa:120002': state changed from 'relocate' to 'started' (node = node1) -info 300 node1/crm: migrate service 'vm:101' to node 'node1' (running) -info 300 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node1) -info 300 node1/crm: migrate service 'vm:102' to node 'node1' (running) -info 300 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node2, target = node1) +info 300 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1) +info 300 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1) info 301 node1/lrm: starting service fa:120002 info 301 node1/lrm: service status fa:120002 started -info 303 node2/lrm: service vm:101 - start migrate to node 'node1' -info 303 node2/lrm: service vm:101 - end migrate to node 'node1' -info 303 node2/lrm: service vm:102 - start migrate to node 'node1' -info 303 node2/lrm: service vm:102 - end migrate to node 'node1' +info 301 node1/lrm: starting service vm:101 +info 301 node1/lrm: service status vm:101 started +info 301 node1/lrm: starting service vm:102 +info 301 node1/lrm: service status vm:102 started info 320 node1/crm: relocation policy successful for 'fa:120002' on node 'node1', failed nodes: node2 -info 320 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1) -info 320 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1) -info 321 node1/lrm: starting service vm:101 -info 321 node1/lrm: service status vm:101 started -info 321 node1/lrm: starting service vm:102 -info 321 node1/lrm: service status vm:102 started info 720 hardware: exit simulation - done diff --git a/src/test/test-resource-affinity-strict-positive6/README b/src/test/test-resource-affinity-strict-positive6/README index a6affda3..e174e458 100644 --- a/src/test/test-resource-affinity-strict-positive6/README +++ b/src/test/test-resource-affinity-strict-positive6/README @@ -1,5 +1,2 @@ Test whether two HA resources in positive resource affinity will migrate to the same target node when one of them finishes earlier than the other. - -The current behavior is not correct, because the already migrated HA resource -will be migrated back to the source node. diff --git a/src/test/test-resource-affinity-strict-positive6/log.expect b/src/test/test-resource-affinity-strict-positive6/log.expect index 69f8d867..cbc63a1e 100644 --- a/src/test/test-resource-affinity-strict-positive6/log.expect +++ b/src/test/test-resource-affinity-strict-positive6/log.expect @@ -10,8 +10,6 @@ info 20 node3/crm: status change startup => wait_for_quorum info 20 node3/lrm: status change startup => wait_for_agent_lock info 20 node1/crm: got lock 'ha_manager_lock' info 20 node1/crm: status change wait_for_quorum => master -info 20 node1/crm: migrate service 'vm:101' to node 'node1' (running) -info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node1) info 21 node1/lrm: got lock 'ha_agent_node1_lock' info 21 node1/lrm: status change wait_for_agent_lock => active info 21 node1/lrm: service vm:102 - start migrate to node 'node3' @@ -20,27 +18,9 @@ info 22 node2/crm: status change wait_for_quorum => slave info 24 node3/crm: status change wait_for_quorum => slave info 25 node3/lrm: got lock 'ha_agent_node3_lock' info 25 node3/lrm: status change wait_for_agent_lock => active -info 25 node3/lrm: service vm:101 - start migrate to node 'node1' -info 25 node3/lrm: service vm:101 - end migrate to node 'node1' -info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1) +info 25 node3/lrm: starting service vm:101 +info 25 node3/lrm: service status vm:101 started info 40 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3) -info 40 node1/crm: migrate service 'vm:101' to node 'node3' (running) -info 40 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3) -info 40 node1/crm: migrate service 'vm:102' to node 'node1' (running) -info 40 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node3, target = node1) -info 41 node1/lrm: service vm:101 - start migrate to node 'node3' -info 41 node1/lrm: service vm:101 - end migrate to node 'node3' -info 45 node3/lrm: service vm:102 - start migrate to node 'node1' -info 45 node3/lrm: service vm:102 - end migrate to node 'node1' -info 60 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3) -info 60 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1) -info 60 node1/crm: migrate service 'vm:101' to node 'node1' (running) -info 60 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node1) -info 61 node1/lrm: starting service vm:102 -info 61 node1/lrm: service status vm:102 started -info 65 node3/lrm: service vm:101 - start migrate to node 'node1' -info 65 node3/lrm: service vm:101 - end migrate to node 'node1' -info 80 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1) -info 81 node1/lrm: starting service vm:101 -info 81 node1/lrm: service status vm:101 started +info 45 node3/lrm: starting service vm:102 +info 45 node3/lrm: service status vm:102 started info 620 hardware: exit simulation - done -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel