From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager 5/7] manager: make HA resource bundles move back to maintenance node
Date: Wed, 22 Apr 2026 12:00:23 +0200 [thread overview]
Message-ID: <20260422100035.232716-6-d.kral@proxmox.com> (raw)
In-Reply-To: <20260422100035.232716-1-d.kral@proxmox.com>
HA resources in positive resource affinity rules (HA resource bundles)
always prefer their current, common node as soon as at least one of
their HA resources is actively assigned to a node already.
This logic is implemented in apply_positive_resource_affinity(), which
will reduce the node set to only their current, common node.
As the maintenance node is different from the HA resources' current node
(except no replacement node could be found for some reason),
select_service_node() should make the HA resources move to the
maintenance node before apply_positive_resource_affinity().
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 11 ++++++++-
.../README | 3 ++-
.../log.expect | 16 +++++++++++++
.../README | 3 ++-
.../log.expect | 23 +++++++++++++++++++
5 files changed, 53 insertions(+), 3 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 795b98c1..ce5d69a4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -352,11 +352,20 @@ sub select_service_node {
}
apply_negative_resource_affinity($separate, $pri_nodes);
- apply_positive_resource_affinity($together, $pri_nodes);
+ # fallback to the previous maintenance node if it is available again.
+ #
+ # if the HA resource is in a resource bundle with one of them already running,
+ # then apply_positive_resource_affinity() will reduce the node set to only
+ # their current, common node.
+ # therefore fallback here already as $pri_nodes has already all other
+ # affinity rules applied and the HA resources in the resource bundle share
+ # the same maintenance node.
return $maintenance_fallback
if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
+ apply_positive_resource_affinity($together, $pri_nodes);
+
return $current_node if $node_preference eq 'none' && $pri_nodes->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/README b/src/test/test-resource-affinity-maintenance-strict-positive1/README
index 4b62e578..ab293cc5 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive1/README
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/README
@@ -1,3 +1,4 @@
Tests whether a strict positive resource affinity rule among two HA resources
makes both HA resources move to the same replacement node in case their
-current, common node is put in maintenance mode.
+current, common node is put in maintenance mode and moves them back as the
+previous maintenance node is available again.
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect b/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
index 5f91b877..91637279 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
@@ -48,4 +48,20 @@ info 220 cmdlist: execute crm node3 disable-node-maintenance
info 225 node3/lrm: got lock 'ha_agent_node3_lock'
info 225 node3/lrm: status change maintenance => active
info 240 node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info 240 node1/crm: moving service 'vm:101' back to 'node3', node came back from maintenance.
+info 240 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 240 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 240 node1/crm: moving service 'vm:102' back to 'node3', node came back from maintenance.
+info 240 node1/crm: migrate service 'vm:102' to node 'node3' (running)
+info 240 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 241 node1/lrm: service vm:101 - start migrate to node 'node3'
+info 241 node1/lrm: service vm:101 - end migrate to node 'node3'
+info 241 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 241 node1/lrm: service vm:102 - end migrate to node 'node3'
+info 260 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 260 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 265 node3/lrm: starting service vm:101
+info 265 node3/lrm: service status vm:101 started
+info 265 node3/lrm: starting service vm:102
+info 265 node3/lrm: service status vm:102 started
info 820 hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/README b/src/test/test-resource-affinity-maintenance-strict-positive2/README
index 32f0942b..dcc4c81d 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive2/README
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/README
@@ -2,7 +2,8 @@ Tests whether a strict positive resource affinity rule among three HA
resources, where two of them are already on a common node but the other HA
resource is still on another node, makes the former two HA resources move to
the node of the other HA resource as their current common node is put in
-maintenance mode.
+maintenance mode and moves them back as soon as the previous maintenance node
+is available again.
The "skip-round crm 1" command ensures that the HA Manager will not move the
dislocated, third HA resource to the common node, but make the LRM acknowledge
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect b/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
index ef63c8ca..9da6d968 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
@@ -43,4 +43,27 @@ info 120 cmdlist: execute crm node1 disable-node-maintenance
info 121 node1/lrm: got lock 'ha_agent_node1_lock'
info 121 node1/lrm: status change maintenance => active
info 140 node1/crm: node 'node1': state changed from 'maintenance' => 'online'
+info 140 node1/crm: moving service 'vm:101' back to 'node1', node came back from maintenance.
+info 140 node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 140 node1/crm: moving service 'vm:102' back to 'node1', node came back from maintenance.
+info 140 node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info 140 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 140 node1/crm: migrate service 'vm:103' to node 'node1' (running)
+info 140 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 145 node3/lrm: service vm:101 - start migrate to node 'node1'
+info 145 node3/lrm: service vm:101 - end migrate to node 'node1'
+info 145 node3/lrm: service vm:102 - start migrate to node 'node1'
+info 145 node3/lrm: service vm:102 - end migrate to node 'node1'
+info 145 node3/lrm: service vm:103 - start migrate to node 'node1'
+info 145 node3/lrm: service vm:103 - end migrate to node 'node1'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
+info 160 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
+info 160 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 161 node1/lrm: starting service vm:101
+info 161 node1/lrm: service status vm:101 started
+info 161 node1/lrm: starting service vm:102
+info 161 node1/lrm: service status vm:102 started
+info 161 node1/lrm: starting service vm:103
+info 161 node1/lrm: service status vm:103 started
info 720 hardware: exit simulation - done
--
2.47.3
next prev parent reply other threads:[~2026-04-22 10:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 1/7] manager: warn if HA resources cannot be moved away from maintenance node Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 2/7] test: add test casses for node affinity rules with maintenance mode Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 3/7] test: add test cases for resource " Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 4/7] manager: make HA resources without failback move back to maintenance node Daniel Kral
2026-04-22 10:00 ` Daniel Kral [this message]
2026-04-22 10:00 ` [PATCH ha-manager 6/7] make get_node_affinity return all priority classes sorted in descending order Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 7/7] manager: try multiple priority classes when applying negative resource affinity Daniel Kral
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260422100035.232716-6-d.kral@proxmox.com \
--to=d.kral@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox