public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes
@ 2026-04-22 10:00 Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 1/7] manager: warn if HA resources cannot be moved away from maintenance node Daniel Kral
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

As reported by a recent Proxmox forum post [0], there are some cases
where either HA resources are not moved away from maintenance nodes or
are not moved back to the maintenance nodes after these are put out of
maintenance again.

Even though we cannot resolve all situations (for example, the affinity
rules constrain the HA resource so that it cannot be moved anywhere but
the maintenance node), this patch series improves the handling with:

- log warnings if HA resources cannot be moved to a replacement node
- make HA resources with fallback enabled move back to their previous
  maintenance node
- make HA resource bundles move back to their previous maintenance node
- try all available, effective priority classes for an HA resource while
  applying the negative resource affinity rules

The last change makes the system more consistent overall, but might
introduce some unintended node placements in highly constrained
scenarios because of how the HA Manager currently resolves these node
placements individually per-HA resource. This should be improved upon in
a future patch series (this bugzilla entry [1] might also be relevant).

[0] https://forum.proxmox.com/threads/182890/
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7475

Daniel Kral (7):
  manager: warn if HA resources cannot be moved away from maintenance
    node
  test: add test casses for node affinity rules with maintenance mode
  test: add test cases for resource affinity rules with maintenance mode
  manager: make HA resources without failback move back to maintenance
    node
  manager: make HA resource bundles move back to maintenance node
  make get_node_affinity return all priority classes sorted in
    descending order
  manager: try multiple priority classes when applying negative resource
    affinity

 src/PVE/HA/Manager.pm                         | 60 +++++++++++++---
 src/PVE/HA/Rules/NodeAffinity.pm              | 24 ++++---
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  3 +
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  3 +
 .../README                                    |  3 +
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 35 ++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  3 +
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 35 ++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  5 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 54 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  4 ++
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 47 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  5 ++
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 67 ++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  4 ++
 .../README                                    | 10 +++
 .../cmdlist                                   |  4 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 69 +++++++++++++++++++
 .../manager_status                            | 34 +++++++++
 .../rules_config                              |  3 +
 .../service_config                            |  5 ++
 .../README                                    |  9 +++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 54 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  7 ++
 .../service_config                            |  4 ++
 .../README                                    |  7 +-
 .../log.expect                                | 16 +++--
 .../test-stale-maintenance-node/log.expect    |  3 +
 82 files changed, 922 insertions(+), 29 deletions(-)
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/README
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/README
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/service_config
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/cmdlist
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/hardware_status
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/manager_status
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/rules_config
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/service_config

-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH ha-manager 1/7] manager: warn if HA resources cannot be moved away from maintenance node
  2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
@ 2026-04-22 10:00 ` Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 2/7] test: add test casses for node affinity rules with maintenance mode Daniel Kral
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

There are scenarios, where an HA resource cannot be moved away from its
current node when it is in maintenance mode.

Previously, this could only happen in an edge case if the whole cluster
was shutdown at the same time while the 'migrate' policy was configured,
but with affinity rules it is much easier to run into such a scenarios.

While some of these affinity-related scenarios need to be resolved in a
better way, admins should always be warned of such a situation.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/HA/Manager.pm                         | 20 +++++++++++++++----
 .../test-stale-maintenance-node/log.expect    |  3 +++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index b69a6bba..684244e1 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -1562,16 +1562,28 @@ sub next_state_started {
                     my $node_state = $ns->get_node_state($sd->{node});
                     if ($node_state eq 'online') {
                         # Having the maintenance node set here means that the service was never
-                        # started on a different node since it was set. This can happen in the edge
-                        # case that the whole cluster is shut down at the same time while the
-                        # 'migrate' policy was configured. Node is not in maintenance mode anymore
-                        # and service is started on this node, so it's fine to clear the setting.
+                        # started on a different node since it was set.
+                        #
+                        # This can happen if:
+                        # - select_service_node(...) could not find any replacement node for the
+                        #   service while its current node was in maintenance mode, or
+                        # - the whole cluster was shut down at the same time while the 'migrate'
+                        #   policy was configured.
+                        #
+                        # Node is not in maintenance mode anymore and service is started on this
+                        # node, so it's fine to clear the setting.
                         $haenv->log(
                             'info',
                             "service '$sid': clearing stale maintenance node "
                                 . "'$sd->{maintenance_node}' setting (is current node)",
                         );
                         delete $sd->{maintenance_node};
+                    } else {
+                        $haenv->log(
+                            'warning',
+                            "service '$sid': cannot find a replacement node while"
+                                . " its current node is in maintenance",
+                        );
                     }
                 }
 
diff --git a/src/test/test-stale-maintenance-node/log.expect b/src/test/test-stale-maintenance-node/log.expect
index 092db8be..fce96fd4 100644
--- a/src/test/test-stale-maintenance-node/log.expect
+++ b/src/test/test-stale-maintenance-node/log.expect
@@ -33,6 +33,7 @@ info    120    node3/lrm: shutdown LRM, doing maintenance, removing this node fr
 info    120    node1/crm: node 'node1': state changed from 'online' => 'maintenance'
 info    120    node1/crm: node 'node2': state changed from 'online' => 'maintenance'
 info    120    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+warn    120    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
 info    121    node1/lrm: status change active => maintenance
 info    124    node2/lrm: exit (loop end)
 info    124     shutdown: execute crm node2 stop
@@ -40,6 +41,7 @@ info    123    node2/crm: server received shutdown request
 info    126    node3/lrm: exit (loop end)
 info    126     shutdown: execute crm node3 stop
 info    125    node3/crm: server received shutdown request
+warn    140    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
 info    143    node2/crm: exit (loop end)
 info    143     shutdown: execute power node2 off
 info    144    node3/crm: exit (loop end)
@@ -64,6 +66,7 @@ info    220      cmdlist: execute power node3 on
 info    220    node3/crm: status change startup => wait_for_quorum
 info    220    node3/lrm: status change startup => wait_for_agent_lock
 info    220    node1/crm: status change wait_for_quorum => master
+warn    220    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
 info    221    node1/lrm: status change wait_for_agent_lock => active
 info    221    node1/lrm: starting service vm:103
 info    221    node1/lrm: service status vm:103 started
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH ha-manager 2/7] test: add test casses for node affinity rules with maintenance mode
  2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 1/7] manager: warn if HA resources cannot be moved away from maintenance node Daniel Kral
@ 2026-04-22 10:00 ` Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 3/7] test: add test cases for resource " Daniel Kral
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

These test cases document how the HA Manager currently behaves for node
affinity with HA resources with failback enabled and disabled, whose
current nodes are put in maintenance mode and available afterwards
again.

The non-strict node affinity rules do only need single node member test
cases, since these are already multi-priority node affinity rules as the
non-member nodes are added with priority -1.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 ++
 .../service_config                            |  3 ++
 .../README                                    |  3 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 40 ++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 ++
 .../service_config                            |  3 ++
 .../README                                    |  3 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 35 ++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 ++
 .../README                                    |  3 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 35 ++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 ++
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 ++
 .../README                                    |  3 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 40 ++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 ++
 42 files changed, 372 insertions(+)
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/README
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/README
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/service_config

diff --git a/src/test/test-node-affinity-maintenance-nonstrict1/README b/src/test/test-node-affinity-maintenance-nonstrict1/README
new file mode 100644
index 00000000..715e8876
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict1/README
@@ -0,0 +1,4 @@
+Test whether an HA resource with failback enabled in a non-strict node affinity
+rule with a single node member will move to a replacement node if its current
+node is in maintenance mode and moves back to the previous maintenance node as
+soon as it's available again.
diff --git a/src/test/test-node-affinity-maintenance-nonstrict1/cmdlist b/src/test/test-node-affinity-maintenance-nonstrict1/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict1/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-node-affinity-maintenance-nonstrict1/hardware_status b/src/test/test-node-affinity-maintenance-nonstrict1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-maintenance-nonstrict1/log.expect b/src/test/test-node-affinity-maintenance-nonstrict1/log.expect
new file mode 100644
index 00000000..339ce3ab
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict1/log.expect
@@ -0,0 +1,48 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info    140    node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    141    node1/lrm: got lock 'ha_agent_node1_lock'
+info    141    node1/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: service vm:101 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:101 - end migrate to node 'node1'
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info    161    node1/lrm: starting service vm:101
+info    161    node1/lrm: service status vm:101 started
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+info    225    node3/lrm: got lock 'ha_agent_node3_lock'
+info    225    node3/lrm: status change maintenance => active
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: moving service 'vm:101' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info    240    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    241    node1/lrm: service vm:101 - start migrate to node 'node3'
+info    241    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:101
+info    265    node3/lrm: service status vm:101 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-maintenance-nonstrict1/manager_status b/src/test/test-node-affinity-maintenance-nonstrict1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-maintenance-nonstrict1/rules_config b/src/test/test-node-affinity-maintenance-nonstrict1/rules_config
new file mode 100644
index 00000000..f758b512
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict1/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node3
+	nodes node3
+	resources vm:101
diff --git a/src/test/test-node-affinity-maintenance-nonstrict1/service_config b/src/test/test-node-affinity-maintenance-nonstrict1/service_config
new file mode 100644
index 00000000..7f0b1bf9
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/README b/src/test/test-node-affinity-maintenance-nonstrict2/README
new file mode 100644
index 00000000..9af43c11
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/README
@@ -0,0 +1,3 @@
+Test whether an HA resource with failback disabled in a non-strict node
+affinity rule with a single node member will move to a replacement node if its
+current node is in maintenance mode.
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/cmdlist b/src/test/test-node-affinity-maintenance-nonstrict2/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/hardware_status b/src/test/test-node-affinity-maintenance-nonstrict2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/log.expect b/src/test/test-node-affinity-maintenance-nonstrict2/log.expect
new file mode 100644
index 00000000..05a77a24
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/log.expect
@@ -0,0 +1,40 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info    140    node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    141    node1/lrm: got lock 'ha_agent_node1_lock'
+info    141    node1/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: service vm:101 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:101 - end migrate to node 'node1'
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info    161    node1/lrm: starting service vm:101
+info    161    node1/lrm: service status vm:101 started
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+info    225    node3/lrm: got lock 'ha_agent_node3_lock'
+info    225    node3/lrm: status change maintenance => active
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/manager_status b/src/test/test-node-affinity-maintenance-nonstrict2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/rules_config b/src/test/test-node-affinity-maintenance-nonstrict2/rules_config
new file mode 100644
index 00000000..f758b512
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node3
+	nodes node3
+	resources vm:101
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/service_config b/src/test/test-node-affinity-maintenance-nonstrict2/service_config
new file mode 100644
index 00000000..c7266eec
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node3", "state": "started", "failback": 0 }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict1/README b/src/test/test-node-affinity-maintenance-strict1/README
new file mode 100644
index 00000000..a31be5db
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict1/README
@@ -0,0 +1,3 @@
+Test whether an HA resource with failback enabled in a strict node affinity
+rule with a single node member will stay on the current node even though it is
+in maintenance mode, because it cannot find any replacement node.
diff --git a/src/test/test-node-affinity-maintenance-strict1/cmdlist b/src/test/test-node-affinity-maintenance-strict1/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict1/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-node-affinity-maintenance-strict1/hardware_status b/src/test/test-node-affinity-maintenance-strict1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict1/log.expect b/src/test/test-node-affinity-maintenance-strict1/log.expect
new file mode 100644
index 00000000..4bdc9122
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict1/log.expect
@@ -0,0 +1,35 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+warn    140    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+warn    160    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+warn    180    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+warn    200    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+warn    220    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: service 'vm:101': clearing stale maintenance node 'node3' setting (is current node)
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-maintenance-strict1/manager_status b/src/test/test-node-affinity-maintenance-strict1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-maintenance-strict1/rules_config b/src/test/test-node-affinity-maintenance-strict1/rules_config
new file mode 100644
index 00000000..25aa655f
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node3
+	nodes node3
+	resources vm:101
+	strict 1
diff --git a/src/test/test-node-affinity-maintenance-strict1/service_config b/src/test/test-node-affinity-maintenance-strict1/service_config
new file mode 100644
index 00000000..7f0b1bf9
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict2/README b/src/test/test-node-affinity-maintenance-strict2/README
new file mode 100644
index 00000000..8a7f768d
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict2/README
@@ -0,0 +1,3 @@
+Test whether an HA resource with failback disabled in a strict node affinity
+rule with a single node member will stay on the current node even though it is
+in maintenance mode, because it cannot find any replacement node.
diff --git a/src/test/test-node-affinity-maintenance-strict2/cmdlist b/src/test/test-node-affinity-maintenance-strict2/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict2/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-node-affinity-maintenance-strict2/hardware_status b/src/test/test-node-affinity-maintenance-strict2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict2/log.expect b/src/test/test-node-affinity-maintenance-strict2/log.expect
new file mode 100644
index 00000000..4bdc9122
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict2/log.expect
@@ -0,0 +1,35 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+warn    140    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+warn    160    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+warn    180    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+warn    200    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+warn    220    node1/crm: service 'vm:101': cannot find a replacement node while its current node is in maintenance
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: service 'vm:101': clearing stale maintenance node 'node3' setting (is current node)
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-maintenance-strict2/manager_status b/src/test/test-node-affinity-maintenance-strict2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-maintenance-strict2/rules_config b/src/test/test-node-affinity-maintenance-strict2/rules_config
new file mode 100644
index 00000000..25aa655f
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict2/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node3
+	nodes node3
+	resources vm:101
+	strict 1
diff --git a/src/test/test-node-affinity-maintenance-strict2/service_config b/src/test/test-node-affinity-maintenance-strict2/service_config
new file mode 100644
index 00000000..c7266eec
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict2/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node3", "state": "started", "failback": 0 }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict3/README b/src/test/test-node-affinity-maintenance-strict3/README
new file mode 100644
index 00000000..b5f5dfbb
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict3/README
@@ -0,0 +1,4 @@
+Test whether an HA resource with failback enabled in a strict node affinity
+rule with two differently prioritized node members will move to the
+lower-priority node if its current node is in maintenance mode and moves back
+to the previous maintenance node as soon as it's available again.
diff --git a/src/test/test-node-affinity-maintenance-strict3/cmdlist b/src/test/test-node-affinity-maintenance-strict3/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict3/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-node-affinity-maintenance-strict3/hardware_status b/src/test/test-node-affinity-maintenance-strict3/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict3/log.expect b/src/test/test-node-affinity-maintenance-strict3/log.expect
new file mode 100644
index 00000000..0bdf4fa0
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict3/log.expect
@@ -0,0 +1,48 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info    140    node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: service vm:101 - start migrate to node 'node2'
+info    145    node3/lrm: service vm:101 - end migrate to node 'node2'
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:101
+info    163    node2/lrm: service status vm:101 started
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+info    225    node3/lrm: got lock 'ha_agent_node3_lock'
+info    225    node3/lrm: status change maintenance => active
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: moving service 'vm:101' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info    240    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node2, target = node3)
+info    243    node2/lrm: service vm:101 - start migrate to node 'node3'
+info    243    node2/lrm: service vm:101 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:101
+info    265    node3/lrm: service status vm:101 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-maintenance-strict3/manager_status b/src/test/test-node-affinity-maintenance-strict3/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-maintenance-strict3/rules_config b/src/test/test-node-affinity-maintenance-strict3/rules_config
new file mode 100644
index 00000000..12539b76
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict3/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node3-or-node2
+	nodes node2:1,node3:2
+	resources vm:101
+	strict 1
diff --git a/src/test/test-node-affinity-maintenance-strict3/service_config b/src/test/test-node-affinity-maintenance-strict3/service_config
new file mode 100644
index 00000000..7f0b1bf9
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict3/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict4/README b/src/test/test-node-affinity-maintenance-strict4/README
new file mode 100644
index 00000000..43c68463
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict4/README
@@ -0,0 +1,3 @@
+Test whether an HA resource with failback disabled in a strict node affinity
+rule with two differently prioritized node members will move to the
+lower-priority node if its current node is in maintenance mode.
diff --git a/src/test/test-node-affinity-maintenance-strict4/cmdlist b/src/test/test-node-affinity-maintenance-strict4/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict4/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-node-affinity-maintenance-strict4/hardware_status b/src/test/test-node-affinity-maintenance-strict4/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-maintenance-strict4/log.expect b/src/test/test-node-affinity-maintenance-strict4/log.expect
new file mode 100644
index 00000000..6f19258c
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict4/log.expect
@@ -0,0 +1,40 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info    140    node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: service vm:101 - start migrate to node 'node2'
+info    145    node3/lrm: service vm:101 - end migrate to node 'node2'
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:101
+info    163    node2/lrm: service status vm:101 started
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+info    225    node3/lrm: got lock 'ha_agent_node3_lock'
+info    225    node3/lrm: status change maintenance => active
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-maintenance-strict4/manager_status b/src/test/test-node-affinity-maintenance-strict4/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-maintenance-strict4/rules_config b/src/test/test-node-affinity-maintenance-strict4/rules_config
new file mode 100644
index 00000000..12539b76
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict4/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node3-or-node2
+	nodes node2:1,node3:2
+	resources vm:101
+	strict 1
diff --git a/src/test/test-node-affinity-maintenance-strict4/service_config b/src/test/test-node-affinity-maintenance-strict4/service_config
new file mode 100644
index 00000000..c7266eec
--- /dev/null
+++ b/src/test/test-node-affinity-maintenance-strict4/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node3", "state": "started", "failback": 0 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH ha-manager 3/7] test: add test cases for resource affinity rules with maintenance mode
  2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 1/7] manager: warn if HA resources cannot be moved away from maintenance node Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 2/7] test: add test casses for node affinity rules with maintenance mode Daniel Kral
@ 2026-04-22 10:00 ` Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 4/7] manager: make HA resources without failback move back to maintenance node Daniel Kral
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

These test cases document how the HA Manager currently behaves for
positive and negative resource affinity rules as well as resource
affinity rules with node affinity rules mixed, whose relevant nodes are
put in maintenance mode and available afterwards again.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 .../README                                    |  5 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 54 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 ++
 .../service_config                            |  4 ++
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 47 ++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 ++
 .../service_config                            |  5 ++
 .../README                                    |  3 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 51 ++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 ++
 .../service_config                            |  4 ++
 .../README                                    |  9 ++++
 .../cmdlist                                   |  4 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 46 ++++++++++++++++
 .../manager_status                            | 34 ++++++++++++
 .../rules_config                              |  3 ++
 .../service_config                            |  5 ++
 .../README                                    |  8 +++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 41 ++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  7 +++
 .../service_config                            |  4 ++
 35 files changed, 396 insertions(+)
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/service_config
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/cmdlist
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/hardware_status
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/manager_status
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/rules_config
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/service_config

diff --git a/src/test/test-resource-affinity-maintenance-strict-negative1/README b/src/test/test-resource-affinity-maintenance-strict-negative1/README
new file mode 100644
index 00000000..5365ebce
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative1/README
@@ -0,0 +1,5 @@
+Tests whether a strict negative resource affinity rule among two HA resources
+makes the HA resource, whose current node is in maintenance mode, move to a
+replacement node (different node than the other HA resources' node) and moves
+the HA resource back to its previous maintenance node as soon as it's available
+again.
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative1/cmdlist b/src/test/test-resource-affinity-maintenance-strict-negative1/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative1/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative1/hardware_status b/src/test/test-resource-affinity-maintenance-strict-negative1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative1/log.expect b/src/test/test-resource-affinity-maintenance-strict-negative1/log.expect
new file mode 100644
index 00000000..1fc25206
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative1/log.expect
@@ -0,0 +1,54 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node2'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:101
+info     23    node2/lrm: service status vm:101 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:102
+info     25    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info    140    node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info    140    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    141    node1/lrm: got lock 'ha_agent_node1_lock'
+info    141    node1/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: service vm:102 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:102 - end migrate to node 'node1'
+info    160    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info    161    node1/lrm: starting service vm:102
+info    161    node1/lrm: service status vm:102 started
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+info    225    node3/lrm: got lock 'ha_agent_node3_lock'
+info    225    node3/lrm: status change maintenance => active
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: moving service 'vm:102' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:102' to node 'node3' (running)
+info    240    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    241    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    241    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:102
+info    265    node3/lrm: service status vm:102 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative1/manager_status b/src/test/test-resource-affinity-maintenance-strict-negative1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative1/rules_config b/src/test/test-resource-affinity-maintenance-strict-negative1/rules_config
new file mode 100644
index 00000000..20747760
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative1/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: lonely-must-vms-be
+	resources vm:101,vm:102
+	affinity negative
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative1/service_config b/src/test/test-resource-affinity-maintenance-strict-negative1/service_config
new file mode 100644
index 00000000..e42e5c79
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative1/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative2/README b/src/test/test-resource-affinity-maintenance-strict-negative2/README
new file mode 100644
index 00000000..a2102c2f
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative2/README
@@ -0,0 +1,4 @@
+Tests whether a strict negative resource affinity rule among three HA resources
+makes the HA resource, whose current node is in maintenance mode, stay on its
+current node, even though it is in maintenance mode, because it cannot find any
+replacement node.
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative2/cmdlist b/src/test/test-resource-affinity-maintenance-strict-negative2/cmdlist
new file mode 100644
index 00000000..7e577b68
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative2/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative2/hardware_status b/src/test/test-resource-affinity-maintenance-strict-negative2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative2/log.expect b/src/test/test-resource-affinity-maintenance-strict-negative2/log.expect
new file mode 100644
index 00000000..505702f7
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative2/log.expect
@@ -0,0 +1,47 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:102
+info     23    node2/lrm: service status vm:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+warn    140    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
+warn    160    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
+warn    180    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
+warn    200    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+warn    220    node1/crm: service 'vm:103': cannot find a replacement node while its current node is in maintenance
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: service 'vm:103': clearing stale maintenance node 'node3' setting (is current node)
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative2/manager_status b/src/test/test-resource-affinity-maintenance-strict-negative2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative2/rules_config b/src/test/test-resource-affinity-maintenance-strict-negative2/rules_config
new file mode 100644
index 00000000..44e6a02e
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative2/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: lonely-must-vms-be
+	resources vm:101,vm:102,vm:103
+	affinity negative
diff --git a/src/test/test-resource-affinity-maintenance-strict-negative2/service_config b/src/test/test-resource-affinity-maintenance-strict-negative2/service_config
new file mode 100644
index 00000000..4b26f6b4
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-negative2/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/README b/src/test/test-resource-affinity-maintenance-strict-positive1/README
new file mode 100644
index 00000000..4b62e578
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/README
@@ -0,0 +1,3 @@
+Tests whether a strict positive resource affinity rule among two HA resources
+makes both HA resources move to the same replacement node in case their
+current, common node is put in maintenance mode.
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/cmdlist b/src/test/test-resource-affinity-maintenance-strict-positive1/cmdlist
new file mode 100644
index 00000000..97fbc1ef
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/hardware_status b/src/test/test-resource-affinity-maintenance-strict-positive1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect b/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
new file mode 100644
index 00000000..5f91b877
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
@@ -0,0 +1,51 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info     25    node3/lrm: starting service vm:102
+info     25    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+info    140    node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    140    node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info    140    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    141    node1/lrm: got lock 'ha_agent_node1_lock'
+info    141    node1/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: service vm:101 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:101 - end migrate to node 'node1'
+info    145    node3/lrm: service vm:102 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:102 - end migrate to node 'node1'
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info    160    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info    161    node1/lrm: starting service vm:101
+info    161    node1/lrm: service status vm:101 started
+info    161    node1/lrm: starting service vm:102
+info    161    node1/lrm: service status vm:102 started
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+info    225    node3/lrm: got lock 'ha_agent_node3_lock'
+info    225    node3/lrm: status change maintenance => active
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/manager_status b/src/test/test-resource-affinity-maintenance-strict-positive1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/rules_config b/src/test/test-resource-affinity-maintenance-strict-positive1/rules_config
new file mode 100644
index 00000000..9789d7cc
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-must-stick-together
+	resources vm:101,vm:102
+	affinity positive
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/service_config b/src/test/test-resource-affinity-maintenance-strict-positive1/service_config
new file mode 100644
index 00000000..50ef1caa
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/README b/src/test/test-resource-affinity-maintenance-strict-positive2/README
new file mode 100644
index 00000000..32f0942b
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/README
@@ -0,0 +1,9 @@
+Tests whether a strict positive resource affinity rule among three HA
+resources, where two of them are already on a common node but the other HA
+resource is still on another node, makes the former two HA resources move to
+the node of the other HA resource as their current common node is put in
+maintenance mode.
+
+The "skip-round crm 1" command ensures that the HA Manager will not move the
+dislocated, third HA resource to the common node, but make the LRM acknowledge
+its maintenance mode request.
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/cmdlist b/src/test/test-resource-affinity-maintenance-strict-positive2/cmdlist
new file mode 100644
index 00000000..2185ee6e
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on", "skip-round crm 1" ],
+    [ "crm node1 disable-node-maintenance" ]
+]
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/hardware_status b/src/test/test-resource-affinity-maintenance-strict-positive2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect b/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
new file mode 100644
index 00000000..ef63c8ca
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
@@ -0,0 +1,46 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute skip-round crm 1
+info     20     run-loop: skipping CRM round
+info     20    node1/lrm: got lock 'ha_agent_node1_lock'
+info     20    node1/lrm: status change wait_for_agent_lock => active
+info     20    node1/lrm: starting service vm:101
+info     20    node1/lrm: service status vm:101 started
+info     20    node1/lrm: starting service vm:102
+info     20    node1/lrm: service status vm:102 started
+info     22    node3/lrm: got lock 'ha_agent_node3_lock'
+info     22    node3/lrm: status change wait_for_agent_lock => active
+info     22    node3/lrm: starting service vm:103
+info     22    node3/lrm: service status vm:103 started
+info     40    node1/crm: got lock 'ha_manager_lock'
+info     40    node1/crm: status change wait_for_quorum => master
+info     40    node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info     40    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info     40    node1/crm: migrate service 'vm:102' to node 'node3' (running)
+info     40    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info     41    node1/lrm: status change active => maintenance
+info     41    node1/lrm: service vm:101 - start migrate to node 'node3'
+info     41    node1/lrm: service vm:101 - end migrate to node 'node3'
+info     41    node1/lrm: service vm:102 - start migrate to node 'node3'
+info     41    node1/lrm: service vm:102 - end migrate to node 'node3'
+info     42    node2/crm: status change wait_for_quorum => slave
+info     44    node3/crm: status change wait_for_quorum => slave
+info     60    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info     60    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info     65    node3/lrm: starting service vm:101
+info     65    node3/lrm: service status vm:101 started
+info     65    node3/lrm: starting service vm:102
+info     65    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute crm node1 disable-node-maintenance
+info    121    node1/lrm: got lock 'ha_agent_node1_lock'
+info    121    node1/lrm: status change maintenance => active
+info    140    node1/crm: node 'node1': state changed from 'maintenance' => 'online'
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/manager_status b/src/test/test-resource-affinity-maintenance-strict-positive2/manager_status
new file mode 100644
index 00000000..135a1d6f
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/manager_status
@@ -0,0 +1,34 @@
+{
+  "master_node": "node1",
+  "node_request": {
+    "node1": {
+      "maintenance": 1
+    }
+  },
+  "node_status": {
+    "node1": "maintenance",
+    "node2": "online",
+    "node3": "online"
+  },
+  "service_status": {
+    "vm:101": {
+      "running": 1,
+      "node": "node1",
+      "state": "started",
+      "uid": "Xi3T+eaBD4iaN01s65D5/g"
+    },
+    "vm:102": {
+      "running": 1,
+      "node": "node1",
+      "state": "started",
+      "uid": "F2xctkwVsaF2KY9gJYsz6g"
+    },
+    "vm:103": {
+      "running": 1,
+      "node": "node3",
+      "state": "started",
+      "uid": "c5yeDFKYkhMe3Nv+XzmN0A"
+    }
+  },
+  "timestamp": 40
+}
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/rules_config b/src/test/test-resource-affinity-maintenance-strict-positive2/rules_config
new file mode 100644
index 00000000..12da6e67
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-must-stick-together
+	resources vm:101,vm:102,vm:103
+	affinity positive
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/service_config b/src/test/test-resource-affinity-maintenance-strict-positive2/service_config
new file mode 100644
index 00000000..32e61c84
--- /dev/null
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
new file mode 100644
index 00000000..c6a11cec
--- /dev/null
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
@@ -0,0 +1,8 @@
+2 HA resources on a 3-node cluster, which are:
+
+- on node2 and node3 respectively,
+- in a non-strict node affinity rule to node2 and node3 (equal priority), and
+- in a strict negative resource affinity rule with each other.
+
+Tests whether the HA resource on node3 will stay there, even though node3 is
+put in maintenance mode, because it cannot find any replacement node.
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/cmdlist b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/cmdlist
new file mode 100644
index 00000000..97fbc1ef
--- /dev/null
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/cmdlist
@@ -0,0 +1,5 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "crm node3 enable-node-maintenance" ],
+    [ "crm node3 disable-node-maintenance" ]
+]
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/hardware_status b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
new file mode 100644
index 00000000..8899f782
--- /dev/null
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
@@ -0,0 +1,41 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node2'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node3)
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:101
+info     23    node2/lrm: service status vm:101 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:102
+info     25    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute crm node3 enable-node-maintenance
+info    125    node3/lrm: status change active => maintenance
+info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
+warn    140    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
+warn    160    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
+warn    180    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
+warn    200    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
+info    220      cmdlist: execute crm node3 disable-node-maintenance
+warn    220    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
+info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: service 'vm:102': clearing stale maintenance node 'node3' setting (is current node)
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/manager_status b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/rules_config b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/rules_config
new file mode 100644
index 00000000..e5bf3e47
--- /dev/null
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/rules_config
@@ -0,0 +1,7 @@
+node-affinity: vm101-vm102-should-be-on-node2-node3
+	resources vm:101,vm:102
+	nodes node2,node3
+
+resource-affinity: lonely-must-vms-be
+	resources vm:101,vm:102
+	affinity negative
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/service_config b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/service_config
new file mode 100644
index 00000000..e42e5c79
--- /dev/null
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH ha-manager 4/7] manager: make HA resources without failback move back to maintenance node
  2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
                   ` (2 preceding siblings ...)
  2026-04-22 10:00 ` [PATCH ha-manager 3/7] test: add test cases for resource " Daniel Kral
@ 2026-04-22 10:00 ` Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 5/7] manager: make HA resource bundles " Daniel Kral
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

If an HA resource has failback disabled and its current node is put in
maintenance mode, the HA resource will correctly move to a replacement
node.

Though as the previous node is put out of maintenance mode again, the HA
resource will stay on the new node. As HA resources should move back to
their previous maintenance node, do not stay on current node if the HA
resource is not yet on the maintenance node.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/HA/Manager.pm                                     | 1 +
 src/test/test-node-affinity-maintenance-nonstrict2/README | 3 ++-
 .../test-node-affinity-maintenance-nonstrict2/log.expect  | 8 ++++++++
 src/test/test-node-affinity-maintenance-strict4/README    | 3 ++-
 .../test-node-affinity-maintenance-strict4/log.expect     | 8 ++++++++
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 684244e1..795b98c1 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -336,6 +336,7 @@ sub select_service_node {
         $node_preference eq 'none'
         && !$service_conf->{failback}
         && $allowed_nodes->{$current_node}
+        && (!defined($maintenance_fallback) || $maintenance_fallback eq $current_node)
         && PVE::HA::Rules::ResourceAffinity::is_allowed_on_node(
             $together, $separate, $current_node,
         )
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/README b/src/test/test-node-affinity-maintenance-nonstrict2/README
index 9af43c11..056a882d 100644
--- a/src/test/test-node-affinity-maintenance-nonstrict2/README
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/README
@@ -1,3 +1,4 @@
 Test whether an HA resource with failback disabled in a non-strict node
 affinity rule with a single node member will move to a replacement node if its
-current node is in maintenance mode.
+current node is in maintenance mode and moves back to the previous maintenance
+node as soon as it's available again.
diff --git a/src/test/test-node-affinity-maintenance-nonstrict2/log.expect b/src/test/test-node-affinity-maintenance-nonstrict2/log.expect
index 05a77a24..339ce3ab 100644
--- a/src/test/test-node-affinity-maintenance-nonstrict2/log.expect
+++ b/src/test/test-node-affinity-maintenance-nonstrict2/log.expect
@@ -37,4 +37,12 @@ info    220      cmdlist: execute crm node3 disable-node-maintenance
 info    225    node3/lrm: got lock 'ha_agent_node3_lock'
 info    225    node3/lrm: status change maintenance => active
 info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: moving service 'vm:101' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info    240    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    241    node1/lrm: service vm:101 - start migrate to node 'node3'
+info    241    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:101
+info    265    node3/lrm: service status vm:101 started
 info    820     hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-maintenance-strict4/README b/src/test/test-node-affinity-maintenance-strict4/README
index 43c68463..e6ad5c7e 100644
--- a/src/test/test-node-affinity-maintenance-strict4/README
+++ b/src/test/test-node-affinity-maintenance-strict4/README
@@ -1,3 +1,4 @@
 Test whether an HA resource with failback disabled in a strict node affinity
 rule with two differently prioritized node members will move to the
-lower-priority node if its current node is in maintenance mode.
+lower-priority node if its current node is in maintenance mode and moves back
+to the previous maintenance node as soon as it's available again.
diff --git a/src/test/test-node-affinity-maintenance-strict4/log.expect b/src/test/test-node-affinity-maintenance-strict4/log.expect
index 6f19258c..0bdf4fa0 100644
--- a/src/test/test-node-affinity-maintenance-strict4/log.expect
+++ b/src/test/test-node-affinity-maintenance-strict4/log.expect
@@ -37,4 +37,12 @@ info    220      cmdlist: execute crm node3 disable-node-maintenance
 info    225    node3/lrm: got lock 'ha_agent_node3_lock'
 info    225    node3/lrm: status change maintenance => active
 info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: moving service 'vm:101' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info    240    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node2, target = node3)
+info    243    node2/lrm: service vm:101 - start migrate to node 'node3'
+info    243    node2/lrm: service vm:101 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:101
+info    265    node3/lrm: service status vm:101 started
 info    820     hardware: exit simulation - done
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH ha-manager 5/7] manager: make HA resource bundles move back to maintenance node
  2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
                   ` (3 preceding siblings ...)
  2026-04-22 10:00 ` [PATCH ha-manager 4/7] manager: make HA resources without failback move back to maintenance node Daniel Kral
@ 2026-04-22 10:00 ` Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 6/7] make get_node_affinity return all priority classes sorted in descending order Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 7/7] manager: try multiple priority classes when applying negative resource affinity Daniel Kral
  6 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

HA resources in positive resource affinity rules (HA resource bundles)
always prefer their current, common node as soon as at least one of
their HA resources is actively assigned to a node already.

This logic is implemented in apply_positive_resource_affinity(), which
will reduce the node set to only their current, common node.

As the maintenance node is different from the HA resources' current node
(except no replacement node could be found for some reason),
select_service_node() should make the HA resources move to the
maintenance node before apply_positive_resource_affinity().

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/HA/Manager.pm                         | 11 ++++++++-
 .../README                                    |  3 ++-
 .../log.expect                                | 16 +++++++++++++
 .../README                                    |  3 ++-
 .../log.expect                                | 23 +++++++++++++++++++
 5 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 795b98c1..ce5d69a4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -352,11 +352,20 @@ sub select_service_node {
     }
 
     apply_negative_resource_affinity($separate, $pri_nodes);
-    apply_positive_resource_affinity($together, $pri_nodes);
 
+    # fallback to the previous maintenance node if it is available again.
+    #
+    # if the HA resource is in a resource bundle with one of them already running,
+    # then apply_positive_resource_affinity() will reduce the node set to only
+    # their current, common node.
+    # therefore fallback here already as $pri_nodes has already all other
+    # affinity rules applied and the HA resources in the resource bundle share
+    # the same maintenance node.
     return $maintenance_fallback
         if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
 
+    apply_positive_resource_affinity($together, $pri_nodes);
+
     return $current_node if $node_preference eq 'none' && $pri_nodes->{$current_node};
 
     my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/README b/src/test/test-resource-affinity-maintenance-strict-positive1/README
index 4b62e578..ab293cc5 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive1/README
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/README
@@ -1,3 +1,4 @@
 Tests whether a strict positive resource affinity rule among two HA resources
 makes both HA resources move to the same replacement node in case their
-current, common node is put in maintenance mode.
+current, common node is put in maintenance mode and moves them back as the
+previous maintenance node is available again.
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect b/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
index 5f91b877..91637279 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
+++ b/src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
@@ -48,4 +48,20 @@ info    220      cmdlist: execute crm node3 disable-node-maintenance
 info    225    node3/lrm: got lock 'ha_agent_node3_lock'
 info    225    node3/lrm: status change maintenance => active
 info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
+info    240    node1/crm: moving service 'vm:101' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info    240    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    240    node1/crm: moving service 'vm:102' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:102' to node 'node3' (running)
+info    240    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    241    node1/lrm: service vm:101 - start migrate to node 'node3'
+info    241    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    241    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    241    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    260    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:101
+info    265    node3/lrm: service status vm:101 started
+info    265    node3/lrm: starting service vm:102
+info    265    node3/lrm: service status vm:102 started
 info    820     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/README b/src/test/test-resource-affinity-maintenance-strict-positive2/README
index 32f0942b..dcc4c81d 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive2/README
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/README
@@ -2,7 +2,8 @@ Tests whether a strict positive resource affinity rule among three HA
 resources, where two of them are already on a common node but the other HA
 resource is still on another node, makes the former two HA resources move to
 the node of the other HA resource as their current common node is put in
-maintenance mode.
+maintenance mode and moves them back as soon as the previous maintenance node
+is available again.
 
 The "skip-round crm 1" command ensures that the HA Manager will not move the
 dislocated, third HA resource to the common node, but make the LRM acknowledge
diff --git a/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect b/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
index ef63c8ca..9da6d968 100644
--- a/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
+++ b/src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
@@ -43,4 +43,27 @@ info    120      cmdlist: execute crm node1 disable-node-maintenance
 info    121    node1/lrm: got lock 'ha_agent_node1_lock'
 info    121    node1/lrm: status change maintenance => active
 info    140    node1/crm: node 'node1': state changed from 'maintenance' => 'online'
+info    140    node1/crm: moving service 'vm:101' back to 'node1', node came back from maintenance.
+info    140    node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    140    node1/crm: moving service 'vm:102' back to 'node1', node came back from maintenance.
+info    140    node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info    140    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    140    node1/crm: migrate service 'vm:103' to node 'node1' (running)
+info    140    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    145    node3/lrm: service vm:101 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:101 - end migrate to node 'node1'
+info    145    node3/lrm: service vm:102 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:102 - end migrate to node 'node1'
+info    145    node3/lrm: service vm:103 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:103 - end migrate to node 'node1'
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info    160    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info    160    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node1)
+info    161    node1/lrm: starting service vm:101
+info    161    node1/lrm: service status vm:101 started
+info    161    node1/lrm: starting service vm:102
+info    161    node1/lrm: service status vm:102 started
+info    161    node1/lrm: starting service vm:103
+info    161    node1/lrm: service status vm:103 started
 info    720     hardware: exit simulation - done
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH ha-manager 6/7] make get_node_affinity return all priority classes sorted in descending order
  2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
                   ` (4 preceding siblings ...)
  2026-04-22 10:00 ` [PATCH ha-manager 5/7] manager: make HA resource bundles " Daniel Kral
@ 2026-04-22 10:00 ` Daniel Kral
  2026-04-22 10:00 ` [PATCH ha-manager 7/7] manager: try multiple priority classes when applying negative resource affinity Daniel Kral
  6 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

This is in preparation of the next patch, which needs to iterate through
the priority classes from the highest to the lowest priority class.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/HA/Manager.pm            |  7 +++++--
 src/PVE/HA/Rules/NodeAffinity.pm | 24 ++++++++++++++----------
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index ce5d69a4..0d7a2f59 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -194,7 +194,9 @@ sub get_resource_migration_candidates {
         my $current_leader_node = $ss->{$leader_sid}->{node};
         my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
 
-        my (undef, $target_nodes) = get_node_affinity($node_affinity, $leader_sid, $online_nodes);
+        my (undef, $priority_classes) =
+            get_node_affinity($node_affinity, $leader_sid, $online_nodes);
+        my $target_nodes = shift @$priority_classes // {};
         my ($together, $separate) =
             get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
         apply_negative_resource_affinity($separate, $target_nodes);
@@ -325,7 +327,8 @@ sub select_service_node {
         $compiled_rules->@{qw(node-affinity resource-affinity)};
 
     my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
-    my ($allowed_nodes, $pri_nodes) = get_node_affinity($node_affinity, $sid, $online_nodes);
+    my ($allowed_nodes, $priority_classes) = get_node_affinity($node_affinity, $sid, $online_nodes);
+    my $pri_nodes = shift @$priority_classes // {};
 
     return undef if !%$pri_nodes;
 
diff --git a/src/PVE/HA/Rules/NodeAffinity.pm b/src/PVE/HA/Rules/NodeAffinity.pm
index 3fa1fdb4..9b4edbfd 100644
--- a/src/PVE/HA/Rules/NodeAffinity.pm
+++ b/src/PVE/HA/Rules/NodeAffinity.pm
@@ -253,22 +253,26 @@ __PACKAGE__->register_check(
 
 =head3 get_node_affinity($node_affinity, $sid, $online_nodes)
 
-Returns a list of two hashes representing the node affinity of C<$sid>
-according to the node affinity C<$node_affinity> and the available nodes in
-the C<$online_nodes> hash.
+Returns a list of a hash and a list representing the available nodes and the
+priority classes of the node affinity of C<$sid> according to the node affinity
+C<$node_affinity> and the online nodes in the C<$online_nodes> hash.
 
 The first hash is a hash set of available nodes, i.e. nodes where the
-resource C<$sid> is allowed to be assigned to, and the second hash is a hash set
-of preferred nodes, i.e. nodes where the resource C<$sid> should be assigned to.
+resource C<$sid> is allowed to be assigned to.
 
-If there are no available nodes at all, returns C<undef>.
+The second item is a list of hash sets of priority classes sorted from highest
+to lowest priority, where each priority class contains the nodes where the
+resource C<$sid> can be assigned to. This list does not contain effectively
+empty priority classes.
+
+If there are no available nodes at all, returns C<({}, [])>.
 
 =cut
 
 sub get_node_affinity {
     my ($node_affinity, $sid, $online_nodes) = @_;
 
-    return ($online_nodes, $online_nodes) if !defined($node_affinity->{$sid});
+    return ($online_nodes, [$online_nodes]) if !defined($node_affinity->{$sid});
 
     my $allowed_nodes = {};
     my $prioritized_nodes = {};
@@ -283,10 +287,10 @@ sub get_node_affinity {
     }
 
     my $preferred_nodes = {};
-    my $highest_priority = (sort { $b <=> $a } keys %$prioritized_nodes)[0];
-    $preferred_nodes = $prioritized_nodes->{$highest_priority} if defined($highest_priority);
+    my @priority_class_numbers = sort { $b <=> $a } keys %$prioritized_nodes;
+    my $priority_classes = [map { $prioritized_nodes->{$_} } @priority_class_numbers];
 
-    return ($allowed_nodes, $preferred_nodes);
+    return ($allowed_nodes, $priority_classes);
 }
 
 1;
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH ha-manager 7/7] manager: try multiple priority classes when applying negative resource affinity
  2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
                   ` (5 preceding siblings ...)
  2026-04-22 10:00 ` [PATCH ha-manager 6/7] make get_node_affinity return all priority classes sorted in descending order Daniel Kral
@ 2026-04-22 10:00 ` Daniel Kral
  6 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2026-04-22 10:00 UTC (permalink / raw)
  To: pve-devel

select_service_node() only considers the nodes from the highest priority
class in a node affinity rule, which has at least one available node.
If an HA resource does not have any node affinity rule, the highest
priority class is the set of all online nodes.

get_node_affinity() already removes nodes, which are considered as not
online, i.e., currently offline nodes or nodes in maintenance mode.

Negative resource affinity rules introduced a new reason why nodes
become unavailable to specific HA resources: another HA resource is
already running on the node, which this specific HA resource must not
share the node with.

Therefore, try the succeeding priority classes from highest to lowest
priority until one of them results in a non-empty node set or if there
are no priority classes left, an empty node set.

This reduces the amount of cases, where select_service_node() reduces no
node at all and therefore the HA Manager making no change to the HA
resources' node placement, even though it is warranted.

This change is also done when generating migration candidates for the
load balancer, which might allow to find better balancing migrations in
certain highly constrained scenarios.

As seen in "test-resource-affinity-with-node-affinity-strict-negative3",
this can also lead the HA Manager to make more abrupt decisions in
certain highly constrained scenarios, though the end state is still
valid with the semantics of non-strict node affinity rules. Nonetheless,
handling negative affinity rules in these scenarios should be improved
in the future.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/HA/Manager.pm                         | 21 ++++++++++++++--
 .../README                                    |  5 ++--
 .../log.expect                                | 25 ++++++++++++++-----
 .../README                                    |  7 +++---
 .../log.expect                                | 16 ++++++------
 5 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 0d7a2f59..5e0439f3 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -199,7 +199,16 @@ sub get_resource_migration_candidates {
         my $target_nodes = shift @$priority_classes // {};
         my ($together, $separate) =
             get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
-        apply_negative_resource_affinity($separate, $target_nodes);
+
+        # do not consider nodes where HA resources from a possible negative resource
+        # affinity rule are running on.
+        # as such a negative resource affinity could end up emptying the current
+        # priority class, try the succeeding priority classes which result in a
+        # non-empty node set or else end up with an empty set.
+        do {
+            apply_negative_resource_affinity($separate, $target_nodes);
+        } while (keys %$target_nodes < 1 && ($target_nodes = shift @$priority_classes));
+        $target_nodes = {} if !defined($target_nodes);
 
         delete $target_nodes->{$current_leader_node};
 
@@ -354,7 +363,15 @@ sub select_service_node {
         }
     }
 
-    apply_negative_resource_affinity($separate, $pri_nodes);
+    # do not consider nodes where HA resources from a possible negative resource
+    # affinity rule are running on.
+    # as such a negative resource affinity could end up emptying the current
+    # priority class, try the succeeding priority classes which result in a
+    # non-empty node set or else end up with an empty set.
+    do {
+        apply_negative_resource_affinity($separate, $pri_nodes);
+    } while (keys %$pri_nodes < 1 && ($pri_nodes = shift @$priority_classes));
+    $pri_nodes = {} if !defined($pri_nodes);
 
     # fallback to the previous maintenance node if it is available again.
     #
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
index c6a11cec..e1fc0d04 100644
--- a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
@@ -4,5 +4,6 @@
 - in a non-strict node affinity rule to node2 and node3 (equal priority), and
 - in a strict negative resource affinity rule with each other.
 
-Tests whether the HA resource on node3 will stay there, even though node3 is
-put in maintenance mode, because it cannot find any replacement node.
+Tests whether the HA resource on node3 will correctly move to a replacement
+node, which is different from the node of the other HA resource (node2), and
+moves back to its previous maintenance node as soon as it's available again.
diff --git a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
index 8899f782..1fc25206 100644
--- a/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
+++ b/src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
@@ -30,12 +30,25 @@ info     25    node3/lrm: service status vm:102 started
 info    120      cmdlist: execute crm node3 enable-node-maintenance
 info    125    node3/lrm: status change active => maintenance
 info    140    node1/crm: node 'node3': state changed from 'online' => 'maintenance'
-warn    140    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
-warn    160    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
-warn    180    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
-warn    200    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
+info    140    node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info    140    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    141    node1/lrm: got lock 'ha_agent_node1_lock'
+info    141    node1/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: service vm:102 - start migrate to node 'node1'
+info    145    node3/lrm: service vm:102 - end migrate to node 'node1'
+info    160    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info    161    node1/lrm: starting service vm:102
+info    161    node1/lrm: service status vm:102 started
 info    220      cmdlist: execute crm node3 disable-node-maintenance
-warn    220    node1/crm: service 'vm:102': cannot find a replacement node while its current node is in maintenance
+info    225    node3/lrm: got lock 'ha_agent_node3_lock'
+info    225    node3/lrm: status change maintenance => active
 info    240    node1/crm: node 'node3': state changed from 'maintenance' => 'online'
-info    240    node1/crm: service 'vm:102': clearing stale maintenance node 'node3' setting (is current node)
+info    240    node1/crm: moving service 'vm:102' back to 'node3', node came back from maintenance.
+info    240    node1/crm: migrate service 'vm:102' to node 'node3' (running)
+info    240    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    241    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    241    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:102
+info    265    node3/lrm: service status vm:102 started
 info    820     hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-with-node-affinity-strict-negative3/README b/src/test/test-resource-affinity-with-node-affinity-strict-negative3/README
index 062fc665..581b0e9a 100644
--- a/src/test/test-resource-affinity-with-node-affinity-strict-negative3/README
+++ b/src/test/test-resource-affinity-with-node-affinity-strict-negative3/README
@@ -1,7 +1,8 @@
 Test whether a strict negative resource affinity rule among three resources,
-where two resources are restricted each to nodes they are not yet on, can be
-exchanged to the nodes described by their node affinity rules, if one of the
-resources is stopped.
+where all resources are restricted each to nodes they are not yet on, can be
+exchanged to the nodes described by their node affinity rules or fallback to
+another valid configuration within the semantics of non-strict node affinity
+rules, if one of the resources is stopped.
 
 The test scenario is:
 - vm:101, vm:102, and vm:103 should be on node2, node3 or node1 respectively
diff --git a/src/test/test-resource-affinity-with-node-affinity-strict-negative3/log.expect b/src/test/test-resource-affinity-with-node-affinity-strict-negative3/log.expect
index 1ed34c36..66974583 100644
--- a/src/test/test-resource-affinity-with-node-affinity-strict-negative3/log.expect
+++ b/src/test/test-resource-affinity-with-node-affinity-strict-negative3/log.expect
@@ -57,11 +57,13 @@ info    285    node3/lrm: service status vm:102 started
 info    320      cmdlist: execute service vm:101 started
 info    320    node1/crm: service 'vm:101': state changed from 'stopped' to 'request_start'  (node = node1)
 info    320    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
-info    320    node1/crm: migrate service 'vm:101' to node 'node2' (running)
-info    320    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
-info    321    node1/lrm: service vm:101 - start migrate to node 'node2'
-info    321    node1/lrm: service vm:101 - end migrate to node 'node2'
-info    340    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
-info    343    node2/lrm: starting service vm:101
-info    343    node2/lrm: service status vm:101 started
+info    320    node1/crm: migrate service 'vm:103' to node 'node2' (running)
+info    320    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    321    node1/lrm: starting service vm:101
+info    321    node1/lrm: service status vm:101 started
+info    321    node1/lrm: service vm:103 - start migrate to node 'node2'
+info    321    node1/lrm: service vm:103 - end migrate to node 'node2'
+info    340    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node2)
+info    343    node2/lrm: starting service vm:103
+info    343    node2/lrm: service status vm:103 started
 info    920     hardware: exit simulation - done
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-22 10:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-22 10:00 [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 1/7] manager: warn if HA resources cannot be moved away from maintenance node Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 2/7] test: add test casses for node affinity rules with maintenance mode Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 3/7] test: add test cases for resource " Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 4/7] manager: make HA resources without failback move back to maintenance node Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 5/7] manager: make HA resource bundles " Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 6/7] make get_node_affinity return all priority classes sorted in descending order Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 7/7] manager: try multiple priority classes when applying negative resource affinity Daniel Kral

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal