* [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules
@ 2025-07-04 18:16 Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH cluster v3 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
` (19 more replies)
0 siblings, 20 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
RFC v1: https://lore.proxmox.com/pve-devel/20250325151254.193177-1-d.kral@proxmox.com/
RFC v2: https://lore.proxmox.com/pve-devel/20250620143148.218469-1-d.kral@proxmox.com/
I've separated the core HA Rules module and the transformation from HA
groups to HA Node Affinity rules (formerly known as HA Location rules)
in this patch series, to reduce the overhead for reviewers and strive
for a better version history, as changing two things at a time is rather
confusing.
The main things that have changed since the last version (v2):
- split up the patch series (ofc)
- rebased on newest available master
- renamed "HA Location Rule" to "HA Node Affinity Rule"
- renamed any reference of a 'HA service' to 'HA resource' (e.g. rules
property 'services' is now 'resources')
- converted tri-state property 'state' to a binary 'disable' flag on HA
rules and expose the 'contradictory' state with an 'errors' hash
- remove the "use-location-rules" feature flag and implement a more
straightforward ha groups migration (more on that below)
- remove any reference of ha groups from the web interface
As before, HA groups are migrated to HA node affinity rules in each HA
Manager round where something has changed about the HA groups / HA
resources config file, but these are now unconditionally done as soon as
a HA Manager runs with that version. It will also try to persistently
migrate these, but that will only be successful as soon as all other
nodes are upgraded (i.e. every node can run at least the HA Manager
version that can successfully parse and apply the HA rules).
There are still some things left to do, which I didn't get the time to
come around to do for this revision:
- Testing, testing, testing
- I've ran out of time on the persistent HA groups migration part, which
has at least the two TODOs, which are mentioned in the patch itself,
and I haven't tested them on any real PVE upgrade yet; It's more of a
draft on how the migration should potentially work
- Also, the last patch for the persistent HA groups migration part will
fail the tests but the two that have been added, because of the way
the other tests are designed; that should be abstracted away in the HA
environment, e.g., a routine "have_groups_been_migrated" for PVE2/Sim.
- There might be a bit too many in-memory group migrations on the HA
Rules API side now, but better safe then sorry, maybe they can be
removed later; however, these shouldn't overwrite the rules that come
from the config, I haven't checked on that yet
- Should the HA Groups API (and the HA Resources 'group' property in the
HA Resources API) be removed now? Or should these stay and uses of
them make auto-migrations to the HA Rules?
As in the previous revisions, I've run a
git rebase master --exec 'make clean && make deb'
on the series, so the tests should work for every patch.
cluster:
Daniel Kral (1):
cfs: add 'ha/rules.cfg' to observed files
src/PVE/Cluster.pm | 1 +
src/pmxcfs/status.c | 1 +
2 files changed, 2 insertions(+)
base-commit: 60e36c87b0fffe6dbdd5b1be72a9273b6f7cec2b
prerequisite-patch-id: 50b1021d35ecf86562d33dc6068c90e219557ab7
prerequisite-patch-id: 0374f409a039eebe9dd7587d6c018ef71ac2c67d
prerequisite-patch-id: d17849368da2aa61fcab9e08235f8673a2d0258e
ha-manager:
Daniel Kral (15):
tree-wide: make arguments for select_service_node explicit
manager: improve signature of select_service_node
introduce rules base plugin
rules: introduce node affinity rule plugin
config, env, hw: add rules read and parse methods
config: delete services from rules if services are deleted from config
manager: read and update rules config
test: ha tester: add test cases for future node affinity rules
resources: introduce failback property in ha resource config
manager: migrate ha groups to node affinity rules in-memory
manager: apply node affinity rules when selecting service nodes
test: add test cases for rules config
api: introduce ha rules api endpoints
cli: expose ha rules api endpoints to ha-manager cli
manager: persistently migrate ha groups to ha rules
.gitignore | 1 +
debian/pve-ha-manager.install | 3 +
src/PVE/API2/HA/Makefile | 2 +-
src/PVE/API2/HA/Resources.pm | 9 +
src/PVE/API2/HA/Rules.pm | 391 +++++++++++++++
src/PVE/API2/HA/Status.pm | 11 +-
src/PVE/CLI/ha_manager.pm | 32 ++
src/PVE/HA/Config.pm | 58 ++-
src/PVE/HA/Env.pm | 30 ++
src/PVE/HA/Env/PVE2.pm | 40 ++
src/PVE/HA/Groups.pm | 48 ++
src/PVE/HA/Makefile | 3 +-
src/PVE/HA/Manager.pm | 259 ++++++----
src/PVE/HA/Resources.pm | 9 +
src/PVE/HA/Resources/PVECT.pm | 1 +
src/PVE/HA/Resources/PVEVM.pm | 1 +
src/PVE/HA/Rules.pm | 455 ++++++++++++++++++
src/PVE/HA/Rules/Makefile | 6 +
src/PVE/HA/Rules/NodeAffinity.pm | 296 ++++++++++++
src/PVE/HA/Sim/Env.pm | 44 ++
src/PVE/HA/Sim/Hardware.pm | 44 ++
src/PVE/HA/Tools.pm | 46 ++
src/test/Makefile | 4 +-
.../defaults-for-node-affinity-rules.cfg | 22 +
...efaults-for-node-affinity-rules.cfg.expect | 60 +++
...e-resource-refs-in-node-affinity-rules.cfg | 31 ++
...rce-refs-in-node-affinity-rules.cfg.expect | 63 +++
src/test/test-group-migrate1/README | 10 +
src/test/test-group-migrate1/cmdlist | 3 +
src/test/test-group-migrate1/groups | 7 +
src/test/test-group-migrate1/hardware_status | 5 +
src/test/test-group-migrate1/log.expect | 306 ++++++++++++
src/test/test-group-migrate1/manager_status | 1 +
src/test/test-group-migrate1/service_config | 5 +
src/test/test-group-migrate2/README | 10 +
src/test/test-group-migrate2/cmdlist | 3 +
src/test/test-group-migrate2/groups | 7 +
src/test/test-group-migrate2/hardware_status | 5 +
src/test/test-group-migrate2/log.expect | 47 ++
src/test/test-group-migrate2/manager_status | 1 +
src/test/test-group-migrate2/service_config | 5 +
src/test/test-node-affinity-nonstrict1/README | 10 +
.../test-node-affinity-nonstrict1/cmdlist | 4 +
src/test/test-node-affinity-nonstrict1/groups | 2 +
.../hardware_status | 5 +
.../test-node-affinity-nonstrict1/log.expect | 40 ++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict2/README | 12 +
.../test-node-affinity-nonstrict2/cmdlist | 4 +
src/test/test-node-affinity-nonstrict2/groups | 3 +
.../hardware_status | 5 +
.../test-node-affinity-nonstrict2/log.expect | 35 ++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict3/README | 10 +
.../test-node-affinity-nonstrict3/cmdlist | 4 +
src/test/test-node-affinity-nonstrict3/groups | 2 +
.../hardware_status | 5 +
.../test-node-affinity-nonstrict3/log.expect | 56 +++
.../manager_status | 1 +
.../service_config | 5 +
src/test/test-node-affinity-nonstrict4/README | 14 +
.../test-node-affinity-nonstrict4/cmdlist | 4 +
src/test/test-node-affinity-nonstrict4/groups | 2 +
.../hardware_status | 5 +
.../test-node-affinity-nonstrict4/log.expect | 54 +++
.../manager_status | 1 +
.../service_config | 5 +
src/test/test-node-affinity-nonstrict5/README | 16 +
.../test-node-affinity-nonstrict5/cmdlist | 5 +
src/test/test-node-affinity-nonstrict5/groups | 2 +
.../hardware_status | 5 +
.../test-node-affinity-nonstrict5/log.expect | 66 +++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict6/README | 14 +
.../test-node-affinity-nonstrict6/cmdlist | 5 +
src/test/test-node-affinity-nonstrict6/groups | 3 +
.../hardware_status | 5 +
.../test-node-affinity-nonstrict6/log.expect | 52 ++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-strict1/README | 10 +
src/test/test-node-affinity-strict1/cmdlist | 4 +
src/test/test-node-affinity-strict1/groups | 3 +
.../hardware_status | 5 +
.../test-node-affinity-strict1/log.expect | 40 ++
.../test-node-affinity-strict1/manager_status | 1 +
.../test-node-affinity-strict1/service_config | 3 +
src/test/test-node-affinity-strict2/README | 11 +
src/test/test-node-affinity-strict2/cmdlist | 4 +
src/test/test-node-affinity-strict2/groups | 4 +
.../hardware_status | 5 +
.../test-node-affinity-strict2/log.expect | 40 ++
.../test-node-affinity-strict2/manager_status | 1 +
.../test-node-affinity-strict2/service_config | 3 +
src/test/test-node-affinity-strict3/README | 10 +
src/test/test-node-affinity-strict3/cmdlist | 4 +
src/test/test-node-affinity-strict3/groups | 3 +
.../hardware_status | 5 +
.../test-node-affinity-strict3/log.expect | 74 +++
.../test-node-affinity-strict3/manager_status | 1 +
.../test-node-affinity-strict3/service_config | 5 +
src/test/test-node-affinity-strict4/README | 14 +
src/test/test-node-affinity-strict4/cmdlist | 4 +
src/test/test-node-affinity-strict4/groups | 3 +
.../hardware_status | 5 +
.../test-node-affinity-strict4/log.expect | 54 +++
.../test-node-affinity-strict4/manager_status | 1 +
.../test-node-affinity-strict4/service_config | 5 +
src/test/test-node-affinity-strict5/README | 16 +
src/test/test-node-affinity-strict5/cmdlist | 5 +
src/test/test-node-affinity-strict5/groups | 3 +
.../hardware_status | 5 +
.../test-node-affinity-strict5/log.expect | 66 +++
.../test-node-affinity-strict5/manager_status | 1 +
.../test-node-affinity-strict5/service_config | 3 +
src/test/test-node-affinity-strict6/README | 14 +
src/test/test-node-affinity-strict6/cmdlist | 5 +
src/test/test-node-affinity-strict6/groups | 4 +
.../hardware_status | 5 +
.../test-node-affinity-strict6/log.expect | 52 ++
.../test-node-affinity-strict6/manager_status | 1 +
.../test-node-affinity-strict6/service_config | 3 +
src/test/test_failover1.pl | 27 +-
src/test/test_rules_config.pl | 100 ++++
127 files changed, 3398 insertions(+), 95 deletions(-)
create mode 100644 src/PVE/API2/HA/Rules.pm
create mode 100644 src/PVE/HA/Rules.pm
create mode 100644 src/PVE/HA/Rules/Makefile
create mode 100644 src/PVE/HA/Rules/NodeAffinity.pm
create mode 100644 src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg
create mode 100644 src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg
create mode 100644 src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect
create mode 100644 src/test/test-group-migrate1/README
create mode 100644 src/test/test-group-migrate1/cmdlist
create mode 100644 src/test/test-group-migrate1/groups
create mode 100644 src/test/test-group-migrate1/hardware_status
create mode 100644 src/test/test-group-migrate1/log.expect
create mode 100644 src/test/test-group-migrate1/manager_status
create mode 100644 src/test/test-group-migrate1/service_config
create mode 100644 src/test/test-group-migrate2/README
create mode 100644 src/test/test-group-migrate2/cmdlist
create mode 100644 src/test/test-group-migrate2/groups
create mode 100644 src/test/test-group-migrate2/hardware_status
create mode 100644 src/test/test-group-migrate2/log.expect
create mode 100644 src/test/test-group-migrate2/manager_status
create mode 100644 src/test/test-group-migrate2/service_config
create mode 100644 src/test/test-node-affinity-nonstrict1/README
create mode 100644 src/test/test-node-affinity-nonstrict1/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict1/groups
create mode 100644 src/test/test-node-affinity-nonstrict1/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict1/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict1/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict1/service_config
create mode 100644 src/test/test-node-affinity-nonstrict2/README
create mode 100644 src/test/test-node-affinity-nonstrict2/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict2/groups
create mode 100644 src/test/test-node-affinity-nonstrict2/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict2/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict2/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict2/service_config
create mode 100644 src/test/test-node-affinity-nonstrict3/README
create mode 100644 src/test/test-node-affinity-nonstrict3/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict3/groups
create mode 100644 src/test/test-node-affinity-nonstrict3/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict3/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict3/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict3/service_config
create mode 100644 src/test/test-node-affinity-nonstrict4/README
create mode 100644 src/test/test-node-affinity-nonstrict4/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict4/groups
create mode 100644 src/test/test-node-affinity-nonstrict4/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict4/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict4/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict4/service_config
create mode 100644 src/test/test-node-affinity-nonstrict5/README
create mode 100644 src/test/test-node-affinity-nonstrict5/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict5/groups
create mode 100644 src/test/test-node-affinity-nonstrict5/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict5/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict5/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict5/service_config
create mode 100644 src/test/test-node-affinity-nonstrict6/README
create mode 100644 src/test/test-node-affinity-nonstrict6/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict6/groups
create mode 100644 src/test/test-node-affinity-nonstrict6/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict6/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict6/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict6/service_config
create mode 100644 src/test/test-node-affinity-strict1/README
create mode 100644 src/test/test-node-affinity-strict1/cmdlist
create mode 100644 src/test/test-node-affinity-strict1/groups
create mode 100644 src/test/test-node-affinity-strict1/hardware_status
create mode 100644 src/test/test-node-affinity-strict1/log.expect
create mode 100644 src/test/test-node-affinity-strict1/manager_status
create mode 100644 src/test/test-node-affinity-strict1/service_config
create mode 100644 src/test/test-node-affinity-strict2/README
create mode 100644 src/test/test-node-affinity-strict2/cmdlist
create mode 100644 src/test/test-node-affinity-strict2/groups
create mode 100644 src/test/test-node-affinity-strict2/hardware_status
create mode 100644 src/test/test-node-affinity-strict2/log.expect
create mode 100644 src/test/test-node-affinity-strict2/manager_status
create mode 100644 src/test/test-node-affinity-strict2/service_config
create mode 100644 src/test/test-node-affinity-strict3/README
create mode 100644 src/test/test-node-affinity-strict3/cmdlist
create mode 100644 src/test/test-node-affinity-strict3/groups
create mode 100644 src/test/test-node-affinity-strict3/hardware_status
create mode 100644 src/test/test-node-affinity-strict3/log.expect
create mode 100644 src/test/test-node-affinity-strict3/manager_status
create mode 100644 src/test/test-node-affinity-strict3/service_config
create mode 100644 src/test/test-node-affinity-strict4/README
create mode 100644 src/test/test-node-affinity-strict4/cmdlist
create mode 100644 src/test/test-node-affinity-strict4/groups
create mode 100644 src/test/test-node-affinity-strict4/hardware_status
create mode 100644 src/test/test-node-affinity-strict4/log.expect
create mode 100644 src/test/test-node-affinity-strict4/manager_status
create mode 100644 src/test/test-node-affinity-strict4/service_config
create mode 100644 src/test/test-node-affinity-strict5/README
create mode 100644 src/test/test-node-affinity-strict5/cmdlist
create mode 100644 src/test/test-node-affinity-strict5/groups
create mode 100644 src/test/test-node-affinity-strict5/hardware_status
create mode 100644 src/test/test-node-affinity-strict5/log.expect
create mode 100644 src/test/test-node-affinity-strict5/manager_status
create mode 100644 src/test/test-node-affinity-strict5/service_config
create mode 100644 src/test/test-node-affinity-strict6/README
create mode 100644 src/test/test-node-affinity-strict6/cmdlist
create mode 100644 src/test/test-node-affinity-strict6/groups
create mode 100644 src/test/test-node-affinity-strict6/hardware_status
create mode 100644 src/test/test-node-affinity-strict6/log.expect
create mode 100644 src/test/test-node-affinity-strict6/manager_status
create mode 100644 src/test/test-node-affinity-strict6/service_config
create mode 100755 src/test/test_rules_config.pl
base-commit: 264dc2c58d145394219f82f25d41f4fc438c4dc4
prerequisite-patch-id: 530b875c25a6bded1cc2294960cf465d5c2bcbca
docs:
Daniel Kral (1):
ha: add documentation about ha rules and ha node affinity rules
Makefile | 2 +
gen-ha-rules-node-affinity-opts.pl | 20 ++++++
gen-ha-rules-opts.pl | 17 +++++
ha-manager.adoc | 103 +++++++++++++++++++++++++++++
ha-rules-node-affinity-opts.adoc | 18 +++++
ha-rules-opts.adoc | 12 ++++
pmxcfs.adoc | 1 +
7 files changed, 173 insertions(+)
create mode 100755 gen-ha-rules-node-affinity-opts.pl
create mode 100755 gen-ha-rules-opts.pl
create mode 100644 ha-rules-node-affinity-opts.adoc
create mode 100644 ha-rules-opts.adoc
base-commit: 7cc17ee5950a53bbd5b5ad81270352ccdb1c541c
prerequisite-patch-id: 92556cd6c1edfb88b397ae244d7dcd56876cd8fb
manager:
Daniel Kral (3):
api: ha: add ha rules api endpoints
ui: ha: remove ha groups from ha resource components
ui: ha: show failback flag in resources status view
PVE/API2/HAConfig.pm | 8 +++++++-
www/manager6/ha/ResourceEdit.js | 16 ++++++++++++----
www/manager6/ha/Resources.js | 17 +++--------------
www/manager6/ha/StatusView.js | 5 ++++-
4 files changed, 26 insertions(+), 20 deletions(-)
base-commit: c0cbe76ee90e7110934c50414bc22371cf13c01a
prerequisite-patch-id: ec6a39936719cfe38787fccb1a80af6378980723
Summary over all repositories:
140 files changed, 3599 insertions(+), 115 deletions(-)
--
Generated by git-murpp 0.8.0
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH cluster v3 1/1] cfs: add 'ha/rules.cfg' to observed files
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 01/15] tree-wide: make arguments for select_service_node explicit Daniel Kral
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/Cluster.pm | 1 +
src/pmxcfs/status.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index 3b1de57..9ec4f66 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -69,6 +69,7 @@ my $observed = {
'ha/crm_commands' => 1,
'ha/manager_status' => 1,
'ha/resources.cfg' => 1,
+ 'ha/rules.cfg' => 1,
'ha/groups.cfg' => 1,
'ha/fence.cfg' => 1,
'status.cfg' => 1,
diff --git a/src/pmxcfs/status.c b/src/pmxcfs/status.c
index 0895e53..38316b4 100644
--- a/src/pmxcfs/status.c
+++ b/src/pmxcfs/status.c
@@ -97,6 +97,7 @@ static memdb_change_t memdb_change_array[] = {
{.path = "ha/crm_commands"},
{.path = "ha/manager_status"},
{.path = "ha/resources.cfg"},
+ {.path = "ha/rules.cfg"},
{.path = "ha/groups.cfg"},
{.path = "ha/fence.cfg"},
{.path = "status.cfg"},
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 01/15] tree-wide: make arguments for select_service_node explicit
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH cluster v3 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 02/15] manager: improve signature of select_service_node Daniel Kral
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Explicitly state all the parameters at all call sites for
select_service_node(...) to clarify in which states these are.
The call site in next_state_recovery(...) sets $best_scored to 1, as it
should find the next best node when recovering from the failed node
$current_node. All references to $best_scored in select_service_node()
are there to check whether $current_node can be selected, but as
$current_node is not available anyway, so this change should not change
the result of select_service_node(...).
Otherwise, $sd->{failed_nodes} and $sd->{maintenance_node} should
contain only the failed $current_node in next_state_recovery(...), and
therefore both can be passed as these should be impossible states here
anyway. A cleaner way could be to explicitly remove them beforehand or
do extra checks in select_service_node(...).
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 11 ++++++++++-
src/test/test_failover1.pl | 15 ++++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 12292e6..85f2b1a 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -971,6 +971,7 @@ sub next_state_started {
$try_next,
$sd->{failed_nodes},
$sd->{maintenance_node},
+ 0, # best_score
);
if ($node && ($sd->{node} ne $node)) {
@@ -1083,7 +1084,15 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
my $recovery_node = select_service_node(
- $self->{groups}, $self->{online_node_usage}, $sid, $cd, $sd->{node},
+ $self->{groups},
+ $self->{online_node_usage},
+ $sid,
+ $cd,
+ $sd->{node},
+ 0, # try_next
+ $sd->{failed_nodes},
+ $sd->{maintenance_node},
+ 1, # best_score
);
if ($recovery_node) {
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 371bdcf..2478b2b 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -24,13 +24,26 @@ my $service_conf = {
group => 'prefer_node1',
};
+my $sd = {
+ failed_nodes => undef,
+ maintenance_node => undef,
+};
+
my $current_node = $service_conf->{node};
sub test {
my ($expected_node, $try_next) = @_;
my $node = PVE::HA::Manager::select_service_node(
- $groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next,
+ $groups,
+ $online_node_usage,
+ "vm:111",
+ $service_conf,
+ $current_node,
+ $try_next,
+ $sd->{failed_nodes},
+ $sd->{maintenance_node},
+ 0, # best_score
);
my (undef, undef, $line) = caller();
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 02/15] manager: improve signature of select_service_node
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH cluster v3 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 01/15] tree-wide: make arguments for select_service_node explicit Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 03/15] introduce rules base plugin Daniel Kral
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
As the signature of select_service_node(...) has become rather long
already, make it more compact by retrieving service- and
affinity-related data directly from the service state in $sd and
introduce a $node_preference parameter to distinguish the behaviors of
$try_next and $best_scored, which have already been mutually exclusive
before.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 79 +++++++++++++++++++++-----------------
src/test/test_failover1.pl | 17 +++-----
2 files changed, 49 insertions(+), 47 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 85f2b1a..c57a280 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -149,18 +149,37 @@ sub get_node_priority_groups {
return ($pri_groups, $group_members);
}
+=head3 select_service_node(...)
+
+=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference)
+
+Used to select the best fitting node for the service C<$sid>, with the
+configuration C<$service_conf> and state C<$sd>, according to the groups defined
+in C<$groups>, available node utilization in C<$online_node_usage>, and the
+given C<$node_preference>.
+
+The C<$node_preference> can be set to:
+
+=over
+
+=item C<'none'>: Try to stay on the current node as much as possible.
+
+=item C<'best-score'>: Try to select the best-scored node.
+
+=item C<'try-next'>: Try to select the best-scored node, which is not in C<< $sd->{failed_nodes} >>.
+
+=back
+
+=cut
+
sub select_service_node {
- my (
- $groups,
- $online_node_usage,
- $sid,
- $service_conf,
- $current_node,
- $try_next,
- $tried_nodes,
- $maintenance_fallback,
- $best_scored,
- ) = @_;
+ my ($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference) = @_;
+
+ die "'$node_preference' is not a valid node_preference for select_service_node\n"
+ if $node_preference !~ m/(none|best-score|try-next)/;
+
+ my ($current_node, $tried_nodes, $maintenance_fallback) =
+ $sd->@{qw(node failed_nodes maintenance_node)};
my $group = get_service_group($groups, $online_node_usage, $service_conf);
@@ -171,7 +190,7 @@ sub select_service_node {
# stay on current node if possible (avoids random migrations)
if (
- (!$try_next && !$best_scored)
+ $node_preference eq 'none'
&& $group->{nofailback}
&& defined($group_members->{$current_node})
) {
@@ -183,7 +202,7 @@ sub select_service_node {
my $top_pri = $pri_list[0];
# try to avoid nodes where the service failed already if we want to relocate
- if ($try_next) {
+ if ($node_preference eq 'try-next') {
foreach my $node (@$tried_nodes) {
delete $pri_groups->{$top_pri}->{$node};
}
@@ -192,8 +211,7 @@ sub select_service_node {
return $maintenance_fallback
if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
- return $current_node
- if (!$try_next && !$best_scored) && $pri_groups->{$top_pri}->{$current_node};
+ return $current_node if $node_preference eq 'none' && $pri_groups->{$top_pri}->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
@@ -208,8 +226,8 @@ sub select_service_node {
}
}
- if ($try_next) {
- if (!$best_scored && defined($found) && ($found < (scalar(@nodes) - 1))) {
+ if ($node_preference eq 'try-next') {
+ if (defined($found) && ($found < (scalar(@nodes) - 1))) {
return $nodes[$found + 1];
} else {
return $nodes[0];
@@ -797,11 +815,8 @@ sub next_state_request_start {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- 0, # try_next
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 1, # best_score
+ $sd,
+ 'best-score',
);
my $select_text = $selected_node ne $current_node ? 'new' : 'current';
$haenv->log(
@@ -901,7 +916,7 @@ sub next_state_started {
} else {
- my $try_next = 0;
+ my $select_node_preference = 'none';
if ($lrm_res) {
@@ -932,7 +947,7 @@ sub next_state_started {
if (scalar(@{ $sd->{failed_nodes} }) <= $cd->{max_relocate}) {
# tell select_service_node to relocate if possible
- $try_next = 1;
+ $select_node_preference = 'try-next';
$haenv->log(
'warning',
@@ -967,11 +982,8 @@ sub next_state_started {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- $try_next,
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 0, # best_score
+ $sd,
+ $select_node_preference,
);
if ($node && ($sd->{node} ne $node)) {
@@ -1009,7 +1021,7 @@ sub next_state_started {
);
}
} else {
- if ($try_next && !defined($node)) {
+ if ($select_node_preference eq 'try-next' && !defined($node)) {
$haenv->log(
'warning',
"Start Error Recovery: Tried all available nodes for service '$sid', retry"
@@ -1088,11 +1100,8 @@ sub next_state_recovery {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- 0, # try_next
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 1, # best_score
+ $sd,
+ 'best-score',
);
if ($recovery_node) {
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 2478b2b..29b56c6 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -25,32 +25,25 @@ my $service_conf = {
};
my $sd = {
+ node => $service_conf->{node},
failed_nodes => undef,
maintenance_node => undef,
};
-my $current_node = $service_conf->{node};
-
sub test {
my ($expected_node, $try_next) = @_;
+ my $select_node_preference = $try_next ? 'try-next' : 'none';
+
my $node = PVE::HA::Manager::select_service_node(
- $groups,
- $online_node_usage,
- "vm:111",
- $service_conf,
- $current_node,
- $try_next,
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 0, # best_score
+ $groups, $online_node_usage, "vm:111", $service_conf, $sd, $select_node_preference,
);
my (undef, undef, $line) = caller();
die "unexpected result: $node != ${expected_node} at line $line\n"
if $node ne $expected_node;
- $current_node = $node;
+ $sd->{node} = $node;
}
test('node1');
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 03/15] introduce rules base plugin
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (2 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 02/15] manager: improve signature of select_service_node Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 04/15] rules: introduce node affinity rule plugin Daniel Kral
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Add a rules base plugin to allow users to specify different kinds of HA
rules in a single configuration file, which put constraints on the HA
Manager's behavior.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 2 +-
src/PVE/HA/Rules.pm | 430 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Tools.pm | 22 ++
4 files changed, 454 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Rules.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 0ffbd8d..9bbd375 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -32,6 +32,7 @@
/usr/share/perl5/PVE/HA/Resources.pm
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
+/usr/share/perl5/PVE/HA/Rules.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 8c91b97..489cbc0 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -1,4 +1,4 @@
-SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
+SIM_SOURCES=CRM.pm Env.pm Groups.pm Rules.pm Resources.pm LRM.pm Manager.pm \
NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
SOURCES=${SIM_SOURCES} Config.pm
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
new file mode 100644
index 0000000..d786669
--- /dev/null
+++ b/src/PVE/HA/Rules.pm
@@ -0,0 +1,430 @@
+package PVE::HA::Rules;
+
+use strict;
+use warnings;
+
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::Tools;
+
+use PVE::HA::Tools;
+
+use base qw(PVE::SectionConfig);
+
+=head1 NAME
+
+PVE::HA::Rules - Base Plugin for HA Rules
+
+=head1 SYNOPSIS
+
+ use base qw(PVE::HA::Rules);
+
+=head1 DESCRIPTION
+
+This package provides the capability to have different types of rules in the
+same config file, which put constraints or other rules on the HA Manager's
+behavior for how it handles HA resources handling.
+
+Since rules can interfere with each other, i.e., rules can make other rules
+invalid or infeasible, this package also provides the capability to check for
+the feasibility between rules of the same type and and between rules of
+different types, and prune the rule set in such a way, that it becomes feasible
+again, while minimizing the amount of rules that need to be pruned.
+
+This packages inherits its config-related methods from C<L<PVE::SectionConfig>>
+and therefore rule plugins need to implement methods from there as well.
+
+=head1 USAGE
+
+Each I<rule plugin> is required to implement the methods C<L<type()>>,
+C<L<properties()>>, and C<L<options>> from the C<L<PVE::SectionConfig>> to
+extend the properties of this I<base plugin> with plugin-specific properties.
+
+=head2 REGISTERING CHECKS
+
+In order to C<L<< register checks|/$class->register_check(...) >>> for a rule
+plugin, the plugin can override the
+C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
+method, which allows the plugin's checkers to pass plugin-specific data, usually
+subsets of specific rules, which are relevant to the checks.
+
+The following example shows a plugin's implementation of its
+C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
+and a trivial check, which will render all rules defining a comment erroneous,
+and blames these errors on the I<comment> property:
+
+ sub get_plugin_check_arguments {
+ my ($class, $rules) = @_;
+
+ my @ruleids = sort {
+ $rules->{order}->{$a} <=> $rules->{order}->{$b}
+ } keys %{$rules->{ids}};
+
+ my $result = {
+ custom_rules => {},
+ };
+
+ for my $ruleid (@ruleids) {
+ my $rule = $rules->{ids}->{$ruleid};
+
+ $result->{custom_rules}->{$ruleid} = $rule if defined($rule->{comment});
+ }
+
+ return $result;
+ }
+
+ __PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return [ sort keys $args->{custom_rules}->%* ];
+ },
+ sub {
+ my ($ruleids, $errors) = @_;
+
+ for my $ruleid (@$ruleids) {
+ push @{$errors->{$ruleid}->{comment}},
+ "rule is ineffective, because I said so.";
+ }
+ }
+ );
+
+=head1 METHODS
+
+=cut
+
+my $defaultData = {
+ propertyList => {
+ type => {
+ description => "HA rule type.",
+ },
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ {
+ completion => \&PVE::HA::Tools::complete_rule,
+ optional => 0,
+ },
+ ),
+ disable => {
+ description => 'Whether the HA rule is disabled.',
+ type => 'boolean',
+ optional => 1,
+ },
+ comment => {
+ description => "HA rule description.",
+ type => 'string',
+ maxLength => 4096,
+ optional => 1,
+ },
+ },
+};
+
+sub private {
+ return $defaultData;
+}
+
+=head3 $class->decode_plugin_value(...)
+
+=head3 $class->decode_plugin_value($type, $key, $value)
+
+B<OPTIONAL:> Can be implemented in a I<rule plugin>.
+
+Called during base plugin's C<decode_value(...)> in order to extend the
+deserialization for plugin-specific values which need it (e.g. lists).
+
+If it is not overrridden by the I<rule plugin>, then it does nothing to
+C<$value> by default.
+
+=cut
+
+sub decode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ return $value;
+}
+
+sub decode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::decode_text($value);
+ }
+
+ my $plugin = $class->lookup($type);
+ return $plugin->decode_plugin_value($type, $key, $value);
+}
+
+=head3 $class->encode_plugin_value(...)
+
+=head3 $class->encode_plugin_value($type, $key, $value)
+
+B<OPTIONAL:> Can be implemented in a I<rule plugin>.
+
+Called during base plugin's C<encode_value(...)> in order to extend the
+serialization for plugin-specific values which need it (e.g. lists).
+
+If it is not overrridden by the I<rule plugin>, then it does nothing to
+C<$value> by default.
+
+=cut
+
+sub encode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ return $value;
+}
+
+sub encode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::encode_text($value);
+ }
+
+ my $plugin = $class->lookup($type);
+ return $plugin->encode_plugin_value($type, $key, $value);
+}
+
+sub parse_section_header {
+ my ($class, $line) = @_;
+
+ if ($line =~ m/^(\S+):\s*(\S+)\s*$/) {
+ my ($type, $ruleid) = (lc($1), $2);
+ my $errmsg = undef; # set if you want to skip whole section
+ eval { PVE::JSONSchema::pve_verify_configid($ruleid); };
+ $errmsg = $@ if $@;
+ my $config = {}; # to return additional attributes
+ return ($type, $ruleid, $errmsg, $config);
+ }
+ return undef;
+}
+
+# General rule helpers
+
+=head3 $class->set_rule_defaults($rule)
+
+Sets the optional properties in the C<$rule>, which have default values, but
+haven't been explicitly set yet.
+
+=cut
+
+sub set_rule_defaults : prototype($$) {
+ my ($class, $rule) = @_;
+
+ if (my $plugin = $class->lookup($rule->{type})) {
+ my $properties = $plugin->properties();
+
+ for my $prop (keys %$properties) {
+ next if defined($rule->{$prop});
+ next if !$properties->{$prop}->{default};
+ next if !$properties->{$prop}->{optional};
+
+ $rule->{$prop} = $properties->{$prop}->{default};
+ }
+ }
+}
+
+# Rule checks definition and methods
+
+my $types = [];
+my $checkdef;
+
+sub register {
+ my ($class) = @_;
+
+ $class->SUPER::register($class);
+
+ # store order in which plugin types are registered
+ push @$types, $class->type();
+}
+
+=head3 $class->register_check(...)
+
+=head3 $class->register_check($check_func, $collect_errors_func)
+
+Used to register rule checks for a rule plugin.
+
+=cut
+
+sub register_check : prototype($$$) {
+ my ($class, $check_func, $collect_errors_func) = @_;
+
+ my $type = eval { $class->type() };
+ $type = 'global' if $@; # check registered here in the base plugin
+
+ push @{ $checkdef->{$type} }, [
+ $check_func, $collect_errors_func,
+ ];
+}
+
+=head3 $class->get_plugin_check_arguments(...)
+
+=head3 $class->get_plugin_check_arguments($rules)
+
+B<OPTIONAL:> Can be implemented in the I<rule plugin>.
+
+Returns a hash, usually subsets of rules relevant to the plugin, which are
+passed to the plugin's C<L<< registered checks|/$class->register_check(...) >>>
+so that the creation of these can be shared inbetween rule check
+implementations.
+
+=cut
+
+sub get_plugin_check_arguments : prototype($$) {
+ my ($class, $rules) = @_;
+
+ return {};
+}
+
+=head3 $class->get_check_arguments(...)
+
+=head3 $class->get_check_arguments($rules)
+
+Returns the union of the plugin's check argument hashes, which are passed to the
+plugin's C<L<< registered checks|/$class->register_check(...) >>> so that the
+creation of these can be shared inbetween rule check implementations.
+
+=cut
+
+sub get_check_arguments : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $global_args = {};
+
+ for my $type (@$types) {
+ my $plugin = $class->lookup($type);
+ my $plugin_args = eval { $plugin->get_plugin_check_arguments($rules) };
+ next if $@; # plugin doesn't implement get_plugin_check_arguments(...)
+
+ $global_args = { $global_args->%*, $plugin_args->%* };
+ }
+
+ return $global_args;
+}
+
+=head3 $class->check_feasibility($rules)
+
+Checks whether the given C<$rules> are feasible by running all checks, which
+were registered with C<L<< register_check()|/$class->register_check(...) >>>,
+and returns a hash map of errorneous rules.
+
+The checks are run in the order in which the rule plugins were registered,
+while global checks, i.e. checks between different rule types, are run at the
+very last.
+
+=cut
+
+sub check_feasibility : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $global_errors = {};
+ my $removable_ruleids = [];
+
+ my $global_args = $class->get_check_arguments($rules);
+
+ for my $type (@$types, 'global') {
+ for my $entry (@{ $checkdef->{$type} }) {
+ my ($check, $collect_errors) = @$entry;
+
+ my $errors = $check->($global_args);
+ $collect_errors->($errors, $global_errors);
+ }
+ }
+
+ return $global_errors;
+}
+
+=head3 $class->canonicalize($rules)
+
+Modifies C<$rules> to contain only feasible rules.
+
+This is done by running all checks, which were registered with
+C<L<< register_check()|/$class->register_check(...) >>> and removing any
+rule, which makes the rule set infeasible.
+
+Returns a list of messages with the reasons why rules were removed.
+
+=cut
+
+sub canonicalize : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $messages = [];
+ my $global_errors = $class->check_feasibility($rules);
+
+ for my $ruleid (keys %$global_errors) {
+ delete $rules->{ids}->{$ruleid};
+ delete $rules->{order}->{$ruleid};
+ }
+
+ for my $ruleid (sort keys %$global_errors) {
+ for my $opt (sort keys %{ $global_errors->{$ruleid} }) {
+ for my $message (@{ $global_errors->{$ruleid}->{$opt} }) {
+ push @$messages, "Drop rule '$ruleid', because $message.\n";
+ }
+ }
+ }
+
+ return $messages;
+}
+
+=head1 FUNCTIONS
+
+=cut
+
+=head3 foreach_rule(...)
+
+=head3 foreach_rule($rules, $func [, $opts])
+
+Filters the given C<$rules> according to the C<$opts> and loops over the
+resulting rules in the order as defined in the section config and executes
+C<$func> with the parameters C<L<< ($rule, $ruleid) >>>.
+
+The filter properties for C<$opts> are:
+
+=over
+
+=item C<$type>: Limits C<$rules> to those which are of rule type C<$type>.
+
+=item C<$exclude_disabled_rules>: Limits C<$rules> to those which are enabled.
+
+=back
+
+=cut
+
+sub foreach_rule : prototype($$;$) {
+ my ($rules, $func, $opts) = @_;
+
+ my $type = $opts->{type};
+ my $exclude_disabled_rules = $opts->{exclude_disabled_rules};
+
+ my @ruleids = sort {
+ $rules->{order}->{$a} <=> $rules->{order}->{$b}
+ } keys %{ $rules->{ids} };
+
+ for my $ruleid (@ruleids) {
+ my $rule = $rules->{ids}->{$ruleid};
+
+ next if !$rule; # skip invalid rules
+ next if defined($type) && $rule->{type} ne $type;
+ next if $exclude_disabled_rules && exists($rule->{disable});
+
+ $func->($rule, $ruleid);
+ }
+}
+
+=head3 get_next_ordinal($rules)
+
+Returns the next available ordinal number in the C<$rules> order hash that can
+be used a newly introduced rule afterwards.
+
+=cut
+
+sub get_next_ordinal : prototype($) {
+ my ($rules) = @_;
+
+ my $current_order = (sort { $a <=> $b } values %{ $rules->{order} })[0] || 0;
+
+ return $current_order + 1;
+}
+
+1;
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index a01ac38..767659f 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -112,6 +112,15 @@ PVE::JSONSchema::register_standard_option(
},
);
+PVE::JSONSchema::register_standard_option(
+ 'pve-ha-rule-id',
+ {
+ description => "HA rule identifier.",
+ type => 'string',
+ format => 'pve-configid',
+ },
+);
+
sub read_json_from_file {
my ($filename, $default) = @_;
@@ -292,4 +301,17 @@ sub complete_group {
return $res;
}
+sub complete_rule {
+ my ($cmd, $pname, $cur) = @_;
+
+ my $cfg = PVE::HA::Config::read_rules_config();
+
+ my $res = [];
+ foreach my $rule (keys %{ $cfg->{ids} }) {
+ push @$res, $rule;
+ }
+
+ return $res;
+}
+
1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 04/15] rules: introduce node affinity rule plugin
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (3 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 03/15] introduce rules base plugin Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 05/15] config, env, hw: add rules read and parse methods Daniel Kral
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Introduce the node affinity rule plugin to allow users to specify node
affinity constraints for independent HA resources.
Node affinity rules must specify one or more HA resources, one or more
nodes with optional priorities (the default is 0), and a strictness,
which is either
* 0 (non-strict): HA resources SHOULD be on one of the rules' nodes, or
* 1 (strict): HA resources MUST be on one of the rules' nodes, or
The initial implementation restricts node affinity rules to only specify
a single HA resource once across all node affinity rules, else these
node affinity rules will not be applied.
This makes node affinity rules structurally equivalent to HA groups with
the exception of the "failback" option, which will be moved to the HA
resource config in an upcoming patch.
The HA resources property is added to the rules base plugin as it will
also planned to be used by other rule plugins, e.g., the resource
affinity rule plugin.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 1 +
src/PVE/HA/Rules.pm | 29 ++++-
src/PVE/HA/Rules/Makefile | 6 +
src/PVE/HA/Rules/NodeAffinity.pm | 213 +++++++++++++++++++++++++++++++
src/PVE/HA/Tools.pm | 24 ++++
6 files changed, 272 insertions(+), 2 deletions(-)
create mode 100644 src/PVE/HA/Rules/Makefile
create mode 100644 src/PVE/HA/Rules/NodeAffinity.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 9bbd375..7462663 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,6 +33,7 @@
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
/usr/share/perl5/PVE/HA/Rules.pm
+/usr/share/perl5/PVE/HA/Rules/NodeAffinity.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 489cbc0..e386cbf 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -8,6 +8,7 @@ install:
install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
make -C Resources install
+ make -C Rules install
make -C Usage install
make -C Env install
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
index d786669..bda0b5d 100644
--- a/src/PVE/HA/Rules.pm
+++ b/src/PVE/HA/Rules.pm
@@ -109,6 +109,13 @@ my $defaultData = {
type => 'boolean',
optional => 1,
},
+ resources => get_standard_option(
+ 'pve-ha-resource-id-list',
+ {
+ completion => \&PVE::HA::Tools::complete_sid,
+ optional => 0,
+ },
+ ),
comment => {
description => "HA rule description.",
type => 'string',
@@ -145,7 +152,17 @@ sub decode_plugin_value {
sub decode_value {
my ($class, $type, $key, $value) = @_;
- if ($key eq 'comment') {
+ if ($key eq 'resources') {
+ my $res = {};
+
+ for my $sid (PVE::Tools::split_list($value)) {
+ if (PVE::HA::Tools::pve_verify_ha_resource_id($sid)) {
+ $res->{$sid} = 1;
+ }
+ }
+
+ return $res;
+ } elsif ($key eq 'comment') {
return PVE::Tools::decode_text($value);
}
@@ -176,7 +193,11 @@ sub encode_plugin_value {
sub encode_value {
my ($class, $type, $key, $value) = @_;
- if ($key eq 'comment') {
+ if ($key eq 'resources') {
+ PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
+
+ return join(',', sort keys %$value);
+ } elsif ($key eq 'comment') {
return PVE::Tools::encode_text($value);
}
@@ -383,6 +404,8 @@ The filter properties for C<$opts> are:
=over
+=item C<$sid>: Limits C<$rules> to those which contain the given resource C<$sid>.
+
=item C<$type>: Limits C<$rules> to those which are of rule type C<$type>.
=item C<$exclude_disabled_rules>: Limits C<$rules> to those which are enabled.
@@ -394,6 +417,7 @@ The filter properties for C<$opts> are:
sub foreach_rule : prototype($$;$) {
my ($rules, $func, $opts) = @_;
+ my $sid = $opts->{sid};
my $type = $opts->{type};
my $exclude_disabled_rules = $opts->{exclude_disabled_rules};
@@ -405,6 +429,7 @@ sub foreach_rule : prototype($$;$) {
my $rule = $rules->{ids}->{$ruleid};
next if !$rule; # skip invalid rules
+ next if defined($sid) && !defined($rule->{resources}->{$sid});
next if defined($type) && $rule->{type} ne $type;
next if $exclude_disabled_rules && exists($rule->{disable});
diff --git a/src/PVE/HA/Rules/Makefile b/src/PVE/HA/Rules/Makefile
new file mode 100644
index 0000000..dfef257
--- /dev/null
+++ b/src/PVE/HA/Rules/Makefile
@@ -0,0 +1,6 @@
+SOURCES=NodeAffinity.pm
+
+.PHONY: install
+install:
+ install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Rules
+ for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Rules/$$i; done
diff --git a/src/PVE/HA/Rules/NodeAffinity.pm b/src/PVE/HA/Rules/NodeAffinity.pm
new file mode 100644
index 0000000..2b3d739
--- /dev/null
+++ b/src/PVE/HA/Rules/NodeAffinity.pm
@@ -0,0 +1,213 @@
+package PVE::HA::Rules::NodeAffinity;
+
+use strict;
+use warnings;
+
+use Storable qw(dclone);
+
+use PVE::Cluster;
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::Tools;
+
+use PVE::HA::Rules;
+use PVE::HA::Tools;
+
+use base qw(PVE::HA::Rules);
+
+=head1 NAME
+
+PVE::HA::Rules::NodeAffinity
+
+=head1 DESCRIPTION
+
+This package provides the capability to specify and apply rules, which put
+affinity constraints between a set of HA resources and a set of nodes.
+
+HA Node Affinity rules can be either C<'non-strict'> or C<'strict'>:
+
+=over
+
+=item C<'non-strict'>
+
+Non-strict node affinity rules SHOULD be applied if possible.
+
+That is, HA resources SHOULD prefer to be on the defined nodes, but may fall
+back to other nodes, if none of the defined nodes are available.
+
+=item C<'strict'>
+
+Strict node affinity rules MUST be applied.
+
+That is, HA resources MUST prefer to be on the defined nodes. In other words,
+these HA resources are restricted to the defined nodes and may not run on any
+other node.
+
+=back
+
+=cut
+
+sub type {
+ return 'node-affinity';
+}
+
+sub properties {
+ return {
+ nodes => get_standard_option(
+ 'pve-ha-group-node-list',
+ {
+ completion => \&PVE::Cluster::get_nodelist,
+ optional => 0,
+ },
+ ),
+ strict => {
+ description => "Describes whether the node affinity rule is strict or non-strict.",
+ verbose_description => <<EODESC,
+Describes whether the node affinity rule is strict or non-strict.
+
+A non-strict node affinity rule makes resources prefer to be on the defined nodes.
+If none of the defined nodes are available, the resource may run on any other node.
+
+A strict node affinity rule makes resources be restricted to the defined nodes. If
+none of the defined nodes are available, the resource will be stopped.
+EODESC
+ type => 'boolean',
+ optional => 1,
+ default => 0,
+ },
+ };
+}
+
+sub options {
+ return {
+ resources => { optional => 0 },
+ nodes => { optional => 0 },
+ strict => { optional => 1 },
+ disable => { optional => 1 },
+ comment => { optional => 1 },
+ };
+}
+
+sub decode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'nodes') {
+ my $res = {};
+
+ for my $node (PVE::Tools::split_list($value)) {
+ if (my ($node, $priority) = PVE::HA::Tools::parse_node_priority($node, 1)) {
+ $res->{$node} = {
+ priority => $priority,
+ };
+ }
+ }
+
+ return $res;
+ }
+
+ return $value;
+}
+
+sub encode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'nodes') {
+ my $res = [];
+
+ for my $node (sort keys %$value) {
+ my $priority = $value->{$node}->{priority};
+
+ if ($priority) {
+ push @$res, "$node:$priority";
+ } else {
+ push @$res, "$node";
+ }
+ }
+
+ return join(',', @$res);
+ }
+
+ return $value;
+}
+
+sub get_plugin_check_arguments {
+ my ($self, $rules) = @_;
+
+ my $result = {
+ node_affinity_rules => {},
+ };
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ $result->{node_affinity_rules}->{$ruleid} = $rule;
+ },
+ {
+ type => 'node-affinity',
+ exclude_disabled_rules => 1,
+ },
+ );
+
+ return $result;
+}
+
+=head1 NODE AFFINITY RULE CHECKERS
+
+=cut
+
+=head3 check_single_resource_reference($node_affinity_rules)
+
+Returns all in C<$node_affinity_rules> as a list of lists, each consisting of
+the node affinity id and the resource id, where at least one resource is shared
+between them.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_single_resource_reference {
+ my ($node_affinity_rules) = @_;
+
+ my @conflicts = ();
+ my $resource_ruleids = {};
+
+ while (my ($ruleid, $rule) = each %$node_affinity_rules) {
+ for my $sid (keys %{ $rule->{resources} }) {
+ push @{ $resource_ruleids->{$sid} }, $ruleid;
+ }
+ }
+
+ for my $sid (keys %$resource_ruleids) {
+ my $ruleids = $resource_ruleids->{$sid};
+
+ next if @$ruleids < 2;
+
+ for my $ruleid (@$ruleids) {
+ push @conflicts, [$ruleid, $sid];
+ }
+ }
+
+ @conflicts = sort { $a->[0] cmp $b->[0] } @conflicts;
+ return \@conflicts;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_single_resource_reference($args->{node_affinity_rules});
+ },
+ sub {
+ my ($conflicts, $errors) = @_;
+
+ for my $conflict (@$conflicts) {
+ my ($ruleid, $sid) = @$conflict;
+
+ push @{ $errors->{$ruleid}->{resources} },
+ "resource '$sid' is already used in another node affinity rule";
+ }
+ },
+);
+
+1;
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index 767659f..549cbe1 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -51,6 +51,18 @@ PVE::JSONSchema::register_standard_option(
},
);
+PVE::JSONSchema::register_standard_option(
+ 'pve-ha-resource-id-list',
+ {
+ description =>
+ "List of HA resource IDs. This consists of a list of resource types followed"
+ . " by a resource specific name separated with a colon (example: vm:100,ct:101).",
+ typetext => "<type>:<name>{,<type>:<name>}*",
+ type => 'string',
+ format => 'pve-ha-resource-id-list',
+ },
+);
+
PVE::JSONSchema::register_format('pve-ha-resource-or-vm-id', \&pve_verify_ha_resource_or_vm_id);
sub pve_verify_ha_resource_or_vm_id {
@@ -103,6 +115,18 @@ PVE::JSONSchema::register_standard_option(
},
);
+sub parse_node_priority {
+ my ($value, $noerr) = @_;
+
+ if ($value =~ m/^([a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?)(:(\d+))?$/) {
+ # node without priority set defaults to priority 0
+ return ($1, int($4 // 0));
+ }
+
+ return undef if $noerr;
+ die "unable to parse HA node entry '$value'\n";
+}
+
PVE::JSONSchema::register_standard_option(
'pve-ha-group-id',
{
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 05/15] config, env, hw: add rules read and parse methods
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (4 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 04/15] rules: introduce node affinity rule plugin Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 06/15] config: delete services from rules if services are deleted from config Daniel Kral
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Adds methods to the HA environment to read and write the rules
configuration file for the different environment implementations.
The HA Rules are initialized with property isolation since it is
expected that other rule types will use similar property names with
different semantic meanings and/or possible values.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Config.pm | 30 ++++++++++++++++++++++++++++++
src/PVE/HA/Env.pm | 6 ++++++
src/PVE/HA/Env/PVE2.pm | 12 ++++++++++++
src/PVE/HA/Sim/Env.pm | 14 ++++++++++++++
src/PVE/HA/Sim/Hardware.pm | 21 +++++++++++++++++++++
5 files changed, 83 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index ec9360e..012ae16 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -7,12 +7,14 @@ use JSON;
use PVE::HA::Tools;
use PVE::HA::Groups;
+use PVE::HA::Rules;
use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
use PVE::HA::Resources;
my $manager_status_filename = "ha/manager_status";
my $ha_groups_config = "ha/groups.cfg";
my $ha_resources_config = "ha/resources.cfg";
+my $ha_rules_config = "ha/rules.cfg";
my $crm_commands_filename = "ha/crm_commands";
my $ha_fence_config = "ha/fence.cfg";
@@ -31,6 +33,11 @@ cfs_register_file(
sub { PVE::HA::Resources->parse_config(@_); },
sub { PVE::HA::Resources->write_config(@_); },
);
+cfs_register_file(
+ $ha_rules_config,
+ sub { PVE::HA::Rules->parse_config(@_); },
+ sub { PVE::HA::Rules->write_config(@_); },
+);
cfs_register_file($manager_status_filename, \&json_reader, \&json_writer);
cfs_register_file(
$ha_fence_config,
@@ -197,6 +204,29 @@ sub parse_sid {
return wantarray ? ($sid, $type, $name) : $sid;
}
+sub read_rules_config {
+
+ return cfs_read_file($ha_rules_config);
+}
+
+sub read_and_check_rules_config {
+
+ my $rules = cfs_read_file($ha_rules_config);
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ return $rules;
+}
+
+sub write_rules_config {
+ my ($cfg) = @_;
+
+ cfs_write_file($ha_rules_config, $cfg);
+}
+
sub read_group_config {
return cfs_read_file($ha_groups_config);
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 285e440..5cee7b3 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -131,6 +131,12 @@ sub steal_service {
return $self->{plug}->steal_service($sid, $current_node, $new_node);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return $self->{plug}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index b709f30..58fd36e 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -22,12 +22,18 @@ use PVE::HA::FenceConfig;
use PVE::HA::Resources;
use PVE::HA::Resources::PVEVM;
use PVE::HA::Resources::PVECT;
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
PVE::HA::Resources::PVEVM->register();
PVE::HA::Resources::PVECT->register();
PVE::HA::Resources->init();
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
my $lockdir = "/etc/pve/priv/lock";
sub new {
@@ -189,6 +195,12 @@ sub steal_service {
$self->cluster_state_update();
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return PVE::HA::Config::read_and_check_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index d892a00..bb76b7f 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -10,6 +10,8 @@ use Fcntl qw(:DEFAULT :flock);
use PVE::HA::Tools;
use PVE::HA::Env;
use PVE::HA::Resources;
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
use PVE::HA::Sim::Resources::VirtVM;
use PVE::HA::Sim::Resources::VirtCT;
use PVE::HA::Sim::Resources::VirtFail;
@@ -20,6 +22,10 @@ PVE::HA::Sim::Resources::VirtFail->register();
PVE::HA::Resources->init();
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
sub new {
my ($this, $nodename, $hardware, $log_id) = @_;
@@ -245,6 +251,14 @@ sub exec_fence_agent {
return $self->{hardware}->exec_fence_agent($agent, $node, @param);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ return $self->{hardware}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 576527d..89dbdfa 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -28,6 +28,7 @@ my $watchdog_timeout = 60;
# $testdir/cmdlist Command list for simulation
# $testdir/hardware_status Hardware description (number of nodes, ...)
# $testdir/manager_status CRM status (start with {})
+# $testdir/rules_config Contraints / Rules configuration
# $testdir/service_config Service configuration
# $testdir/static_service_stats Static service usage information (cpu, memory)
# $testdir/groups HA groups configuration
@@ -319,6 +320,22 @@ sub read_crm_commands {
return $self->global_lock($code);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/rules_config";
+ my $raw = '';
+ $raw = PVE::Tools::file_get_contents($filename) if -f $filename;
+ my $rules = PVE::HA::Rules->parse_config($filename, $raw);
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ return $rules;
+}
+
sub read_group_config {
my ($self) = @_;
@@ -391,6 +408,10 @@ sub new {
# copy initial configuartion
copy("$testdir/manager_status", "$statusdir/manager_status"); # optional
+ if (-f "$testdir/rules_config") {
+ copy("$testdir/rules_config", "$statusdir/rules_config");
+ }
+
if (-f "$testdir/groups") {
copy("$testdir/groups", "$statusdir/groups");
} else {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 06/15] config: delete services from rules if services are deleted from config
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (5 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 05/15] config, env, hw: add rules read and parse methods Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 07/15] manager: read and update rules config Daniel Kral
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Remove HA resources from rules, where these HA resources are used, if
they are removed by delete_service_from_config(...), which is called by
the HA resources' delete API endpoint and possibly external callers,
e.g. if the HA resource is removed externally.
If all of the rules' HA resources have been removed, the rule itself
must be removed as it would result in an erroneous rules config, which
would become user-visible at the next read and parse of the rules
config.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Config.pm | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 012ae16..2e520aa 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -360,6 +360,25 @@ sub delete_service_from_config {
"delete resource failed",
);
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = read_rules_config();
+
+ return if !defined($rules->{ids});
+
+ for my $ruleid (keys %{ $rules->{ids} }) {
+ my $rule_resources = $rules->{ids}->{$ruleid}->{resources} // {};
+
+ delete $rule_resources->{$sid};
+
+ delete $rules->{ids}->{$ruleid} if !%$rule_resources;
+ }
+
+ write_rules_config($rules);
+ },
+ "delete resource from rules failed",
+ );
+
return !!$res;
}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 07/15] manager: read and update rules config
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (6 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 06/15] config: delete services from rules if services are deleted from config Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 08/15] test: ha tester: add test cases for future node affinity rules Daniel Kral
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Read the rules configuration in each round and update the canonicalized
rules configuration if there were any changes since the last round to
reduce the amount of times of verifying the rule set.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c57a280..88ff4a6 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -8,6 +8,8 @@ use Digest::MD5 qw(md5_base64);
use PVE::Tools;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -41,7 +43,11 @@ sub new {
my $class = ref($this) || $this;
- my $self = bless { haenv => $haenv, crs => {} }, $class;
+ my $self = bless {
+ haenv => $haenv,
+ crs => {},
+ last_rules_digest => '',
+ }, $class;
my $old_ms = $haenv->read_manager_status();
@@ -556,6 +562,18 @@ sub manage {
delete $ss->{$sid};
}
+ my $new_rules = $haenv->read_rules_config();
+
+ if ($new_rules->{digest} ne $self->{last_rules_digest}) {
+
+ my $messages = PVE::HA::Rules->canonicalize($new_rules);
+ $haenv->log('info', $_) for @$messages;
+
+ $self->{rules} = $new_rules;
+
+ $self->{last_rules_digest} = $self->{rules}->{digest};
+ }
+
$self->update_crm_commands();
for (;;) {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 08/15] test: ha tester: add test cases for future node affinity rules
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (7 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 07/15] manager: read and update rules config Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 09/15] resources: introduce failback property in ha resource config Daniel Kral
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Add test cases to verify that the node affinity rules, which will be
added in a following patch, are functionally equivalent to the
existing HA groups.
These test cases verify the following scenarios for (a) unrestricted and
(b) restricted groups (i.e. non-strict and strict node affinity rules):
1. If a service is manually migrated to a non-member node and failback
is enabled, then (a)(b) migrate the service back to a member node.
2. If a service is manually migrated to a non-member node and failback
is disabled, then (a) migrate the service back to a member node, or
(b) do nothing for unrestricted groups.
3. If a service's node fails, where the failed node is the only
available group member left, (a) stay in recovery, or (b) migrate the
service to a non-member node.
4. If a service's node fails, but there is another available group
member left, (a)(b) migrate the service to the other member node.
5. If a service's group has failback enabled and the service's node,
which is the node with the highest priority in the group, fails and
comes back later, (a)(b) migrate it to the second-highest prioritized
node and automatically migrate it back to the highest priority node
as soon as it is available again.
6. If a service's group has failback disabled and the service's node,
which is the node with the highest priority in the group, fails and
comes back later, (a)(b) migrate it to the second-highest prioritized
node, but do not migrate it back to the highest priority node if it
becomes available again.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/test/test-node-affinity-nonstrict1/README | 10 +++
.../test-node-affinity-nonstrict1/cmdlist | 4 +
src/test/test-node-affinity-nonstrict1/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict1/log.expect | 40 ++++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict2/README | 12 +++
.../test-node-affinity-nonstrict2/cmdlist | 4 +
src/test/test-node-affinity-nonstrict2/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict2/log.expect | 35 +++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict3/README | 10 +++
.../test-node-affinity-nonstrict3/cmdlist | 4 +
src/test/test-node-affinity-nonstrict3/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict3/log.expect | 56 ++++++++++++++
.../manager_status | 1 +
.../service_config | 5 ++
src/test/test-node-affinity-nonstrict4/README | 14 ++++
.../test-node-affinity-nonstrict4/cmdlist | 4 +
src/test/test-node-affinity-nonstrict4/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict4/log.expect | 54 ++++++++++++++
.../manager_status | 1 +
.../service_config | 5 ++
src/test/test-node-affinity-nonstrict5/README | 16 ++++
.../test-node-affinity-nonstrict5/cmdlist | 5 ++
src/test/test-node-affinity-nonstrict5/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict5/log.expect | 66 +++++++++++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict6/README | 14 ++++
.../test-node-affinity-nonstrict6/cmdlist | 5 ++
src/test/test-node-affinity-nonstrict6/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict6/log.expect | 52 +++++++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-strict1/README | 10 +++
src/test/test-node-affinity-strict1/cmdlist | 4 +
src/test/test-node-affinity-strict1/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict1/log.expect | 40 ++++++++++
.../test-node-affinity-strict1/manager_status | 1 +
.../test-node-affinity-strict1/service_config | 3 +
src/test/test-node-affinity-strict2/README | 11 +++
src/test/test-node-affinity-strict2/cmdlist | 4 +
src/test/test-node-affinity-strict2/groups | 4 +
.../hardware_status | 5 ++
.../test-node-affinity-strict2/log.expect | 40 ++++++++++
.../test-node-affinity-strict2/manager_status | 1 +
.../test-node-affinity-strict2/service_config | 3 +
src/test/test-node-affinity-strict3/README | 10 +++
src/test/test-node-affinity-strict3/cmdlist | 4 +
src/test/test-node-affinity-strict3/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict3/log.expect | 74 +++++++++++++++++++
.../test-node-affinity-strict3/manager_status | 1 +
.../test-node-affinity-strict3/service_config | 5 ++
src/test/test-node-affinity-strict4/README | 14 ++++
src/test/test-node-affinity-strict4/cmdlist | 4 +
src/test/test-node-affinity-strict4/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict4/log.expect | 54 ++++++++++++++
.../test-node-affinity-strict4/manager_status | 1 +
.../test-node-affinity-strict4/service_config | 5 ++
src/test/test-node-affinity-strict5/README | 16 ++++
src/test/test-node-affinity-strict5/cmdlist | 5 ++
src/test/test-node-affinity-strict5/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict5/log.expect | 66 +++++++++++++++++
.../test-node-affinity-strict5/manager_status | 1 +
.../test-node-affinity-strict5/service_config | 3 +
src/test/test-node-affinity-strict6/README | 14 ++++
src/test/test-node-affinity-strict6/cmdlist | 5 ++
src/test/test-node-affinity-strict6/groups | 4 +
.../hardware_status | 5 ++
.../test-node-affinity-strict6/log.expect | 52 +++++++++++++
.../test-node-affinity-strict6/manager_status | 1 +
.../test-node-affinity-strict6/service_config | 3 +
84 files changed, 982 insertions(+)
create mode 100644 src/test/test-node-affinity-nonstrict1/README
create mode 100644 src/test/test-node-affinity-nonstrict1/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict1/groups
create mode 100644 src/test/test-node-affinity-nonstrict1/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict1/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict1/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict1/service_config
create mode 100644 src/test/test-node-affinity-nonstrict2/README
create mode 100644 src/test/test-node-affinity-nonstrict2/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict2/groups
create mode 100644 src/test/test-node-affinity-nonstrict2/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict2/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict2/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict2/service_config
create mode 100644 src/test/test-node-affinity-nonstrict3/README
create mode 100644 src/test/test-node-affinity-nonstrict3/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict3/groups
create mode 100644 src/test/test-node-affinity-nonstrict3/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict3/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict3/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict3/service_config
create mode 100644 src/test/test-node-affinity-nonstrict4/README
create mode 100644 src/test/test-node-affinity-nonstrict4/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict4/groups
create mode 100644 src/test/test-node-affinity-nonstrict4/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict4/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict4/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict4/service_config
create mode 100644 src/test/test-node-affinity-nonstrict5/README
create mode 100644 src/test/test-node-affinity-nonstrict5/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict5/groups
create mode 100644 src/test/test-node-affinity-nonstrict5/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict5/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict5/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict5/service_config
create mode 100644 src/test/test-node-affinity-nonstrict6/README
create mode 100644 src/test/test-node-affinity-nonstrict6/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict6/groups
create mode 100644 src/test/test-node-affinity-nonstrict6/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict6/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict6/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict6/service_config
create mode 100644 src/test/test-node-affinity-strict1/README
create mode 100644 src/test/test-node-affinity-strict1/cmdlist
create mode 100644 src/test/test-node-affinity-strict1/groups
create mode 100644 src/test/test-node-affinity-strict1/hardware_status
create mode 100644 src/test/test-node-affinity-strict1/log.expect
create mode 100644 src/test/test-node-affinity-strict1/manager_status
create mode 100644 src/test/test-node-affinity-strict1/service_config
create mode 100644 src/test/test-node-affinity-strict2/README
create mode 100644 src/test/test-node-affinity-strict2/cmdlist
create mode 100644 src/test/test-node-affinity-strict2/groups
create mode 100644 src/test/test-node-affinity-strict2/hardware_status
create mode 100644 src/test/test-node-affinity-strict2/log.expect
create mode 100644 src/test/test-node-affinity-strict2/manager_status
create mode 100644 src/test/test-node-affinity-strict2/service_config
create mode 100644 src/test/test-node-affinity-strict3/README
create mode 100644 src/test/test-node-affinity-strict3/cmdlist
create mode 100644 src/test/test-node-affinity-strict3/groups
create mode 100644 src/test/test-node-affinity-strict3/hardware_status
create mode 100644 src/test/test-node-affinity-strict3/log.expect
create mode 100644 src/test/test-node-affinity-strict3/manager_status
create mode 100644 src/test/test-node-affinity-strict3/service_config
create mode 100644 src/test/test-node-affinity-strict4/README
create mode 100644 src/test/test-node-affinity-strict4/cmdlist
create mode 100644 src/test/test-node-affinity-strict4/groups
create mode 100644 src/test/test-node-affinity-strict4/hardware_status
create mode 100644 src/test/test-node-affinity-strict4/log.expect
create mode 100644 src/test/test-node-affinity-strict4/manager_status
create mode 100644 src/test/test-node-affinity-strict4/service_config
create mode 100644 src/test/test-node-affinity-strict5/README
create mode 100644 src/test/test-node-affinity-strict5/cmdlist
create mode 100644 src/test/test-node-affinity-strict5/groups
create mode 100644 src/test/test-node-affinity-strict5/hardware_status
create mode 100644 src/test/test-node-affinity-strict5/log.expect
create mode 100644 src/test/test-node-affinity-strict5/manager_status
create mode 100644 src/test/test-node-affinity-strict5/service_config
create mode 100644 src/test/test-node-affinity-strict6/README
create mode 100644 src/test/test-node-affinity-strict6/cmdlist
create mode 100644 src/test/test-node-affinity-strict6/groups
create mode 100644 src/test/test-node-affinity-strict6/hardware_status
create mode 100644 src/test/test-node-affinity-strict6/log.expect
create mode 100644 src/test/test-node-affinity-strict6/manager_status
create mode 100644 src/test/test-node-affinity-strict6/service_config
diff --git a/src/test/test-node-affinity-nonstrict1/README b/src/test/test-node-affinity-nonstrict1/README
new file mode 100644
index 0000000..8775b6c
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group will automatically migrate back
+to a node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is a group member and has higher priority than the other nodes
diff --git a/src/test/test-node-affinity-nonstrict1/cmdlist b/src/test/test-node-affinity-nonstrict1/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict1/groups b/src/test/test-node-affinity-nonstrict1/groups
new file mode 100644
index 0000000..50c9a2d
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node3
diff --git a/src/test/test-node-affinity-nonstrict1/hardware_status b/src/test/test-node-affinity-nonstrict1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict1/log.expect b/src/test/test-node-affinity-nonstrict1/log.expect
new file mode 100644
index 0000000..e0f4d46
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict1/manager_status b/src/test/test-node-affinity-nonstrict1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict1/service_config b/src/test/test-node-affinity-nonstrict1/service_config
new file mode 100644
index 0000000..5f55843
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-nonstrict2/README b/src/test/test-node-affinity-nonstrict2/README
new file mode 100644
index 0000000..f27414b
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/README
@@ -0,0 +1,12 @@
+Test whether a service in a unrestricted group with nofailback enabled will
+stay on the manual migration target node, even though the target node is not a
+member of the unrestricted group.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, vm:101 stays on node2; even though
+ node2 is not a group member, the nofailback flag prevents vm:101 to be
+ migrated back to a group member
diff --git a/src/test/test-node-affinity-nonstrict2/cmdlist b/src/test/test-node-affinity-nonstrict2/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict2/groups b/src/test/test-node-affinity-nonstrict2/groups
new file mode 100644
index 0000000..59192fa
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/groups
@@ -0,0 +1,3 @@
+group: should_stay_here
+ nodes node3
+ nofailback 1
diff --git a/src/test/test-node-affinity-nonstrict2/hardware_status b/src/test/test-node-affinity-nonstrict2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict2/log.expect b/src/test/test-node-affinity-nonstrict2/log.expect
new file mode 100644
index 0000000..35e2470
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/log.expect
@@ -0,0 +1,35 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: starting service vm:101
+info 143 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict2/manager_status b/src/test/test-node-affinity-nonstrict2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict2/service_config b/src/test/test-node-affinity-nonstrict2/service_config
new file mode 100644
index 0000000..5f55843
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-nonstrict3/README b/src/test/test-node-affinity-nonstrict3/README
new file mode 100644
index 0000000..c4ddfab
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group with only one node member will
+be migrated to a non-member node in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node1
diff --git a/src/test/test-node-affinity-nonstrict3/cmdlist b/src/test/test-node-affinity-nonstrict3/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict3/groups b/src/test/test-node-affinity-nonstrict3/groups
new file mode 100644
index 0000000..50c9a2d
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node3
diff --git a/src/test/test-node-affinity-nonstrict3/hardware_status b/src/test/test-node-affinity-nonstrict3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict3/log.expect b/src/test/test-node-affinity-nonstrict3/log.expect
new file mode 100644
index 0000000..752300b
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/log.expect
@@ -0,0 +1,56 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: got lock 'ha_agent_node1_lock'
+info 241 node1/lrm: status change wait_for_agent_lock => active
+info 241 node1/lrm: starting service vm:101
+info 241 node1/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict3/manager_status b/src/test/test-node-affinity-nonstrict3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-nonstrict3/service_config b/src/test/test-node-affinity-nonstrict3/service_config
new file mode 100644
index 0000000..777b2a7
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-nonstrict4/README b/src/test/test-node-affinity-nonstrict4/README
new file mode 100644
index 0000000..a08f0e1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/README
@@ -0,0 +1,14 @@
+Test whether a service in a unrestricted group with two node members will stay
+assigned to one of the node members in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher service count than node1 to test whether the restriction
+ to node2 and node3 is applied even though the scheduler would prefer the less
+ utilized node1
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node2, as it's the only available node
+ left in the unrestricted group
diff --git a/src/test/test-node-affinity-nonstrict4/cmdlist b/src/test/test-node-affinity-nonstrict4/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict4/groups b/src/test/test-node-affinity-nonstrict4/groups
new file mode 100644
index 0000000..b1584b5
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node2,node3
diff --git a/src/test/test-node-affinity-nonstrict4/hardware_status b/src/test/test-node-affinity-nonstrict4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict4/log.expect b/src/test/test-node-affinity-nonstrict4/log.expect
new file mode 100644
index 0000000..847e157
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/log.expect
@@ -0,0 +1,54 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict4/manager_status b/src/test/test-node-affinity-nonstrict4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict4/service_config b/src/test/test-node-affinity-nonstrict4/service_config
new file mode 100644
index 0000000..777b2a7
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-nonstrict5/README b/src/test/test-node-affinity-nonstrict5/README
new file mode 100644
index 0000000..0c37044
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/README
@@ -0,0 +1,16 @@
+Test whether a service in a unrestricted group with two differently prioritized
+node members will stay on the node with the highest priority in case of a
+failover or when the service is on a lower-priority node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
+ a higher priority than node3
+- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
+ available node member left in the unrestricted group
+- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
+ higher priority than node3
diff --git a/src/test/test-node-affinity-nonstrict5/cmdlist b/src/test/test-node-affinity-nonstrict5/cmdlist
new file mode 100644
index 0000000..6932aa7
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off" ],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict5/groups b/src/test/test-node-affinity-nonstrict5/groups
new file mode 100644
index 0000000..03a0ee9
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node2:2,node3:1
diff --git a/src/test/test-node-affinity-nonstrict5/hardware_status b/src/test/test-node-affinity-nonstrict5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict5/log.expect b/src/test/test-node-affinity-nonstrict5/log.expect
new file mode 100644
index 0000000..a875e11
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 25 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 260 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 260 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 265 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 265 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 280 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 363 node2/lrm: got lock 'ha_agent_node2_lock'
+info 363 node2/lrm: status change wait_for_agent_lock => active
+info 363 node2/lrm: starting service vm:101
+info 363 node2/lrm: service status vm:101 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict5/manager_status b/src/test/test-node-affinity-nonstrict5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict5/service_config b/src/test/test-node-affinity-nonstrict5/service_config
new file mode 100644
index 0000000..5f55843
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-nonstrict6/README b/src/test/test-node-affinity-nonstrict6/README
new file mode 100644
index 0000000..4ab1275
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/README
@@ -0,0 +1,14 @@
+Test whether a service in a unrestricted group with nofailback enabled and two
+differently prioritized node members will stay on the current node without
+migrating back to the highest priority node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node2
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As node2 fails, vm:101 is migrated to node3 as it is the only available node
+ member left in the unrestricted group
+- As node2 comes back online, vm:101 stays on node3; even though node2 has a
+ higher priority, the nofailback flag prevents vm:101 to migrate back to node2
diff --git a/src/test/test-node-affinity-nonstrict6/cmdlist b/src/test/test-node-affinity-nonstrict6/cmdlist
new file mode 100644
index 0000000..4dd33cc
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off"],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict6/groups b/src/test/test-node-affinity-nonstrict6/groups
new file mode 100644
index 0000000..a7aed17
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/groups
@@ -0,0 +1,3 @@
+group: should_stay_here
+ nodes node2:2,node3:1
+ nofailback 1
diff --git a/src/test/test-node-affinity-nonstrict6/hardware_status b/src/test/test-node-affinity-nonstrict6/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict6/log.expect b/src/test/test-node-affinity-nonstrict6/log.expect
new file mode 100644
index 0000000..bcb472b
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: got lock 'ha_agent_node3_lock'
+info 245 node3/lrm: status change wait_for_agent_lock => active
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict6/manager_status b/src/test/test-node-affinity-nonstrict6/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict6/service_config b/src/test/test-node-affinity-nonstrict6/service_config
new file mode 100644
index 0000000..c4ece62
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node2", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict1/README b/src/test/test-node-affinity-strict1/README
new file mode 100644
index 0000000..c717d58
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/README
@@ -0,0 +1,10 @@
+Test whether a service in a restricted group will automatically migrate back to
+a restricted node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is the only available node member left in the restricted group
diff --git a/src/test/test-node-affinity-strict1/cmdlist b/src/test/test-node-affinity-strict1/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-strict1/groups b/src/test/test-node-affinity-strict1/groups
new file mode 100644
index 0000000..370865f
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
diff --git a/src/test/test-node-affinity-strict1/hardware_status b/src/test/test-node-affinity-strict1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict1/log.expect b/src/test/test-node-affinity-strict1/log.expect
new file mode 100644
index 0000000..e0f4d46
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict1/manager_status b/src/test/test-node-affinity-strict1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict1/service_config b/src/test/test-node-affinity-strict1/service_config
new file mode 100644
index 0000000..36ea15b
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict2/README b/src/test/test-node-affinity-strict2/README
new file mode 100644
index 0000000..f4d06a1
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/README
@@ -0,0 +1,11 @@
+Test whether a service in a restricted group with nofailback enabled will
+automatically migrate back to a restricted node member in case of a manual
+migration to a non-member node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is the only available node member left in the restricted group
diff --git a/src/test/test-node-affinity-strict2/cmdlist b/src/test/test-node-affinity-strict2/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-strict2/groups b/src/test/test-node-affinity-strict2/groups
new file mode 100644
index 0000000..e43eafc
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/groups
@@ -0,0 +1,4 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
+ nofailback 1
diff --git a/src/test/test-node-affinity-strict2/hardware_status b/src/test/test-node-affinity-strict2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict2/log.expect b/src/test/test-node-affinity-strict2/log.expect
new file mode 100644
index 0000000..e0f4d46
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict2/manager_status b/src/test/test-node-affinity-strict2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict2/service_config b/src/test/test-node-affinity-strict2/service_config
new file mode 100644
index 0000000..36ea15b
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict3/README b/src/test/test-node-affinity-strict3/README
new file mode 100644
index 0000000..5aced39
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/README
@@ -0,0 +1,10 @@
+Test whether a service in a restricted group with only one node member will
+stay in recovery in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As node3 fails, vm:101 stays in recovery since there's no available node
+ member left in the restricted group
diff --git a/src/test/test-node-affinity-strict3/cmdlist b/src/test/test-node-affinity-strict3/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-strict3/groups b/src/test/test-node-affinity-strict3/groups
new file mode 100644
index 0000000..370865f
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
diff --git a/src/test/test-node-affinity-strict3/hardware_status b/src/test/test-node-affinity-strict3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict3/log.expect b/src/test/test-node-affinity-strict3/log.expect
new file mode 100644
index 0000000..47f9776
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/log.expect
@@ -0,0 +1,74 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+err 240 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 260 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 280 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 300 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 320 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 340 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 360 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 380 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 400 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 420 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 440 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 460 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 480 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 500 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 520 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 540 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 560 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 580 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 600 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 620 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 640 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 660 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 680 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 700 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict3/manager_status b/src/test/test-node-affinity-strict3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-strict3/service_config b/src/test/test-node-affinity-strict3/service_config
new file mode 100644
index 0000000..9adf02c
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-strict4/README b/src/test/test-node-affinity-strict4/README
new file mode 100644
index 0000000..25ded53
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/README
@@ -0,0 +1,14 @@
+Test whether a service in a restricted group with two node members will stay
+assigned to one of the node members in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher service count than node1 to test whether the restriction
+ to node2 and node3 is applied even though the scheduler would prefer the less
+ utilized node1
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node2, as it's the only available node
+ left in the restricted group
diff --git a/src/test/test-node-affinity-strict4/cmdlist b/src/test/test-node-affinity-strict4/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-strict4/groups b/src/test/test-node-affinity-strict4/groups
new file mode 100644
index 0000000..0ad2abc
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node2,node3
+ restricted 1
diff --git a/src/test/test-node-affinity-strict4/hardware_status b/src/test/test-node-affinity-strict4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict4/log.expect b/src/test/test-node-affinity-strict4/log.expect
new file mode 100644
index 0000000..847e157
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/log.expect
@@ -0,0 +1,54 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict4/manager_status b/src/test/test-node-affinity-strict4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict4/service_config b/src/test/test-node-affinity-strict4/service_config
new file mode 100644
index 0000000..9adf02c
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-strict5/README b/src/test/test-node-affinity-strict5/README
new file mode 100644
index 0000000..a4e67f4
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/README
@@ -0,0 +1,16 @@
+Test whether a service in a restricted group with two differently prioritized
+node members will stay on the node with the highest priority in case of a
+failover or when the service is on a lower-priority node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
+ a higher priority than node3
+- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
+ available node member left in the restricted group
+- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
+ higher priority than node3
diff --git a/src/test/test-node-affinity-strict5/cmdlist b/src/test/test-node-affinity-strict5/cmdlist
new file mode 100644
index 0000000..6932aa7
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off" ],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-strict5/groups b/src/test/test-node-affinity-strict5/groups
new file mode 100644
index 0000000..ec3cd79
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node2:2,node3:1
+ restricted 1
diff --git a/src/test/test-node-affinity-strict5/hardware_status b/src/test/test-node-affinity-strict5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict5/log.expect b/src/test/test-node-affinity-strict5/log.expect
new file mode 100644
index 0000000..a875e11
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 25 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 260 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 260 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 265 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 265 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 280 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 363 node2/lrm: got lock 'ha_agent_node2_lock'
+info 363 node2/lrm: status change wait_for_agent_lock => active
+info 363 node2/lrm: starting service vm:101
+info 363 node2/lrm: service status vm:101 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict5/manager_status b/src/test/test-node-affinity-strict5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict5/service_config b/src/test/test-node-affinity-strict5/service_config
new file mode 100644
index 0000000..36ea15b
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict6/README b/src/test/test-node-affinity-strict6/README
new file mode 100644
index 0000000..c558afd
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/README
@@ -0,0 +1,14 @@
+Test whether a service in a restricted group with nofailback enabled and two
+differently prioritized node members will stay on the current node without
+migrating back to the highest priority node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node2
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As node2 fails, vm:101 is migrated to node3 as it is the only available node
+ member left in the restricted group
+- As node2 comes back online, vm:101 stays on node3; even though node2 has a
+ higher priority, the nofailback flag prevents vm:101 to migrate back to node2
diff --git a/src/test/test-node-affinity-strict6/cmdlist b/src/test/test-node-affinity-strict6/cmdlist
new file mode 100644
index 0000000..4dd33cc
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off"],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-strict6/groups b/src/test/test-node-affinity-strict6/groups
new file mode 100644
index 0000000..cdd0e50
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/groups
@@ -0,0 +1,4 @@
+group: must_stay_here
+ nodes node2:2,node3:1
+ restricted 1
+ nofailback 1
diff --git a/src/test/test-node-affinity-strict6/hardware_status b/src/test/test-node-affinity-strict6/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict6/log.expect b/src/test/test-node-affinity-strict6/log.expect
new file mode 100644
index 0000000..bcb472b
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: got lock 'ha_agent_node3_lock'
+info 245 node3/lrm: status change wait_for_agent_lock => active
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict6/manager_status b/src/test/test-node-affinity-strict6/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict6/service_config b/src/test/test-node-affinity-strict6/service_config
new file mode 100644
index 0000000..1d371e1
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node2", "state": "started", "group": "must_stay_here" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 09/15] resources: introduce failback property in ha resource config
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (8 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 08/15] test: ha tester: add test cases for future node affinity rules Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 10/15] manager: migrate ha groups to node affinity rules in-memory Daniel Kral
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Add the failback property in the HA resources config, which is
functionally equivalent to the negation of the HA group's nofailback
property. It will be used to migrate HA groups to HA node affinity
rules.
The 'failback' flag is set to be enabled by default as the HA group's
nofailback property was disabled by default.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/API2/HA/Resources.pm | 9 +++++++++
src/PVE/API2/HA/Status.pm | 11 ++++++++++-
src/PVE/HA/Config.pm | 1 +
src/PVE/HA/Resources.pm | 9 +++++++++
src/PVE/HA/Resources/PVECT.pm | 1 +
src/PVE/HA/Resources/PVEVM.pm | 1 +
src/PVE/HA/Sim/Hardware.pm | 1 +
src/test/test_failover1.pl | 1 +
8 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/src/PVE/API2/HA/Resources.pm b/src/PVE/API2/HA/Resources.pm
index 5916204..e06d202 100644
--- a/src/PVE/API2/HA/Resources.pm
+++ b/src/PVE/API2/HA/Resources.pm
@@ -127,6 +127,15 @@ __PACKAGE__->register_method({
optional => 1,
description => "Requested resource state.",
},
+ failback => {
+ description => "HA resource is automatically migrated to the"
+ . " node with the highest priority according to their node"
+ . " affinity rule, if a node with a higher priority than"
+ . " the current node comes online.",
+ type => 'boolean',
+ optional => 1,
+ default => 1,
+ },
group => get_standard_option('pve-ha-group-id', { optional => 1 }),
max_restart => {
description => "Maximal number of tries to restart the service on"
diff --git a/src/PVE/API2/HA/Status.pm b/src/PVE/API2/HA/Status.pm
index 1547e0e..4038766 100644
--- a/src/PVE/API2/HA/Status.pm
+++ b/src/PVE/API2/HA/Status.pm
@@ -109,6 +109,15 @@ __PACKAGE__->register_method({
type => "string",
optional => 1,
},
+ failback => {
+ description => "HA resource is automatically migrated to"
+ . " the node with the highest priority according to their"
+ . " node affinity rule, if a node with a higher priority"
+ . " than the current node comes online.",
+ type => "boolean",
+ optional => 1,
+ default => 1,
+ },
max_relocate => {
description => "For type 'service'.",
type => "integer",
@@ -260,7 +269,7 @@ __PACKAGE__->register_method({
# also return common resource attributes
if (defined($sc)) {
$data->{request_state} = $sc->{state};
- foreach my $key (qw(group max_restart max_relocate comment)) {
+ foreach my $key (qw(group max_restart max_relocate failback comment)) {
$data->{$key} = $sc->{$key} if defined($sc->{$key});
}
}
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 2e520aa..7d071f3 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -116,6 +116,7 @@ sub read_and_check_resources_config {
my (undef, undef, $name) = parse_sid($sid);
$d->{state} = 'started' if !defined($d->{state});
$d->{state} = 'started' if $d->{state} eq 'enabled'; # backward compatibility
+ $d->{failback} = 1 if !defined($d->{failback});
$d->{max_restart} = 1 if !defined($d->{max_restart});
$d->{max_relocate} = 1 if !defined($d->{max_relocate});
if (PVE::HA::Resources->lookup($d->{type})) {
diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 873387e..b6d4a73 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -62,6 +62,15 @@ EODESC
completion => \&PVE::HA::Tools::complete_group,
},
),
+ failback => {
+ description => "Automatically migrate HA resource to the node with"
+ . " the highest priority according to their node affinity "
+ . " rules, if a node with a higher priority than the current"
+ . " node comes online.",
+ type => 'boolean',
+ optional => 1,
+ default => 1,
+ },
max_restart => {
description => "Maximal number of tries to restart the service on"
. " a node after its start failed.",
diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index d1ab679..44644d9 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -37,6 +37,7 @@ sub options {
state => { optional => 1 },
group => { optional => 1 },
comment => { optional => 1 },
+ failback => { optional => 1 },
max_restart => { optional => 1 },
max_relocate => { optional => 1 },
};
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index fe65577..e634fe3 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -37,6 +37,7 @@ sub options {
state => { optional => 1 },
group => { optional => 1 },
comment => { optional => 1 },
+ failback => { optional => 1 },
max_restart => { optional => 1 },
max_relocate => { optional => 1 },
};
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 89dbdfa..579be2a 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -106,6 +106,7 @@ sub read_service_config {
}
$d->{state} = 'disabled' if !$d->{state};
$d->{state} = 'started' if $d->{state} eq 'enabled'; # backward compatibility
+ $d->{failback} = 1 if !defined($d->{failback});
$d->{max_restart} = 1 if !defined($d->{max_restart});
$d->{max_relocate} = 1 if !defined($d->{max_relocate});
}
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 29b56c6..f6faa38 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -22,6 +22,7 @@ $online_node_usage->add_node("node3");
my $service_conf = {
node => 'node1',
group => 'prefer_node1',
+ failback => 1,
};
my $sd = {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 10/15] manager: migrate ha groups to node affinity rules in-memory
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (9 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 09/15] resources: introduce failback property in ha resource config Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 11/15] manager: apply node affinity rules when selecting service nodes Daniel Kral
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Migrate the currently configured groups to node affinity rules
in-memory, so that they can be applied as such in the next patches and
therefore replace HA groups internally.
HA node affinity rules in their initial implementation are designed to
be as restrictive as HA groups, i.e. only allow a HA resource to be used
in a single node affinity rule, to ease the migration between them.
HA groups map directly to node affinity rules, except that the
'restricted' property is renamed to 'strict' and that the 'failback'
property is moved to the HA resources config.
The 'nofailback' property is moved to the HA resources config, because
it allows users to set it more granularly for individual HA resources
and allows the node affinity rules to be more extendible in the future,
e.g. multiple node affinity rules for a single HA resource.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Config.pm | 3 ++-
src/PVE/HA/Groups.pm | 48 +++++++++++++++++++++++++++++++++++++++++++
src/PVE/HA/Manager.pm | 18 ++++++++++++++--
3 files changed, 66 insertions(+), 3 deletions(-)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 7d071f3..424a6e1 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -131,7 +131,8 @@ sub read_and_check_resources_config {
}
}
- return $conf;
+ # TODO PVE 10: Remove digest when HA groups have been fully migrated to rules
+ return wantarray ? ($conf, $res->{digest}) : $conf;
}
sub update_resources_config {
diff --git a/src/PVE/HA/Groups.pm b/src/PVE/HA/Groups.pm
index 821d969..f065732 100644
--- a/src/PVE/HA/Groups.pm
+++ b/src/PVE/HA/Groups.pm
@@ -6,6 +6,7 @@ use warnings;
use PVE::JSONSchema qw(get_standard_option);
use PVE::SectionConfig;
use PVE::HA::Tools;
+use PVE::HA::Rules;
use base qw(PVE::SectionConfig);
@@ -107,4 +108,51 @@ sub parse_section_header {
__PACKAGE__->register();
__PACKAGE__->init();
+# Migrate nofailback flag from $groups to $resources
+sub migrate_groups_to_resources {
+ my ($groups, $resources) = @_;
+
+ for my $sid (keys %$resources) {
+ my $groupid = $resources->{$sid}->{group}
+ or next; # skip resources without groups
+
+ $resources->{$sid}->{failback} = !$groups->{ids}->{$groupid}->{nofailback};
+ }
+}
+
+# Migrate groups from groups from $groups and $resources to node affinity rules in $rules
+sub migrate_groups_to_rules {
+ my ($rules, $groups, $resources) = @_;
+
+ my $group_resources = {};
+
+ for my $sid (keys %$resources) {
+ my $groupid = $resources->{$sid}->{group}
+ or next; # skip resources without groups
+
+ $group_resources->{$groupid}->{$sid} = 1;
+ }
+
+ while (my ($group, $resources) = each %$group_resources) {
+ next if !$groups->{ids}->{$group}; # skip non-existant groups
+
+ my $new_ruleid = "ha-group-$group";
+ my $nodes = {};
+ for my $entry (keys $groups->{ids}->{$group}->{nodes}->%*) {
+ my ($node, $priority) = PVE::HA::Tools::parse_node_priority($entry);
+
+ $nodes->{$node} = { priority => $priority };
+ }
+
+ $rules->{ids}->{$new_ruleid} = {
+ type => 'node-affinity',
+ resources => $resources,
+ nodes => $nodes,
+ strict => $groups->{ids}->{$group}->{restricted},
+ comment => "Generated from HA group '$group'.",
+ };
+ $rules->{order}->{$new_ruleid} = PVE::HA::Rules::get_next_ordinal($rules);
+ }
+}
+
1;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 88ff4a6..148447d 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -6,6 +6,7 @@ use warnings;
use Digest::MD5 qw(md5_base64);
use PVE::Tools;
+use PVE::HA::Groups;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Rules;
@@ -47,6 +48,8 @@ sub new {
haenv => $haenv,
crs => {},
last_rules_digest => '',
+ last_groups_digest => '',
+ last_services_digest => '',
}, $class;
my $old_ms = $haenv->read_manager_status();
@@ -529,7 +532,7 @@ sub manage {
$self->update_crs_scheduler_mode();
- my $sc = $haenv->read_service_config();
+ my ($sc, $services_digest) = $haenv->read_service_config();
$self->{groups} = $haenv->read_group_config(); # update
@@ -564,7 +567,16 @@ sub manage {
my $new_rules = $haenv->read_rules_config();
- if ($new_rules->{digest} ne $self->{last_rules_digest}) {
+ # TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules
+ PVE::HA::Groups::migrate_groups_to_resources($self->{groups}, $sc);
+
+ if (
+ !$self->{rules}
+ || $new_rules->{digest} ne $self->{last_rules_digest}
+ || $self->{groups}->{digest} ne $self->{last_groups_digest}
+ || $services_digest && $services_digest ne $self->{last_services_digest}
+ ) {
+ PVE::HA::Groups::migrate_groups_to_rules($new_rules, $self->{groups}, $sc);
my $messages = PVE::HA::Rules->canonicalize($new_rules);
$haenv->log('info', $_) for @$messages;
@@ -572,6 +584,8 @@ sub manage {
$self->{rules} = $new_rules;
$self->{last_rules_digest} = $self->{rules}->{digest};
+ $self->{last_groups_digest} = $self->{groups}->{digest};
+ $self->{last_services_digest} = $services_digest;
}
$self->update_crm_commands();
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 11/15] manager: apply node affinity rules when selecting service nodes
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (10 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 10/15] manager: migrate ha groups to node affinity rules in-memory Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 12/15] test: add test cases for rules config Daniel Kral
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Replace the HA group mechanism with the functionally equivalent node
affinity rules' get_node_affinity(...), which enforces the node affinity
rules defined in the rules config.
This allows the $groups parameter to be replaced with the $rules
parameter in select_service_node(...) as all behavior of the HA groups
is now encoded in $service_conf and $rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 83 ++++++--------------------------
src/PVE/HA/Rules/NodeAffinity.pm | 83 ++++++++++++++++++++++++++++++++
src/test/test_failover1.pl | 16 ++++--
3 files changed, 110 insertions(+), 72 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 148447d..4357253 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -10,7 +10,7 @@ use PVE::HA::Groups;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Rules;
-use PVE::HA::Rules::NodeAffinity;
+use PVE::HA::Rules::NodeAffinity qw(get_node_affinity);
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -114,57 +114,13 @@ sub flush_master_status {
$haenv->write_manager_status($ms);
}
-sub get_service_group {
- my ($groups, $online_node_usage, $service_conf) = @_;
-
- my $group = {};
- # add all online nodes to default group to allow try_next when no group set
- $group->{nodes}->{$_} = 1 for $online_node_usage->list_nodes();
-
- # overwrite default if service is bound to a specific group
- if (my $group_id = $service_conf->{group}) {
- $group = $groups->{ids}->{$group_id} if $groups->{ids}->{$group_id};
- }
-
- return $group;
-}
-
-# groups available nodes with their priority as group index
-sub get_node_priority_groups {
- my ($group, $online_node_usage) = @_;
-
- my $pri_groups = {};
- my $group_members = {};
- foreach my $entry (keys %{ $group->{nodes} }) {
- my ($node, $pri) = ($entry, 0);
- if ($entry =~ m/^(\S+):(\d+)$/) {
- ($node, $pri) = ($1, $2);
- }
- next if !$online_node_usage->contains_node($node); # offline
- $pri_groups->{$pri}->{$node} = 1;
- $group_members->{$node} = $pri;
- }
-
- # add non-group members to unrestricted groups (priority -1)
- if (!$group->{restricted}) {
- my $pri = -1;
- for my $node ($online_node_usage->list_nodes()) {
- next if defined($group_members->{$node});
- $pri_groups->{$pri}->{$node} = 1;
- $group_members->{$node} = -1;
- }
- }
-
- return ($pri_groups, $group_members);
-}
-
=head3 select_service_node(...)
-=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference)
+=head3 select_service_node($rules, $online_node_usage, $sid, $service_conf, $sd, $node_preference)
Used to select the best fitting node for the service C<$sid>, with the
-configuration C<$service_conf> and state C<$sd>, according to the groups defined
-in C<$groups>, available node utilization in C<$online_node_usage>, and the
+configuration C<$service_conf> and state C<$sd>, according to the rules defined
+in C<$rules>, available node utilization in C<$online_node_usage>, and the
given C<$node_preference>.
The C<$node_preference> can be set to:
@@ -182,7 +138,7 @@ The C<$node_preference> can be set to:
=cut
sub select_service_node {
- my ($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference) = @_;
+ my ($rules, $online_node_usage, $sid, $service_conf, $sd, $node_preference) = @_;
die "'$node_preference' is not a valid node_preference for select_service_node\n"
if $node_preference !~ m/(none|best-score|try-next)/;
@@ -190,42 +146,35 @@ sub select_service_node {
my ($current_node, $tried_nodes, $maintenance_fallback) =
$sd->@{qw(node failed_nodes maintenance_node)};
- my $group = get_service_group($groups, $online_node_usage, $service_conf);
+ my ($allowed_nodes, $pri_nodes) = get_node_affinity($rules, $sid, $online_node_usage);
- my ($pri_groups, $group_members) = get_node_priority_groups($group, $online_node_usage);
-
- my @pri_list = sort { $b <=> $a } keys %$pri_groups;
- return undef if !scalar(@pri_list);
+ return undef if !%$pri_nodes;
# stay on current node if possible (avoids random migrations)
if (
$node_preference eq 'none'
- && $group->{nofailback}
- && defined($group_members->{$current_node})
+ && !$service_conf->{failback}
+ && $allowed_nodes->{$current_node}
) {
return $current_node;
}
- # select node from top priority node list
-
- my $top_pri = $pri_list[0];
-
# try to avoid nodes where the service failed already if we want to relocate
if ($node_preference eq 'try-next') {
foreach my $node (@$tried_nodes) {
- delete $pri_groups->{$top_pri}->{$node};
+ delete $pri_nodes->{$node};
}
}
return $maintenance_fallback
- if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+ if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
- return $current_node if $node_preference eq 'none' && $pri_groups->{$top_pri}->{$current_node};
+ return $current_node if $node_preference eq 'none' && $pri_nodes->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
$scores->{$a} <=> $scores->{$b} || $a cmp $b
- } keys %{ $pri_groups->{$top_pri} };
+ } keys %$pri_nodes;
my $found;
for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
@@ -843,7 +792,7 @@ sub next_state_request_start {
if ($self->{crs}->{rebalance_on_request_start}) {
my $selected_node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
@@ -1010,7 +959,7 @@ sub next_state_started {
}
my $node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
@@ -1128,7 +1077,7 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
my $recovery_node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
diff --git a/src/PVE/HA/Rules/NodeAffinity.pm b/src/PVE/HA/Rules/NodeAffinity.pm
index 2b3d739..0331399 100644
--- a/src/PVE/HA/Rules/NodeAffinity.pm
+++ b/src/PVE/HA/Rules/NodeAffinity.pm
@@ -12,8 +12,13 @@ use PVE::Tools;
use PVE::HA::Rules;
use PVE::HA::Tools;
+use base qw(Exporter);
use base qw(PVE::HA::Rules);
+our @EXPORT_OK = qw(
+ get_node_affinity
+);
+
=head1 NAME
PVE::HA::Rules::NodeAffinity
@@ -210,4 +215,82 @@ __PACKAGE__->register_check(
},
);
+=head1 NODE AFFINITY RULE HELPERS
+
+=cut
+
+my $get_resource_node_affinity_rule = sub {
+ my ($rules, $sid) = @_;
+
+ # with the current restriction a resource can only be in one node affinity rule
+ my $node_affinity_rule;
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule) = @_;
+
+ $node_affinity_rule = dclone($rule) if !$node_affinity_rule;
+ },
+ {
+ sid => $sid,
+ type => 'node-affinity',
+ exclude_disabled_rules => 1,
+ },
+ );
+
+ return $node_affinity_rule;
+};
+
+=head3 get_node_affinity($rules, $sid, $online_node_usage)
+
+Returns a list of two hashes representing the node affinity of C<$sid>
+according to the node affinity rules in C<$rules> and the available nodes in
+C<$online_node_usage>.
+
+The first hash is a hash set of available nodes, i.e. nodes where the
+resource C<$sid> is allowed to be assigned to, and the second hash is a hash set
+of preferred nodes, i.e. nodes where the resource C<$sid> should be assigned to.
+
+If there are no available nodes at all, returns C<undef>.
+
+=cut
+
+sub get_node_affinity : prototype($$$) {
+ my ($rules, $sid, $online_node_usage) = @_;
+
+ my $node_affinity_rule = $get_resource_node_affinity_rule->($rules, $sid);
+
+ # default to a node affinity rule with all available nodes
+ if (!$node_affinity_rule) {
+ for my $node ($online_node_usage->list_nodes()) {
+ $node_affinity_rule->{nodes}->{$node} = { priority => 0 };
+ }
+ }
+
+ # add remaining nodes with low priority for non-strict node affinity rules
+ if (!$node_affinity_rule->{strict}) {
+ for my $node ($online_node_usage->list_nodes()) {
+ next if defined($node_affinity_rule->{nodes}->{$node});
+
+ $node_affinity_rule->{nodes}->{$node} = { priority => -1 };
+ }
+ }
+
+ my $allowed_nodes = {};
+ my $prioritized_nodes = {};
+
+ while (my ($node, $props) = each %{ $node_affinity_rule->{nodes} }) {
+ next if !$online_node_usage->contains_node($node); # node is offline
+
+ $allowed_nodes->{$node} = 1;
+ $prioritized_nodes->{ $props->{priority} }->{$node} = 1;
+ }
+
+ my $preferred_nodes = {};
+ my $highest_priority = (sort { $b <=> $a } keys %$prioritized_nodes)[0];
+ $preferred_nodes = $prioritized_nodes->{$highest_priority} if defined($highest_priority);
+
+ return ($allowed_nodes, $preferred_nodes);
+}
+
1;
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index f6faa38..78a001e 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -4,12 +4,19 @@ use strict;
use warnings;
use lib '..';
-use PVE::HA::Groups;
use PVE::HA::Manager;
use PVE::HA::Usage::Basic;
-my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
-group: prefer_node1
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
+
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
+my $rules = PVE::HA::Rules->parse_config("rules.tmp", <<EOD);
+node-affinity: prefer_node1
+ resources vm:111
nodes node1
EOD
@@ -21,7 +28,6 @@ $online_node_usage->add_node("node3");
my $service_conf = {
node => 'node1',
- group => 'prefer_node1',
failback => 1,
};
@@ -37,7 +43,7 @@ sub test {
my $select_node_preference = $try_next ? 'try-next' : 'none';
my $node = PVE::HA::Manager::select_service_node(
- $groups, $online_node_usage, "vm:111", $service_conf, $sd, $select_node_preference,
+ $rules, $online_node_usage, "vm:111", $service_conf, $sd, $select_node_preference,
);
my (undef, undef, $line) = caller();
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 12/15] test: add test cases for rules config
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (11 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 11/15] manager: apply node affinity rules when selecting service nodes Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 13/15] api: introduce ha rules api endpoints Daniel Kral
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Add test cases to verify that the rule checkers correctly identify and
remove HA rules from the rules to make the rule set feasible. For now,
there only are HA Node Affinity rules, which verify:
- Node Affinity rules retrieve the correct optional default values
- Node Affinity rules, which specify the same HA resource more than
once, are dropped from the rule set
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
.gitignore | 1 +
src/test/Makefile | 4 +-
.../defaults-for-node-affinity-rules.cfg | 22 ++++
...efaults-for-node-affinity-rules.cfg.expect | 60 +++++++++++
...e-resource-refs-in-node-affinity-rules.cfg | 31 ++++++
...rce-refs-in-node-affinity-rules.cfg.expect | 63 +++++++++++
src/test/test_rules_config.pl | 100 ++++++++++++++++++
7 files changed, 280 insertions(+), 1 deletion(-)
create mode 100644 src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg
create mode 100644 src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg
create mode 100644 src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect
create mode 100755 src/test/test_rules_config.pl
diff --git a/.gitignore b/.gitignore
index c35280e..35de63f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,3 +6,4 @@
/src/test/test-*/status/*
/src/test/fence_cfgs/*.cfg.commands
/src/test/fence_cfgs/*.cfg.write
+/src/test/rules_cfgs/*.cfg.output
diff --git a/src/test/Makefile b/src/test/Makefile
index e54959f..6da9e10 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -5,6 +5,7 @@ all:
test:
@echo "-- start regression tests --"
./test_failover1.pl
+ ./test_rules_config.pl
./ha-tester.pl
./test_fence_config.pl
@echo "-- end regression tests (success) --"
@@ -12,4 +13,5 @@ test:
.PHONY: clean
clean:
rm -rf *~ test-*/log test-*/*~ test-*/status \
- fence_cfgs/*.cfg.commands fence_cfgs/*.write
+ fence_cfgs/*.cfg.commands fence_cfgs/*.write \
+ rules_cfgs/*.cfg.output
diff --git a/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg
new file mode 100644
index 0000000..c8b2f2d
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg
@@ -0,0 +1,22 @@
+# Case 1: Node Affinity rules are enabled and loose by default, so set it so if it isn't yet.
+node-affinity: node-affinity-defaults
+ resources vm:101
+ nodes node1
+
+# Case 2: Node Affinity rule is disabled, it shouldn't be enabled afterwards.
+node-affinity: node-affinity-disabled
+ resources vm:102
+ nodes node2
+ disable
+
+# Case 3: Node Affinity rule is disabled with explicit 1 set, it shouldn't be enabled afterwards.
+node-affinity: node-affinity-disabled-explicit
+ resources vm:103
+ nodes node2
+ disable 1
+
+# Case 4: Node Affinity rule is set to strict, so it shouldn't be loose afterwards.
+node-affinity: node-affinity-strict
+ resources vm:104
+ nodes node3
+ strict 1
diff --git a/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect
new file mode 100644
index 0000000..59a2c36
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect
@@ -0,0 +1,60 @@
+--- Log ---
+--- Config ---
+$VAR1 = {
+ 'digest' => 'c96c9de143221a82e44efa8bb4814b8248a8ea11',
+ 'ids' => {
+ 'node-affinity-defaults' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:101' => 1
+ },
+ 'type' => 'node-affinity'
+ },
+ 'node-affinity-disabled' => {
+ 'disable' => 1,
+ 'nodes' => {
+ 'node2' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:102' => 1
+ },
+ 'type' => 'node-affinity'
+ },
+ 'node-affinity-disabled-explicit' => {
+ 'disable' => 1,
+ 'nodes' => {
+ 'node2' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:103' => 1
+ },
+ 'type' => 'node-affinity'
+ },
+ 'node-affinity-strict' => {
+ 'nodes' => {
+ 'node3' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:104' => 1
+ },
+ 'strict' => 1,
+ 'type' => 'node-affinity'
+ }
+ },
+ 'order' => {
+ 'node-affinity-defaults' => 1,
+ 'node-affinity-disabled' => 2,
+ 'node-affinity-disabled-explicit' => 3,
+ 'node-affinity-strict' => 4
+ }
+ };
diff --git a/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg
new file mode 100644
index 0000000..1e279e7
--- /dev/null
+++ b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg
@@ -0,0 +1,31 @@
+# Case 1: Do not remove two Node Affinity rules, which do not share resources.
+node-affinity: no-same-resource1
+ resources vm:101,vm:102,vm:103
+ nodes node1,node2:2
+ strict 0
+
+node-affinity: no-same-resource2
+ resources vm:104,vm:105
+ nodes node1,node2:2
+ strict 0
+
+node-affinity: no-same-resource3
+ resources vm:106
+ nodes node1,node2:2
+ strict 1
+
+# Case 2: Remove Node Affinity rules, which share the same resource between them.
+node-affinity: same-resource1
+ resources vm:201
+ nodes node1,node2:2
+ strict 0
+
+node-affinity: same-resource2
+ resources vm:201,vm:202
+ nodes node3
+ strict 1
+
+node-affinity: same-resource3
+ resources vm:201,vm:203,vm:204
+ nodes node1:2,node3:3
+ strict 0
diff --git a/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect
new file mode 100644
index 0000000..3fd0c9c
--- /dev/null
+++ b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect
@@ -0,0 +1,63 @@
+--- Log ---
+Drop rule 'same-resource1', because resource 'vm:201' is already used in another node affinity rule.
+Drop rule 'same-resource2', because resource 'vm:201' is already used in another node affinity rule.
+Drop rule 'same-resource3', because resource 'vm:201' is already used in another node affinity rule.
+--- Config ---
+$VAR1 = {
+ 'digest' => '5865d23b1a342e7f8cfa68bd0e1da556ca8d28a6',
+ 'ids' => {
+ 'no-same-resource1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'resources' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ 'vm:103' => 1
+ },
+ 'strict' => 0,
+ 'type' => 'node-affinity'
+ },
+ 'no-same-resource2' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'resources' => {
+ 'vm:104' => 1,
+ 'vm:105' => 1
+ },
+ 'strict' => 0,
+ 'type' => 'node-affinity'
+ },
+ 'no-same-resource3' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'resources' => {
+ 'vm:106' => 1
+ },
+ 'strict' => 1,
+ 'type' => 'node-affinity'
+ }
+ },
+ 'order' => {
+ 'no-same-resource1' => 1,
+ 'no-same-resource2' => 2,
+ 'no-same-resource3' => 3
+ }
+ };
diff --git a/src/test/test_rules_config.pl b/src/test/test_rules_config.pl
new file mode 100755
index 0000000..824afed
--- /dev/null
+++ b/src/test/test_rules_config.pl
@@ -0,0 +1,100 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+use lib qw(..);
+
+use Test::More;
+use Test::MockModule;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
+
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
+my $opt_nodiff;
+
+if (!GetOptions("nodiff" => \$opt_nodiff)) {
+ print "usage: $0 [test.cfg] [--nodiff]\n";
+ exit -1;
+}
+
+sub _log {
+ my ($fh, $source, $message) = @_;
+
+ chomp $message;
+ $message = "[$source] $message" if $source;
+
+ print "$message\n";
+
+ $fh->print("$message\n");
+ $fh->flush();
+}
+
+sub check_cfg {
+ my ($cfg_fn, $outfile) = @_;
+
+ my $raw = PVE::Tools::file_get_contents($cfg_fn);
+
+ open(my $LOG, '>', "$outfile");
+ select($LOG);
+ $| = 1;
+
+ print "--- Log ---\n";
+ my $cfg = PVE::HA::Rules->parse_config($cfg_fn, $raw);
+ PVE::HA::Rules->set_rule_defaults($_) for values %{ $cfg->{ids} };
+ my $messages = PVE::HA::Rules->canonicalize($cfg);
+ print $_ for @$messages;
+ print "--- Config ---\n";
+ {
+ local $Data::Dumper::Sortkeys = 1;
+ print Dumper($cfg);
+ }
+
+ select(STDOUT);
+}
+
+sub run_test {
+ my ($cfg_fn) = @_;
+
+ print "* check: $cfg_fn\n";
+
+ my $outfile = "$cfg_fn.output";
+ my $expect = "$cfg_fn.expect";
+
+ eval { check_cfg($cfg_fn, $outfile); };
+ if (my $err = $@) {
+ die "Test '$cfg_fn' failed:\n$err\n";
+ }
+
+ return if $opt_nodiff;
+
+ my $res;
+
+ if (-f $expect) {
+ my $cmd = ['diff', '-u', $expect, $outfile];
+ $res = system(@$cmd);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ } else {
+ $res = system('cp', $outfile, $expect);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ }
+
+ print "* end rules test: $cfg_fn (success)\n\n";
+}
+
+# exec tests
+
+if (my $testcfg = shift) {
+ run_test($testcfg);
+} else {
+ for my $cfg (<rules_cfgs/*cfg>) {
+ run_test($cfg);
+ }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 13/15] api: introduce ha rules api endpoints
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (12 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 12/15] test: add test cases for rules config Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 14/15] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Add CRUD API endpoints for HA rules, which assert whether the given
properties for the rules are valid and will not make the existing rule
set infeasible.
Disallowing changes to the rule set via the API, which would make this
and other rules infeasible, makes it safer for users of the HA Manager
to not disrupt the behavior that other rules already enforce.
This functionality can obviously not safeguard manual changes to the
rules config file itself, but manual changes that result in infeasible
rules will be dropped on the next canonalize(...) call by the HA
Manager anyway with a log message.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/API2/HA/Makefile | 2 +-
src/PVE/API2/HA/Rules.pm | 391 ++++++++++++++++++++++++++++++++++
3 files changed, 393 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/API2/HA/Rules.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 7462663..b4eff27 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -16,6 +16,7 @@
/usr/share/man/man8/pve-ha-lrm.8.gz
/usr/share/perl5/PVE/API2/HA/Groups.pm
/usr/share/perl5/PVE/API2/HA/Resources.pm
+/usr/share/perl5/PVE/API2/HA/Rules.pm
/usr/share/perl5/PVE/API2/HA/Status.pm
/usr/share/perl5/PVE/CLI/ha_manager.pm
/usr/share/perl5/PVE/HA/CRM.pm
diff --git a/src/PVE/API2/HA/Makefile b/src/PVE/API2/HA/Makefile
index 5686efc..86c1013 100644
--- a/src/PVE/API2/HA/Makefile
+++ b/src/PVE/API2/HA/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Resources.pm Groups.pm Status.pm
+SOURCES=Resources.pm Groups.pm Rules.pm Status.pm
.PHONY: install
install:
diff --git a/src/PVE/API2/HA/Rules.pm b/src/PVE/API2/HA/Rules.pm
new file mode 100644
index 0000000..2e5e382
--- /dev/null
+++ b/src/PVE/API2/HA/Rules.pm
@@ -0,0 +1,391 @@
+package PVE::API2::HA::Rules;
+
+use strict;
+use warnings;
+
+use HTTP::Status qw(:constants);
+
+use Storable qw(dclone);
+
+use PVE::Cluster qw(cfs_read_file);
+use PVE::Exception;
+use PVE::Tools qw(extract_param);
+use PVE::JSONSchema qw(get_standard_option);
+
+use PVE::HA::Config;
+use PVE::HA::Groups;
+use PVE::HA::Rules;
+
+use base qw(PVE::RESTHandler);
+
+my $get_api_ha_rule = sub {
+ my ($rules, $ruleid, $rule_errors) = @_;
+
+ die "no such ha rule '$ruleid'\n" if !$rules->{ids}->{$ruleid};
+
+ my $rule_cfg = dclone($rules->{ids}->{$ruleid});
+
+ $rule_cfg->{rule} = $ruleid;
+ $rule_cfg->{digest} = $rules->{digest};
+ $rule_cfg->{order} = $rules->{order}->{$ruleid};
+
+ # set optional rule parameter's default values
+ PVE::HA::Rules->set_rule_defaults($rule_cfg);
+
+ if ($rule_cfg->{resources}) {
+ $rule_cfg->{resources} =
+ PVE::HA::Rules->encode_value($rule_cfg->{type}, 'resources', $rule_cfg->{resources});
+ }
+
+ if ($rule_cfg->{nodes}) {
+ $rule_cfg->{nodes} =
+ PVE::HA::Rules->encode_value($rule_cfg->{type}, 'nodes', $rule_cfg->{nodes});
+ }
+
+ if ($rule_errors) {
+ $rule_cfg->{errors} = $rule_errors;
+ }
+
+ return $rule_cfg;
+};
+
+my $assert_resources_are_configured = sub {
+ my ($resources) = @_;
+
+ my $unconfigured_resources = [];
+
+ for my $resource (sort keys %$resources) {
+ push @$unconfigured_resources, $resource
+ if !PVE::HA::Config::service_is_configured($resource);
+ }
+
+ die "cannot use unmanaged resource(s) " . join(', ', @$unconfigured_resources) . ".\n"
+ if @$unconfigured_resources;
+};
+
+my $assert_nodes_do_exist = sub {
+ my ($nodes) = @_;
+
+ my $nonexistant_nodes = [];
+
+ for my $node (sort keys %$nodes) {
+ push @$nonexistant_nodes, $node
+ if !PVE::Cluster::check_node_exists($node, 1);
+ }
+
+ die "cannot use non-existant node(s) " . join(', ', @$nonexistant_nodes) . ".\n"
+ if @$nonexistant_nodes;
+};
+
+my $get_full_rules_config = sub {
+ my ($rules) = @_;
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ # TODO PVE 10: Remove group migration when HA groups have been fully migrated to location rules
+ my $groups = PVE::HA::Config::read_group_config();
+ my $resources = PVE::HA::Config::read_and_check_resources_config();
+
+ PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $resources);
+
+ return $rules;
+};
+
+my $check_feasibility = sub {
+ my ($rules) = @_;
+
+ $rules = dclone($rules);
+
+ $rules = $get_full_rules_config->($rules);
+
+ return PVE::HA::Rules->check_feasibility($rules);
+};
+
+my $assert_feasibility = sub {
+ my ($rules, $ruleid) = @_;
+
+ my $global_errors = $check_feasibility->($rules);
+ my $rule_errors = $global_errors->{$ruleid};
+
+ return if !$rule_errors;
+
+ # stringify error messages
+ for my $opt (keys %$rule_errors) {
+ $rule_errors->{$opt} = join(', ', @{ $rule_errors->{$opt} });
+ }
+
+ my $param = {
+ code => HTTP_BAD_REQUEST,
+ errors => $rule_errors,
+ };
+
+ my $exc = PVE::Exception->new("Rule '$ruleid' is invalid.\n", %$param);
+
+ my ($pkg, $filename, $line) = caller;
+
+ $exc->{filename} = $filename;
+ $exc->{line} = $line;
+
+ die $exc;
+};
+
+__PACKAGE__->register_method({
+ name => 'index',
+ path => '',
+ method => 'GET',
+ description => "Get HA rules.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Audit']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ type => {
+ type => 'string',
+ description => "Limit the returned list to the specified rule type.",
+ enum => PVE::HA::Rules->lookup_types(),
+ optional => 1,
+ },
+ resource => {
+ type => 'string',
+ description =>
+ "Limit the returned list to rules affecting the specified resource.",
+ completion => \&PVE::HA::Tools::complete_sid,
+ optional => 1,
+ },
+ },
+ },
+ returns => {
+ type => 'array',
+ items => {
+ type => 'object',
+ properties => {
+ rule => { type => 'string' },
+ },
+ links => [{ rel => 'child', href => '{rule}' }],
+ },
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $type = extract_param($param, 'type');
+ my $state = extract_param($param, 'state');
+ my $resource = extract_param($param, 'resource');
+
+ my $rules = PVE::HA::Config::read_rules_config();
+ $rules = $get_full_rules_config->($rules);
+
+ my $global_errors = $check_feasibility->($rules);
+
+ my $res = [];
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ my $rule_errors = $global_errors->{$ruleid};
+ my $rule_cfg = $get_api_ha_rule->($rules, $ruleid, $rule_errors);
+
+ push @$res, $rule_cfg;
+ },
+ {
+ type => $type,
+ sid => $resource,
+ },
+ );
+
+ return $res;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'read_rule',
+ method => 'GET',
+ path => '{rule}',
+ description => "Read HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Audit']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ { completion => \&PVE::HA::Tools::complete_rule },
+ ),
+ },
+ },
+ returns => {
+ type => 'object',
+ properties => {
+ rule => get_standard_option('pve-ha-rule-id'),
+ type => {
+ type => 'string',
+ },
+ },
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+
+ my $rules = PVE::HA::Config::read_rules_config();
+ $rules = $get_full_rules_config->($rules);
+
+ my $global_errors = $check_feasibility->($rules);
+ my $rule_errors = $global_errors->{$ruleid};
+
+ return $get_api_ha_rule->($rules, $ruleid, $rule_errors);
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'create_rule',
+ method => 'POST',
+ path => '',
+ description => "Create HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => PVE::HA::Rules->createSchema(),
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ PVE::Cluster::check_cfs_quorum();
+ mkdir("/etc/pve/ha");
+
+ my $type = extract_param($param, 'type');
+ my $ruleid = extract_param($param, 'rule');
+
+ my $plugin = PVE::HA::Rules->lookup($type);
+
+ my $opts = $plugin->check_config($ruleid, $param, 1, 1);
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ die "HA rule '$ruleid' already defined\n" if $rules->{ids}->{$ruleid};
+
+ $assert_resources_are_configured->($opts->{resources});
+ $assert_nodes_do_exist->($opts->{nodes}) if $opts->{nodes};
+
+ $rules->{order}->{$ruleid} = PVE::HA::Rules::get_next_ordinal($rules);
+ $rules->{ids}->{$ruleid} = $opts;
+
+ $assert_feasibility->($rules, $ruleid);
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "create ha rule failed",
+ );
+
+ return undef;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'update_rule',
+ method => 'PUT',
+ path => '{rule}',
+ description => "Update HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => PVE::HA::Rules->updateSchema(),
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+ my $digest = extract_param($param, 'digest');
+ my $delete = extract_param($param, 'delete');
+
+ if ($delete) {
+ $delete = [PVE::Tools::split_list($delete)];
+ }
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ PVE::SectionConfig::assert_if_modified($rules, $digest);
+
+ my $rule = $rules->{ids}->{$ruleid} || die "HA rule '$ruleid' does not exist\n";
+
+ my $type = $rule->{type};
+ my $plugin = PVE::HA::Rules->lookup($type);
+ my $opts = $plugin->check_config($ruleid, $param, 0, 1);
+
+ $assert_resources_are_configured->($opts->{resources});
+ $assert_nodes_do_exist->($opts->{nodes}) if $opts->{nodes};
+
+ my $options = $plugin->private()->{options}->{$type};
+ PVE::SectionConfig::delete_from_config($rule, $options, $opts, $delete);
+
+ $rule->{$_} = $opts->{$_} for keys $opts->%*;
+
+ $assert_feasibility->($rules, $ruleid);
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "update HA rules failed",
+ );
+
+ return undef;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'delete_rule',
+ method => 'DELETE',
+ path => '{rule}',
+ description => "Delete HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ { completion => \&PVE::HA::Tools::complete_rule },
+ ),
+ },
+ },
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ delete $rules->{ids}->{$ruleid};
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "delete ha rule failed",
+ );
+
+ return undef;
+ },
+});
+
+1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH ha-manager v3 14/15] cli: expose ha rules api endpoints to ha-manager cli
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (13 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 13/15] api: introduce ha rules api endpoints Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [RFC ha-manager v3 15/15] manager: persistently migrate ha groups to ha rules Daniel Kral
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Expose the HA rules API endpoints through the CLI in its own subcommand.
The names of the subsubcommands are chosen to be consistent with the
other commands provided by the ha-manager CLI for HA resources and
groups, but grouped into a subcommand.
The properties specified for the 'rules config' command are chosen to
reflect the columns from the WebGUI for the HA rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/CLI/ha_manager.pm | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/src/PVE/CLI/ha_manager.pm b/src/PVE/CLI/ha_manager.pm
index ca230f2..ef936cd 100644
--- a/src/PVE/CLI/ha_manager.pm
+++ b/src/PVE/CLI/ha_manager.pm
@@ -17,6 +17,7 @@ use PVE::HA::Env::PVE2;
use PVE::HA::Tools;
use PVE::API2::HA::Resources;
use PVE::API2::HA::Groups;
+use PVE::API2::HA::Rules;
use PVE::API2::HA::Status;
use base qw(PVE::CLIHandler);
@@ -199,6 +200,37 @@ our $cmddef = {
groupremove => ["PVE::API2::HA::Groups", 'delete', ['group']],
groupset => ["PVE::API2::HA::Groups", 'update', ['group']],
+ rules => {
+ list => [
+ 'PVE::API2::HA::Rules',
+ 'index',
+ [],
+ {},
+ sub {
+ my ($data, $schema, $options) = @_;
+ PVE::CLIFormatter::print_api_result($data, $schema, undef, $options);
+ },
+ $PVE::RESTHandler::standard_output_options,
+ ],
+ config => [
+ 'PVE::API2::HA::Rules',
+ 'index',
+ ['rule'],
+ {},
+ sub {
+ my ($data, $schema, $options) = @_;
+ my $props_to_print = [
+ 'rule', 'type', 'state', 'affinity', 'strict', 'resources', 'nodes',
+ ];
+ PVE::CLIFormatter::print_api_result($data, $schema, $props_to_print, $options);
+ },
+ $PVE::RESTHandler::standard_output_options,
+ ],
+ add => ['PVE::API2::HA::Rules', 'create_rule', ['type', 'rule']],
+ remove => ['PVE::API2::HA::Rules', 'delete_rule', ['rule']],
+ set => ['PVE::API2::HA::Rules', 'update_rule', ['type', 'rule']],
+ },
+
add => ["PVE::API2::HA::Resources", 'create', ['sid']],
remove => ["PVE::API2::HA::Resources", 'delete', ['sid']],
set => ["PVE::API2::HA::Resources", 'update', ['sid']],
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [RFC ha-manager v3 15/15] manager: persistently migrate ha groups to ha rules
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (14 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 14/15] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH docs v3 1/1] ha: add documentation about ha rules and ha node affinity rules Daniel Kral
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Migrate the HA groups config to the HA resources and HA rules config
persistently on disk and retry until it succeeds.
The HA group config is already migrated in the HA Manager in-memory, but
to persistently use them as HA node affinity rules, they must be
migrated to the HA rules config.
As the new 'failback' flag can only be read by newer HA Manager versions
and the rules config cannot be read by older HA Manager versions, these
can only be migrated (for the HA resources config) and deleted (for the
HA groups config) if all nodes are upgraded to the correct pve-manager
version, which has a version dependency on the ha-manager package, which
can read and apply the HA rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
As already mentioned in the cover letter, this is only a rough draft how
it should work, but there are at least two things missing (and I'm
unfortunately sure that there will be more edge cases than this happy
path), but I hope I got it right close enough to be a viable solution.
This also sparks the discussion on how we should handle any uses of the
HA group API now, as those haven't been adapted yet. We could also
remove them here, or make them (+ the HA resource API 'group' property)
translate the HA groups to HA rules every time..
src/PVE/HA/Config.pm | 5 +
src/PVE/HA/Env.pm | 24 ++
src/PVE/HA/Env/PVE2.pm | 28 ++
src/PVE/HA/Manager.pm | 94 ++++++
src/PVE/HA/Sim/Env.pm | 30 ++
src/PVE/HA/Sim/Hardware.pm | 22 ++
src/test/test-group-migrate1/README | 10 +
src/test/test-group-migrate1/cmdlist | 3 +
src/test/test-group-migrate1/groups | 7 +
src/test/test-group-migrate1/hardware_status | 5 +
src/test/test-group-migrate1/log.expect | 306 +++++++++++++++++++
src/test/test-group-migrate1/manager_status | 1 +
src/test/test-group-migrate1/service_config | 5 +
src/test/test-group-migrate2/README | 10 +
src/test/test-group-migrate2/cmdlist | 3 +
src/test/test-group-migrate2/groups | 7 +
src/test/test-group-migrate2/hardware_status | 5 +
src/test/test-group-migrate2/log.expect | 47 +++
src/test/test-group-migrate2/manager_status | 1 +
src/test/test-group-migrate2/service_config | 5 +
20 files changed, 618 insertions(+)
create mode 100644 src/test/test-group-migrate1/README
create mode 100644 src/test/test-group-migrate1/cmdlist
create mode 100644 src/test/test-group-migrate1/groups
create mode 100644 src/test/test-group-migrate1/hardware_status
create mode 100644 src/test/test-group-migrate1/log.expect
create mode 100644 src/test/test-group-migrate1/manager_status
create mode 100644 src/test/test-group-migrate1/service_config
create mode 100644 src/test/test-group-migrate2/README
create mode 100644 src/test/test-group-migrate2/cmdlist
create mode 100644 src/test/test-group-migrate2/groups
create mode 100644 src/test/test-group-migrate2/hardware_status
create mode 100644 src/test/test-group-migrate2/log.expect
create mode 100644 src/test/test-group-migrate2/manager_status
create mode 100644 src/test/test-group-migrate2/service_config
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 424a6e1..59bafd7 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -234,6 +234,11 @@ sub read_group_config {
return cfs_read_file($ha_groups_config);
}
+sub delete_group_config {
+
+ unlink $ha_groups_config or die "failed to remove group config: $!\n";
+}
+
sub write_group_config {
my ($cfg) = @_;
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 5cee7b3..1325676 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -100,6 +100,12 @@ sub update_service_config {
return $self->{plug}->update_service_config($sid, $param);
}
+sub write_service_config {
+ my ($self, $conf) = @_;
+
+ $self->{plug}->write_service_config($conf);
+}
+
sub parse_sid {
my ($self, $sid) = @_;
@@ -137,12 +143,24 @@ sub read_rules_config {
return $self->{plug}->read_rules_config();
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ $self->{plug}->write_rules_config($rules);
+}
+
sub read_group_config {
my ($self) = @_;
return $self->{plug}->read_group_config();
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ $self->{plug}->delete_group_config();
+}
+
# this should return a hash containing info
# what nodes are members and online.
sub get_node_info {
@@ -288,4 +306,10 @@ sub get_static_node_stats {
return $self->{plug}->get_static_node_stats();
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ return $self->{plug}->get_node_version($node);
+}
+
1;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 58fd36e..aecffc0 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -141,6 +141,12 @@ sub update_service_config {
return PVE::HA::Config::update_resources_config($sid, $param);
}
+sub write_service_config {
+ my ($self, $conf) = @_;
+
+ return PVE::HA::Config::write_resources_config($conf);
+}
+
sub parse_sid {
my ($self, $sid) = @_;
@@ -201,12 +207,24 @@ sub read_rules_config {
return PVE::HA::Config::read_and_check_rules_config();
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ PVE::HA::Config::write_rules_config($rules);
+}
+
sub read_group_config {
my ($self) = @_;
return PVE::HA::Config::read_group_config();
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ PVE::HA::Config::delete_group_config();
+}
+
# this should return a hash containing info
# what nodes are members and online.
sub get_node_info {
@@ -489,4 +507,14 @@ sub get_static_node_stats {
return $stats;
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ my $version_info = PVE::Cluster::get_node_kv('version-info', $node);
+
+ return undef if !$version_info->{$node};
+
+ return $version_info->{$node}->{version};
+}
+
1;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 4357253..b2fd896 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -464,6 +464,97 @@ sub update_crm_commands {
}
+# TODO Call needs to be abstracted to environment as test environment has 3 default
+# groups, and will then evalulate true for each test
+sub have_groups_been_migrated {
+ my ($haenv) = @_;
+
+ my $groups = $haenv->read_group_config();
+
+ return 1 if !$groups;
+ return keys $groups->{ids}->%* < 1;
+}
+
+my $get_version_parts = sub {
+ my ($node_version) = @_;
+
+ my ($maj, $min) = $node_version =~ m/^(\d+)\.(\d+)/;
+
+ return ($maj, $min);
+};
+
+my $has_node_min_version = sub {
+ my ($node_version, $min_version) = @_;
+
+ my ($min_major, $min_minor) = $get_version_parts->($min_version);
+ my ($maj, $min) = $get_version_parts->($node_version);
+
+ return 0 if $maj < $min_major;
+ return 0 if $maj == $min_major && $min < $min_minor;
+
+ return 1;
+};
+
+# TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules
+sub migrate_groups_persistently {
+ my ($haenv, $ns) = @_;
+
+ return 1 if have_groups_been_migrated($haenv); # groups already migrated
+
+ $haenv->log('notice', "Start migrating HA groups...");
+
+ # NOTE pve-manager has a version dependency on the ha-manager which supports HA rules
+ my $HA_RULES_MINVERSION = "9.0.0~2";
+
+ eval {
+ my $resources = $haenv->read_service_config();
+ my $groups = $haenv->read_group_config();
+ my $rules = $haenv->read_rules_config();
+
+ # write changes to rules config whenever possible to allow users to
+ # already modify migrated rules
+ PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $resources);
+ $haenv->write_rules_config($rules);
+ $haenv->log('notice', "HA groups to rules config migration successful");
+
+ for my $node ($ns->list_nodes()->@*) {
+ my $node_status = $ns->get_node_state($node);
+ $haenv->log(
+ 'notice',
+ "node '$node' is in state '$node_status' during HA group migration.",
+ );
+ die "node '$node' is not online\n" if $node_status ne 'online';
+
+ my $node_version = $haenv->get_node_version($node);
+ die "could not retrieve version from node '$node'\n" if !$node_version;
+ $haenv->log('notice', "Node '$node' has pve-manager version '$node_version'");
+
+ my $has_min_version = $has_node_min_version->($node_version, $HA_RULES_MINVERSION);
+
+ die "node '$node' needs at least '$HA_RULES_MINVERSION' to migrate HA groups\n"
+ if !$has_min_version;
+ }
+
+ # write changes to resources config only after node checks, because old
+ # nodes cannot read the 'failback' flag yet
+ PVE::HA::Groups::migrate_groups_to_resources($groups, $resources);
+ $haenv->write_service_config($resources);
+ $haenv->log('notice', "HA groups to services config migration successful");
+
+ $haenv->delete_group_config();
+
+ $haenv->log('notice', "HA groups config deletion successful");
+ };
+ if (my $err = $@) {
+ $haenv->log('err', "Abort HA group migration: $err");
+ return 0;
+ }
+
+ $haenv->log('notice', "HA groups migration successful");
+
+ return 1;
+}
+
sub manage {
my ($self) = @_;
@@ -481,6 +572,9 @@ sub manage {
$self->update_crs_scheduler_mode();
+ # TODO Should only be run every couple of manager rounds
+ migrate_groups_persistently($haenv, $ns) if !have_groups_been_migrated($haenv);
+
my ($sc, $services_digest) = $haenv->read_service_config();
$self->{groups} = $haenv->read_group_config(); # update
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index bb76b7f..446071d 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -215,6 +215,14 @@ sub update_service_config {
return $self->{hardware}->update_service_config($sid, $param);
}
+sub write_service_config {
+ my ($self, $conf) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ $self->{hardware}->write_service_config($conf);
+}
+
sub parse_sid {
my ($self, $sid) = @_;
@@ -259,6 +267,14 @@ sub read_rules_config {
return $self->{hardware}->read_rules_config();
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ $self->{hardware}->write_rules_config($rules);
+}
+
sub read_group_config {
my ($self) = @_;
@@ -267,6 +283,14 @@ sub read_group_config {
return $self->{hardware}->read_group_config();
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ $self->{hardware}->delete_group_config();
+}
+
# this is normally only allowed by the master to recover a _fenced_ service
sub steal_service {
my ($self, $sid, $current_node, $new_node) = @_;
@@ -468,4 +492,10 @@ sub get_static_node_stats {
return $self->{hardware}->get_static_node_stats();
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ return $self->{hardware}->get_node_version($node);
+}
+
1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 579be2a..a44bc9b 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -337,6 +337,13 @@ sub read_rules_config {
return $rules;
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ my $filename = "$self->{statusdir}/rules_config";
+ PVE::HA::Rules->write_config($filename, $rules);
+}
+
sub read_group_config {
my ($self) = @_;
@@ -347,6 +354,13 @@ sub read_group_config {
return PVE::HA::Groups->parse_config($filename, $raw);
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/groups";
+ unlink $filename or die "failed to remove group config: $!\n";
+}
+
sub read_service_status {
my ($self, $node) = @_;
@@ -942,4 +956,12 @@ sub get_static_node_stats {
return $stats;
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ my $cstatus = $self->read_hardware_status_nolock();
+
+ return $cstatus->{$node}->{version} // "9.0.0~2";
+}
+
1;
diff --git a/src/test/test-group-migrate1/README b/src/test/test-group-migrate1/README
new file mode 100644
index 0000000..8775b6c
--- /dev/null
+++ b/src/test/test-group-migrate1/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group will automatically migrate back
+to a node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is a group member and has higher priority than the other nodes
diff --git a/src/test/test-group-migrate1/cmdlist b/src/test/test-group-migrate1/cmdlist
new file mode 100644
index 0000000..3bfad44
--- /dev/null
+++ b/src/test/test-group-migrate1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"]
+]
diff --git a/src/test/test-group-migrate1/groups b/src/test/test-group-migrate1/groups
new file mode 100644
index 0000000..bad746c
--- /dev/null
+++ b/src/test/test-group-migrate1/groups
@@ -0,0 +1,7 @@
+group: group1
+ nodes node1
+ restricted 1
+
+group: group2
+ nodes node2:2,node3
+ nofailback 1
diff --git a/src/test/test-group-migrate1/hardware_status b/src/test/test-group-migrate1/hardware_status
new file mode 100644
index 0000000..d66fbf0
--- /dev/null
+++ b/src/test/test-group-migrate1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "version": "9.0.0~2" },
+ "node2": { "power": "off", "network": "off", "version": "9.0.0~2" },
+ "node3": { "power": "off", "network": "off", "version": "8.4.1" }
+}
diff --git a/src/test/test-group-migrate1/log.expect b/src/test/test-group-migrate1/log.expect
new file mode 100644
index 0000000..d0ddf3d
--- /dev/null
+++ b/src/test/test-group-migrate1/log.expect
@@ -0,0 +1,306 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+noti 20 node1/crm: Start migrating HA groups...
+noti 20 node1/crm: HA groups to rules config migration successful
+noti 20 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 20 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 20 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 20 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 20 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 20 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 20 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+noti 40 node1/crm: Start migrating HA groups...
+noti 40 node1/crm: HA groups to rules config migration successful
+noti 40 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 40 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 40 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 40 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 40 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 40 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 40 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 60 node1/crm: Start migrating HA groups...
+noti 60 node1/crm: HA groups to rules config migration successful
+noti 60 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 60 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 60 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 60 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 60 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 60 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 60 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 80 node1/crm: Start migrating HA groups...
+noti 80 node1/crm: HA groups to rules config migration successful
+noti 80 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 80 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 80 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 80 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 80 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 80 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 80 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 100 node1/crm: Start migrating HA groups...
+noti 100 node1/crm: HA groups to rules config migration successful
+noti 100 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 100 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 100 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 100 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 100 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 100 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 100 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 120 node1/crm: Start migrating HA groups...
+noti 120 node1/crm: HA groups to rules config migration successful
+noti 120 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 120 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 120 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 120 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 120 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 120 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 120 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 140 node1/crm: Start migrating HA groups...
+noti 140 node1/crm: HA groups to rules config migration successful
+noti 140 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 140 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 140 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 140 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 140 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 140 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 140 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 160 node1/crm: Start migrating HA groups...
+noti 160 node1/crm: HA groups to rules config migration successful
+noti 160 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 160 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 160 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 160 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 160 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 160 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 160 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 180 node1/crm: Start migrating HA groups...
+noti 180 node1/crm: HA groups to rules config migration successful
+noti 180 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 180 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 180 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 180 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 180 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 180 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 180 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 200 node1/crm: Start migrating HA groups...
+noti 200 node1/crm: HA groups to rules config migration successful
+noti 200 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 200 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 200 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 200 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 220 node1/crm: Start migrating HA groups...
+noti 220 node1/crm: HA groups to rules config migration successful
+noti 220 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 220 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 220 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 220 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 220 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 220 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 220 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 240 node1/crm: Start migrating HA groups...
+noti 240 node1/crm: HA groups to rules config migration successful
+noti 240 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 240 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 240 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 240 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 240 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 240 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 240 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 260 node1/crm: Start migrating HA groups...
+noti 260 node1/crm: HA groups to rules config migration successful
+noti 260 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 260 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 260 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 260 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 260 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 260 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 260 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 280 node1/crm: Start migrating HA groups...
+noti 280 node1/crm: HA groups to rules config migration successful
+noti 280 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 280 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 280 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 280 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 280 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 280 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 280 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 300 node1/crm: Start migrating HA groups...
+noti 300 node1/crm: HA groups to rules config migration successful
+noti 300 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 300 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 300 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 300 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 300 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 300 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 300 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 320 node1/crm: Start migrating HA groups...
+noti 320 node1/crm: HA groups to rules config migration successful
+noti 320 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 320 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 320 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 320 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 320 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 320 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 320 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 340 node1/crm: Start migrating HA groups...
+noti 340 node1/crm: HA groups to rules config migration successful
+noti 340 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 340 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 340 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 340 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 340 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 340 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 340 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 360 node1/crm: Start migrating HA groups...
+noti 360 node1/crm: HA groups to rules config migration successful
+noti 360 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 360 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 360 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 360 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 360 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 360 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 360 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 380 node1/crm: Start migrating HA groups...
+noti 380 node1/crm: HA groups to rules config migration successful
+noti 380 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 380 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 380 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 380 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 380 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 380 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 380 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 400 node1/crm: Start migrating HA groups...
+noti 400 node1/crm: HA groups to rules config migration successful
+noti 400 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 400 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 400 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 400 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 400 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 400 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 400 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 420 node1/crm: Start migrating HA groups...
+noti 420 node1/crm: HA groups to rules config migration successful
+noti 420 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 420 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 420 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 420 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 420 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 420 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 420 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 440 node1/crm: Start migrating HA groups...
+noti 440 node1/crm: HA groups to rules config migration successful
+noti 440 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 440 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 440 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 440 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 440 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 440 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 440 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 460 node1/crm: Start migrating HA groups...
+noti 460 node1/crm: HA groups to rules config migration successful
+noti 460 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 460 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 460 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 460 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 460 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 460 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 460 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 480 node1/crm: Start migrating HA groups...
+noti 480 node1/crm: HA groups to rules config migration successful
+noti 480 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 480 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 480 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 480 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 480 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 480 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 480 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 500 node1/crm: Start migrating HA groups...
+noti 500 node1/crm: HA groups to rules config migration successful
+noti 500 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 500 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 500 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 500 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 500 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 500 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 500 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 520 node1/crm: Start migrating HA groups...
+noti 520 node1/crm: HA groups to rules config migration successful
+noti 520 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 520 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 520 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 520 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 520 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 520 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 520 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 540 node1/crm: Start migrating HA groups...
+noti 540 node1/crm: HA groups to rules config migration successful
+noti 540 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 540 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 540 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 540 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 540 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 540 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 540 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 560 node1/crm: Start migrating HA groups...
+noti 560 node1/crm: HA groups to rules config migration successful
+noti 560 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 560 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 560 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 560 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 560 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 560 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 560 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 580 node1/crm: Start migrating HA groups...
+noti 580 node1/crm: HA groups to rules config migration successful
+noti 580 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 580 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 580 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 580 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 580 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 580 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 580 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+noti 600 node1/crm: Start migrating HA groups...
+noti 600 node1/crm: HA groups to rules config migration successful
+noti 600 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 600 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 600 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 600 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 600 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 600 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 600 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0~2' to migrate HA groups
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-group-migrate1/manager_status b/src/test/test-group-migrate1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-group-migrate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-group-migrate1/service_config b/src/test/test-group-migrate1/service_config
new file mode 100644
index 0000000..a27551e
--- /dev/null
+++ b/src/test/test-group-migrate1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started", "group": "group1" },
+ "vm:102": { "node": "node2", "state": "started", "group": "group2" },
+ "vm:103": { "node": "node3", "state": "started", "group": "group2" }
+}
diff --git a/src/test/test-group-migrate2/README b/src/test/test-group-migrate2/README
new file mode 100644
index 0000000..8775b6c
--- /dev/null
+++ b/src/test/test-group-migrate2/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group will automatically migrate back
+to a node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is a group member and has higher priority than the other nodes
diff --git a/src/test/test-group-migrate2/cmdlist b/src/test/test-group-migrate2/cmdlist
new file mode 100644
index 0000000..3bfad44
--- /dev/null
+++ b/src/test/test-group-migrate2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"]
+]
diff --git a/src/test/test-group-migrate2/groups b/src/test/test-group-migrate2/groups
new file mode 100644
index 0000000..bad746c
--- /dev/null
+++ b/src/test/test-group-migrate2/groups
@@ -0,0 +1,7 @@
+group: group1
+ nodes node1
+ restricted 1
+
+group: group2
+ nodes node2:2,node3
+ nofailback 1
diff --git a/src/test/test-group-migrate2/hardware_status b/src/test/test-group-migrate2/hardware_status
new file mode 100644
index 0000000..5253589
--- /dev/null
+++ b/src/test/test-group-migrate2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "version": "9.0.0~2" },
+ "node2": { "power": "off", "network": "off", "version": "9.0.0~2" },
+ "node3": { "power": "off", "network": "off", "version": "9.0.0~2" }
+}
diff --git a/src/test/test-group-migrate2/log.expect b/src/test/test-group-migrate2/log.expect
new file mode 100644
index 0000000..8349f57
--- /dev/null
+++ b/src/test/test-group-migrate2/log.expect
@@ -0,0 +1,47 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+noti 20 node1/crm: Start migrating HA groups...
+noti 20 node1/crm: HA groups to rules config migration successful
+noti 20 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 20 node1/crm: Node 'node1' has pve-manager version '9.0.0~2'
+noti 20 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 20 node1/crm: Node 'node2' has pve-manager version '9.0.0~2'
+noti 20 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 20 node1/crm: Node 'node3' has pve-manager version '9.0.0~2'
+noti 20 node1/crm: HA groups to services config migration successful
+noti 20 node1/crm: HA groups config deletion successful
+noti 20 node1/crm: HA groups migration successful
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-group-migrate2/manager_status b/src/test/test-group-migrate2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-group-migrate2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-group-migrate2/service_config b/src/test/test-group-migrate2/service_config
new file mode 100644
index 0000000..a27551e
--- /dev/null
+++ b/src/test/test-group-migrate2/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started", "group": "group1" },
+ "vm:102": { "node": "node2", "state": "started", "group": "group2" },
+ "vm:103": { "node": "node3", "state": "started", "group": "group2" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH docs v3 1/1] ha: add documentation about ha rules and ha node affinity rules
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (15 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [RFC ha-manager v3 15/15] manager: persistently migrate ha groups to ha rules Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 1/3] api: ha: add ha rules api endpoints Daniel Kral
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Add documentation about HA Node Affinity rules and general documentation
what HA rules are for in a format that is extendable with other HA rule
types in the future.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
append to ha intro
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
Makefile | 2 +
gen-ha-rules-node-affinity-opts.pl | 20 ++++++
gen-ha-rules-opts.pl | 17 +++++
ha-manager.adoc | 103 +++++++++++++++++++++++++++++
ha-rules-node-affinity-opts.adoc | 18 +++++
ha-rules-opts.adoc | 12 ++++
pmxcfs.adoc | 1 +
7 files changed, 173 insertions(+)
create mode 100755 gen-ha-rules-node-affinity-opts.pl
create mode 100755 gen-ha-rules-opts.pl
create mode 100644 ha-rules-node-affinity-opts.adoc
create mode 100644 ha-rules-opts.adoc
diff --git a/Makefile b/Makefile
index f30d77a..c5e506e 100644
--- a/Makefile
+++ b/Makefile
@@ -49,6 +49,8 @@ GEN_DEB_SOURCES= \
GEN_SCRIPTS= \
gen-ha-groups-opts.pl \
gen-ha-resources-opts.pl \
+ gen-ha-rules-node-affinity-opts.pl \
+ gen-ha-rules-opts.pl \
gen-datacenter.cfg.5-opts.pl \
gen-pct.conf.5-opts.pl \
gen-pct-network-opts.pl \
diff --git a/gen-ha-rules-node-affinity-opts.pl b/gen-ha-rules-node-affinity-opts.pl
new file mode 100755
index 0000000..e2f07fa
--- /dev/null
+++ b/gen-ha-rules-node-affinity-opts.pl
@@ -0,0 +1,20 @@
+#!/usr/bin/perl
+
+use lib '.';
+use strict;
+use warnings;
+use PVE::RESTHandler;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
+
+my $private = PVE::HA::Rules::private();
+my $node_affinity_rule_props = PVE::HA::Rules::NodeAffinity::properties();
+my $properties = {
+ resources => $private->{propertyList}->{resources},
+ $node_affinity_rule_props->%*,
+};
+
+print PVE::RESTHandler::dump_properties($properties);
diff --git a/gen-ha-rules-opts.pl b/gen-ha-rules-opts.pl
new file mode 100755
index 0000000..66dd174
--- /dev/null
+++ b/gen-ha-rules-opts.pl
@@ -0,0 +1,17 @@
+#!/usr/bin/perl
+
+use lib '.';
+use strict;
+use warnings;
+use PVE::RESTHandler;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+
+my $private = PVE::HA::Rules::private();
+my $properties = $private->{propertyList};
+delete $properties->{type};
+delete $properties->{rule};
+
+print PVE::RESTHandler::dump_properties($properties);
diff --git a/ha-manager.adoc b/ha-manager.adoc
index 3d6fc4a..d316bb7 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -670,6 +670,109 @@ up online again to investigate the cause of failure and check if it runs
stably again. Setting the `nofailback` flag prevents the recovered services from
moving straight back to the fenced node.
+[[ha_manager_rules]]
+Rules
+~~~~~
+
+HA rules are used to put certain constraints on HA-managed resources, which are
+defined in the HA rules configuration file `/etc/pve/ha/rules.cfg`.
+
+----
+<type>: <rule>
+ resources <resources_list>
+ <property> <value>
+ ...
+----
+
+include::ha-rules-opts.adoc[]
+
+.Available HA Rule Types
+[width="100%",cols="1,3",options="header"]
+|===========================================================
+| HA Rule Type | Description
+| `node-affinity` | Places affinity from one or more HA resources to one or
+more nodes.
+|===========================================================
+
+[[ha_manager_node_affinity_rules]]
+Node Affinity Rules
+^^^^^^^^^^^^^^^^^^^
+
+NOTE: HA Node Affinity rules are equivalent to HA Groups and will replace them
+in an upcoming major release.
+
+By default, a HA resource is able to run on any cluster node, but a common
+requirement is that a HA resource should run on a specific node. That can be
+implemented by defining a HA node affinity rule to make the HA resource
+`vm:100` prefer the node `node1`:
+
+----
+# ha-manager rules add node-affinity ha-rule-vm100 --resources vm:100 --nodes node1
+----
+
+By default, node affinity rules are not strict, i.e., if there is none of the
+specified nodes available, the HA resource can also be moved to other nodes.
+If, on the other hand, a HA resource must be restricted to the specified nodes,
+then the node affinity rule must be set to be strict.
+
+In the previous example, the node affinity rule can be modified to restrict the
+resource `vm:100` to be only on `node1`:
+
+----
+# ha-manager rules set node-affinity ha-rule-vm100 --strict 1
+----
+
+For bigger clusters or specific use cases, it makes sense to define a more
+detailed failover behavior. For example, the resources `vm:200` and `ct:300`
+should run on `node1`. If `node1` becomes unavailable, the resources should be
+distributed on `node2` and `node3`. If `node2` and `node3` are also
+unavailable, the resources should run on `node4`.
+
+To implement this behavior in a node affinity rule, nodes can be paired with
+priorities to order the preference for nodes. If two or more nodes have the same
+priority, the resources can run on any of them. For the above example, `node1`
+gets the highest priority, `node2` and `node3` get the same priority, and at
+last `node4` gets the lowest priority, which can be omitted to default to `0`:
+
+----
+# ha-manager rules add node-affinity priority-cascade \
+ --resources vm:200,ct:300 --nodes "node1:2,node2:1,node3:1,node4"
+----
+
+The above commands create the following rules in the rules configuration file:
+
+.Node Affinity Rules Configuration Example (`/etc/pve/ha/rules.cfg`)
+----
+node-affinity: ha-rule-vm100
+ resources vm:100
+ nodes node1
+ strict 1
+
+node-affinity: priority-cascade
+ resources vm:200,ct:300
+ nodes node1:2,node2:1,node3:1,node4
+----
+
+Node Affinity Rule Properties
++++++++++++++++++++++++++++++
+
+include::ha-rules-node-affinity-opts.adoc[]
+
+[[ha_manager_rule_conflicts]]
+Rule Conflicts and Errors
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+HA rules can impose rather complex constraints on the HA resources. To ensure
+that a new or modified HA rule does not introduce uncertainty into the HA
+stack's CRS scheduler, HA rules are tested for feasibility before these are
+applied. If a rule does fail any of these tests, the rule is disabled until the
+conflicts and errors is resolved.
+
+Currently, HA rules are checked for the following feasibility tests:
+
+* A HA resource can only be referenced by a single HA node affinity rule in
+ total. If two or more HA node affinity rules specify the same HA resource,
+ these HA node affinity rules will be disabled.
[[ha_manager_fencing]]
Fencing
diff --git a/ha-rules-node-affinity-opts.adoc b/ha-rules-node-affinity-opts.adoc
new file mode 100644
index 0000000..852636c
--- /dev/null
+++ b/ha-rules-node-affinity-opts.adoc
@@ -0,0 +1,18 @@
+`nodes`: `<node>[:<pri>]{,<node>[:<pri>]}*` ::
+
+List of cluster node members, where a priority can be given to each node. A resource bound to a group will run on the available nodes with the highest priority. If there are more nodes in the highest priority class, the services will get distributed to those nodes. The priorities have a relative meaning only. The higher the number, the higher the priority.
+
+`resources`: `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
+`strict`: `<boolean>` ('default =' `0`)::
+
+Describes whether the node affinity rule is strict or non-strict.
++
+A non-strict node affinity rule makes resources prefer to be on the defined nodes.
+If none of the defined nodes are available, the resource may run on any other node.
++
+A strict node affinity rule makes resources be restricted to the defined nodes. If
+none of the defined nodes are available, the resource will be stopped.
+
diff --git a/ha-rules-opts.adoc b/ha-rules-opts.adoc
new file mode 100644
index 0000000..b50b289
--- /dev/null
+++ b/ha-rules-opts.adoc
@@ -0,0 +1,12 @@
+`comment`: `<string>` ::
+
+HA rule description.
+
+`disable`: `<boolean>` ('default =' `0`)::
+
+Whether the HA rule is disabled.
+
+`resources`: `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
diff --git a/pmxcfs.adoc b/pmxcfs.adoc
index f4aa847..8ca7284 100644
--- a/pmxcfs.adoc
+++ b/pmxcfs.adoc
@@ -104,6 +104,7 @@ Files
|`ha/crm_commands` | Displays HA operations that are currently being carried out by the CRM
|`ha/manager_status` | JSON-formatted information regarding HA services on the cluster
|`ha/resources.cfg` | Resources managed by high availability, and their current state
+|`ha/rules.cfg` | Rules putting constraints on the HA manager's scheduling of HA resources
|`nodes/<NAME>/config` | Node-specific configuration
|`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers
|`nodes/<NAME>/openvz/` | Prior to {pve} 4.0, used for container configuration data (deprecated, removed soon)
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH manager v3 1/3] api: ha: add ha rules api endpoints
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (16 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH docs v3 1/1] ha: add documentation about ha rules and ha node affinity rules Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 2/3] ui: ha: remove ha groups from ha resource components Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 3/3] ui: ha: show failback flag in resources status view Daniel Kral
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
PVE/API2/HAConfig.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/PVE/API2/HAConfig.pm b/PVE/API2/HAConfig.pm
index 35f49cbb..d29211fb 100644
--- a/PVE/API2/HAConfig.pm
+++ b/PVE/API2/HAConfig.pm
@@ -12,6 +12,7 @@ use PVE::JSONSchema qw(get_standard_option);
use PVE::Exception qw(raise_param_exc);
use PVE::API2::HA::Resources;
use PVE::API2::HA::Groups;
+use PVE::API2::HA::Rules;
use PVE::API2::HA::Status;
use base qw(PVE::RESTHandler);
@@ -26,6 +27,11 @@ __PACKAGE__->register_method({
path => 'groups',
});
+__PACKAGE__->register_method({
+ subclass => "PVE::API2::HA::Rules",
+ path => 'rules',
+});
+
__PACKAGE__->register_method({
subclass => "PVE::API2::HA::Status",
path => 'status',
@@ -57,7 +63,7 @@ __PACKAGE__->register_method({
my ($param) = @_;
my $res = [
- { id => 'status' }, { id => 'resources' }, { id => 'groups' },
+ { id => 'status' }, { id => 'resources' }, { id => 'groups' }, { id => 'rules' },
];
return $res;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH manager v3 2/3] ui: ha: remove ha groups from ha resource components
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (17 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 1/3] api: ha: add ha rules api endpoints Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 3/3] ui: ha: show failback flag in resources status view Daniel Kral
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
Remove the HA group column from the HA Resources grid view and the HA
group selector from the HA Resources edit window, as these will be
replaced by semantically equivalent HA node affinity rules in the next
patch.
Add the field 'failback' that is moved to the HA Resources config as
part of the migration from groups to node affinity rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
www/manager6/ha/ResourceEdit.js | 16 ++++++++++++----
www/manager6/ha/Resources.js | 17 -----------------
www/manager6/ha/StatusView.js | 1 -
3 files changed, 12 insertions(+), 22 deletions(-)
diff --git a/www/manager6/ha/ResourceEdit.js b/www/manager6/ha/ResourceEdit.js
index 1048ccca..428672a8 100644
--- a/www/manager6/ha/ResourceEdit.js
+++ b/www/manager6/ha/ResourceEdit.js
@@ -11,7 +11,7 @@ Ext.define('PVE.ha.VMResourceInputPanel', {
}
delete values.vmid;
- PVE.Utils.delete_if_default(values, 'group', '', me.isCreate);
+ PVE.Utils.delete_if_default(values, 'failback', '1', me.isCreate);
PVE.Utils.delete_if_default(values, 'max_restart', '1', me.isCreate);
PVE.Utils.delete_if_default(values, 'max_relocate', '1', me.isCreate);
@@ -110,9 +110,17 @@ Ext.define('PVE.ha.VMResourceInputPanel', {
me.column2 = [
{
- xtype: 'pveHAGroupSelector',
- name: 'group',
- fieldLabel: gettext('Group'),
+ xtype: 'proxmoxcheckbox',
+ name: 'failback',
+ fieldLabel: gettext('Failback'),
+ autoEl: {
+ tag: 'div',
+ 'data-qtip': gettext(
+ 'Enable if HA resource should automatically adjust to HA rules.',
+ ),
+ },
+ uncheckedValue: 0,
+ value: 1,
},
{
xtype: 'proxmoxKVComboBox',
diff --git a/www/manager6/ha/Resources.js b/www/manager6/ha/Resources.js
index e8e53b3b..097097dc 100644
--- a/www/manager6/ha/Resources.js
+++ b/www/manager6/ha/Resources.js
@@ -136,23 +136,6 @@ Ext.define('PVE.ha.ResourcesView', {
renderer: (v) => (v === undefined ? '1' : v),
dataIndex: 'max_relocate',
},
- {
- header: gettext('Group'),
- width: 200,
- sortable: true,
- renderer: function (value, metaData, { data }) {
- if (data.errors && data.errors.group) {
- metaData.tdCls = 'proxmox-invalid-row';
- let html = Ext.htmlEncode(
- `<p>${Ext.htmlEncode(data.errors.group)}</p>`,
- );
- metaData.tdAttr =
- 'data-qwidth=600 data-qtitle="ERROR" data-qtip="' + html + '"';
- }
- return value;
- },
- dataIndex: 'group',
- },
{
header: gettext('Description'),
flex: 1,
diff --git a/www/manager6/ha/StatusView.js b/www/manager6/ha/StatusView.js
index 3e3205a5..a3ca9fdf 100644
--- a/www/manager6/ha/StatusView.js
+++ b/www/manager6/ha/StatusView.js
@@ -78,7 +78,6 @@ Ext.define(
'status',
'sid',
'state',
- 'group',
'comment',
'max_restart',
'max_relocate',
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH manager v3 3/3] ui: ha: show failback flag in resources status view
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
` (18 preceding siblings ...)
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 2/3] ui: ha: remove ha groups from ha resource components Daniel Kral
@ 2025-07-04 18:16 ` Daniel Kral
19 siblings, 0 replies; 21+ messages in thread
From: Daniel Kral @ 2025-07-04 18:16 UTC (permalink / raw)
To: pve-devel
As the HA groups' failback flag is now being part of the HA resources
config, it should also be shown there instead of the previous HA groups
view.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
www/manager6/ha/Resources.js | 6 ++++++
www/manager6/ha/StatusView.js | 4 ++++
2 files changed, 10 insertions(+)
diff --git a/www/manager6/ha/Resources.js b/www/manager6/ha/Resources.js
index 097097dc..65897bed 100644
--- a/www/manager6/ha/Resources.js
+++ b/www/manager6/ha/Resources.js
@@ -136,6 +136,12 @@ Ext.define('PVE.ha.ResourcesView', {
renderer: (v) => (v === undefined ? '1' : v),
dataIndex: 'max_relocate',
},
+ {
+ header: gettext('Failback'),
+ width: 100,
+ sortable: true,
+ dataIndex: 'failback',
+ },
{
header: gettext('Description'),
flex: 1,
diff --git a/www/manager6/ha/StatusView.js b/www/manager6/ha/StatusView.js
index a3ca9fdf..50ad8e84 100644
--- a/www/manager6/ha/StatusView.js
+++ b/www/manager6/ha/StatusView.js
@@ -79,6 +79,10 @@ Ext.define(
'sid',
'state',
'comment',
+ {
+ name: 'failback',
+ type: 'boolean',
+ },
'max_restart',
'max_relocate',
'type',
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-07-09 9:04 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-04 18:16 [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH cluster v3 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 01/15] tree-wide: make arguments for select_service_node explicit Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 02/15] manager: improve signature of select_service_node Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 03/15] introduce rules base plugin Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 04/15] rules: introduce node affinity rule plugin Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 05/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 06/15] config: delete services from rules if services are deleted from config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 07/15] manager: read and update rules config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 08/15] test: ha tester: add test cases for future node affinity rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 09/15] resources: introduce failback property in ha resource config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 10/15] manager: migrate ha groups to node affinity rules in-memory Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 11/15] manager: apply node affinity rules when selecting service nodes Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 12/15] test: add test cases for rules config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 13/15] api: introduce ha rules api endpoints Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 14/15] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
2025-07-04 18:16 ` [pve-devel] [RFC ha-manager v3 15/15] manager: persistently migrate ha groups to ha rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH docs v3 1/1] ha: add documentation about ha rules and ha node affinity rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 1/3] api: ha: add ha rules api endpoints Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 2/3] ui: ha: remove ha groups from ha resource components Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 3/3] ui: ha: show failback flag in resources status view Daniel Kral
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox