* [pve-devel] [PATCH ha-manager v4 01/19] tree-wide: make arguments for select_service_node explicit
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 02/19] manager: improve signature of select_service_node Daniel Kral
` (24 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Explicitly state all the parameters at all call sites for
select_service_node(...) to clarify in which states these are.
The call site in next_state_recovery(...) sets $best_scored to 1, as it
should find the next best node when recovering from the failed node
$current_node. All references to $best_scored in select_service_node()
are there to check whether $current_node can be selected, but as
$current_node is not available anyway, so this change should not change
the result of select_service_node(...).
Otherwise, $sd->{failed_nodes} and $sd->{maintenance_node} should
contain only the failed $current_node in next_state_recovery(...), and
therefore both can be passed as these should be impossible states here
anyway. A cleaner way could be to explicitly remove them beforehand or
do extra checks in select_service_node(...).
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 11 ++++++++++-
src/test/test_failover1.pl | 15 ++++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 12292e67..85f2b1ab 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -971,6 +971,7 @@ sub next_state_started {
$try_next,
$sd->{failed_nodes},
$sd->{maintenance_node},
+ 0, # best_score
);
if ($node && ($sd->{node} ne $node)) {
@@ -1083,7 +1084,15 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
my $recovery_node = select_service_node(
- $self->{groups}, $self->{online_node_usage}, $sid, $cd, $sd->{node},
+ $self->{groups},
+ $self->{online_node_usage},
+ $sid,
+ $cd,
+ $sd->{node},
+ 0, # try_next
+ $sd->{failed_nodes},
+ $sd->{maintenance_node},
+ 1, # best_score
);
if ($recovery_node) {
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 371bdcfb..2478b2bc 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -24,13 +24,26 @@ my $service_conf = {
group => 'prefer_node1',
};
+my $sd = {
+ failed_nodes => undef,
+ maintenance_node => undef,
+};
+
my $current_node = $service_conf->{node};
sub test {
my ($expected_node, $try_next) = @_;
my $node = PVE::HA::Manager::select_service_node(
- $groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next,
+ $groups,
+ $online_node_usage,
+ "vm:111",
+ $service_conf,
+ $current_node,
+ $try_next,
+ $sd->{failed_nodes},
+ $sd->{maintenance_node},
+ 0, # best_score
);
my (undef, undef, $line) = caller();
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 02/19] manager: improve signature of select_service_node
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 01/19] tree-wide: make arguments for select_service_node explicit Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 03/19] introduce rules base plugin Daniel Kral
` (23 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
As the signature of select_service_node(...) has become rather long
already, make it more compact by retrieving service- and
affinity-related data directly from the service state in $sd and
introduce a $node_preference parameter to distinguish the behaviors of
$try_next and $best_scored, which have already been mutually exclusive
before.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 79 +++++++++++++++++++++-----------------
src/test/test_failover1.pl | 17 +++-----
2 files changed, 49 insertions(+), 47 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 85f2b1ab..c57a280c 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -149,18 +149,37 @@ sub get_node_priority_groups {
return ($pri_groups, $group_members);
}
+=head3 select_service_node(...)
+
+=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference)
+
+Used to select the best fitting node for the service C<$sid>, with the
+configuration C<$service_conf> and state C<$sd>, according to the groups defined
+in C<$groups>, available node utilization in C<$online_node_usage>, and the
+given C<$node_preference>.
+
+The C<$node_preference> can be set to:
+
+=over
+
+=item C<'none'>: Try to stay on the current node as much as possible.
+
+=item C<'best-score'>: Try to select the best-scored node.
+
+=item C<'try-next'>: Try to select the best-scored node, which is not in C<< $sd->{failed_nodes} >>.
+
+=back
+
+=cut
+
sub select_service_node {
- my (
- $groups,
- $online_node_usage,
- $sid,
- $service_conf,
- $current_node,
- $try_next,
- $tried_nodes,
- $maintenance_fallback,
- $best_scored,
- ) = @_;
+ my ($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference) = @_;
+
+ die "'$node_preference' is not a valid node_preference for select_service_node\n"
+ if $node_preference !~ m/(none|best-score|try-next)/;
+
+ my ($current_node, $tried_nodes, $maintenance_fallback) =
+ $sd->@{qw(node failed_nodes maintenance_node)};
my $group = get_service_group($groups, $online_node_usage, $service_conf);
@@ -171,7 +190,7 @@ sub select_service_node {
# stay on current node if possible (avoids random migrations)
if (
- (!$try_next && !$best_scored)
+ $node_preference eq 'none'
&& $group->{nofailback}
&& defined($group_members->{$current_node})
) {
@@ -183,7 +202,7 @@ sub select_service_node {
my $top_pri = $pri_list[0];
# try to avoid nodes where the service failed already if we want to relocate
- if ($try_next) {
+ if ($node_preference eq 'try-next') {
foreach my $node (@$tried_nodes) {
delete $pri_groups->{$top_pri}->{$node};
}
@@ -192,8 +211,7 @@ sub select_service_node {
return $maintenance_fallback
if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
- return $current_node
- if (!$try_next && !$best_scored) && $pri_groups->{$top_pri}->{$current_node};
+ return $current_node if $node_preference eq 'none' && $pri_groups->{$top_pri}->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
@@ -208,8 +226,8 @@ sub select_service_node {
}
}
- if ($try_next) {
- if (!$best_scored && defined($found) && ($found < (scalar(@nodes) - 1))) {
+ if ($node_preference eq 'try-next') {
+ if (defined($found) && ($found < (scalar(@nodes) - 1))) {
return $nodes[$found + 1];
} else {
return $nodes[0];
@@ -797,11 +815,8 @@ sub next_state_request_start {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- 0, # try_next
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 1, # best_score
+ $sd,
+ 'best-score',
);
my $select_text = $selected_node ne $current_node ? 'new' : 'current';
$haenv->log(
@@ -901,7 +916,7 @@ sub next_state_started {
} else {
- my $try_next = 0;
+ my $select_node_preference = 'none';
if ($lrm_res) {
@@ -932,7 +947,7 @@ sub next_state_started {
if (scalar(@{ $sd->{failed_nodes} }) <= $cd->{max_relocate}) {
# tell select_service_node to relocate if possible
- $try_next = 1;
+ $select_node_preference = 'try-next';
$haenv->log(
'warning',
@@ -967,11 +982,8 @@ sub next_state_started {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- $try_next,
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 0, # best_score
+ $sd,
+ $select_node_preference,
);
if ($node && ($sd->{node} ne $node)) {
@@ -1009,7 +1021,7 @@ sub next_state_started {
);
}
} else {
- if ($try_next && !defined($node)) {
+ if ($select_node_preference eq 'try-next' && !defined($node)) {
$haenv->log(
'warning',
"Start Error Recovery: Tried all available nodes for service '$sid', retry"
@@ -1088,11 +1100,8 @@ sub next_state_recovery {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- 0, # try_next
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 1, # best_score
+ $sd,
+ 'best-score',
);
if ($recovery_node) {
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 2478b2bc..29b56c68 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -25,32 +25,25 @@ my $service_conf = {
};
my $sd = {
+ node => $service_conf->{node},
failed_nodes => undef,
maintenance_node => undef,
};
-my $current_node = $service_conf->{node};
-
sub test {
my ($expected_node, $try_next) = @_;
+ my $select_node_preference = $try_next ? 'try-next' : 'none';
+
my $node = PVE::HA::Manager::select_service_node(
- $groups,
- $online_node_usage,
- "vm:111",
- $service_conf,
- $current_node,
- $try_next,
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 0, # best_score
+ $groups, $online_node_usage, "vm:111", $service_conf, $sd, $select_node_preference,
);
my (undef, undef, $line) = caller();
die "unexpected result: $node != ${expected_node} at line $line\n"
if $node ne $expected_node;
- $current_node = $node;
+ $sd->{node} = $node;
}
test('node1');
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 03/19] introduce rules base plugin
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 01/19] tree-wide: make arguments for select_service_node explicit Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 02/19] manager: improve signature of select_service_node Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 04/19] rules: introduce node affinity rule plugin Daniel Kral
` (22 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Add a rules base plugin to allow users to specify different kinds of HA
rules in a single configuration file, which put constraints on the HA
Manager's behavior.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 2 +-
src/PVE/HA/Rules.pm | 430 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Tools.pm | 22 ++
4 files changed, 454 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Rules.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 0ffbd8dd..9bbd375d 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -32,6 +32,7 @@
/usr/share/perl5/PVE/HA/Resources.pm
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
+/usr/share/perl5/PVE/HA/Rules.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 8c91b97b..489cbc05 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -1,4 +1,4 @@
-SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
+SIM_SOURCES=CRM.pm Env.pm Groups.pm Rules.pm Resources.pm LRM.pm Manager.pm \
NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
SOURCES=${SIM_SOURCES} Config.pm
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
new file mode 100644
index 00000000..d786669c
--- /dev/null
+++ b/src/PVE/HA/Rules.pm
@@ -0,0 +1,430 @@
+package PVE::HA::Rules;
+
+use strict;
+use warnings;
+
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::Tools;
+
+use PVE::HA::Tools;
+
+use base qw(PVE::SectionConfig);
+
+=head1 NAME
+
+PVE::HA::Rules - Base Plugin for HA Rules
+
+=head1 SYNOPSIS
+
+ use base qw(PVE::HA::Rules);
+
+=head1 DESCRIPTION
+
+This package provides the capability to have different types of rules in the
+same config file, which put constraints or other rules on the HA Manager's
+behavior for how it handles HA resources handling.
+
+Since rules can interfere with each other, i.e., rules can make other rules
+invalid or infeasible, this package also provides the capability to check for
+the feasibility between rules of the same type and and between rules of
+different types, and prune the rule set in such a way, that it becomes feasible
+again, while minimizing the amount of rules that need to be pruned.
+
+This packages inherits its config-related methods from C<L<PVE::SectionConfig>>
+and therefore rule plugins need to implement methods from there as well.
+
+=head1 USAGE
+
+Each I<rule plugin> is required to implement the methods C<L<type()>>,
+C<L<properties()>>, and C<L<options>> from the C<L<PVE::SectionConfig>> to
+extend the properties of this I<base plugin> with plugin-specific properties.
+
+=head2 REGISTERING CHECKS
+
+In order to C<L<< register checks|/$class->register_check(...) >>> for a rule
+plugin, the plugin can override the
+C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
+method, which allows the plugin's checkers to pass plugin-specific data, usually
+subsets of specific rules, which are relevant to the checks.
+
+The following example shows a plugin's implementation of its
+C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
+and a trivial check, which will render all rules defining a comment erroneous,
+and blames these errors on the I<comment> property:
+
+ sub get_plugin_check_arguments {
+ my ($class, $rules) = @_;
+
+ my @ruleids = sort {
+ $rules->{order}->{$a} <=> $rules->{order}->{$b}
+ } keys %{$rules->{ids}};
+
+ my $result = {
+ custom_rules => {},
+ };
+
+ for my $ruleid (@ruleids) {
+ my $rule = $rules->{ids}->{$ruleid};
+
+ $result->{custom_rules}->{$ruleid} = $rule if defined($rule->{comment});
+ }
+
+ return $result;
+ }
+
+ __PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return [ sort keys $args->{custom_rules}->%* ];
+ },
+ sub {
+ my ($ruleids, $errors) = @_;
+
+ for my $ruleid (@$ruleids) {
+ push @{$errors->{$ruleid}->{comment}},
+ "rule is ineffective, because I said so.";
+ }
+ }
+ );
+
+=head1 METHODS
+
+=cut
+
+my $defaultData = {
+ propertyList => {
+ type => {
+ description => "HA rule type.",
+ },
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ {
+ completion => \&PVE::HA::Tools::complete_rule,
+ optional => 0,
+ },
+ ),
+ disable => {
+ description => 'Whether the HA rule is disabled.',
+ type => 'boolean',
+ optional => 1,
+ },
+ comment => {
+ description => "HA rule description.",
+ type => 'string',
+ maxLength => 4096,
+ optional => 1,
+ },
+ },
+};
+
+sub private {
+ return $defaultData;
+}
+
+=head3 $class->decode_plugin_value(...)
+
+=head3 $class->decode_plugin_value($type, $key, $value)
+
+B<OPTIONAL:> Can be implemented in a I<rule plugin>.
+
+Called during base plugin's C<decode_value(...)> in order to extend the
+deserialization for plugin-specific values which need it (e.g. lists).
+
+If it is not overrridden by the I<rule plugin>, then it does nothing to
+C<$value> by default.
+
+=cut
+
+sub decode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ return $value;
+}
+
+sub decode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::decode_text($value);
+ }
+
+ my $plugin = $class->lookup($type);
+ return $plugin->decode_plugin_value($type, $key, $value);
+}
+
+=head3 $class->encode_plugin_value(...)
+
+=head3 $class->encode_plugin_value($type, $key, $value)
+
+B<OPTIONAL:> Can be implemented in a I<rule plugin>.
+
+Called during base plugin's C<encode_value(...)> in order to extend the
+serialization for plugin-specific values which need it (e.g. lists).
+
+If it is not overrridden by the I<rule plugin>, then it does nothing to
+C<$value> by default.
+
+=cut
+
+sub encode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ return $value;
+}
+
+sub encode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::encode_text($value);
+ }
+
+ my $plugin = $class->lookup($type);
+ return $plugin->encode_plugin_value($type, $key, $value);
+}
+
+sub parse_section_header {
+ my ($class, $line) = @_;
+
+ if ($line =~ m/^(\S+):\s*(\S+)\s*$/) {
+ my ($type, $ruleid) = (lc($1), $2);
+ my $errmsg = undef; # set if you want to skip whole section
+ eval { PVE::JSONSchema::pve_verify_configid($ruleid); };
+ $errmsg = $@ if $@;
+ my $config = {}; # to return additional attributes
+ return ($type, $ruleid, $errmsg, $config);
+ }
+ return undef;
+}
+
+# General rule helpers
+
+=head3 $class->set_rule_defaults($rule)
+
+Sets the optional properties in the C<$rule>, which have default values, but
+haven't been explicitly set yet.
+
+=cut
+
+sub set_rule_defaults : prototype($$) {
+ my ($class, $rule) = @_;
+
+ if (my $plugin = $class->lookup($rule->{type})) {
+ my $properties = $plugin->properties();
+
+ for my $prop (keys %$properties) {
+ next if defined($rule->{$prop});
+ next if !$properties->{$prop}->{default};
+ next if !$properties->{$prop}->{optional};
+
+ $rule->{$prop} = $properties->{$prop}->{default};
+ }
+ }
+}
+
+# Rule checks definition and methods
+
+my $types = [];
+my $checkdef;
+
+sub register {
+ my ($class) = @_;
+
+ $class->SUPER::register($class);
+
+ # store order in which plugin types are registered
+ push @$types, $class->type();
+}
+
+=head3 $class->register_check(...)
+
+=head3 $class->register_check($check_func, $collect_errors_func)
+
+Used to register rule checks for a rule plugin.
+
+=cut
+
+sub register_check : prototype($$$) {
+ my ($class, $check_func, $collect_errors_func) = @_;
+
+ my $type = eval { $class->type() };
+ $type = 'global' if $@; # check registered here in the base plugin
+
+ push @{ $checkdef->{$type} }, [
+ $check_func, $collect_errors_func,
+ ];
+}
+
+=head3 $class->get_plugin_check_arguments(...)
+
+=head3 $class->get_plugin_check_arguments($rules)
+
+B<OPTIONAL:> Can be implemented in the I<rule plugin>.
+
+Returns a hash, usually subsets of rules relevant to the plugin, which are
+passed to the plugin's C<L<< registered checks|/$class->register_check(...) >>>
+so that the creation of these can be shared inbetween rule check
+implementations.
+
+=cut
+
+sub get_plugin_check_arguments : prototype($$) {
+ my ($class, $rules) = @_;
+
+ return {};
+}
+
+=head3 $class->get_check_arguments(...)
+
+=head3 $class->get_check_arguments($rules)
+
+Returns the union of the plugin's check argument hashes, which are passed to the
+plugin's C<L<< registered checks|/$class->register_check(...) >>> so that the
+creation of these can be shared inbetween rule check implementations.
+
+=cut
+
+sub get_check_arguments : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $global_args = {};
+
+ for my $type (@$types) {
+ my $plugin = $class->lookup($type);
+ my $plugin_args = eval { $plugin->get_plugin_check_arguments($rules) };
+ next if $@; # plugin doesn't implement get_plugin_check_arguments(...)
+
+ $global_args = { $global_args->%*, $plugin_args->%* };
+ }
+
+ return $global_args;
+}
+
+=head3 $class->check_feasibility($rules)
+
+Checks whether the given C<$rules> are feasible by running all checks, which
+were registered with C<L<< register_check()|/$class->register_check(...) >>>,
+and returns a hash map of errorneous rules.
+
+The checks are run in the order in which the rule plugins were registered,
+while global checks, i.e. checks between different rule types, are run at the
+very last.
+
+=cut
+
+sub check_feasibility : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $global_errors = {};
+ my $removable_ruleids = [];
+
+ my $global_args = $class->get_check_arguments($rules);
+
+ for my $type (@$types, 'global') {
+ for my $entry (@{ $checkdef->{$type} }) {
+ my ($check, $collect_errors) = @$entry;
+
+ my $errors = $check->($global_args);
+ $collect_errors->($errors, $global_errors);
+ }
+ }
+
+ return $global_errors;
+}
+
+=head3 $class->canonicalize($rules)
+
+Modifies C<$rules> to contain only feasible rules.
+
+This is done by running all checks, which were registered with
+C<L<< register_check()|/$class->register_check(...) >>> and removing any
+rule, which makes the rule set infeasible.
+
+Returns a list of messages with the reasons why rules were removed.
+
+=cut
+
+sub canonicalize : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $messages = [];
+ my $global_errors = $class->check_feasibility($rules);
+
+ for my $ruleid (keys %$global_errors) {
+ delete $rules->{ids}->{$ruleid};
+ delete $rules->{order}->{$ruleid};
+ }
+
+ for my $ruleid (sort keys %$global_errors) {
+ for my $opt (sort keys %{ $global_errors->{$ruleid} }) {
+ for my $message (@{ $global_errors->{$ruleid}->{$opt} }) {
+ push @$messages, "Drop rule '$ruleid', because $message.\n";
+ }
+ }
+ }
+
+ return $messages;
+}
+
+=head1 FUNCTIONS
+
+=cut
+
+=head3 foreach_rule(...)
+
+=head3 foreach_rule($rules, $func [, $opts])
+
+Filters the given C<$rules> according to the C<$opts> and loops over the
+resulting rules in the order as defined in the section config and executes
+C<$func> with the parameters C<L<< ($rule, $ruleid) >>>.
+
+The filter properties for C<$opts> are:
+
+=over
+
+=item C<$type>: Limits C<$rules> to those which are of rule type C<$type>.
+
+=item C<$exclude_disabled_rules>: Limits C<$rules> to those which are enabled.
+
+=back
+
+=cut
+
+sub foreach_rule : prototype($$;$) {
+ my ($rules, $func, $opts) = @_;
+
+ my $type = $opts->{type};
+ my $exclude_disabled_rules = $opts->{exclude_disabled_rules};
+
+ my @ruleids = sort {
+ $rules->{order}->{$a} <=> $rules->{order}->{$b}
+ } keys %{ $rules->{ids} };
+
+ for my $ruleid (@ruleids) {
+ my $rule = $rules->{ids}->{$ruleid};
+
+ next if !$rule; # skip invalid rules
+ next if defined($type) && $rule->{type} ne $type;
+ next if $exclude_disabled_rules && exists($rule->{disable});
+
+ $func->($rule, $ruleid);
+ }
+}
+
+=head3 get_next_ordinal($rules)
+
+Returns the next available ordinal number in the C<$rules> order hash that can
+be used a newly introduced rule afterwards.
+
+=cut
+
+sub get_next_ordinal : prototype($) {
+ my ($rules) = @_;
+
+ my $current_order = (sort { $a <=> $b } values %{ $rules->{order} })[0] || 0;
+
+ return $current_order + 1;
+}
+
+1;
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index a01ac38f..767659ff 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -112,6 +112,15 @@ PVE::JSONSchema::register_standard_option(
},
);
+PVE::JSONSchema::register_standard_option(
+ 'pve-ha-rule-id',
+ {
+ description => "HA rule identifier.",
+ type => 'string',
+ format => 'pve-configid',
+ },
+);
+
sub read_json_from_file {
my ($filename, $default) = @_;
@@ -292,4 +301,17 @@ sub complete_group {
return $res;
}
+sub complete_rule {
+ my ($cmd, $pname, $cur) = @_;
+
+ my $cfg = PVE::HA::Config::read_rules_config();
+
+ my $res = [];
+ foreach my $rule (keys %{ $cfg->{ids} }) {
+ push @$res, $rule;
+ }
+
+ return $res;
+}
+
1;
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 04/19] rules: introduce node affinity rule plugin
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (2 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 03/19] introduce rules base plugin Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 05/19] config, env, hw: add rules read and parse methods Daniel Kral
` (21 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Introduce the node affinity rule plugin to allow users to specify node
affinity constraints for independent HA resources.
Node affinity rules must specify one or more HA resources, one or more
nodes with optional priorities (the default is 0), and a strictness,
which is either
* 0 (non-strict): HA resources SHOULD be on one of the rules' nodes, or
* 1 (strict): HA resources MUST be on one of the rules' nodes, or
The initial implementation restricts node affinity rules to only specify
a single HA resource once across all node affinity rules, else these
node affinity rules will not be applied.
This makes node affinity rules structurally equivalent to HA groups with
the exception of the "failback" option, which will be moved to the HA
resource config in an upcoming patch.
The HA resources property is added to the rules base plugin as it will
also planned to be used by other rule plugins, e.g., the resource
affinity rule plugin.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 1 +
src/PVE/HA/Rules.pm | 29 ++++-
src/PVE/HA/Rules/Makefile | 6 +
src/PVE/HA/Rules/NodeAffinity.pm | 213 +++++++++++++++++++++++++++++++
src/PVE/HA/Tools.pm | 24 ++++
6 files changed, 272 insertions(+), 2 deletions(-)
create mode 100644 src/PVE/HA/Rules/Makefile
create mode 100644 src/PVE/HA/Rules/NodeAffinity.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 9bbd375d..7462663b 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,6 +33,7 @@
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
/usr/share/perl5/PVE/HA/Rules.pm
+/usr/share/perl5/PVE/HA/Rules/NodeAffinity.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 489cbc05..e386cbfc 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -8,6 +8,7 @@ install:
install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
make -C Resources install
+ make -C Rules install
make -C Usage install
make -C Env install
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
index d786669c..bda0b5d1 100644
--- a/src/PVE/HA/Rules.pm
+++ b/src/PVE/HA/Rules.pm
@@ -109,6 +109,13 @@ my $defaultData = {
type => 'boolean',
optional => 1,
},
+ resources => get_standard_option(
+ 'pve-ha-resource-id-list',
+ {
+ completion => \&PVE::HA::Tools::complete_sid,
+ optional => 0,
+ },
+ ),
comment => {
description => "HA rule description.",
type => 'string',
@@ -145,7 +152,17 @@ sub decode_plugin_value {
sub decode_value {
my ($class, $type, $key, $value) = @_;
- if ($key eq 'comment') {
+ if ($key eq 'resources') {
+ my $res = {};
+
+ for my $sid (PVE::Tools::split_list($value)) {
+ if (PVE::HA::Tools::pve_verify_ha_resource_id($sid)) {
+ $res->{$sid} = 1;
+ }
+ }
+
+ return $res;
+ } elsif ($key eq 'comment') {
return PVE::Tools::decode_text($value);
}
@@ -176,7 +193,11 @@ sub encode_plugin_value {
sub encode_value {
my ($class, $type, $key, $value) = @_;
- if ($key eq 'comment') {
+ if ($key eq 'resources') {
+ PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
+
+ return join(',', sort keys %$value);
+ } elsif ($key eq 'comment') {
return PVE::Tools::encode_text($value);
}
@@ -383,6 +404,8 @@ The filter properties for C<$opts> are:
=over
+=item C<$sid>: Limits C<$rules> to those which contain the given resource C<$sid>.
+
=item C<$type>: Limits C<$rules> to those which are of rule type C<$type>.
=item C<$exclude_disabled_rules>: Limits C<$rules> to those which are enabled.
@@ -394,6 +417,7 @@ The filter properties for C<$opts> are:
sub foreach_rule : prototype($$;$) {
my ($rules, $func, $opts) = @_;
+ my $sid = $opts->{sid};
my $type = $opts->{type};
my $exclude_disabled_rules = $opts->{exclude_disabled_rules};
@@ -405,6 +429,7 @@ sub foreach_rule : prototype($$;$) {
my $rule = $rules->{ids}->{$ruleid};
next if !$rule; # skip invalid rules
+ next if defined($sid) && !defined($rule->{resources}->{$sid});
next if defined($type) && $rule->{type} ne $type;
next if $exclude_disabled_rules && exists($rule->{disable});
diff --git a/src/PVE/HA/Rules/Makefile b/src/PVE/HA/Rules/Makefile
new file mode 100644
index 00000000..dfef257d
--- /dev/null
+++ b/src/PVE/HA/Rules/Makefile
@@ -0,0 +1,6 @@
+SOURCES=NodeAffinity.pm
+
+.PHONY: install
+install:
+ install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Rules
+ for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Rules/$$i; done
diff --git a/src/PVE/HA/Rules/NodeAffinity.pm b/src/PVE/HA/Rules/NodeAffinity.pm
new file mode 100644
index 00000000..2b3d7390
--- /dev/null
+++ b/src/PVE/HA/Rules/NodeAffinity.pm
@@ -0,0 +1,213 @@
+package PVE::HA::Rules::NodeAffinity;
+
+use strict;
+use warnings;
+
+use Storable qw(dclone);
+
+use PVE::Cluster;
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::Tools;
+
+use PVE::HA::Rules;
+use PVE::HA::Tools;
+
+use base qw(PVE::HA::Rules);
+
+=head1 NAME
+
+PVE::HA::Rules::NodeAffinity
+
+=head1 DESCRIPTION
+
+This package provides the capability to specify and apply rules, which put
+affinity constraints between a set of HA resources and a set of nodes.
+
+HA Node Affinity rules can be either C<'non-strict'> or C<'strict'>:
+
+=over
+
+=item C<'non-strict'>
+
+Non-strict node affinity rules SHOULD be applied if possible.
+
+That is, HA resources SHOULD prefer to be on the defined nodes, but may fall
+back to other nodes, if none of the defined nodes are available.
+
+=item C<'strict'>
+
+Strict node affinity rules MUST be applied.
+
+That is, HA resources MUST prefer to be on the defined nodes. In other words,
+these HA resources are restricted to the defined nodes and may not run on any
+other node.
+
+=back
+
+=cut
+
+sub type {
+ return 'node-affinity';
+}
+
+sub properties {
+ return {
+ nodes => get_standard_option(
+ 'pve-ha-group-node-list',
+ {
+ completion => \&PVE::Cluster::get_nodelist,
+ optional => 0,
+ },
+ ),
+ strict => {
+ description => "Describes whether the node affinity rule is strict or non-strict.",
+ verbose_description => <<EODESC,
+Describes whether the node affinity rule is strict or non-strict.
+
+A non-strict node affinity rule makes resources prefer to be on the defined nodes.
+If none of the defined nodes are available, the resource may run on any other node.
+
+A strict node affinity rule makes resources be restricted to the defined nodes. If
+none of the defined nodes are available, the resource will be stopped.
+EODESC
+ type => 'boolean',
+ optional => 1,
+ default => 0,
+ },
+ };
+}
+
+sub options {
+ return {
+ resources => { optional => 0 },
+ nodes => { optional => 0 },
+ strict => { optional => 1 },
+ disable => { optional => 1 },
+ comment => { optional => 1 },
+ };
+}
+
+sub decode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'nodes') {
+ my $res = {};
+
+ for my $node (PVE::Tools::split_list($value)) {
+ if (my ($node, $priority) = PVE::HA::Tools::parse_node_priority($node, 1)) {
+ $res->{$node} = {
+ priority => $priority,
+ };
+ }
+ }
+
+ return $res;
+ }
+
+ return $value;
+}
+
+sub encode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'nodes') {
+ my $res = [];
+
+ for my $node (sort keys %$value) {
+ my $priority = $value->{$node}->{priority};
+
+ if ($priority) {
+ push @$res, "$node:$priority";
+ } else {
+ push @$res, "$node";
+ }
+ }
+
+ return join(',', @$res);
+ }
+
+ return $value;
+}
+
+sub get_plugin_check_arguments {
+ my ($self, $rules) = @_;
+
+ my $result = {
+ node_affinity_rules => {},
+ };
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ $result->{node_affinity_rules}->{$ruleid} = $rule;
+ },
+ {
+ type => 'node-affinity',
+ exclude_disabled_rules => 1,
+ },
+ );
+
+ return $result;
+}
+
+=head1 NODE AFFINITY RULE CHECKERS
+
+=cut
+
+=head3 check_single_resource_reference($node_affinity_rules)
+
+Returns all in C<$node_affinity_rules> as a list of lists, each consisting of
+the node affinity id and the resource id, where at least one resource is shared
+between them.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_single_resource_reference {
+ my ($node_affinity_rules) = @_;
+
+ my @conflicts = ();
+ my $resource_ruleids = {};
+
+ while (my ($ruleid, $rule) = each %$node_affinity_rules) {
+ for my $sid (keys %{ $rule->{resources} }) {
+ push @{ $resource_ruleids->{$sid} }, $ruleid;
+ }
+ }
+
+ for my $sid (keys %$resource_ruleids) {
+ my $ruleids = $resource_ruleids->{$sid};
+
+ next if @$ruleids < 2;
+
+ for my $ruleid (@$ruleids) {
+ push @conflicts, [$ruleid, $sid];
+ }
+ }
+
+ @conflicts = sort { $a->[0] cmp $b->[0] } @conflicts;
+ return \@conflicts;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_single_resource_reference($args->{node_affinity_rules});
+ },
+ sub {
+ my ($conflicts, $errors) = @_;
+
+ for my $conflict (@$conflicts) {
+ my ($ruleid, $sid) = @$conflict;
+
+ push @{ $errors->{$ruleid}->{resources} },
+ "resource '$sid' is already used in another node affinity rule";
+ }
+ },
+);
+
+1;
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index 767659ff..549cbe14 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -51,6 +51,18 @@ PVE::JSONSchema::register_standard_option(
},
);
+PVE::JSONSchema::register_standard_option(
+ 'pve-ha-resource-id-list',
+ {
+ description =>
+ "List of HA resource IDs. This consists of a list of resource types followed"
+ . " by a resource specific name separated with a colon (example: vm:100,ct:101).",
+ typetext => "<type>:<name>{,<type>:<name>}*",
+ type => 'string',
+ format => 'pve-ha-resource-id-list',
+ },
+);
+
PVE::JSONSchema::register_format('pve-ha-resource-or-vm-id', \&pve_verify_ha_resource_or_vm_id);
sub pve_verify_ha_resource_or_vm_id {
@@ -103,6 +115,18 @@ PVE::JSONSchema::register_standard_option(
},
);
+sub parse_node_priority {
+ my ($value, $noerr) = @_;
+
+ if ($value =~ m/^([a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?)(:(\d+))?$/) {
+ # node without priority set defaults to priority 0
+ return ($1, int($4 // 0));
+ }
+
+ return undef if $noerr;
+ die "unable to parse HA node entry '$value'\n";
+}
+
PVE::JSONSchema::register_standard_option(
'pve-ha-group-id',
{
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 05/19] config, env, hw: add rules read and parse methods
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (3 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 04/19] rules: introduce node affinity rule plugin Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 06/19] config: delete services from rules if services are deleted from config Daniel Kral
` (20 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Adds methods to the HA environment to read and write the rules
configuration file for the different environment implementations.
The HA Rules are initialized with property isolation since it is
expected that other rule types will use similar property names with
different semantic meanings and/or possible values.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Config.pm | 30 ++++++++++++++++++++++++++++++
src/PVE/HA/Env.pm | 6 ++++++
src/PVE/HA/Env/PVE2.pm | 12 ++++++++++++
src/PVE/HA/Sim/Env.pm | 14 ++++++++++++++
src/PVE/HA/Sim/Hardware.pm | 21 +++++++++++++++++++++
5 files changed, 83 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index ec9360ef..012ae16d 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -7,12 +7,14 @@ use JSON;
use PVE::HA::Tools;
use PVE::HA::Groups;
+use PVE::HA::Rules;
use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
use PVE::HA::Resources;
my $manager_status_filename = "ha/manager_status";
my $ha_groups_config = "ha/groups.cfg";
my $ha_resources_config = "ha/resources.cfg";
+my $ha_rules_config = "ha/rules.cfg";
my $crm_commands_filename = "ha/crm_commands";
my $ha_fence_config = "ha/fence.cfg";
@@ -31,6 +33,11 @@ cfs_register_file(
sub { PVE::HA::Resources->parse_config(@_); },
sub { PVE::HA::Resources->write_config(@_); },
);
+cfs_register_file(
+ $ha_rules_config,
+ sub { PVE::HA::Rules->parse_config(@_); },
+ sub { PVE::HA::Rules->write_config(@_); },
+);
cfs_register_file($manager_status_filename, \&json_reader, \&json_writer);
cfs_register_file(
$ha_fence_config,
@@ -197,6 +204,29 @@ sub parse_sid {
return wantarray ? ($sid, $type, $name) : $sid;
}
+sub read_rules_config {
+
+ return cfs_read_file($ha_rules_config);
+}
+
+sub read_and_check_rules_config {
+
+ my $rules = cfs_read_file($ha_rules_config);
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ return $rules;
+}
+
+sub write_rules_config {
+ my ($cfg) = @_;
+
+ cfs_write_file($ha_rules_config, $cfg);
+}
+
sub read_group_config {
return cfs_read_file($ha_groups_config);
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 285e4400..5cee7b30 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -131,6 +131,12 @@ sub steal_service {
return $self->{plug}->steal_service($sid, $current_node, $new_node);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return $self->{plug}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index b709f303..58fd36e3 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -22,12 +22,18 @@ use PVE::HA::FenceConfig;
use PVE::HA::Resources;
use PVE::HA::Resources::PVEVM;
use PVE::HA::Resources::PVECT;
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
PVE::HA::Resources::PVEVM->register();
PVE::HA::Resources::PVECT->register();
PVE::HA::Resources->init();
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
my $lockdir = "/etc/pve/priv/lock";
sub new {
@@ -189,6 +195,12 @@ sub steal_service {
$self->cluster_state_update();
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return PVE::HA::Config::read_and_check_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index d892a006..bb76b7fa 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -10,6 +10,8 @@ use Fcntl qw(:DEFAULT :flock);
use PVE::HA::Tools;
use PVE::HA::Env;
use PVE::HA::Resources;
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
use PVE::HA::Sim::Resources::VirtVM;
use PVE::HA::Sim::Resources::VirtCT;
use PVE::HA::Sim::Resources::VirtFail;
@@ -20,6 +22,10 @@ PVE::HA::Sim::Resources::VirtFail->register();
PVE::HA::Resources->init();
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
sub new {
my ($this, $nodename, $hardware, $log_id) = @_;
@@ -245,6 +251,14 @@ sub exec_fence_agent {
return $self->{hardware}->exec_fence_agent($agent, $node, @param);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ return $self->{hardware}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 576527d5..89dbdfa4 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -28,6 +28,7 @@ my $watchdog_timeout = 60;
# $testdir/cmdlist Command list for simulation
# $testdir/hardware_status Hardware description (number of nodes, ...)
# $testdir/manager_status CRM status (start with {})
+# $testdir/rules_config Contraints / Rules configuration
# $testdir/service_config Service configuration
# $testdir/static_service_stats Static service usage information (cpu, memory)
# $testdir/groups HA groups configuration
@@ -319,6 +320,22 @@ sub read_crm_commands {
return $self->global_lock($code);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/rules_config";
+ my $raw = '';
+ $raw = PVE::Tools::file_get_contents($filename) if -f $filename;
+ my $rules = PVE::HA::Rules->parse_config($filename, $raw);
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ return $rules;
+}
+
sub read_group_config {
my ($self) = @_;
@@ -391,6 +408,10 @@ sub new {
# copy initial configuartion
copy("$testdir/manager_status", "$statusdir/manager_status"); # optional
+ if (-f "$testdir/rules_config") {
+ copy("$testdir/rules_config", "$statusdir/rules_config");
+ }
+
if (-f "$testdir/groups") {
copy("$testdir/groups", "$statusdir/groups");
} else {
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 06/19] config: delete services from rules if services are deleted from config
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (4 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 05/19] config, env, hw: add rules read and parse methods Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 07/19] manager: read and update rules config Daniel Kral
` (19 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Remove HA resources from rules, where these HA resources are used, if
they are removed by delete_service_from_config(...), which is called by
the HA resources' delete API endpoint and possibly external callers,
e.g. if the HA resource is removed externally.
If all of the rules' HA resources have been removed, the rule itself
must be removed as it would result in an erroneous rules config, which
would become user-visible at the next read and parse of the rules
config.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Config.pm | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 012ae16d..2e520aab 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -360,6 +360,25 @@ sub delete_service_from_config {
"delete resource failed",
);
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = read_rules_config();
+
+ return if !defined($rules->{ids});
+
+ for my $ruleid (keys %{ $rules->{ids} }) {
+ my $rule_resources = $rules->{ids}->{$ruleid}->{resources} // {};
+
+ delete $rule_resources->{$sid};
+
+ delete $rules->{ids}->{$ruleid} if !%$rule_resources;
+ }
+
+ write_rules_config($rules);
+ },
+ "delete resource from rules failed",
+ );
+
return !!$res;
}
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 07/19] manager: read and update rules config
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (5 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 06/19] config: delete services from rules if services are deleted from config Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 08/19] test: ha tester: add test cases for future node affinity rules Daniel Kral
` (18 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Read the rules configuration in each round and update the canonicalized
rules configuration if there were any changes since the last round to
reduce the amount of times of verifying the rule set.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c57a280c..88ff4a65 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -8,6 +8,8 @@ use Digest::MD5 qw(md5_base64);
use PVE::Tools;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -41,7 +43,11 @@ sub new {
my $class = ref($this) || $this;
- my $self = bless { haenv => $haenv, crs => {} }, $class;
+ my $self = bless {
+ haenv => $haenv,
+ crs => {},
+ last_rules_digest => '',
+ }, $class;
my $old_ms = $haenv->read_manager_status();
@@ -556,6 +562,18 @@ sub manage {
delete $ss->{$sid};
}
+ my $new_rules = $haenv->read_rules_config();
+
+ if ($new_rules->{digest} ne $self->{last_rules_digest}) {
+
+ my $messages = PVE::HA::Rules->canonicalize($new_rules);
+ $haenv->log('info', $_) for @$messages;
+
+ $self->{rules} = $new_rules;
+
+ $self->{last_rules_digest} = $self->{rules}->{digest};
+ }
+
$self->update_crm_commands();
for (;;) {
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 08/19] test: ha tester: add test cases for future node affinity rules
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (6 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 07/19] manager: read and update rules config Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 09/19] resources: introduce failback property in ha resource config Daniel Kral
` (17 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Add test cases to verify that the node affinity rules, which will be
added in a following patch, are functionally equivalent to the
existing HA groups.
These test cases verify the following scenarios for (a) unrestricted and
(b) restricted groups (i.e. non-strict and strict node affinity rules):
1. If a service is manually migrated to a non-member node and failback
is enabled, then (a)(b) migrate the service back to a member node.
2. If a service is manually migrated to a non-member node and failback
is disabled, then (a) migrate the service back to a member node, or
(b) do nothing for unrestricted groups.
3. If a service's node fails, where the failed node is the only
available group member left, (a) stay in recovery, or (b) migrate the
service to a non-member node.
4. If a service's node fails, but there is another available group
member left, (a)(b) migrate the service to the other member node.
5. If a service's group has failback enabled and the service's node,
which is the node with the highest priority in the group, fails and
comes back later, (a)(b) migrate it to the second-highest prioritized
node and automatically migrate it back to the highest priority node
as soon as it is available again.
6. If a service's group has failback disabled and the service's node,
which is the node with the highest priority in the group, fails and
comes back later, (a)(b) migrate it to the second-highest prioritized
node, but do not migrate it back to the highest priority node if it
becomes available again.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/test/test-node-affinity-nonstrict1/README | 10 +++
.../test-node-affinity-nonstrict1/cmdlist | 4 +
src/test/test-node-affinity-nonstrict1/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict1/log.expect | 40 ++++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict2/README | 12 +++
.../test-node-affinity-nonstrict2/cmdlist | 4 +
src/test/test-node-affinity-nonstrict2/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict2/log.expect | 35 +++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict3/README | 10 +++
.../test-node-affinity-nonstrict3/cmdlist | 4 +
src/test/test-node-affinity-nonstrict3/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict3/log.expect | 56 ++++++++++++++
.../manager_status | 1 +
.../service_config | 5 ++
src/test/test-node-affinity-nonstrict4/README | 14 ++++
.../test-node-affinity-nonstrict4/cmdlist | 4 +
src/test/test-node-affinity-nonstrict4/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict4/log.expect | 54 ++++++++++++++
.../manager_status | 1 +
.../service_config | 5 ++
src/test/test-node-affinity-nonstrict5/README | 16 ++++
.../test-node-affinity-nonstrict5/cmdlist | 5 ++
src/test/test-node-affinity-nonstrict5/groups | 2 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict5/log.expect | 66 +++++++++++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-nonstrict6/README | 14 ++++
.../test-node-affinity-nonstrict6/cmdlist | 5 ++
src/test/test-node-affinity-nonstrict6/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-nonstrict6/log.expect | 52 +++++++++++++
.../manager_status | 1 +
.../service_config | 3 +
src/test/test-node-affinity-strict1/README | 10 +++
src/test/test-node-affinity-strict1/cmdlist | 4 +
src/test/test-node-affinity-strict1/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict1/log.expect | 40 ++++++++++
.../test-node-affinity-strict1/manager_status | 1 +
.../test-node-affinity-strict1/service_config | 3 +
src/test/test-node-affinity-strict2/README | 11 +++
src/test/test-node-affinity-strict2/cmdlist | 4 +
src/test/test-node-affinity-strict2/groups | 4 +
.../hardware_status | 5 ++
.../test-node-affinity-strict2/log.expect | 40 ++++++++++
.../test-node-affinity-strict2/manager_status | 1 +
.../test-node-affinity-strict2/service_config | 3 +
src/test/test-node-affinity-strict3/README | 10 +++
src/test/test-node-affinity-strict3/cmdlist | 4 +
src/test/test-node-affinity-strict3/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict3/log.expect | 74 +++++++++++++++++++
.../test-node-affinity-strict3/manager_status | 1 +
.../test-node-affinity-strict3/service_config | 5 ++
src/test/test-node-affinity-strict4/README | 14 ++++
src/test/test-node-affinity-strict4/cmdlist | 4 +
src/test/test-node-affinity-strict4/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict4/log.expect | 54 ++++++++++++++
.../test-node-affinity-strict4/manager_status | 1 +
.../test-node-affinity-strict4/service_config | 5 ++
src/test/test-node-affinity-strict5/README | 16 ++++
src/test/test-node-affinity-strict5/cmdlist | 5 ++
src/test/test-node-affinity-strict5/groups | 3 +
.../hardware_status | 5 ++
.../test-node-affinity-strict5/log.expect | 66 +++++++++++++++++
.../test-node-affinity-strict5/manager_status | 1 +
.../test-node-affinity-strict5/service_config | 3 +
src/test/test-node-affinity-strict6/README | 14 ++++
src/test/test-node-affinity-strict6/cmdlist | 5 ++
src/test/test-node-affinity-strict6/groups | 4 +
.../hardware_status | 5 ++
.../test-node-affinity-strict6/log.expect | 52 +++++++++++++
.../test-node-affinity-strict6/manager_status | 1 +
.../test-node-affinity-strict6/service_config | 3 +
84 files changed, 982 insertions(+)
create mode 100644 src/test/test-node-affinity-nonstrict1/README
create mode 100644 src/test/test-node-affinity-nonstrict1/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict1/groups
create mode 100644 src/test/test-node-affinity-nonstrict1/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict1/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict1/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict1/service_config
create mode 100644 src/test/test-node-affinity-nonstrict2/README
create mode 100644 src/test/test-node-affinity-nonstrict2/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict2/groups
create mode 100644 src/test/test-node-affinity-nonstrict2/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict2/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict2/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict2/service_config
create mode 100644 src/test/test-node-affinity-nonstrict3/README
create mode 100644 src/test/test-node-affinity-nonstrict3/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict3/groups
create mode 100644 src/test/test-node-affinity-nonstrict3/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict3/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict3/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict3/service_config
create mode 100644 src/test/test-node-affinity-nonstrict4/README
create mode 100644 src/test/test-node-affinity-nonstrict4/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict4/groups
create mode 100644 src/test/test-node-affinity-nonstrict4/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict4/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict4/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict4/service_config
create mode 100644 src/test/test-node-affinity-nonstrict5/README
create mode 100644 src/test/test-node-affinity-nonstrict5/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict5/groups
create mode 100644 src/test/test-node-affinity-nonstrict5/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict5/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict5/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict5/service_config
create mode 100644 src/test/test-node-affinity-nonstrict6/README
create mode 100644 src/test/test-node-affinity-nonstrict6/cmdlist
create mode 100644 src/test/test-node-affinity-nonstrict6/groups
create mode 100644 src/test/test-node-affinity-nonstrict6/hardware_status
create mode 100644 src/test/test-node-affinity-nonstrict6/log.expect
create mode 100644 src/test/test-node-affinity-nonstrict6/manager_status
create mode 100644 src/test/test-node-affinity-nonstrict6/service_config
create mode 100644 src/test/test-node-affinity-strict1/README
create mode 100644 src/test/test-node-affinity-strict1/cmdlist
create mode 100644 src/test/test-node-affinity-strict1/groups
create mode 100644 src/test/test-node-affinity-strict1/hardware_status
create mode 100644 src/test/test-node-affinity-strict1/log.expect
create mode 100644 src/test/test-node-affinity-strict1/manager_status
create mode 100644 src/test/test-node-affinity-strict1/service_config
create mode 100644 src/test/test-node-affinity-strict2/README
create mode 100644 src/test/test-node-affinity-strict2/cmdlist
create mode 100644 src/test/test-node-affinity-strict2/groups
create mode 100644 src/test/test-node-affinity-strict2/hardware_status
create mode 100644 src/test/test-node-affinity-strict2/log.expect
create mode 100644 src/test/test-node-affinity-strict2/manager_status
create mode 100644 src/test/test-node-affinity-strict2/service_config
create mode 100644 src/test/test-node-affinity-strict3/README
create mode 100644 src/test/test-node-affinity-strict3/cmdlist
create mode 100644 src/test/test-node-affinity-strict3/groups
create mode 100644 src/test/test-node-affinity-strict3/hardware_status
create mode 100644 src/test/test-node-affinity-strict3/log.expect
create mode 100644 src/test/test-node-affinity-strict3/manager_status
create mode 100644 src/test/test-node-affinity-strict3/service_config
create mode 100644 src/test/test-node-affinity-strict4/README
create mode 100644 src/test/test-node-affinity-strict4/cmdlist
create mode 100644 src/test/test-node-affinity-strict4/groups
create mode 100644 src/test/test-node-affinity-strict4/hardware_status
create mode 100644 src/test/test-node-affinity-strict4/log.expect
create mode 100644 src/test/test-node-affinity-strict4/manager_status
create mode 100644 src/test/test-node-affinity-strict4/service_config
create mode 100644 src/test/test-node-affinity-strict5/README
create mode 100644 src/test/test-node-affinity-strict5/cmdlist
create mode 100644 src/test/test-node-affinity-strict5/groups
create mode 100644 src/test/test-node-affinity-strict5/hardware_status
create mode 100644 src/test/test-node-affinity-strict5/log.expect
create mode 100644 src/test/test-node-affinity-strict5/manager_status
create mode 100644 src/test/test-node-affinity-strict5/service_config
create mode 100644 src/test/test-node-affinity-strict6/README
create mode 100644 src/test/test-node-affinity-strict6/cmdlist
create mode 100644 src/test/test-node-affinity-strict6/groups
create mode 100644 src/test/test-node-affinity-strict6/hardware_status
create mode 100644 src/test/test-node-affinity-strict6/log.expect
create mode 100644 src/test/test-node-affinity-strict6/manager_status
create mode 100644 src/test/test-node-affinity-strict6/service_config
diff --git a/src/test/test-node-affinity-nonstrict1/README b/src/test/test-node-affinity-nonstrict1/README
new file mode 100644
index 00000000..8775b6ca
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group will automatically migrate back
+to a node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is a group member and has higher priority than the other nodes
diff --git a/src/test/test-node-affinity-nonstrict1/cmdlist b/src/test/test-node-affinity-nonstrict1/cmdlist
new file mode 100644
index 00000000..a63e4fdf
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict1/groups b/src/test/test-node-affinity-nonstrict1/groups
new file mode 100644
index 00000000..50c9a2d7
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node3
diff --git a/src/test/test-node-affinity-nonstrict1/hardware_status b/src/test/test-node-affinity-nonstrict1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict1/log.expect b/src/test/test-node-affinity-nonstrict1/log.expect
new file mode 100644
index 00000000..d86c69de
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 123 node2/lrm: got lock 'ha_agent_node2_lock'
+info 123 node2/lrm: status change wait_for_agent_lock => active
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict1/manager_status b/src/test/test-node-affinity-nonstrict1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict1/service_config b/src/test/test-node-affinity-nonstrict1/service_config
new file mode 100644
index 00000000..5f558431
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-nonstrict2/README b/src/test/test-node-affinity-nonstrict2/README
new file mode 100644
index 00000000..f27414b1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/README
@@ -0,0 +1,12 @@
+Test whether a service in a unrestricted group with nofailback enabled will
+stay on the manual migration target node, even though the target node is not a
+member of the unrestricted group.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, vm:101 stays on node2; even though
+ node2 is not a group member, the nofailback flag prevents vm:101 to be
+ migrated back to a group member
diff --git a/src/test/test-node-affinity-nonstrict2/cmdlist b/src/test/test-node-affinity-nonstrict2/cmdlist
new file mode 100644
index 00000000..a63e4fdf
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict2/groups b/src/test/test-node-affinity-nonstrict2/groups
new file mode 100644
index 00000000..59192fad
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/groups
@@ -0,0 +1,3 @@
+group: should_stay_here
+ nodes node3
+ nofailback 1
diff --git a/src/test/test-node-affinity-nonstrict2/hardware_status b/src/test/test-node-affinity-nonstrict2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict2/log.expect b/src/test/test-node-affinity-nonstrict2/log.expect
new file mode 100644
index 00000000..c574097d
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/log.expect
@@ -0,0 +1,35 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 123 node2/lrm: got lock 'ha_agent_node2_lock'
+info 123 node2/lrm: status change wait_for_agent_lock => active
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 143 node2/lrm: starting service vm:101
+info 143 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict2/manager_status b/src/test/test-node-affinity-nonstrict2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict2/service_config b/src/test/test-node-affinity-nonstrict2/service_config
new file mode 100644
index 00000000..5f558431
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-nonstrict3/README b/src/test/test-node-affinity-nonstrict3/README
new file mode 100644
index 00000000..c4ddfab8
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group with only one node member will
+be migrated to a non-member node in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node1
diff --git a/src/test/test-node-affinity-nonstrict3/cmdlist b/src/test/test-node-affinity-nonstrict3/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict3/groups b/src/test/test-node-affinity-nonstrict3/groups
new file mode 100644
index 00000000..50c9a2d7
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node3
diff --git a/src/test/test-node-affinity-nonstrict3/hardware_status b/src/test/test-node-affinity-nonstrict3/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict3/log.expect b/src/test/test-node-affinity-nonstrict3/log.expect
new file mode 100644
index 00000000..752300bc
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/log.expect
@@ -0,0 +1,56 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: got lock 'ha_agent_node1_lock'
+info 241 node1/lrm: status change wait_for_agent_lock => active
+info 241 node1/lrm: starting service vm:101
+info 241 node1/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict3/manager_status b/src/test/test-node-affinity-nonstrict3/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-nonstrict3/service_config b/src/test/test-node-affinity-nonstrict3/service_config
new file mode 100644
index 00000000..777b2a7e
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-nonstrict4/README b/src/test/test-node-affinity-nonstrict4/README
new file mode 100644
index 00000000..a08f0e1d
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/README
@@ -0,0 +1,14 @@
+Test whether a service in a unrestricted group with two node members will stay
+assigned to one of the node members in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher service count than node1 to test whether the restriction
+ to node2 and node3 is applied even though the scheduler would prefer the less
+ utilized node1
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node2, as it's the only available node
+ left in the unrestricted group
diff --git a/src/test/test-node-affinity-nonstrict4/cmdlist b/src/test/test-node-affinity-nonstrict4/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict4/groups b/src/test/test-node-affinity-nonstrict4/groups
new file mode 100644
index 00000000..b1584b55
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node2,node3
diff --git a/src/test/test-node-affinity-nonstrict4/hardware_status b/src/test/test-node-affinity-nonstrict4/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict4/log.expect b/src/test/test-node-affinity-nonstrict4/log.expect
new file mode 100644
index 00000000..847e157c
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/log.expect
@@ -0,0 +1,54 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict4/manager_status b/src/test/test-node-affinity-nonstrict4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict4/service_config b/src/test/test-node-affinity-nonstrict4/service_config
new file mode 100644
index 00000000..777b2a7e
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-nonstrict5/README b/src/test/test-node-affinity-nonstrict5/README
new file mode 100644
index 00000000..0c370446
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/README
@@ -0,0 +1,16 @@
+Test whether a service in a unrestricted group with two differently prioritized
+node members will stay on the node with the highest priority in case of a
+failover or when the service is on a lower-priority node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
+ a higher priority than node3
+- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
+ available node member left in the unrestricted group
+- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
+ higher priority than node3
diff --git a/src/test/test-node-affinity-nonstrict5/cmdlist b/src/test/test-node-affinity-nonstrict5/cmdlist
new file mode 100644
index 00000000..6932aa78
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off" ],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict5/groups b/src/test/test-node-affinity-nonstrict5/groups
new file mode 100644
index 00000000..03a0ee9b
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node2:2,node3:1
diff --git a/src/test/test-node-affinity-nonstrict5/hardware_status b/src/test/test-node-affinity-nonstrict5/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict5/log.expect b/src/test/test-node-affinity-nonstrict5/log.expect
new file mode 100644
index 00000000..ca6e4e4f
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 25 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 260 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 260 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 265 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 265 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 280 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 363 node2/lrm: got lock 'ha_agent_node2_lock'
+info 363 node2/lrm: status change wait_for_agent_lock => active
+info 363 node2/lrm: starting service vm:101
+info 363 node2/lrm: service status vm:101 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict5/manager_status b/src/test/test-node-affinity-nonstrict5/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict5/service_config b/src/test/test-node-affinity-nonstrict5/service_config
new file mode 100644
index 00000000..5f558431
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-nonstrict6/README b/src/test/test-node-affinity-nonstrict6/README
new file mode 100644
index 00000000..4ab12756
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/README
@@ -0,0 +1,14 @@
+Test whether a service in a unrestricted group with nofailback enabled and two
+differently prioritized node members will stay on the current node without
+migrating back to the highest priority node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node2
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As node2 fails, vm:101 is migrated to node3 as it is the only available node
+ member left in the unrestricted group
+- As node2 comes back online, vm:101 stays on node3; even though node2 has a
+ higher priority, the nofailback flag prevents vm:101 to migrate back to node2
diff --git a/src/test/test-node-affinity-nonstrict6/cmdlist b/src/test/test-node-affinity-nonstrict6/cmdlist
new file mode 100644
index 00000000..4dd33cc4
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off"],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-nonstrict6/groups b/src/test/test-node-affinity-nonstrict6/groups
new file mode 100644
index 00000000..a7aed178
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/groups
@@ -0,0 +1,3 @@
+group: should_stay_here
+ nodes node2:2,node3:1
+ nofailback 1
diff --git a/src/test/test-node-affinity-nonstrict6/hardware_status b/src/test/test-node-affinity-nonstrict6/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-nonstrict6/log.expect b/src/test/test-node-affinity-nonstrict6/log.expect
new file mode 100644
index 00000000..bcb472ba
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: got lock 'ha_agent_node3_lock'
+info 245 node3/lrm: status change wait_for_agent_lock => active
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-nonstrict6/manager_status b/src/test/test-node-affinity-nonstrict6/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-nonstrict6/service_config b/src/test/test-node-affinity-nonstrict6/service_config
new file mode 100644
index 00000000..c4ece62c
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node2", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict1/README b/src/test/test-node-affinity-strict1/README
new file mode 100644
index 00000000..c717d589
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/README
@@ -0,0 +1,10 @@
+Test whether a service in a restricted group will automatically migrate back to
+a restricted node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is the only available node member left in the restricted group
diff --git a/src/test/test-node-affinity-strict1/cmdlist b/src/test/test-node-affinity-strict1/cmdlist
new file mode 100644
index 00000000..a63e4fdf
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-strict1/groups b/src/test/test-node-affinity-strict1/groups
new file mode 100644
index 00000000..370865f6
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
diff --git a/src/test/test-node-affinity-strict1/hardware_status b/src/test/test-node-affinity-strict1/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict1/log.expect b/src/test/test-node-affinity-strict1/log.expect
new file mode 100644
index 00000000..d86c69de
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 123 node2/lrm: got lock 'ha_agent_node2_lock'
+info 123 node2/lrm: status change wait_for_agent_lock => active
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict1/manager_status b/src/test/test-node-affinity-strict1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict1/service_config b/src/test/test-node-affinity-strict1/service_config
new file mode 100644
index 00000000..36ea15b1
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict2/README b/src/test/test-node-affinity-strict2/README
new file mode 100644
index 00000000..f4d06a14
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/README
@@ -0,0 +1,11 @@
+Test whether a service in a restricted group with nofailback enabled will
+automatically migrate back to a restricted node member in case of a manual
+migration to a non-member node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is the only available node member left in the restricted group
diff --git a/src/test/test-node-affinity-strict2/cmdlist b/src/test/test-node-affinity-strict2/cmdlist
new file mode 100644
index 00000000..a63e4fdf
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-node-affinity-strict2/groups b/src/test/test-node-affinity-strict2/groups
new file mode 100644
index 00000000..e43eafc5
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/groups
@@ -0,0 +1,4 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
+ nofailback 1
diff --git a/src/test/test-node-affinity-strict2/hardware_status b/src/test/test-node-affinity-strict2/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict2/log.expect b/src/test/test-node-affinity-strict2/log.expect
new file mode 100644
index 00000000..d86c69de
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 123 node2/lrm: got lock 'ha_agent_node2_lock'
+info 123 node2/lrm: status change wait_for_agent_lock => active
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict2/manager_status b/src/test/test-node-affinity-strict2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict2/service_config b/src/test/test-node-affinity-strict2/service_config
new file mode 100644
index 00000000..36ea15b1
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict3/README b/src/test/test-node-affinity-strict3/README
new file mode 100644
index 00000000..5aced390
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/README
@@ -0,0 +1,10 @@
+Test whether a service in a restricted group with only one node member will
+stay in recovery in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As node3 fails, vm:101 stays in recovery since there's no available node
+ member left in the restricted group
diff --git a/src/test/test-node-affinity-strict3/cmdlist b/src/test/test-node-affinity-strict3/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-strict3/groups b/src/test/test-node-affinity-strict3/groups
new file mode 100644
index 00000000..370865f6
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
diff --git a/src/test/test-node-affinity-strict3/hardware_status b/src/test/test-node-affinity-strict3/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict3/log.expect b/src/test/test-node-affinity-strict3/log.expect
new file mode 100644
index 00000000..47f97767
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/log.expect
@@ -0,0 +1,74 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+err 240 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 260 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 280 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 300 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 320 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 340 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 360 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 380 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 400 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 420 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 440 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 460 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 480 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 500 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 520 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 540 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 560 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 580 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 600 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 620 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 640 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 660 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 680 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 700 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict3/manager_status b/src/test/test-node-affinity-strict3/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-node-affinity-strict3/service_config b/src/test/test-node-affinity-strict3/service_config
new file mode 100644
index 00000000..9adf02c8
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-strict4/README b/src/test/test-node-affinity-strict4/README
new file mode 100644
index 00000000..25ded53e
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/README
@@ -0,0 +1,14 @@
+Test whether a service in a restricted group with two node members will stay
+assigned to one of the node members in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher service count than node1 to test whether the restriction
+ to node2 and node3 is applied even though the scheduler would prefer the less
+ utilized node1
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node2, as it's the only available node
+ left in the restricted group
diff --git a/src/test/test-node-affinity-strict4/cmdlist b/src/test/test-node-affinity-strict4/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-node-affinity-strict4/groups b/src/test/test-node-affinity-strict4/groups
new file mode 100644
index 00000000..0ad2abc6
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node2,node3
+ restricted 1
diff --git a/src/test/test-node-affinity-strict4/hardware_status b/src/test/test-node-affinity-strict4/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict4/log.expect b/src/test/test-node-affinity-strict4/log.expect
new file mode 100644
index 00000000..847e157c
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/log.expect
@@ -0,0 +1,54 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict4/manager_status b/src/test/test-node-affinity-strict4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict4/service_config b/src/test/test-node-affinity-strict4/service_config
new file mode 100644
index 00000000..9adf02c8
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-node-affinity-strict5/README b/src/test/test-node-affinity-strict5/README
new file mode 100644
index 00000000..a4e67f42
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/README
@@ -0,0 +1,16 @@
+Test whether a service in a restricted group with two differently prioritized
+node members will stay on the node with the highest priority in case of a
+failover or when the service is on a lower-priority node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
+ a higher priority than node3
+- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
+ available node member left in the restricted group
+- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
+ higher priority than node3
diff --git a/src/test/test-node-affinity-strict5/cmdlist b/src/test/test-node-affinity-strict5/cmdlist
new file mode 100644
index 00000000..6932aa78
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off" ],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-strict5/groups b/src/test/test-node-affinity-strict5/groups
new file mode 100644
index 00000000..ec3cd799
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node2:2,node3:1
+ restricted 1
diff --git a/src/test/test-node-affinity-strict5/hardware_status b/src/test/test-node-affinity-strict5/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict5/log.expect b/src/test/test-node-affinity-strict5/log.expect
new file mode 100644
index 00000000..ca6e4e4f
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 25 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 260 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 260 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 265 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 265 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 280 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 363 node2/lrm: got lock 'ha_agent_node2_lock'
+info 363 node2/lrm: status change wait_for_agent_lock => active
+info 363 node2/lrm: starting service vm:101
+info 363 node2/lrm: service status vm:101 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict5/manager_status b/src/test/test-node-affinity-strict5/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict5/service_config b/src/test/test-node-affinity-strict5/service_config
new file mode 100644
index 00000000..36ea15b1
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-node-affinity-strict6/README b/src/test/test-node-affinity-strict6/README
new file mode 100644
index 00000000..c558afd1
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/README
@@ -0,0 +1,14 @@
+Test whether a service in a restricted group with nofailback enabled and two
+differently prioritized node members will stay on the current node without
+migrating back to the highest priority node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node2
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As node2 fails, vm:101 is migrated to node3 as it is the only available node
+ member left in the restricted group
+- As node2 comes back online, vm:101 stays on node3; even though node2 has a
+ higher priority, the nofailback flag prevents vm:101 to migrate back to node2
diff --git a/src/test/test-node-affinity-strict6/cmdlist b/src/test/test-node-affinity-strict6/cmdlist
new file mode 100644
index 00000000..4dd33cc4
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off"],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-node-affinity-strict6/groups b/src/test/test-node-affinity-strict6/groups
new file mode 100644
index 00000000..cdd0e502
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/groups
@@ -0,0 +1,4 @@
+group: must_stay_here
+ nodes node2:2,node3:1
+ restricted 1
+ nofailback 1
diff --git a/src/test/test-node-affinity-strict6/hardware_status b/src/test/test-node-affinity-strict6/hardware_status
new file mode 100644
index 00000000..451beb13
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-node-affinity-strict6/log.expect b/src/test/test-node-affinity-strict6/log.expect
new file mode 100644
index 00000000..bcb472ba
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: got lock 'ha_agent_node3_lock'
+info 245 node3/lrm: status change wait_for_agent_lock => active
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-node-affinity-strict6/manager_status b/src/test/test-node-affinity-strict6/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-node-affinity-strict6/service_config b/src/test/test-node-affinity-strict6/service_config
new file mode 100644
index 00000000..1d371e1e
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node2", "state": "started", "group": "must_stay_here" }
+}
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 09/19] resources: introduce failback property in ha resource config
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (7 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 08/19] test: ha tester: add test cases for future node affinity rules Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 10/19] manager: migrate ha groups to node affinity rules in-memory Daniel Kral
` (16 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Add the failback property in the HA resources config, which is
functionally equivalent to the negation of the HA group's nofailback
property. It will be used to migrate HA groups to HA node affinity
rules.
The 'failback' flag is set to be enabled by default as the HA group's
nofailback property was disabled by default.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/API2/HA/Resources.pm | 9 +++++++++
src/PVE/API2/HA/Status.pm | 11 ++++++++++-
src/PVE/HA/Config.pm | 1 +
src/PVE/HA/Resources.pm | 9 +++++++++
src/PVE/HA/Resources/PVECT.pm | 1 +
src/PVE/HA/Resources/PVEVM.pm | 1 +
src/PVE/HA/Sim/Hardware.pm | 1 +
src/test/test_failover1.pl | 1 +
8 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/src/PVE/API2/HA/Resources.pm b/src/PVE/API2/HA/Resources.pm
index 59162044..26ef9e33 100644
--- a/src/PVE/API2/HA/Resources.pm
+++ b/src/PVE/API2/HA/Resources.pm
@@ -127,6 +127,15 @@ __PACKAGE__->register_method({
optional => 1,
description => "Requested resource state.",
},
+ failback => {
+ description => "The HA resource is automatically migrated to"
+ . " the node with the highest priority according to their"
+ . " node affinity rule, if a node with a higher priority"
+ . " than the current node comes online.",
+ type => 'boolean',
+ optional => 1,
+ default => 1,
+ },
group => get_standard_option('pve-ha-group-id', { optional => 1 }),
max_restart => {
description => "Maximal number of tries to restart the service on"
diff --git a/src/PVE/API2/HA/Status.pm b/src/PVE/API2/HA/Status.pm
index 1547e0ec..6e13c2c8 100644
--- a/src/PVE/API2/HA/Status.pm
+++ b/src/PVE/API2/HA/Status.pm
@@ -109,6 +109,15 @@ __PACKAGE__->register_method({
type => "string",
optional => 1,
},
+ failback => {
+ description => "The HA resource is automatically migrated"
+ . " to the node with the highest priority according to"
+ . " their node affinity rule, if a node with a higher"
+ . " priority than the current node comes online.",
+ type => "boolean",
+ optional => 1,
+ default => 1,
+ },
max_relocate => {
description => "For type 'service'.",
type => "integer",
@@ -260,7 +269,7 @@ __PACKAGE__->register_method({
# also return common resource attributes
if (defined($sc)) {
$data->{request_state} = $sc->{state};
- foreach my $key (qw(group max_restart max_relocate comment)) {
+ foreach my $key (qw(group max_restart max_relocate failback comment)) {
$data->{$key} = $sc->{$key} if defined($sc->{$key});
}
}
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 2e520aab..7d071f3b 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -116,6 +116,7 @@ sub read_and_check_resources_config {
my (undef, undef, $name) = parse_sid($sid);
$d->{state} = 'started' if !defined($d->{state});
$d->{state} = 'started' if $d->{state} eq 'enabled'; # backward compatibility
+ $d->{failback} = 1 if !defined($d->{failback});
$d->{max_restart} = 1 if !defined($d->{max_restart});
$d->{max_relocate} = 1 if !defined($d->{max_relocate});
if (PVE::HA::Resources->lookup($d->{type})) {
diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 873387e3..b6d4a732 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -62,6 +62,15 @@ EODESC
completion => \&PVE::HA::Tools::complete_group,
},
),
+ failback => {
+ description => "Automatically migrate HA resource to the node with"
+ . " the highest priority according to their node affinity "
+ . " rules, if a node with a higher priority than the current"
+ . " node comes online.",
+ type => 'boolean',
+ optional => 1,
+ default => 1,
+ },
max_restart => {
description => "Maximal number of tries to restart the service on"
. " a node after its start failed.",
diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index d1ab6796..44644d92 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -37,6 +37,7 @@ sub options {
state => { optional => 1 },
group => { optional => 1 },
comment => { optional => 1 },
+ failback => { optional => 1 },
max_restart => { optional => 1 },
max_relocate => { optional => 1 },
};
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index fe65577c..e634fe3c 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -37,6 +37,7 @@ sub options {
state => { optional => 1 },
group => { optional => 1 },
comment => { optional => 1 },
+ failback => { optional => 1 },
max_restart => { optional => 1 },
max_relocate => { optional => 1 },
};
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 89dbdfa4..579be2ad 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -106,6 +106,7 @@ sub read_service_config {
}
$d->{state} = 'disabled' if !$d->{state};
$d->{state} = 'started' if $d->{state} eq 'enabled'; # backward compatibility
+ $d->{failback} = 1 if !defined($d->{failback});
$d->{max_restart} = 1 if !defined($d->{max_restart});
$d->{max_relocate} = 1 if !defined($d->{max_relocate});
}
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 29b56c68..f6faa386 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -22,6 +22,7 @@ $online_node_usage->add_node("node3");
my $service_conf = {
node => 'node1',
group => 'prefer_node1',
+ failback => 1,
};
my $sd = {
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 10/19] manager: migrate ha groups to node affinity rules in-memory
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (8 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 09/19] resources: introduce failback property in ha resource config Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 11/19] manager: apply node affinity rules when selecting service nodes Daniel Kral
` (15 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Migrate the currently configured groups to node affinity rules
in-memory, so that they can be applied as such in the next patches and
therefore replace HA groups internally.
HA node affinity rules in their initial implementation are designed to
be as restrictive as HA groups, i.e. only allow a HA resource to be used
in a single node affinity rule, to ease the migration between them.
HA groups map directly to node affinity rules, except that the
'restricted' property is renamed to 'strict' and that the 'failback'
property is moved to the HA resources config.
The 'nofailback' property is moved to the HA resources config, because
it allows users to set it more granularly for individual HA resources
and allows the node affinity rules to be more extendible in the future,
e.g. multiple node affinity rules for a single HA resource.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Config.pm | 3 ++-
src/PVE/HA/Groups.pm | 47 +++++++++++++++++++++++++++++++++++++++++++
src/PVE/HA/Manager.pm | 18 +++++++++++++++--
3 files changed, 65 insertions(+), 3 deletions(-)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 7d071f3b..424a6e10 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -131,7 +131,8 @@ sub read_and_check_resources_config {
}
}
- return $conf;
+ # TODO PVE 10: Remove digest when HA groups have been fully migrated to rules
+ return wantarray ? ($conf, $res->{digest}) : $conf;
}
sub update_resources_config {
diff --git a/src/PVE/HA/Groups.pm b/src/PVE/HA/Groups.pm
index 821d969b..b39b0373 100644
--- a/src/PVE/HA/Groups.pm
+++ b/src/PVE/HA/Groups.pm
@@ -107,4 +107,51 @@ sub parse_section_header {
__PACKAGE__->register();
__PACKAGE__->init();
+# Migrate nofailback flag from $groups to $resources
+sub migrate_groups_to_resources {
+ my ($groups, $resources) = @_;
+
+ for my $sid (keys %$resources) {
+ my $groupid = $resources->{$sid}->{group}
+ or next; # skip resources without groups
+
+ $resources->{$sid}->{failback} = int(!$groups->{ids}->{$groupid}->{nofailback});
+ }
+}
+
+# Migrate groups from groups from $groups and $resources to node affinity rules in $rules
+sub migrate_groups_to_rules {
+ my ($rules, $groups, $resources) = @_;
+
+ my $group_resources = {};
+
+ for my $sid (keys %$resources) {
+ my $groupid = $resources->{$sid}->{group}
+ or next; # skip resources without groups
+
+ $group_resources->{$groupid}->{$sid} = 1;
+ }
+
+ while (my ($group, $resources) = each %$group_resources) {
+ next if !$groups->{ids}->{$group}; # skip non-existant groups
+
+ my $new_ruleid = "ha-group-$group";
+ my $nodes = {};
+ for my $entry (keys $groups->{ids}->{$group}->{nodes}->%*) {
+ my ($node, $priority) = PVE::HA::Tools::parse_node_priority($entry);
+
+ $nodes->{$node} = { priority => $priority };
+ }
+
+ $rules->{ids}->{$new_ruleid} = {
+ type => 'node-affinity',
+ resources => $resources,
+ nodes => $nodes,
+ strict => $groups->{ids}->{$group}->{restricted},
+ comment => "Generated from HA group '$group'.",
+ };
+ $rules->{order}->{$new_ruleid} = PVE::HA::Rules::get_next_ordinal($rules);
+ }
+}
+
1;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 88ff4a65..148447d6 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -6,6 +6,7 @@ use warnings;
use Digest::MD5 qw(md5_base64);
use PVE::Tools;
+use PVE::HA::Groups;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Rules;
@@ -47,6 +48,8 @@ sub new {
haenv => $haenv,
crs => {},
last_rules_digest => '',
+ last_groups_digest => '',
+ last_services_digest => '',
}, $class;
my $old_ms = $haenv->read_manager_status();
@@ -529,7 +532,7 @@ sub manage {
$self->update_crs_scheduler_mode();
- my $sc = $haenv->read_service_config();
+ my ($sc, $services_digest) = $haenv->read_service_config();
$self->{groups} = $haenv->read_group_config(); # update
@@ -564,7 +567,16 @@ sub manage {
my $new_rules = $haenv->read_rules_config();
- if ($new_rules->{digest} ne $self->{last_rules_digest}) {
+ # TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules
+ PVE::HA::Groups::migrate_groups_to_resources($self->{groups}, $sc);
+
+ if (
+ !$self->{rules}
+ || $new_rules->{digest} ne $self->{last_rules_digest}
+ || $self->{groups}->{digest} ne $self->{last_groups_digest}
+ || $services_digest && $services_digest ne $self->{last_services_digest}
+ ) {
+ PVE::HA::Groups::migrate_groups_to_rules($new_rules, $self->{groups}, $sc);
my $messages = PVE::HA::Rules->canonicalize($new_rules);
$haenv->log('info', $_) for @$messages;
@@ -572,6 +584,8 @@ sub manage {
$self->{rules} = $new_rules;
$self->{last_rules_digest} = $self->{rules}->{digest};
+ $self->{last_groups_digest} = $self->{groups}->{digest};
+ $self->{last_services_digest} = $services_digest;
}
$self->update_crm_commands();
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 11/19] manager: apply node affinity rules when selecting service nodes
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (9 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 10/19] manager: migrate ha groups to node affinity rules in-memory Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 12/19] test: add test cases for rules config Daniel Kral
` (14 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Replace the HA group mechanism with the functionally equivalent node
affinity rules' get_node_affinity(...), which enforces the node affinity
rules defined in the rules config.
This allows the $groups parameter to be replaced with the $rules
parameter in select_service_node(...) as all behavior of the HA groups
is now encoded in $service_conf and $rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 83 ++++++--------------------------
src/PVE/HA/Rules/NodeAffinity.pm | 83 ++++++++++++++++++++++++++++++++
src/test/test_failover1.pl | 16 ++++--
3 files changed, 110 insertions(+), 72 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 148447d6..43572531 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -10,7 +10,7 @@ use PVE::HA::Groups;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Rules;
-use PVE::HA::Rules::NodeAffinity;
+use PVE::HA::Rules::NodeAffinity qw(get_node_affinity);
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -114,57 +114,13 @@ sub flush_master_status {
$haenv->write_manager_status($ms);
}
-sub get_service_group {
- my ($groups, $online_node_usage, $service_conf) = @_;
-
- my $group = {};
- # add all online nodes to default group to allow try_next when no group set
- $group->{nodes}->{$_} = 1 for $online_node_usage->list_nodes();
-
- # overwrite default if service is bound to a specific group
- if (my $group_id = $service_conf->{group}) {
- $group = $groups->{ids}->{$group_id} if $groups->{ids}->{$group_id};
- }
-
- return $group;
-}
-
-# groups available nodes with their priority as group index
-sub get_node_priority_groups {
- my ($group, $online_node_usage) = @_;
-
- my $pri_groups = {};
- my $group_members = {};
- foreach my $entry (keys %{ $group->{nodes} }) {
- my ($node, $pri) = ($entry, 0);
- if ($entry =~ m/^(\S+):(\d+)$/) {
- ($node, $pri) = ($1, $2);
- }
- next if !$online_node_usage->contains_node($node); # offline
- $pri_groups->{$pri}->{$node} = 1;
- $group_members->{$node} = $pri;
- }
-
- # add non-group members to unrestricted groups (priority -1)
- if (!$group->{restricted}) {
- my $pri = -1;
- for my $node ($online_node_usage->list_nodes()) {
- next if defined($group_members->{$node});
- $pri_groups->{$pri}->{$node} = 1;
- $group_members->{$node} = -1;
- }
- }
-
- return ($pri_groups, $group_members);
-}
-
=head3 select_service_node(...)
-=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference)
+=head3 select_service_node($rules, $online_node_usage, $sid, $service_conf, $sd, $node_preference)
Used to select the best fitting node for the service C<$sid>, with the
-configuration C<$service_conf> and state C<$sd>, according to the groups defined
-in C<$groups>, available node utilization in C<$online_node_usage>, and the
+configuration C<$service_conf> and state C<$sd>, according to the rules defined
+in C<$rules>, available node utilization in C<$online_node_usage>, and the
given C<$node_preference>.
The C<$node_preference> can be set to:
@@ -182,7 +138,7 @@ The C<$node_preference> can be set to:
=cut
sub select_service_node {
- my ($groups, $online_node_usage, $sid, $service_conf, $sd, $node_preference) = @_;
+ my ($rules, $online_node_usage, $sid, $service_conf, $sd, $node_preference) = @_;
die "'$node_preference' is not a valid node_preference for select_service_node\n"
if $node_preference !~ m/(none|best-score|try-next)/;
@@ -190,42 +146,35 @@ sub select_service_node {
my ($current_node, $tried_nodes, $maintenance_fallback) =
$sd->@{qw(node failed_nodes maintenance_node)};
- my $group = get_service_group($groups, $online_node_usage, $service_conf);
+ my ($allowed_nodes, $pri_nodes) = get_node_affinity($rules, $sid, $online_node_usage);
- my ($pri_groups, $group_members) = get_node_priority_groups($group, $online_node_usage);
-
- my @pri_list = sort { $b <=> $a } keys %$pri_groups;
- return undef if !scalar(@pri_list);
+ return undef if !%$pri_nodes;
# stay on current node if possible (avoids random migrations)
if (
$node_preference eq 'none'
- && $group->{nofailback}
- && defined($group_members->{$current_node})
+ && !$service_conf->{failback}
+ && $allowed_nodes->{$current_node}
) {
return $current_node;
}
- # select node from top priority node list
-
- my $top_pri = $pri_list[0];
-
# try to avoid nodes where the service failed already if we want to relocate
if ($node_preference eq 'try-next') {
foreach my $node (@$tried_nodes) {
- delete $pri_groups->{$top_pri}->{$node};
+ delete $pri_nodes->{$node};
}
}
return $maintenance_fallback
- if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+ if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
- return $current_node if $node_preference eq 'none' && $pri_groups->{$top_pri}->{$current_node};
+ return $current_node if $node_preference eq 'none' && $pri_nodes->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
$scores->{$a} <=> $scores->{$b} || $a cmp $b
- } keys %{ $pri_groups->{$top_pri} };
+ } keys %$pri_nodes;
my $found;
for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
@@ -843,7 +792,7 @@ sub next_state_request_start {
if ($self->{crs}->{rebalance_on_request_start}) {
my $selected_node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
@@ -1010,7 +959,7 @@ sub next_state_started {
}
my $node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
@@ -1128,7 +1077,7 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
my $recovery_node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
diff --git a/src/PVE/HA/Rules/NodeAffinity.pm b/src/PVE/HA/Rules/NodeAffinity.pm
index 2b3d7390..03313997 100644
--- a/src/PVE/HA/Rules/NodeAffinity.pm
+++ b/src/PVE/HA/Rules/NodeAffinity.pm
@@ -12,8 +12,13 @@ use PVE::Tools;
use PVE::HA::Rules;
use PVE::HA::Tools;
+use base qw(Exporter);
use base qw(PVE::HA::Rules);
+our @EXPORT_OK = qw(
+ get_node_affinity
+);
+
=head1 NAME
PVE::HA::Rules::NodeAffinity
@@ -210,4 +215,82 @@ __PACKAGE__->register_check(
},
);
+=head1 NODE AFFINITY RULE HELPERS
+
+=cut
+
+my $get_resource_node_affinity_rule = sub {
+ my ($rules, $sid) = @_;
+
+ # with the current restriction a resource can only be in one node affinity rule
+ my $node_affinity_rule;
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule) = @_;
+
+ $node_affinity_rule = dclone($rule) if !$node_affinity_rule;
+ },
+ {
+ sid => $sid,
+ type => 'node-affinity',
+ exclude_disabled_rules => 1,
+ },
+ );
+
+ return $node_affinity_rule;
+};
+
+=head3 get_node_affinity($rules, $sid, $online_node_usage)
+
+Returns a list of two hashes representing the node affinity of C<$sid>
+according to the node affinity rules in C<$rules> and the available nodes in
+C<$online_node_usage>.
+
+The first hash is a hash set of available nodes, i.e. nodes where the
+resource C<$sid> is allowed to be assigned to, and the second hash is a hash set
+of preferred nodes, i.e. nodes where the resource C<$sid> should be assigned to.
+
+If there are no available nodes at all, returns C<undef>.
+
+=cut
+
+sub get_node_affinity : prototype($$$) {
+ my ($rules, $sid, $online_node_usage) = @_;
+
+ my $node_affinity_rule = $get_resource_node_affinity_rule->($rules, $sid);
+
+ # default to a node affinity rule with all available nodes
+ if (!$node_affinity_rule) {
+ for my $node ($online_node_usage->list_nodes()) {
+ $node_affinity_rule->{nodes}->{$node} = { priority => 0 };
+ }
+ }
+
+ # add remaining nodes with low priority for non-strict node affinity rules
+ if (!$node_affinity_rule->{strict}) {
+ for my $node ($online_node_usage->list_nodes()) {
+ next if defined($node_affinity_rule->{nodes}->{$node});
+
+ $node_affinity_rule->{nodes}->{$node} = { priority => -1 };
+ }
+ }
+
+ my $allowed_nodes = {};
+ my $prioritized_nodes = {};
+
+ while (my ($node, $props) = each %{ $node_affinity_rule->{nodes} }) {
+ next if !$online_node_usage->contains_node($node); # node is offline
+
+ $allowed_nodes->{$node} = 1;
+ $prioritized_nodes->{ $props->{priority} }->{$node} = 1;
+ }
+
+ my $preferred_nodes = {};
+ my $highest_priority = (sort { $b <=> $a } keys %$prioritized_nodes)[0];
+ $preferred_nodes = $prioritized_nodes->{$highest_priority} if defined($highest_priority);
+
+ return ($allowed_nodes, $preferred_nodes);
+}
+
1;
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index f6faa386..78a001eb 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -4,12 +4,19 @@ use strict;
use warnings;
use lib '..';
-use PVE::HA::Groups;
use PVE::HA::Manager;
use PVE::HA::Usage::Basic;
-my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
-group: prefer_node1
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
+
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
+my $rules = PVE::HA::Rules->parse_config("rules.tmp", <<EOD);
+node-affinity: prefer_node1
+ resources vm:111
nodes node1
EOD
@@ -21,7 +28,6 @@ $online_node_usage->add_node("node3");
my $service_conf = {
node => 'node1',
- group => 'prefer_node1',
failback => 1,
};
@@ -37,7 +43,7 @@ sub test {
my $select_node_preference = $try_next ? 'try-next' : 'none';
my $node = PVE::HA::Manager::select_service_node(
- $groups, $online_node_usage, "vm:111", $service_conf, $sd, $select_node_preference,
+ $rules, $online_node_usage, "vm:111", $service_conf, $sd, $select_node_preference,
);
my (undef, undef, $line) = caller();
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 12/19] test: add test cases for rules config
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (10 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 11/19] manager: apply node affinity rules when selecting service nodes Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 13/19] api: introduce ha rules api endpoints Daniel Kral
` (13 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Add test cases to verify that the rule checkers correctly identify and
remove HA rules from the rules to make the rule set feasible. For now,
there only are HA Node Affinity rules, which verify:
- Node Affinity rules retrieve the correct optional default values
- Node Affinity rules, which specify the same HA resource more than
once, are dropped from the rule set
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
.gitignore | 1 +
src/test/Makefile | 4 +-
.../defaults-for-node-affinity-rules.cfg | 22 ++++
...efaults-for-node-affinity-rules.cfg.expect | 60 +++++++++++
...e-resource-refs-in-node-affinity-rules.cfg | 31 ++++++
...rce-refs-in-node-affinity-rules.cfg.expect | 63 +++++++++++
src/test/test_rules_config.pl | 100 ++++++++++++++++++
7 files changed, 280 insertions(+), 1 deletion(-)
create mode 100644 src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg
create mode 100644 src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg
create mode 100644 src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect
create mode 100755 src/test/test_rules_config.pl
diff --git a/.gitignore b/.gitignore
index c35280ee..35de63f6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,3 +6,4 @@
/src/test/test-*/status/*
/src/test/fence_cfgs/*.cfg.commands
/src/test/fence_cfgs/*.cfg.write
+/src/test/rules_cfgs/*.cfg.output
diff --git a/src/test/Makefile b/src/test/Makefile
index e54959fb..6da9e100 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -5,6 +5,7 @@ all:
test:
@echo "-- start regression tests --"
./test_failover1.pl
+ ./test_rules_config.pl
./ha-tester.pl
./test_fence_config.pl
@echo "-- end regression tests (success) --"
@@ -12,4 +13,5 @@ test:
.PHONY: clean
clean:
rm -rf *~ test-*/log test-*/*~ test-*/status \
- fence_cfgs/*.cfg.commands fence_cfgs/*.write
+ fence_cfgs/*.cfg.commands fence_cfgs/*.write \
+ rules_cfgs/*.cfg.output
diff --git a/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg
new file mode 100644
index 00000000..c8b2f2dd
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg
@@ -0,0 +1,22 @@
+# Case 1: Node Affinity rules are enabled and loose by default, so set it so if it isn't yet.
+node-affinity: node-affinity-defaults
+ resources vm:101
+ nodes node1
+
+# Case 2: Node Affinity rule is disabled, it shouldn't be enabled afterwards.
+node-affinity: node-affinity-disabled
+ resources vm:102
+ nodes node2
+ disable
+
+# Case 3: Node Affinity rule is disabled with explicit 1 set, it shouldn't be enabled afterwards.
+node-affinity: node-affinity-disabled-explicit
+ resources vm:103
+ nodes node2
+ disable 1
+
+# Case 4: Node Affinity rule is set to strict, so it shouldn't be loose afterwards.
+node-affinity: node-affinity-strict
+ resources vm:104
+ nodes node3
+ strict 1
diff --git a/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect
new file mode 100644
index 00000000..59a2c364
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-node-affinity-rules.cfg.expect
@@ -0,0 +1,60 @@
+--- Log ---
+--- Config ---
+$VAR1 = {
+ 'digest' => 'c96c9de143221a82e44efa8bb4814b8248a8ea11',
+ 'ids' => {
+ 'node-affinity-defaults' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:101' => 1
+ },
+ 'type' => 'node-affinity'
+ },
+ 'node-affinity-disabled' => {
+ 'disable' => 1,
+ 'nodes' => {
+ 'node2' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:102' => 1
+ },
+ 'type' => 'node-affinity'
+ },
+ 'node-affinity-disabled-explicit' => {
+ 'disable' => 1,
+ 'nodes' => {
+ 'node2' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:103' => 1
+ },
+ 'type' => 'node-affinity'
+ },
+ 'node-affinity-strict' => {
+ 'nodes' => {
+ 'node3' => {
+ 'priority' => 0
+ }
+ },
+ 'resources' => {
+ 'vm:104' => 1
+ },
+ 'strict' => 1,
+ 'type' => 'node-affinity'
+ }
+ },
+ 'order' => {
+ 'node-affinity-defaults' => 1,
+ 'node-affinity-disabled' => 2,
+ 'node-affinity-disabled-explicit' => 3,
+ 'node-affinity-strict' => 4
+ }
+ };
diff --git a/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg
new file mode 100644
index 00000000..1e279e73
--- /dev/null
+++ b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg
@@ -0,0 +1,31 @@
+# Case 1: Do not remove two Node Affinity rules, which do not share resources.
+node-affinity: no-same-resource1
+ resources vm:101,vm:102,vm:103
+ nodes node1,node2:2
+ strict 0
+
+node-affinity: no-same-resource2
+ resources vm:104,vm:105
+ nodes node1,node2:2
+ strict 0
+
+node-affinity: no-same-resource3
+ resources vm:106
+ nodes node1,node2:2
+ strict 1
+
+# Case 2: Remove Node Affinity rules, which share the same resource between them.
+node-affinity: same-resource1
+ resources vm:201
+ nodes node1,node2:2
+ strict 0
+
+node-affinity: same-resource2
+ resources vm:201,vm:202
+ nodes node3
+ strict 1
+
+node-affinity: same-resource3
+ resources vm:201,vm:203,vm:204
+ nodes node1:2,node3:3
+ strict 0
diff --git a/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect
new file mode 100644
index 00000000..3fd0c9ca
--- /dev/null
+++ b/src/test/rules_cfgs/multiple-resource-refs-in-node-affinity-rules.cfg.expect
@@ -0,0 +1,63 @@
+--- Log ---
+Drop rule 'same-resource1', because resource 'vm:201' is already used in another node affinity rule.
+Drop rule 'same-resource2', because resource 'vm:201' is already used in another node affinity rule.
+Drop rule 'same-resource3', because resource 'vm:201' is already used in another node affinity rule.
+--- Config ---
+$VAR1 = {
+ 'digest' => '5865d23b1a342e7f8cfa68bd0e1da556ca8d28a6',
+ 'ids' => {
+ 'no-same-resource1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'resources' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ 'vm:103' => 1
+ },
+ 'strict' => 0,
+ 'type' => 'node-affinity'
+ },
+ 'no-same-resource2' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'resources' => {
+ 'vm:104' => 1,
+ 'vm:105' => 1
+ },
+ 'strict' => 0,
+ 'type' => 'node-affinity'
+ },
+ 'no-same-resource3' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'resources' => {
+ 'vm:106' => 1
+ },
+ 'strict' => 1,
+ 'type' => 'node-affinity'
+ }
+ },
+ 'order' => {
+ 'no-same-resource1' => 1,
+ 'no-same-resource2' => 2,
+ 'no-same-resource3' => 3
+ }
+ };
diff --git a/src/test/test_rules_config.pl b/src/test/test_rules_config.pl
new file mode 100755
index 00000000..824afed1
--- /dev/null
+++ b/src/test/test_rules_config.pl
@@ -0,0 +1,100 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+use lib qw(..);
+
+use Test::More;
+use Test::MockModule;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
+
+PVE::HA::Rules::NodeAffinity->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
+my $opt_nodiff;
+
+if (!GetOptions("nodiff" => \$opt_nodiff)) {
+ print "usage: $0 [test.cfg] [--nodiff]\n";
+ exit -1;
+}
+
+sub _log {
+ my ($fh, $source, $message) = @_;
+
+ chomp $message;
+ $message = "[$source] $message" if $source;
+
+ print "$message\n";
+
+ $fh->print("$message\n");
+ $fh->flush();
+}
+
+sub check_cfg {
+ my ($cfg_fn, $outfile) = @_;
+
+ my $raw = PVE::Tools::file_get_contents($cfg_fn);
+
+ open(my $LOG, '>', "$outfile");
+ select($LOG);
+ $| = 1;
+
+ print "--- Log ---\n";
+ my $cfg = PVE::HA::Rules->parse_config($cfg_fn, $raw);
+ PVE::HA::Rules->set_rule_defaults($_) for values %{ $cfg->{ids} };
+ my $messages = PVE::HA::Rules->canonicalize($cfg);
+ print $_ for @$messages;
+ print "--- Config ---\n";
+ {
+ local $Data::Dumper::Sortkeys = 1;
+ print Dumper($cfg);
+ }
+
+ select(STDOUT);
+}
+
+sub run_test {
+ my ($cfg_fn) = @_;
+
+ print "* check: $cfg_fn\n";
+
+ my $outfile = "$cfg_fn.output";
+ my $expect = "$cfg_fn.expect";
+
+ eval { check_cfg($cfg_fn, $outfile); };
+ if (my $err = $@) {
+ die "Test '$cfg_fn' failed:\n$err\n";
+ }
+
+ return if $opt_nodiff;
+
+ my $res;
+
+ if (-f $expect) {
+ my $cmd = ['diff', '-u', $expect, $outfile];
+ $res = system(@$cmd);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ } else {
+ $res = system('cp', $outfile, $expect);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ }
+
+ print "* end rules test: $cfg_fn (success)\n\n";
+}
+
+# exec tests
+
+if (my $testcfg = shift) {
+ run_test($testcfg);
+} else {
+ for my $cfg (<rules_cfgs/*cfg>) {
+ run_test($cfg);
+ }
+}
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 13/19] api: introduce ha rules api endpoints
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (11 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 12/19] test: add test cases for rules config Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 14/19] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
` (12 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Add CRUD API endpoints for HA rules, which assert whether the given
properties for the rules are valid and will not make the existing rule
set infeasible.
Disallowing changes to the rule set via the API, which would make this
and other rules infeasible, makes it safer for users of the HA Manager
to not disrupt the behavior that other rules already enforce.
This functionality can obviously not safeguard manual changes to the
rules config file itself, but manual changes that result in infeasible
rules will be dropped on the next canonalize(...) call by the HA
Manager anyway with a log message.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/API2/HA/Makefile | 2 +-
src/PVE/API2/HA/Rules.pm | 391 ++++++++++++++++++++++++++++++++++
3 files changed, 393 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/API2/HA/Rules.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 7462663b..b4eff279 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -16,6 +16,7 @@
/usr/share/man/man8/pve-ha-lrm.8.gz
/usr/share/perl5/PVE/API2/HA/Groups.pm
/usr/share/perl5/PVE/API2/HA/Resources.pm
+/usr/share/perl5/PVE/API2/HA/Rules.pm
/usr/share/perl5/PVE/API2/HA/Status.pm
/usr/share/perl5/PVE/CLI/ha_manager.pm
/usr/share/perl5/PVE/HA/CRM.pm
diff --git a/src/PVE/API2/HA/Makefile b/src/PVE/API2/HA/Makefile
index 5686efcb..86c10135 100644
--- a/src/PVE/API2/HA/Makefile
+++ b/src/PVE/API2/HA/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Resources.pm Groups.pm Status.pm
+SOURCES=Resources.pm Groups.pm Rules.pm Status.pm
.PHONY: install
install:
diff --git a/src/PVE/API2/HA/Rules.pm b/src/PVE/API2/HA/Rules.pm
new file mode 100644
index 00000000..2e5e3820
--- /dev/null
+++ b/src/PVE/API2/HA/Rules.pm
@@ -0,0 +1,391 @@
+package PVE::API2::HA::Rules;
+
+use strict;
+use warnings;
+
+use HTTP::Status qw(:constants);
+
+use Storable qw(dclone);
+
+use PVE::Cluster qw(cfs_read_file);
+use PVE::Exception;
+use PVE::Tools qw(extract_param);
+use PVE::JSONSchema qw(get_standard_option);
+
+use PVE::HA::Config;
+use PVE::HA::Groups;
+use PVE::HA::Rules;
+
+use base qw(PVE::RESTHandler);
+
+my $get_api_ha_rule = sub {
+ my ($rules, $ruleid, $rule_errors) = @_;
+
+ die "no such ha rule '$ruleid'\n" if !$rules->{ids}->{$ruleid};
+
+ my $rule_cfg = dclone($rules->{ids}->{$ruleid});
+
+ $rule_cfg->{rule} = $ruleid;
+ $rule_cfg->{digest} = $rules->{digest};
+ $rule_cfg->{order} = $rules->{order}->{$ruleid};
+
+ # set optional rule parameter's default values
+ PVE::HA::Rules->set_rule_defaults($rule_cfg);
+
+ if ($rule_cfg->{resources}) {
+ $rule_cfg->{resources} =
+ PVE::HA::Rules->encode_value($rule_cfg->{type}, 'resources', $rule_cfg->{resources});
+ }
+
+ if ($rule_cfg->{nodes}) {
+ $rule_cfg->{nodes} =
+ PVE::HA::Rules->encode_value($rule_cfg->{type}, 'nodes', $rule_cfg->{nodes});
+ }
+
+ if ($rule_errors) {
+ $rule_cfg->{errors} = $rule_errors;
+ }
+
+ return $rule_cfg;
+};
+
+my $assert_resources_are_configured = sub {
+ my ($resources) = @_;
+
+ my $unconfigured_resources = [];
+
+ for my $resource (sort keys %$resources) {
+ push @$unconfigured_resources, $resource
+ if !PVE::HA::Config::service_is_configured($resource);
+ }
+
+ die "cannot use unmanaged resource(s) " . join(', ', @$unconfigured_resources) . ".\n"
+ if @$unconfigured_resources;
+};
+
+my $assert_nodes_do_exist = sub {
+ my ($nodes) = @_;
+
+ my $nonexistant_nodes = [];
+
+ for my $node (sort keys %$nodes) {
+ push @$nonexistant_nodes, $node
+ if !PVE::Cluster::check_node_exists($node, 1);
+ }
+
+ die "cannot use non-existant node(s) " . join(', ', @$nonexistant_nodes) . ".\n"
+ if @$nonexistant_nodes;
+};
+
+my $get_full_rules_config = sub {
+ my ($rules) = @_;
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ # TODO PVE 10: Remove group migration when HA groups have been fully migrated to location rules
+ my $groups = PVE::HA::Config::read_group_config();
+ my $resources = PVE::HA::Config::read_and_check_resources_config();
+
+ PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $resources);
+
+ return $rules;
+};
+
+my $check_feasibility = sub {
+ my ($rules) = @_;
+
+ $rules = dclone($rules);
+
+ $rules = $get_full_rules_config->($rules);
+
+ return PVE::HA::Rules->check_feasibility($rules);
+};
+
+my $assert_feasibility = sub {
+ my ($rules, $ruleid) = @_;
+
+ my $global_errors = $check_feasibility->($rules);
+ my $rule_errors = $global_errors->{$ruleid};
+
+ return if !$rule_errors;
+
+ # stringify error messages
+ for my $opt (keys %$rule_errors) {
+ $rule_errors->{$opt} = join(', ', @{ $rule_errors->{$opt} });
+ }
+
+ my $param = {
+ code => HTTP_BAD_REQUEST,
+ errors => $rule_errors,
+ };
+
+ my $exc = PVE::Exception->new("Rule '$ruleid' is invalid.\n", %$param);
+
+ my ($pkg, $filename, $line) = caller;
+
+ $exc->{filename} = $filename;
+ $exc->{line} = $line;
+
+ die $exc;
+};
+
+__PACKAGE__->register_method({
+ name => 'index',
+ path => '',
+ method => 'GET',
+ description => "Get HA rules.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Audit']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ type => {
+ type => 'string',
+ description => "Limit the returned list to the specified rule type.",
+ enum => PVE::HA::Rules->lookup_types(),
+ optional => 1,
+ },
+ resource => {
+ type => 'string',
+ description =>
+ "Limit the returned list to rules affecting the specified resource.",
+ completion => \&PVE::HA::Tools::complete_sid,
+ optional => 1,
+ },
+ },
+ },
+ returns => {
+ type => 'array',
+ items => {
+ type => 'object',
+ properties => {
+ rule => { type => 'string' },
+ },
+ links => [{ rel => 'child', href => '{rule}' }],
+ },
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $type = extract_param($param, 'type');
+ my $state = extract_param($param, 'state');
+ my $resource = extract_param($param, 'resource');
+
+ my $rules = PVE::HA::Config::read_rules_config();
+ $rules = $get_full_rules_config->($rules);
+
+ my $global_errors = $check_feasibility->($rules);
+
+ my $res = [];
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ my $rule_errors = $global_errors->{$ruleid};
+ my $rule_cfg = $get_api_ha_rule->($rules, $ruleid, $rule_errors);
+
+ push @$res, $rule_cfg;
+ },
+ {
+ type => $type,
+ sid => $resource,
+ },
+ );
+
+ return $res;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'read_rule',
+ method => 'GET',
+ path => '{rule}',
+ description => "Read HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Audit']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ { completion => \&PVE::HA::Tools::complete_rule },
+ ),
+ },
+ },
+ returns => {
+ type => 'object',
+ properties => {
+ rule => get_standard_option('pve-ha-rule-id'),
+ type => {
+ type => 'string',
+ },
+ },
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+
+ my $rules = PVE::HA::Config::read_rules_config();
+ $rules = $get_full_rules_config->($rules);
+
+ my $global_errors = $check_feasibility->($rules);
+ my $rule_errors = $global_errors->{$ruleid};
+
+ return $get_api_ha_rule->($rules, $ruleid, $rule_errors);
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'create_rule',
+ method => 'POST',
+ path => '',
+ description => "Create HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => PVE::HA::Rules->createSchema(),
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ PVE::Cluster::check_cfs_quorum();
+ mkdir("/etc/pve/ha");
+
+ my $type = extract_param($param, 'type');
+ my $ruleid = extract_param($param, 'rule');
+
+ my $plugin = PVE::HA::Rules->lookup($type);
+
+ my $opts = $plugin->check_config($ruleid, $param, 1, 1);
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ die "HA rule '$ruleid' already defined\n" if $rules->{ids}->{$ruleid};
+
+ $assert_resources_are_configured->($opts->{resources});
+ $assert_nodes_do_exist->($opts->{nodes}) if $opts->{nodes};
+
+ $rules->{order}->{$ruleid} = PVE::HA::Rules::get_next_ordinal($rules);
+ $rules->{ids}->{$ruleid} = $opts;
+
+ $assert_feasibility->($rules, $ruleid);
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "create ha rule failed",
+ );
+
+ return undef;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'update_rule',
+ method => 'PUT',
+ path => '{rule}',
+ description => "Update HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => PVE::HA::Rules->updateSchema(),
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+ my $digest = extract_param($param, 'digest');
+ my $delete = extract_param($param, 'delete');
+
+ if ($delete) {
+ $delete = [PVE::Tools::split_list($delete)];
+ }
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ PVE::SectionConfig::assert_if_modified($rules, $digest);
+
+ my $rule = $rules->{ids}->{$ruleid} || die "HA rule '$ruleid' does not exist\n";
+
+ my $type = $rule->{type};
+ my $plugin = PVE::HA::Rules->lookup($type);
+ my $opts = $plugin->check_config($ruleid, $param, 0, 1);
+
+ $assert_resources_are_configured->($opts->{resources});
+ $assert_nodes_do_exist->($opts->{nodes}) if $opts->{nodes};
+
+ my $options = $plugin->private()->{options}->{$type};
+ PVE::SectionConfig::delete_from_config($rule, $options, $opts, $delete);
+
+ $rule->{$_} = $opts->{$_} for keys $opts->%*;
+
+ $assert_feasibility->($rules, $ruleid);
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "update HA rules failed",
+ );
+
+ return undef;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'delete_rule',
+ method => 'DELETE',
+ path => '{rule}',
+ description => "Delete HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ { completion => \&PVE::HA::Tools::complete_rule },
+ ),
+ },
+ },
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ delete $rules->{ids}->{$ruleid};
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "delete ha rule failed",
+ );
+
+ return undef;
+ },
+});
+
+1;
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 14/19] cli: expose ha rules api endpoints to ha-manager cli
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (12 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 13/19] api: introduce ha rules api endpoints Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 15/19] sim: do not create default groups for test cases Daniel Kral
` (11 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Expose the HA rules API endpoints through the CLI in its own subcommand.
The names of the subsubcommands are chosen to be consistent with the
other commands provided by the ha-manager CLI for HA resources and
groups, but grouped into a subcommand.
The properties specified for the 'rules config' command are chosen to
reflect the columns from the WebGUI for the HA rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/CLI/ha_manager.pm | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/src/PVE/CLI/ha_manager.pm b/src/PVE/CLI/ha_manager.pm
index ca230f29..ef936cda 100644
--- a/src/PVE/CLI/ha_manager.pm
+++ b/src/PVE/CLI/ha_manager.pm
@@ -17,6 +17,7 @@ use PVE::HA::Env::PVE2;
use PVE::HA::Tools;
use PVE::API2::HA::Resources;
use PVE::API2::HA::Groups;
+use PVE::API2::HA::Rules;
use PVE::API2::HA::Status;
use base qw(PVE::CLIHandler);
@@ -199,6 +200,37 @@ our $cmddef = {
groupremove => ["PVE::API2::HA::Groups", 'delete', ['group']],
groupset => ["PVE::API2::HA::Groups", 'update', ['group']],
+ rules => {
+ list => [
+ 'PVE::API2::HA::Rules',
+ 'index',
+ [],
+ {},
+ sub {
+ my ($data, $schema, $options) = @_;
+ PVE::CLIFormatter::print_api_result($data, $schema, undef, $options);
+ },
+ $PVE::RESTHandler::standard_output_options,
+ ],
+ config => [
+ 'PVE::API2::HA::Rules',
+ 'index',
+ ['rule'],
+ {},
+ sub {
+ my ($data, $schema, $options) = @_;
+ my $props_to_print = [
+ 'rule', 'type', 'state', 'affinity', 'strict', 'resources', 'nodes',
+ ];
+ PVE::CLIFormatter::print_api_result($data, $schema, $props_to_print, $options);
+ },
+ $PVE::RESTHandler::standard_output_options,
+ ],
+ add => ['PVE::API2::HA::Rules', 'create_rule', ['type', 'rule']],
+ remove => ['PVE::API2::HA::Rules', 'delete_rule', ['rule']],
+ set => ['PVE::API2::HA::Rules', 'update_rule', ['type', 'rule']],
+ },
+
add => ["PVE::API2::HA::Resources", 'create', ['sid']],
remove => ["PVE::API2::HA::Resources", 'delete', ['sid']],
set => ["PVE::API2::HA::Resources", 'update', ['sid']],
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 15/19] sim: do not create default groups for test cases
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (13 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 14/19] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-30 10:01 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 16/19] test: ha tester: migrate groups to service and rules config Daniel Kral
` (10 subsequent siblings)
25 siblings, 1 reply; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
As none of the existing HA test cases rely on the default HA groups
created by the simulated hardware anymore, create them only for the
ha-simulator hardware.
This is done, because in an upcoming patch, which persistently migrates
HA groups to node affinity rules, it would unnecessarily fire the
migration for every default group config.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Sim/Hardware.pm | 16 ----------------
src/PVE/HA/Sim/RTHardware.pm | 18 ++++++++++++++++++
2 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 579be2ad..35107446 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -375,20 +375,6 @@ sub read_static_service_stats {
return $stats;
}
-my $default_group_config = <<__EOD;
-group: prefer_node1
- nodes node1
- nofailback 1
-
-group: prefer_node2
- nodes node2
- nofailback 1
-
-group: prefer_node3
- nodes node3
- nofailback 1
-__EOD
-
sub new {
my ($this, $testdir) = @_;
@@ -415,8 +401,6 @@ sub new {
if (-f "$testdir/groups") {
copy("$testdir/groups", "$statusdir/groups");
- } else {
- PVE::Tools::file_set_contents("$statusdir/groups", $default_group_config);
}
if (-f "$testdir/service_config") {
diff --git a/src/PVE/HA/Sim/RTHardware.pm b/src/PVE/HA/Sim/RTHardware.pm
index 0dfe0b21..611f9386 100644
--- a/src/PVE/HA/Sim/RTHardware.pm
+++ b/src/PVE/HA/Sim/RTHardware.pm
@@ -24,6 +24,20 @@ use PVE::HA::LRM;
use PVE::HA::Sim::RTEnv;
use base qw(PVE::HA::Sim::Hardware);
+my $default_group_config = <<__EOD;
+group: prefer_node1
+ nodes node1
+ nofailback 1
+
+group: prefer_node2
+ nodes node2
+ nofailback 1
+
+group: prefer_node3
+ nodes node3
+ nofailback 1
+__EOD
+
sub new {
my ($this, $testdir) = @_;
@@ -31,6 +45,10 @@ sub new {
my $self = $class->SUPER::new($testdir);
+ if (!-f "$testdir/groups") {
+ PVE::Tools::file_set_contents("$self->{statusdir}/groups", $default_group_config);
+ }
+
my $logfile = "$testdir/log";
$self->{logfh} = IO::File->new(">$logfile")
|| die "unable to open '$logfile' - $!";
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v4 15/19] sim: do not create default groups for test cases
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 15/19] sim: do not create default groups for test cases Daniel Kral
@ 2025-07-30 10:01 ` Daniel Kral
0 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-30 10:01 UTC (permalink / raw)
To: Proxmox VE development discussion; +Cc: pve-devel
On Tue Jul 29, 2025 at 8:00 PM CEST, Daniel Kral wrote:
> As none of the existing HA test cases rely on the default HA groups
> created by the simulated hardware anymore, create them only for the
> ha-simulator hardware.
>
> This is done, because in an upcoming patch, which persistently migrates
> HA groups to node affinity rules, it would unnecessarily fire the
> migration for every default group config.
I'll remove the creation of default groups entirely in the next revision
as these were not used by the test cases, but they're also not really
usable by the pve-ha-simulator anyway (except if the files are edited
manually), as there are is no GUI integration of HA groups CRUD nor
assigning HA resources to these groups.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 16/19] test: ha tester: migrate groups to service and rules config
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (14 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 15/19] sim: do not create default groups for test cases Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 17/19] test: ha tester: replace any reference to groups with node affinity rules Daniel Kral
` (9 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
This is done, because in an upcoming patch, which persistently migrates
HA groups to node affinity rules, it would make all these test cases try
to migrate the HA groups config to the service and rules config. As this
is not the responsibility of these test cases and HA groups become
deprecated anyway, move them now.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/test/test-basic5/groups | 2 --
src/test/test-basic5/rules_config | 3 +++
src/test/test-basic5/service_config | 2 +-
src/test/test-crs-static2/groups | 2 --
src/test/test-crs-static2/rules_config | 3 +++
src/test/test-crs-static2/service_config | 2 +-
src/test/test-node-affinity-nonstrict1/groups | 2 --
src/test/test-node-affinity-nonstrict1/rules_config | 3 +++
src/test/test-node-affinity-nonstrict1/service_config | 2 +-
src/test/test-node-affinity-nonstrict2/groups | 3 ---
src/test/test-node-affinity-nonstrict2/rules_config | 3 +++
src/test/test-node-affinity-nonstrict2/service_config | 2 +-
src/test/test-node-affinity-nonstrict3/groups | 2 --
src/test/test-node-affinity-nonstrict3/rules_config | 3 +++
src/test/test-node-affinity-nonstrict3/service_config | 2 +-
src/test/test-node-affinity-nonstrict4/groups | 2 --
src/test/test-node-affinity-nonstrict4/rules_config | 3 +++
src/test/test-node-affinity-nonstrict4/service_config | 2 +-
src/test/test-node-affinity-nonstrict5/groups | 2 --
src/test/test-node-affinity-nonstrict5/rules_config | 3 +++
src/test/test-node-affinity-nonstrict5/service_config | 2 +-
src/test/test-node-affinity-nonstrict6/groups | 3 ---
src/test/test-node-affinity-nonstrict6/rules_config | 3 +++
src/test/test-node-affinity-nonstrict6/service_config | 2 +-
src/test/test-node-affinity-strict1/groups | 3 ---
src/test/test-node-affinity-strict1/rules_config | 4 ++++
src/test/test-node-affinity-strict1/service_config | 2 +-
src/test/test-node-affinity-strict2/groups | 4 ----
src/test/test-node-affinity-strict2/rules_config | 4 ++++
src/test/test-node-affinity-strict2/service_config | 2 +-
src/test/test-node-affinity-strict3/groups | 3 ---
src/test/test-node-affinity-strict3/rules_config | 4 ++++
src/test/test-node-affinity-strict3/service_config | 2 +-
src/test/test-node-affinity-strict4/groups | 3 ---
src/test/test-node-affinity-strict4/rules_config | 4 ++++
src/test/test-node-affinity-strict4/service_config | 2 +-
src/test/test-node-affinity-strict5/groups | 3 ---
src/test/test-node-affinity-strict5/rules_config | 4 ++++
src/test/test-node-affinity-strict5/service_config | 2 +-
src/test/test-node-affinity-strict6/groups | 4 ----
src/test/test-node-affinity-strict6/rules_config | 4 ++++
src/test/test-node-affinity-strict6/service_config | 2 +-
src/test/test-recovery1/groups | 4 ----
src/test/test-recovery1/rules_config | 4 ++++
src/test/test-recovery1/service_config | 2 +-
src/test/test-recovery2/groups | 4 ----
src/test/test-recovery2/rules_config | 4 ++++
src/test/test-recovery2/service_config | 2 +-
src/test/test-recovery3/groups | 4 ----
src/test/test-recovery3/rules_config | 4 ++++
src/test/test-recovery3/service_config | 2 +-
src/test/test-recovery4/groups | 4 ----
src/test/test-recovery4/rules_config | 4 ++++
src/test/test-recovery4/service_config | 2 +-
src/test/test-resource-failure2/groups | 2 --
src/test/test-resource-failure2/rules_config | 3 +++
src/test/test-resource-failure2/service_config | 2 +-
src/test/test-resource-failure3/service_config | 2 +-
src/test/test-shutdown2/groups | 2 --
src/test/test-shutdown2/rules_config | 3 +++
src/test/test-shutdown2/service_config | 4 ++--
src/test/test-shutdown3/groups | 2 --
src/test/test-shutdown3/rules_config | 3 +++
src/test/test-shutdown3/service_config | 4 ++--
64 files changed, 97 insertions(+), 84 deletions(-)
delete mode 100644 src/test/test-basic5/groups
create mode 100644 src/test/test-basic5/rules_config
delete mode 100644 src/test/test-crs-static2/groups
create mode 100644 src/test/test-crs-static2/rules_config
delete mode 100644 src/test/test-node-affinity-nonstrict1/groups
create mode 100644 src/test/test-node-affinity-nonstrict1/rules_config
delete mode 100644 src/test/test-node-affinity-nonstrict2/groups
create mode 100644 src/test/test-node-affinity-nonstrict2/rules_config
delete mode 100644 src/test/test-node-affinity-nonstrict3/groups
create mode 100644 src/test/test-node-affinity-nonstrict3/rules_config
delete mode 100644 src/test/test-node-affinity-nonstrict4/groups
create mode 100644 src/test/test-node-affinity-nonstrict4/rules_config
delete mode 100644 src/test/test-node-affinity-nonstrict5/groups
create mode 100644 src/test/test-node-affinity-nonstrict5/rules_config
delete mode 100644 src/test/test-node-affinity-nonstrict6/groups
create mode 100644 src/test/test-node-affinity-nonstrict6/rules_config
delete mode 100644 src/test/test-node-affinity-strict1/groups
create mode 100644 src/test/test-node-affinity-strict1/rules_config
delete mode 100644 src/test/test-node-affinity-strict2/groups
create mode 100644 src/test/test-node-affinity-strict2/rules_config
delete mode 100644 src/test/test-node-affinity-strict3/groups
create mode 100644 src/test/test-node-affinity-strict3/rules_config
delete mode 100644 src/test/test-node-affinity-strict4/groups
create mode 100644 src/test/test-node-affinity-strict4/rules_config
delete mode 100644 src/test/test-node-affinity-strict5/groups
create mode 100644 src/test/test-node-affinity-strict5/rules_config
delete mode 100644 src/test/test-node-affinity-strict6/groups
create mode 100644 src/test/test-node-affinity-strict6/rules_config
delete mode 100644 src/test/test-recovery1/groups
create mode 100644 src/test/test-recovery1/rules_config
delete mode 100644 src/test/test-recovery2/groups
create mode 100644 src/test/test-recovery2/rules_config
delete mode 100644 src/test/test-recovery3/groups
create mode 100644 src/test/test-recovery3/rules_config
delete mode 100644 src/test/test-recovery4/groups
create mode 100644 src/test/test-recovery4/rules_config
delete mode 100644 src/test/test-resource-failure2/groups
create mode 100644 src/test/test-resource-failure2/rules_config
delete mode 100644 src/test/test-shutdown2/groups
create mode 100644 src/test/test-shutdown2/rules_config
delete mode 100644 src/test/test-shutdown3/groups
create mode 100644 src/test/test-shutdown3/rules_config
diff --git a/src/test/test-basic5/groups b/src/test/test-basic5/groups
deleted file mode 100644
index 3c0cff1e..00000000
--- a/src/test/test-basic5/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: prefer_node1
- nodes node1
\ No newline at end of file
diff --git a/src/test/test-basic5/rules_config b/src/test/test-basic5/rules_config
new file mode 100644
index 00000000..d980209a
--- /dev/null
+++ b/src/test/test-basic5/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node1
+ nodes node1
+ resources vm:101
diff --git a/src/test/test-basic5/service_config b/src/test/test-basic5/service_config
index 5bf422ca..c202a349 100644
--- a/src/test/test-basic5/service_config
+++ b/src/test/test-basic5/service_config
@@ -1,5 +1,5 @@
{
- "vm:101": { "node": "node1", "state": "enabled", "group": "prefer_node1" },
+ "vm:101": { "node": "node1", "state": "enabled" },
"vm:102": { "node": "node2", "state": "enabled" },
"vm:103": { "node": "node3", "state": "enabled" }
}
\ No newline at end of file
diff --git a/src/test/test-crs-static2/groups b/src/test/test-crs-static2/groups
deleted file mode 100644
index 43e9bf5f..00000000
--- a/src/test/test-crs-static2/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: prefer_node1
- nodes node1
diff --git a/src/test/test-crs-static2/rules_config b/src/test/test-crs-static2/rules_config
new file mode 100644
index 00000000..33df2db0
--- /dev/null
+++ b/src/test/test-crs-static2/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm102-should-be-on-node1
+ nodes node1
+ resources vm:102
diff --git a/src/test/test-crs-static2/service_config b/src/test/test-crs-static2/service_config
index 1f2333d0..9c124471 100644
--- a/src/test/test-crs-static2/service_config
+++ b/src/test/test-crs-static2/service_config
@@ -1,3 +1,3 @@
{
- "vm:102": { "node": "node1", "state": "enabled", "group": "prefer_node1" }
+ "vm:102": { "node": "node1", "state": "enabled" }
}
diff --git a/src/test/test-node-affinity-nonstrict1/groups b/src/test/test-node-affinity-nonstrict1/groups
deleted file mode 100644
index 50c9a2d7..00000000
--- a/src/test/test-node-affinity-nonstrict1/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: should_stay_here
- nodes node3
diff --git a/src/test/test-node-affinity-nonstrict1/rules_config b/src/test/test-node-affinity-nonstrict1/rules_config
new file mode 100644
index 00000000..f758b512
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict1/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node3
+ nodes node3
+ resources vm:101
diff --git a/src/test/test-node-affinity-nonstrict1/service_config b/src/test/test-node-affinity-nonstrict1/service_config
index 5f558431..7f0b1bf9 100644
--- a/src/test/test-node-affinity-nonstrict1/service_config
+++ b/src/test/test-node-affinity-nonstrict1/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+ "vm:101": { "node": "node3", "state": "started" }
}
diff --git a/src/test/test-node-affinity-nonstrict2/groups b/src/test/test-node-affinity-nonstrict2/groups
deleted file mode 100644
index 59192fad..00000000
--- a/src/test/test-node-affinity-nonstrict2/groups
+++ /dev/null
@@ -1,3 +0,0 @@
-group: should_stay_here
- nodes node3
- nofailback 1
diff --git a/src/test/test-node-affinity-nonstrict2/rules_config b/src/test/test-node-affinity-nonstrict2/rules_config
new file mode 100644
index 00000000..f758b512
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict2/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node3
+ nodes node3
+ resources vm:101
diff --git a/src/test/test-node-affinity-nonstrict2/service_config b/src/test/test-node-affinity-nonstrict2/service_config
index 5f558431..c7266eec 100644
--- a/src/test/test-node-affinity-nonstrict2/service_config
+++ b/src/test/test-node-affinity-nonstrict2/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+ "vm:101": { "node": "node3", "state": "started", "failback": 0 }
}
diff --git a/src/test/test-node-affinity-nonstrict3/groups b/src/test/test-node-affinity-nonstrict3/groups
deleted file mode 100644
index 50c9a2d7..00000000
--- a/src/test/test-node-affinity-nonstrict3/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: should_stay_here
- nodes node3
diff --git a/src/test/test-node-affinity-nonstrict3/rules_config b/src/test/test-node-affinity-nonstrict3/rules_config
new file mode 100644
index 00000000..f758b512
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict3/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node3
+ nodes node3
+ resources vm:101
diff --git a/src/test/test-node-affinity-nonstrict3/service_config b/src/test/test-node-affinity-nonstrict3/service_config
index 777b2a7e..cdf0bd0c 100644
--- a/src/test/test-node-affinity-nonstrict3/service_config
+++ b/src/test/test-node-affinity-nonstrict3/service_config
@@ -1,5 +1,5 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:101": { "node": "node3", "state": "started" },
"vm:102": { "node": "node2", "state": "started" },
"vm:103": { "node": "node2", "state": "started" }
}
diff --git a/src/test/test-node-affinity-nonstrict4/groups b/src/test/test-node-affinity-nonstrict4/groups
deleted file mode 100644
index b1584b55..00000000
--- a/src/test/test-node-affinity-nonstrict4/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: should_stay_here
- nodes node2,node3
diff --git a/src/test/test-node-affinity-nonstrict4/rules_config b/src/test/test-node-affinity-nonstrict4/rules_config
new file mode 100644
index 00000000..c9faedb1
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict4/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node2-node3
+ nodes node2,node3
+ resources vm:101
diff --git a/src/test/test-node-affinity-nonstrict4/service_config b/src/test/test-node-affinity-nonstrict4/service_config
index 777b2a7e..cdf0bd0c 100644
--- a/src/test/test-node-affinity-nonstrict4/service_config
+++ b/src/test/test-node-affinity-nonstrict4/service_config
@@ -1,5 +1,5 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:101": { "node": "node3", "state": "started" },
"vm:102": { "node": "node2", "state": "started" },
"vm:103": { "node": "node2", "state": "started" }
}
diff --git a/src/test/test-node-affinity-nonstrict5/groups b/src/test/test-node-affinity-nonstrict5/groups
deleted file mode 100644
index 03a0ee9b..00000000
--- a/src/test/test-node-affinity-nonstrict5/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: should_stay_here
- nodes node2:2,node3:1
diff --git a/src/test/test-node-affinity-nonstrict5/rules_config b/src/test/test-node-affinity-nonstrict5/rules_config
new file mode 100644
index 00000000..233b25d2
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict5/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node2-node3
+ nodes node2:2,node3:1
+ resources vm:101
diff --git a/src/test/test-node-affinity-nonstrict5/service_config b/src/test/test-node-affinity-nonstrict5/service_config
index 5f558431..7f0b1bf9 100644
--- a/src/test/test-node-affinity-nonstrict5/service_config
+++ b/src/test/test-node-affinity-nonstrict5/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+ "vm:101": { "node": "node3", "state": "started" }
}
diff --git a/src/test/test-node-affinity-nonstrict6/groups b/src/test/test-node-affinity-nonstrict6/groups
deleted file mode 100644
index a7aed178..00000000
--- a/src/test/test-node-affinity-nonstrict6/groups
+++ /dev/null
@@ -1,3 +0,0 @@
-group: should_stay_here
- nodes node2:2,node3:1
- nofailback 1
diff --git a/src/test/test-node-affinity-nonstrict6/rules_config b/src/test/test-node-affinity-nonstrict6/rules_config
new file mode 100644
index 00000000..233b25d2
--- /dev/null
+++ b/src/test/test-node-affinity-nonstrict6/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm101-should-be-on-node2-node3
+ nodes node2:2,node3:1
+ resources vm:101
diff --git a/src/test/test-node-affinity-nonstrict6/service_config b/src/test/test-node-affinity-nonstrict6/service_config
index c4ece62c..98aef4e9 100644
--- a/src/test/test-node-affinity-nonstrict6/service_config
+++ b/src/test/test-node-affinity-nonstrict6/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node2", "state": "started", "group": "should_stay_here" }
+ "vm:101": { "node": "node2", "state": "started", "failback": 0 }
}
diff --git a/src/test/test-node-affinity-strict1/groups b/src/test/test-node-affinity-strict1/groups
deleted file mode 100644
index 370865f6..00000000
--- a/src/test/test-node-affinity-strict1/groups
+++ /dev/null
@@ -1,3 +0,0 @@
-group: must_stay_here
- nodes node3
- restricted 1
diff --git a/src/test/test-node-affinity-strict1/rules_config b/src/test/test-node-affinity-strict1/rules_config
new file mode 100644
index 00000000..25aa655f
--- /dev/null
+++ b/src/test/test-node-affinity-strict1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node3
+ nodes node3
+ resources vm:101
+ strict 1
diff --git a/src/test/test-node-affinity-strict1/service_config b/src/test/test-node-affinity-strict1/service_config
index 36ea15b1..7f0b1bf9 100644
--- a/src/test/test-node-affinity-strict1/service_config
+++ b/src/test/test-node-affinity-strict1/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+ "vm:101": { "node": "node3", "state": "started" }
}
diff --git a/src/test/test-node-affinity-strict2/groups b/src/test/test-node-affinity-strict2/groups
deleted file mode 100644
index e43eafc5..00000000
--- a/src/test/test-node-affinity-strict2/groups
+++ /dev/null
@@ -1,4 +0,0 @@
-group: must_stay_here
- nodes node3
- restricted 1
- nofailback 1
diff --git a/src/test/test-node-affinity-strict2/rules_config b/src/test/test-node-affinity-strict2/rules_config
new file mode 100644
index 00000000..25aa655f
--- /dev/null
+++ b/src/test/test-node-affinity-strict2/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node3
+ nodes node3
+ resources vm:101
+ strict 1
diff --git a/src/test/test-node-affinity-strict2/service_config b/src/test/test-node-affinity-strict2/service_config
index 36ea15b1..c7266eec 100644
--- a/src/test/test-node-affinity-strict2/service_config
+++ b/src/test/test-node-affinity-strict2/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+ "vm:101": { "node": "node3", "state": "started", "failback": 0 }
}
diff --git a/src/test/test-node-affinity-strict3/groups b/src/test/test-node-affinity-strict3/groups
deleted file mode 100644
index 370865f6..00000000
--- a/src/test/test-node-affinity-strict3/groups
+++ /dev/null
@@ -1,3 +0,0 @@
-group: must_stay_here
- nodes node3
- restricted 1
diff --git a/src/test/test-node-affinity-strict3/rules_config b/src/test/test-node-affinity-strict3/rules_config
new file mode 100644
index 00000000..25aa655f
--- /dev/null
+++ b/src/test/test-node-affinity-strict3/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node3
+ nodes node3
+ resources vm:101
+ strict 1
diff --git a/src/test/test-node-affinity-strict3/service_config b/src/test/test-node-affinity-strict3/service_config
index 9adf02c8..cdf0bd0c 100644
--- a/src/test/test-node-affinity-strict3/service_config
+++ b/src/test/test-node-affinity-strict3/service_config
@@ -1,5 +1,5 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:101": { "node": "node3", "state": "started" },
"vm:102": { "node": "node2", "state": "started" },
"vm:103": { "node": "node2", "state": "started" }
}
diff --git a/src/test/test-node-affinity-strict4/groups b/src/test/test-node-affinity-strict4/groups
deleted file mode 100644
index 0ad2abc6..00000000
--- a/src/test/test-node-affinity-strict4/groups
+++ /dev/null
@@ -1,3 +0,0 @@
-group: must_stay_here
- nodes node2,node3
- restricted 1
diff --git a/src/test/test-node-affinity-strict4/rules_config b/src/test/test-node-affinity-strict4/rules_config
new file mode 100644
index 00000000..ceb59540
--- /dev/null
+++ b/src/test/test-node-affinity-strict4/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node2-node3
+ nodes node2,node3
+ resources vm:101
+ strict 1
diff --git a/src/test/test-node-affinity-strict4/service_config b/src/test/test-node-affinity-strict4/service_config
index 9adf02c8..cdf0bd0c 100644
--- a/src/test/test-node-affinity-strict4/service_config
+++ b/src/test/test-node-affinity-strict4/service_config
@@ -1,5 +1,5 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:101": { "node": "node3", "state": "started" },
"vm:102": { "node": "node2", "state": "started" },
"vm:103": { "node": "node2", "state": "started" }
}
diff --git a/src/test/test-node-affinity-strict5/groups b/src/test/test-node-affinity-strict5/groups
deleted file mode 100644
index ec3cd799..00000000
--- a/src/test/test-node-affinity-strict5/groups
+++ /dev/null
@@ -1,3 +0,0 @@
-group: must_stay_here
- nodes node2:2,node3:1
- restricted 1
diff --git a/src/test/test-node-affinity-strict5/rules_config b/src/test/test-node-affinity-strict5/rules_config
new file mode 100644
index 00000000..8ad48205
--- /dev/null
+++ b/src/test/test-node-affinity-strict5/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node2-node3
+ nodes node2:2,node3:1
+ resources vm:101
+ strict 1
diff --git a/src/test/test-node-affinity-strict5/service_config b/src/test/test-node-affinity-strict5/service_config
index 36ea15b1..7f0b1bf9 100644
--- a/src/test/test-node-affinity-strict5/service_config
+++ b/src/test/test-node-affinity-strict5/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+ "vm:101": { "node": "node3", "state": "started" }
}
diff --git a/src/test/test-node-affinity-strict6/groups b/src/test/test-node-affinity-strict6/groups
deleted file mode 100644
index cdd0e502..00000000
--- a/src/test/test-node-affinity-strict6/groups
+++ /dev/null
@@ -1,4 +0,0 @@
-group: must_stay_here
- nodes node2:2,node3:1
- restricted 1
- nofailback 1
diff --git a/src/test/test-node-affinity-strict6/rules_config b/src/test/test-node-affinity-strict6/rules_config
new file mode 100644
index 00000000..8ad48205
--- /dev/null
+++ b/src/test/test-node-affinity-strict6/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-must-be-on-node2-node3
+ nodes node2:2,node3:1
+ resources vm:101
+ strict 1
diff --git a/src/test/test-node-affinity-strict6/service_config b/src/test/test-node-affinity-strict6/service_config
index 1d371e1e..98aef4e9 100644
--- a/src/test/test-node-affinity-strict6/service_config
+++ b/src/test/test-node-affinity-strict6/service_config
@@ -1,3 +1,3 @@
{
- "vm:101": { "node": "node2", "state": "started", "group": "must_stay_here" }
+ "vm:101": { "node": "node2", "state": "started", "failback": 0 }
}
diff --git a/src/test/test-recovery1/groups b/src/test/test-recovery1/groups
deleted file mode 100644
index 06c7f76e..00000000
--- a/src/test/test-recovery1/groups
+++ /dev/null
@@ -1,4 +0,0 @@
-group: prefer_node2
- nodes node2
- restricted 1
-
diff --git a/src/test/test-recovery1/rules_config b/src/test/test-recovery1/rules_config
new file mode 100644
index 00000000..7ce791f5
--- /dev/null
+++ b/src/test/test-recovery1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm102-must-be-on-node2
+ nodes node2
+ resources vm:102
+ strict 1
diff --git a/src/test/test-recovery1/service_config b/src/test/test-recovery1/service_config
index 39a05e59..933564e3 100644
--- a/src/test/test-recovery1/service_config
+++ b/src/test/test-recovery1/service_config
@@ -1,3 +1,3 @@
{
- "vm:102": { "node": "node2", "state": "enabled", "group": "prefer_node2" }
+ "vm:102": { "node": "node2", "state": "enabled" }
}
diff --git a/src/test/test-recovery2/groups b/src/test/test-recovery2/groups
deleted file mode 100644
index 06c7f76e..00000000
--- a/src/test/test-recovery2/groups
+++ /dev/null
@@ -1,4 +0,0 @@
-group: prefer_node2
- nodes node2
- restricted 1
-
diff --git a/src/test/test-recovery2/rules_config b/src/test/test-recovery2/rules_config
new file mode 100644
index 00000000..7ce791f5
--- /dev/null
+++ b/src/test/test-recovery2/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm102-must-be-on-node2
+ nodes node2
+ resources vm:102
+ strict 1
diff --git a/src/test/test-recovery2/service_config b/src/test/test-recovery2/service_config
index 39a05e59..933564e3 100644
--- a/src/test/test-recovery2/service_config
+++ b/src/test/test-recovery2/service_config
@@ -1,3 +1,3 @@
{
- "vm:102": { "node": "node2", "state": "enabled", "group": "prefer_node2" }
+ "vm:102": { "node": "node2", "state": "enabled" }
}
diff --git a/src/test/test-recovery3/groups b/src/test/test-recovery3/groups
deleted file mode 100644
index 06c7f76e..00000000
--- a/src/test/test-recovery3/groups
+++ /dev/null
@@ -1,4 +0,0 @@
-group: prefer_node2
- nodes node2
- restricted 1
-
diff --git a/src/test/test-recovery3/rules_config b/src/test/test-recovery3/rules_config
new file mode 100644
index 00000000..7ce791f5
--- /dev/null
+++ b/src/test/test-recovery3/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm102-must-be-on-node2
+ nodes node2
+ resources vm:102
+ strict 1
diff --git a/src/test/test-recovery3/service_config b/src/test/test-recovery3/service_config
index 39a05e59..933564e3 100644
--- a/src/test/test-recovery3/service_config
+++ b/src/test/test-recovery3/service_config
@@ -1,3 +1,3 @@
{
- "vm:102": { "node": "node2", "state": "enabled", "group": "prefer_node2" }
+ "vm:102": { "node": "node2", "state": "enabled" }
}
diff --git a/src/test/test-recovery4/groups b/src/test/test-recovery4/groups
deleted file mode 100644
index 06c7f76e..00000000
--- a/src/test/test-recovery4/groups
+++ /dev/null
@@ -1,4 +0,0 @@
-group: prefer_node2
- nodes node2
- restricted 1
-
diff --git a/src/test/test-recovery4/rules_config b/src/test/test-recovery4/rules_config
new file mode 100644
index 00000000..7ce791f5
--- /dev/null
+++ b/src/test/test-recovery4/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm102-must-be-on-node2
+ nodes node2
+ resources vm:102
+ strict 1
diff --git a/src/test/test-recovery4/service_config b/src/test/test-recovery4/service_config
index 39a05e59..933564e3 100644
--- a/src/test/test-recovery4/service_config
+++ b/src/test/test-recovery4/service_config
@@ -1,3 +1,3 @@
{
- "vm:102": { "node": "node2", "state": "enabled", "group": "prefer_node2" }
+ "vm:102": { "node": "node2", "state": "enabled" }
}
diff --git a/src/test/test-resource-failure2/groups b/src/test/test-resource-failure2/groups
deleted file mode 100644
index 01d634f2..00000000
--- a/src/test/test-resource-failure2/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: all
- nodes node1,node2,node3
diff --git a/src/test/test-resource-failure2/rules_config b/src/test/test-resource-failure2/rules_config
new file mode 100644
index 00000000..11ff0187
--- /dev/null
+++ b/src/test/test-resource-failure2/rules_config
@@ -0,0 +1,3 @@
+node-affinity: fa130-should-be-on-node1-node2-node3
+ nodes node1,node2,node3
+ resources fa:130
diff --git a/src/test/test-resource-failure2/service_config b/src/test/test-resource-failure2/service_config
index a3f54599..7f0a481c 100644
--- a/src/test/test-resource-failure2/service_config
+++ b/src/test/test-resource-failure2/service_config
@@ -1,3 +1,3 @@
{
- "fa:130": { "node": "node2", "max_restart": "2", "group" : "all" }
+ "fa:130": { "node": "node2", "max_restart": "2" }
}
diff --git a/src/test/test-resource-failure3/service_config b/src/test/test-resource-failure3/service_config
index d596b9cf..61cfa1e1 100644
--- a/src/test/test-resource-failure3/service_config
+++ b/src/test/test-resource-failure3/service_config
@@ -1,3 +1,3 @@
{
- "fa:101": { "node": "node2", "group" : "all", "state" : "enabled" }
+ "fa:101": { "node": "node2", "state" : "enabled" }
}
diff --git a/src/test/test-shutdown2/groups b/src/test/test-shutdown2/groups
deleted file mode 100644
index d8ee675e..00000000
--- a/src/test/test-shutdown2/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: prefer_node3
- nodes node3
diff --git a/src/test/test-shutdown2/rules_config b/src/test/test-shutdown2/rules_config
new file mode 100644
index 00000000..a7b9226d
--- /dev/null
+++ b/src/test/test-shutdown2/rules_config
@@ -0,0 +1,3 @@
+node-affinity: vm103-should-be-on-node3
+ nodes node3
+ resources vm:103
diff --git a/src/test/test-shutdown2/service_config b/src/test/test-shutdown2/service_config
index 7153f420..c6860e7c 100644
--- a/src/test/test-shutdown2/service_config
+++ b/src/test/test-shutdown2/service_config
@@ -1,3 +1,3 @@
{
- "vm:103": { "node": "node3", "state": "enabled", "group": "prefer_node3" }
-}
\ No newline at end of file
+ "vm:103": { "node": "node3", "state": "enabled" }
+}
diff --git a/src/test/test-shutdown3/groups b/src/test/test-shutdown3/groups
deleted file mode 100644
index d8ee675e..00000000
--- a/src/test/test-shutdown3/groups
+++ /dev/null
@@ -1,2 +0,0 @@
-group: prefer_node3
- nodes node3
diff --git a/src/test/test-shutdown3/rules_config b/src/test/test-shutdown3/rules_config
new file mode 100644
index 00000000..aae55621
--- /dev/null
+++ b/src/test/test-shutdown3/rules_config
@@ -0,0 +1,3 @@
+node-affinity: ct103-should-be-on-node3
+ nodes node3
+ resources ct:103
diff --git a/src/test/test-shutdown3/service_config b/src/test/test-shutdown3/service_config
index 1b98cf4d..f2624ede 100644
--- a/src/test/test-shutdown3/service_config
+++ b/src/test/test-shutdown3/service_config
@@ -1,3 +1,3 @@
{
- "ct:103": { "node": "node3", "state": "enabled", "group": "prefer_node3" }
-}
\ No newline at end of file
+ "ct:103": { "node": "node3", "state": "enabled" }
+}
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 17/19] test: ha tester: replace any reference to groups with node affinity rules
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (15 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 16/19] test: ha tester: migrate groups to service and rules config Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 18/19] env: add property delete for update_service_config Daniel Kral
` (8 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
As these test cases do work with node affinity rules now, correctly
replace references to unrestricted/restricted groups with
non-strict/strict node affinity rules and also replace "nofailback" with
"disabled failback".
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/test/test-crs-static2/README | 3 ++-
src/test/test-node-affinity-nonstrict1/README | 7 ++++---
src/test/test-node-affinity-nonstrict2/README | 10 +++++-----
src/test/test-node-affinity-nonstrict3/README | 6 +++---
src/test/test-node-affinity-nonstrict4/README | 8 ++++----
src/test/test-node-affinity-nonstrict5/README | 8 ++++----
src/test/test-node-affinity-nonstrict6/README | 8 ++++----
src/test/test-node-affinity-strict1/README | 7 ++++---
src/test/test-node-affinity-strict2/README | 8 ++++----
src/test/test-node-affinity-strict3/README | 6 +++---
src/test/test-node-affinity-strict4/README | 8 ++++----
src/test/test-node-affinity-strict5/README | 8 ++++----
src/test/test-node-affinity-strict6/README | 11 ++++++-----
src/test/test-recovery2/README | 4 ++--
src/test/test-relocate-policy-default-group/README | 6 +++---
src/test/test-resource-failure6/README | 6 +++---
16 files changed, 59 insertions(+), 55 deletions(-)
diff --git a/src/test/test-crs-static2/README b/src/test/test-crs-static2/README
index 61530a76..c4812b5b 100644
--- a/src/test/test-crs-static2/README
+++ b/src/test/test-crs-static2/README
@@ -1,4 +1,5 @@
Test how service recovery works with the 'static' resource scheduling mode.
Expect that the single service always gets recovered to the node with the most
-available resources. Also tests that the group priority still takes precedence.
+available resources. Also tests that the node affinity rule's node priority
+still takes precedence.
diff --git a/src/test/test-node-affinity-nonstrict1/README b/src/test/test-node-affinity-nonstrict1/README
index 8775b6ca..15798005 100644
--- a/src/test/test-node-affinity-nonstrict1/README
+++ b/src/test/test-node-affinity-nonstrict1/README
@@ -1,5 +1,6 @@
-Test whether a service in a unrestricted group will automatically migrate back
-to a node member in case of a manual migration to a non-member node.
+Test whether a ha resource in a non-strict node affinity rule will
+automatically migrate back to a node member in case of a manual migration to a
+non-member node.
The test scenario is:
- vm:101 should be kept on node3
@@ -7,4 +8,4 @@ The test scenario is:
The expected outcome is:
- As vm:101 is manually migrated to node2, it is migrated back to node3, as
- node3 is a group member and has higher priority than the other nodes
+ node3 is a rule member and has higher priority than the other nodes
diff --git a/src/test/test-node-affinity-nonstrict2/README b/src/test/test-node-affinity-nonstrict2/README
index f27414b1..a2ad43ba 100644
--- a/src/test/test-node-affinity-nonstrict2/README
+++ b/src/test/test-node-affinity-nonstrict2/README
@@ -1,6 +1,6 @@
-Test whether a service in a unrestricted group with nofailback enabled will
-stay on the manual migration target node, even though the target node is not a
-member of the unrestricted group.
+Test whether a service in a non-strict node affinity rule, where the service
+has failback disabled, will stay on the manual migration target node, even
+though the target node is not a member of the non-strict node affinity rule.
The test scenario is:
- vm:101 should be kept on node3
@@ -8,5 +8,5 @@ The test scenario is:
The expected outcome is:
- As vm:101 is manually migrated to node2, vm:101 stays on node2; even though
- node2 is not a group member, the nofailback flag prevents vm:101 to be
- migrated back to a group member
+ node2 is not a rule member, the disabled failback flag prevents vm:101 to be
+ migrated back to a rule member.
diff --git a/src/test/test-node-affinity-nonstrict3/README b/src/test/test-node-affinity-nonstrict3/README
index c4ddfab8..98507ebd 100644
--- a/src/test/test-node-affinity-nonstrict3/README
+++ b/src/test/test-node-affinity-nonstrict3/README
@@ -1,6 +1,6 @@
-Test whether a service in a unrestricted group with only one node member will
-be migrated to a non-member node in case of a failover of their previously
-assigned node.
+Test whether a service in a non-strict node affinity rule with only one node
+member will be migrated to a non-member node in case of a failover of their
+previously assigned node.
The test scenario is:
- vm:101 should be kept on node3
diff --git a/src/test/test-node-affinity-nonstrict4/README b/src/test/test-node-affinity-nonstrict4/README
index a08f0e1d..31a46881 100644
--- a/src/test/test-node-affinity-nonstrict4/README
+++ b/src/test/test-node-affinity-nonstrict4/README
@@ -1,6 +1,6 @@
-Test whether a service in a unrestricted group with two node members will stay
-assigned to one of the node members in case of a failover of their previously
-assigned node.
+Test whether a service in a non-strict node affinity rule with two node members
+will stay assigned to one of the node members in case of a failover of their
+previously assigned node.
The test scenario is:
- vm:101 should be kept on node2 or node3
@@ -11,4 +11,4 @@ The test scenario is:
The expected outcome is:
- As node3 fails, vm:101 is migrated to node2, as it's the only available node
- left in the unrestricted group
+ left in the non-strict node affinity rule
diff --git a/src/test/test-node-affinity-nonstrict5/README b/src/test/test-node-affinity-nonstrict5/README
index 0c370446..118cd14c 100644
--- a/src/test/test-node-affinity-nonstrict5/README
+++ b/src/test/test-node-affinity-nonstrict5/README
@@ -1,6 +1,6 @@
-Test whether a service in a unrestricted group with two differently prioritized
-node members will stay on the node with the highest priority in case of a
-failover or when the service is on a lower-priority node.
+Test whether a service in a non-strict node affinity rule with two differently
+prioritized node members will stay on the node with the highest priority in
+case of a failover or when the service is on a lower-priority node.
The test scenario is:
- vm:101 should be kept on node2 or node3
@@ -11,6 +11,6 @@ The expected outcome is:
- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
a higher priority than node3
- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
- available node member left in the unrestricted group
+ available node member left in the non-strict node affinity rule
- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
higher priority than node3
diff --git a/src/test/test-node-affinity-nonstrict6/README b/src/test/test-node-affinity-nonstrict6/README
index 4ab12756..64cdaecd 100644
--- a/src/test/test-node-affinity-nonstrict6/README
+++ b/src/test/test-node-affinity-nonstrict6/README
@@ -1,6 +1,6 @@
-Test whether a service in a unrestricted group with nofailback enabled and two
-differently prioritized node members will stay on the current node without
-migrating back to the highest priority node.
+Test whether a service in a non-strict node affinity rule with nofailback
+enabled and two differently prioritized node members will stay on the current
+node without migrating back to the highest priority node.
The test scenario is:
- vm:101 should be kept on node2 or node3
@@ -9,6 +9,6 @@ The test scenario is:
The expected outcome is:
- As node2 fails, vm:101 is migrated to node3 as it is the only available node
- member left in the unrestricted group
+ member left in the non-strict node affinity rule
- As node2 comes back online, vm:101 stays on node3; even though node2 has a
higher priority, the nofailback flag prevents vm:101 to migrate back to node2
diff --git a/src/test/test-node-affinity-strict1/README b/src/test/test-node-affinity-strict1/README
index c717d589..85e82fa7 100644
--- a/src/test/test-node-affinity-strict1/README
+++ b/src/test/test-node-affinity-strict1/README
@@ -1,5 +1,6 @@
-Test whether a service in a restricted group will automatically migrate back to
-a restricted node member in case of a manual migration to a non-member node.
+Test whether a service in a strict node affinity rule will automatically
+migrate back to a restricted node member in case of a manual migration to a
+non-member node.
The test scenario is:
- vm:101 must be kept on node3
@@ -7,4 +8,4 @@ The test scenario is:
The expected outcome is:
- As vm:101 is manually migrated to node2, it is migrated back to node3, as
- node3 is the only available node member left in the restricted group
+ node3 is the only available node member left in the strict node affinity rule
diff --git a/src/test/test-node-affinity-strict2/README b/src/test/test-node-affinity-strict2/README
index f4d06a14..156ca2dc 100644
--- a/src/test/test-node-affinity-strict2/README
+++ b/src/test/test-node-affinity-strict2/README
@@ -1,6 +1,6 @@
-Test whether a service in a restricted group with nofailback enabled will
-automatically migrate back to a restricted node member in case of a manual
-migration to a non-member node.
+Test whether a service in a strict node affinity rule, where the service has
+failback disabled, will automatically migrate back to a restricted node member
+in case of a manual migration to a non-member node.
The test scenario is:
- vm:101 must be kept on node3
@@ -8,4 +8,4 @@ The test scenario is:
The expected outcome is:
- As vm:101 is manually migrated to node2, it is migrated back to node3, as
- node3 is the only available node member left in the restricted group
+ node3 is the only available node member left in the strict node affinity rule
diff --git a/src/test/test-node-affinity-strict3/README b/src/test/test-node-affinity-strict3/README
index 5aced390..b2167adb 100644
--- a/src/test/test-node-affinity-strict3/README
+++ b/src/test/test-node-affinity-strict3/README
@@ -1,5 +1,5 @@
-Test whether a service in a restricted group with only one node member will
-stay in recovery in case of a failover of their previously assigned node.
+Test whether a service in a strict node affinity rule with only one node member
+will stay in recovery in case of a failover of their previously assigned node.
The test scenario is:
- vm:101 must be kept on node3
@@ -7,4 +7,4 @@ The test scenario is:
The expected outcome is:
- As node3 fails, vm:101 stays in recovery since there's no available node
- member left in the restricted group
+ member left in the strict node affinity rule
diff --git a/src/test/test-node-affinity-strict4/README b/src/test/test-node-affinity-strict4/README
index 25ded53e..f9ea4282 100644
--- a/src/test/test-node-affinity-strict4/README
+++ b/src/test/test-node-affinity-strict4/README
@@ -1,6 +1,6 @@
-Test whether a service in a restricted group with two node members will stay
-assigned to one of the node members in case of a failover of their previously
-assigned node.
+Test whether a service in a strict node affinity rule with two node members
+will stay assigned to one of the node members in case of a failover of their
+previously assigned node.
The test scenario is:
- vm:101 must be kept on node2 or node3
@@ -11,4 +11,4 @@ The test scenario is:
The expected outcome is:
- As node3 fails, vm:101 is migrated to node2, as it's the only available node
- left in the restricted group
+ left in the strict node affinity rule
diff --git a/src/test/test-node-affinity-strict5/README b/src/test/test-node-affinity-strict5/README
index a4e67f42..56b31a88 100644
--- a/src/test/test-node-affinity-strict5/README
+++ b/src/test/test-node-affinity-strict5/README
@@ -1,6 +1,6 @@
-Test whether a service in a restricted group with two differently prioritized
-node members will stay on the node with the highest priority in case of a
-failover or when the service is on a lower-priority node.
+Test whether a service in a strict node affinity rule with two differently
+prioritized node members will stay on the node with the highest priority in
+case of a failover or when the service is on a lower-priority node.
The test scenario is:
- vm:101 must be kept on node2 or node3
@@ -11,6 +11,6 @@ The expected outcome is:
- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
a higher priority than node3
- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
- available node member left in the restricted group
+ available node member left in the strict node affinity rule
- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
higher priority than node3
diff --git a/src/test/test-node-affinity-strict6/README b/src/test/test-node-affinity-strict6/README
index c558afd1..57d7e701 100644
--- a/src/test/test-node-affinity-strict6/README
+++ b/src/test/test-node-affinity-strict6/README
@@ -1,6 +1,6 @@
-Test whether a service in a restricted group with nofailback enabled and two
-differently prioritized node members will stay on the current node without
-migrating back to the highest priority node.
+Test whether a service in a strict node affinity rule, where the service has
+failback disabled, and two differently prioritized node members will stay on
+the current node without migrating back to the highest priority node.
The test scenario is:
- vm:101 must be kept on node2 or node3
@@ -9,6 +9,7 @@ The test scenario is:
The expected outcome is:
- As node2 fails, vm:101 is migrated to node3 as it is the only available node
- member left in the restricted group
+ member left in the strict node affinity rule
- As node2 comes back online, vm:101 stays on node3; even though node2 has a
- higher priority, the nofailback flag prevents vm:101 to migrate back to node2
+ higher priority, the disabled failback flag prevents vm:101 to migrate back
+ to node2
diff --git a/src/test/test-recovery2/README b/src/test/test-recovery2/README
index 017d0f20..61294c94 100644
--- a/src/test/test-recovery2/README
+++ b/src/test/test-recovery2/README
@@ -1,3 +1,3 @@
Test what happens if a service needs to get recovered but select_service_node
-cannot return any possible node due to restricted groups, but after a while the
-original node comes up, in which case the service must be recovered.
+cannot return any possible node due to a strict node affinity rule, but after a
+while the original node comes up, in which case the service must be recovered.
diff --git a/src/test/test-relocate-policy-default-group/README b/src/test/test-relocate-policy-default-group/README
index 18ee13a0..4553efeb 100644
--- a/src/test/test-relocate-policy-default-group/README
+++ b/src/test/test-relocate-policy-default-group/README
@@ -1,7 +1,7 @@
-Test relocate policy on services with no group.
+Test relocate policy on services with no node affinity rule.
Service 'fa:130' fails three times to restart and has a 'max_restart' policy
of 0, thus will be relocated after each start try.
-As it has no group configured all available nodes should get chosen for
-when relocating.
+As it has no node affinity rule configured all available nodes should get
+chosen for when relocating.
As we allow to relocate twice but the service fails three times we place
it in the error state after all tries where used and all nodes where visited
diff --git a/src/test/test-resource-failure6/README b/src/test/test-resource-failure6/README
index 787af014..49a6f3a0 100644
--- a/src/test/test-resource-failure6/README
+++ b/src/test/test-resource-failure6/README
@@ -1,5 +1,5 @@
-Test relocate policy on services with no group.
+Test relocate policy on services with no node affinity rule.
Service 'fa:130' fails three times to restart and has a 'max_restart' policy
of 0, thus will be relocated after each start try.
-As it has no group configured all available nodes should get chosen for
-when relocating.
+As it has no node affinity rule configured all available nodes should get
+chosen for when relocating.
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 18/19] env: add property delete for update_service_config
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (16 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 17/19] test: ha tester: replace any reference to groups with node affinity rules Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 19/19] manager: persistently migrate ha groups to ha rules Daniel Kral
` (7 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Allow callees of update_service_config(...) to provide properties, which
should be deleted from a HA resource config.
This is needed for the migration of HA groups, as the 'group' property
must be removed to completely migrate these to the respective HA
resource configs. Otherwise, these groups would be reported as
non-existant after the HA group config is removed.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Env.pm | 4 ++--
src/PVE/HA/Env/PVE2.pm | 4 ++--
src/PVE/HA/Sim/Env.pm | 4 ++--
src/PVE/HA/Sim/Hardware.pm | 8 +++++++-
4 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 5cee7b30..70e39ad4 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -95,9 +95,9 @@ sub read_service_config {
}
sub update_service_config {
- my ($self, $sid, $param) = @_;
+ my ($self, $sid, $param, $delete) = @_;
- return $self->{plug}->update_service_config($sid, $param);
+ return $self->{plug}->update_service_config($sid, $param, $delete);
}
sub parse_sid {
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 58fd36e3..854c8942 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -136,9 +136,9 @@ sub read_service_config {
}
sub update_service_config {
- my ($self, $sid, $param) = @_;
+ my ($self, $sid, $param, $delete) = @_;
- return PVE::HA::Config::update_resources_config($sid, $param);
+ return PVE::HA::Config::update_resources_config($sid, $param, $delete);
}
sub parse_sid {
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index bb76b7fa..528ea3f8 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -210,9 +210,9 @@ sub read_service_config {
}
sub update_service_config {
- my ($self, $sid, $param) = @_;
+ my ($self, $sid, $param, $delete) = @_;
- return $self->{hardware}->update_service_config($sid, $param);
+ return $self->{hardware}->update_service_config($sid, $param, $delete);
}
sub parse_sid {
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 35107446..3a1ebf25 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -115,7 +115,7 @@ sub read_service_config {
}
sub update_service_config {
- my ($self, $sid, $param) = @_;
+ my ($self, $sid, $param, $delete) = @_;
my $conf = $self->read_service_config();
@@ -125,6 +125,12 @@ sub update_service_config {
$sconf->{$k} = $param->{$k};
}
+ if ($delete) {
+ for my $k (PVE::Tools::split_list($delete)) {
+ delete $sconf->{$k};
+ }
+ }
+
$self->write_service_config($conf);
}
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH ha-manager v4 19/19] manager: persistently migrate ha groups to ha rules
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (17 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 18/19] env: add property delete for update_service_config Daniel Kral
@ 2025-07-29 18:00 ` Daniel Kral
2025-07-29 18:01 ` [pve-devel] [PATCH docs v4 1/2] ha: add documentation about ha rules and ha node affinity rules Daniel Kral
` (6 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:00 UTC (permalink / raw)
To: pve-devel
Migrate the HA groups config to the HA resources and HA rules config
persistently on disk and retry until it succeeds. The HA group config is
already migrated in the HA Manager in-memory, but to persistently use
them as HA node affinity rules, they must be migrated to the HA rules
config.
As the new 'failback' flag can only be read by newer HA Manager versions
and the rules config cannot be read by older HA Manager versions, these
can only be migrated (for the HA resources config) and deleted (for the
HA groups config) if all nodes are upgraded to the correct pve-manager
version, which has a version dependency on the ha-manager package, which
can read and apply the HA rules.
If the HA group migration fails, it is retried every 10 rounds.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch must be updated with the correct pve-manager version, which
the HA Manager must check for before fully migrating (i.e. deleting the
rules config, etc.).
I guessed pve-manager 9.0.0 for now, but let's see what it'll be.
src/PVE/HA/Config.pm | 5 +
src/PVE/HA/Env.pm | 24 ++++
src/PVE/HA/Env/PVE2.pm | 29 +++++
src/PVE/HA/Manager.pm | 114 +++++++++++++++++++
src/PVE/HA/Sim/Env.pm | 30 +++++
src/PVE/HA/Sim/Hardware.pm | 24 ++++
src/test/test-group-migrate1/README | 4 +
src/test/test-group-migrate1/cmdlist | 4 +
src/test/test-group-migrate1/groups | 7 ++
src/test/test-group-migrate1/hardware_status | 5 +
src/test/test-group-migrate1/log.expect | 87 ++++++++++++++
src/test/test-group-migrate1/manager_status | 1 +
src/test/test-group-migrate1/service_config | 5 +
src/test/test-group-migrate2/README | 3 +
src/test/test-group-migrate2/cmdlist | 3 +
src/test/test-group-migrate2/groups | 7 ++
src/test/test-group-migrate2/hardware_status | 5 +
src/test/test-group-migrate2/log.expect | 47 ++++++++
src/test/test-group-migrate2/manager_status | 1 +
src/test/test-group-migrate2/service_config | 5 +
20 files changed, 410 insertions(+)
create mode 100644 src/test/test-group-migrate1/README
create mode 100644 src/test/test-group-migrate1/cmdlist
create mode 100644 src/test/test-group-migrate1/groups
create mode 100644 src/test/test-group-migrate1/hardware_status
create mode 100644 src/test/test-group-migrate1/log.expect
create mode 100644 src/test/test-group-migrate1/manager_status
create mode 100644 src/test/test-group-migrate1/service_config
create mode 100644 src/test/test-group-migrate2/README
create mode 100644 src/test/test-group-migrate2/cmdlist
create mode 100644 src/test/test-group-migrate2/groups
create mode 100644 src/test/test-group-migrate2/hardware_status
create mode 100644 src/test/test-group-migrate2/log.expect
create mode 100644 src/test/test-group-migrate2/manager_status
create mode 100644 src/test/test-group-migrate2/service_config
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 424a6e10..92d04443 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -234,6 +234,11 @@ sub read_group_config {
return cfs_read_file($ha_groups_config);
}
+sub delete_group_config {
+
+ unlink "/etc/pve/$ha_groups_config" or die "failed to remove group config: $!\n";
+}
+
sub write_group_config {
my ($cfg) = @_;
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 70e39ad4..e00272a0 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -100,6 +100,12 @@ sub update_service_config {
return $self->{plug}->update_service_config($sid, $param, $delete);
}
+sub write_service_config {
+ my ($self, $conf) = @_;
+
+ $self->{plug}->write_service_config($conf);
+}
+
sub parse_sid {
my ($self, $sid) = @_;
@@ -137,12 +143,24 @@ sub read_rules_config {
return $self->{plug}->read_rules_config();
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ $self->{plug}->write_rules_config($rules);
+}
+
sub read_group_config {
my ($self) = @_;
return $self->{plug}->read_group_config();
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ $self->{plug}->delete_group_config();
+}
+
# this should return a hash containing info
# what nodes are members and online.
sub get_node_info {
@@ -288,4 +306,10 @@ sub get_static_node_stats {
return $self->{plug}->get_static_node_stats();
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ return $self->{plug}->get_node_version($node);
+}
+
1;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 854c8942..78ce5616 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -141,6 +141,12 @@ sub update_service_config {
return PVE::HA::Config::update_resources_config($sid, $param, $delete);
}
+sub write_service_config {
+ my ($self, $conf) = @_;
+
+ return PVE::HA::Config::write_resources_config($conf);
+}
+
sub parse_sid {
my ($self, $sid) = @_;
@@ -201,12 +207,24 @@ sub read_rules_config {
return PVE::HA::Config::read_and_check_rules_config();
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ PVE::HA::Config::write_rules_config($rules);
+}
+
sub read_group_config {
my ($self) = @_;
return PVE::HA::Config::read_group_config();
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ PVE::HA::Config::delete_group_config();
+}
+
# this should return a hash containing info
# what nodes are members and online.
sub get_node_info {
@@ -489,4 +507,15 @@ sub get_static_node_stats {
return $stats;
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ my $version_info = PVE::Cluster::get_node_kv('version-info', $node);
+ return undef if !$version_info->{$node};
+
+ my $node_version_info = eval { decode_json($version_info->{$node}) };
+
+ return $node_version_info->{version};
+}
+
1;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 43572531..b85a81f4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -39,6 +39,8 @@ use PVE::HA::Usage::Static;
# patches for changing above, as that set is mostly sensible and should be easy to remember once
# spending a bit time in the HA code base.
+my $max_group_migration_round = 10;
+
sub new {
my ($this, $haenv) = @_;
@@ -50,6 +52,7 @@ sub new {
last_rules_digest => '',
last_groups_digest => '',
last_services_digest => '',
+ group_migration_round => 0,
}, $class;
my $old_ms = $haenv->read_manager_status();
@@ -464,6 +467,115 @@ sub update_crm_commands {
}
+my $have_groups_been_migrated = sub {
+ my ($haenv) = @_;
+
+ my $groups = $haenv->read_group_config();
+
+ return 1 if !$groups;
+ return keys $groups->{ids}->%* < 1;
+};
+
+my $get_version_parts = sub {
+ my ($node_version) = @_;
+
+ return $node_version =~ m/^(\d+)\.(\d+)\.(\d+)/;
+};
+
+my $has_node_min_version = sub {
+ my ($node_version, $min_version) = @_;
+
+ my ($major, $minor, $patch) = $get_version_parts->($node_version);
+ my ($min_major, $min_minor, $min_patch) = $get_version_parts->($min_version);
+
+ return 0 if $major < $min_major;
+ return 0 if $major == $min_major && $minor < $min_minor;
+ return 0 if $major == $min_major && $minor == $min_minor && $patch < $min_patch;
+
+ return 1;
+};
+
+my $migrate_group_persistently = sub {
+ my ($haenv, $ns) = @_;
+
+ $haenv->log('notice', "Start migrating HA groups...");
+
+ # NOTE pve-manager has a version dependency on the ha-manager which supports HA rules
+ # FIXME Set the actual minimum version which depends on the correct ha-manager version
+ my $HA_RULES_MINVERSION = "9.0.0";
+
+ eval {
+ my $resources = $haenv->read_service_config();
+ my $groups = $haenv->read_group_config();
+ my $rules = $haenv->read_rules_config();
+
+ # write changes to rules config whenever possible to allow users to
+ # already modify migrated rules
+ PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $resources);
+ $haenv->write_rules_config($rules);
+ $haenv->log('notice', "HA groups to rules config migration successful");
+
+ for my $node ($ns->list_nodes()->@*) {
+ my $node_status = $ns->get_node_state($node);
+ $haenv->log(
+ 'notice',
+ "node '$node' is in state '$node_status' during HA group migration.",
+ );
+ die "node '$node' is not online\n" if $node_status ne 'online';
+
+ my $node_version = $haenv->get_node_version($node);
+ die "could not retrieve version from node '$node'\n" if !$node_version;
+ $haenv->log('notice', "Node '$node' has pve-manager version '$node_version'");
+
+ my $has_min_version = $has_node_min_version->($node_version, $HA_RULES_MINVERSION);
+
+ die "node '$node' needs at least '$HA_RULES_MINVERSION' to migrate HA groups\n"
+ if !$has_min_version;
+ }
+
+ # write changes to resources config only after node checks, because old
+ # nodes cannot read the 'failback' flag yet
+ PVE::HA::Groups::migrate_groups_to_resources($groups, $resources);
+
+ for my $sid (keys %$resources) {
+ my $param = { failback => $resources->{$sid}->{failback} };
+
+ $haenv->update_service_config($sid, $param, 'group');
+ }
+
+ $haenv->log('notice', "HA groups to services config migration successful");
+
+ $haenv->delete_group_config();
+
+ $haenv->log('notice', "HA groups config deletion successful");
+ };
+ if (my $err = $@) {
+ $haenv->log('err', "Abort HA group migration: $err");
+ return 0;
+ }
+
+ $haenv->log('notice', "HA groups migration successful");
+
+ return 1;
+};
+
+# TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules
+sub try_persistent_group_migration {
+ my ($self) = @_;
+
+ my ($haenv, $ns) = ($self->{haenv}, $self->{ns});
+
+ return if $have_groups_been_migrated->($haenv);
+
+ $self->{group_migration_round}++;
+ return if $self->{group_migration_round} < $max_group_migration_round;
+ $self->{group_migration_round} = 0;
+
+ my $success = $migrate_group_persistently->($haenv, $ns);
+
+ $haenv->log('err', "retry in $max_group_migration_round rounds.") if !$success;
+}
+
sub manage {
my ($self) = @_;
@@ -481,6 +593,8 @@ sub manage {
$self->update_crs_scheduler_mode();
+ $self->try_persistent_group_migration();
+
my ($sc, $services_digest) = $haenv->read_service_config();
$self->{groups} = $haenv->read_group_config(); # update
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index 528ea3f8..fab270c1 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -215,6 +215,14 @@ sub update_service_config {
return $self->{hardware}->update_service_config($sid, $param, $delete);
}
+sub write_service_config {
+ my ($self, $conf) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ $self->{hardware}->write_service_config($conf);
+}
+
sub parse_sid {
my ($self, $sid) = @_;
@@ -259,6 +267,14 @@ sub read_rules_config {
return $self->{hardware}->read_rules_config();
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ $self->{hardware}->write_rules_config($rules);
+}
+
sub read_group_config {
my ($self) = @_;
@@ -267,6 +283,14 @@ sub read_group_config {
return $self->{hardware}->read_group_config();
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ $self->{hardware}->delete_group_config();
+}
+
# this is normally only allowed by the master to recover a _fenced_ service
sub steal_service {
my ($self, $sid, $current_node, $new_node) = @_;
@@ -468,4 +492,10 @@ sub get_static_node_stats {
return $self->{hardware}->get_static_node_stats();
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ return $self->{hardware}->get_node_version($node);
+}
+
1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 3a1ebf25..4207ce31 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -343,6 +343,15 @@ sub read_rules_config {
return $rules;
}
+sub write_rules_config {
+ my ($self, $rules) = @_;
+
+ my $filename = "$self->{statusdir}/rules_config";
+
+ my $data = PVE::HA::Rules->write_config($filename, $rules);
+ PVE::Tools::file_set_contents($filename, $data);
+}
+
sub read_group_config {
my ($self) = @_;
@@ -353,6 +362,13 @@ sub read_group_config {
return PVE::HA::Groups->parse_config($filename, $raw);
}
+sub delete_group_config {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/groups";
+ unlink $filename or die "failed to remove group config: $!\n";
+}
+
sub read_service_status {
my ($self, $node) = @_;
@@ -932,4 +948,12 @@ sub get_static_node_stats {
return $stats;
}
+sub get_node_version {
+ my ($self, $node) = @_;
+
+ my $cstatus = $self->read_hardware_status_nolock();
+
+ return $cstatus->{$node}->{version} // "9.0.0~2";
+}
+
1;
diff --git a/src/test/test-group-migrate1/README b/src/test/test-group-migrate1/README
new file mode 100644
index 00000000..7fb2109b
--- /dev/null
+++ b/src/test/test-group-migrate1/README
@@ -0,0 +1,4 @@
+Test whether a partially upgraded cluster, i.e. at least one node has not
+reached the minimum version to understand HA rules, does not fully migrate the
+HA group config. That is, the HA groups config will not be deleted and the
+failback flag is not written to the service config.
diff --git a/src/test/test-group-migrate1/cmdlist b/src/test/test-group-migrate1/cmdlist
new file mode 100644
index 00000000..ae62801b
--- /dev/null
+++ b/src/test/test-group-migrate1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "delay 1000" ]
+]
diff --git a/src/test/test-group-migrate1/groups b/src/test/test-group-migrate1/groups
new file mode 100644
index 00000000..bad746ca
--- /dev/null
+++ b/src/test/test-group-migrate1/groups
@@ -0,0 +1,7 @@
+group: group1
+ nodes node1
+ restricted 1
+
+group: group2
+ nodes node2:2,node3
+ nofailback 1
diff --git a/src/test/test-group-migrate1/hardware_status b/src/test/test-group-migrate1/hardware_status
new file mode 100644
index 00000000..f8c6c787
--- /dev/null
+++ b/src/test/test-group-migrate1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "version": "9.1.2" },
+ "node2": { "power": "off", "network": "off", "version": "9.0.0~11" },
+ "node3": { "power": "off", "network": "off", "version": "8.4.1" }
+}
diff --git a/src/test/test-group-migrate1/log.expect b/src/test/test-group-migrate1/log.expect
new file mode 100644
index 00000000..ef173568
--- /dev/null
+++ b/src/test/test-group-migrate1/log.expect
@@ -0,0 +1,87 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute delay 1000
+noti 200 node1/crm: Start migrating HA groups...
+noti 200 node1/crm: HA groups to rules config migration successful
+noti 200 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node1' has pve-manager version '9.1.2'
+noti 200 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node2' has pve-manager version '9.0.0~11'
+noti 200 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 200 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0' to migrate HA groups
+err 200 node1/crm: retry in 10 rounds.
+noti 400 node1/crm: Start migrating HA groups...
+noti 400 node1/crm: HA groups to rules config migration successful
+noti 400 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 400 node1/crm: Node 'node1' has pve-manager version '9.1.2'
+noti 400 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 400 node1/crm: Node 'node2' has pve-manager version '9.0.0~11'
+noti 400 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 400 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 400 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0' to migrate HA groups
+err 400 node1/crm: retry in 10 rounds.
+noti 600 node1/crm: Start migrating HA groups...
+noti 600 node1/crm: HA groups to rules config migration successful
+noti 600 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 600 node1/crm: Node 'node1' has pve-manager version '9.1.2'
+noti 600 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 600 node1/crm: Node 'node2' has pve-manager version '9.0.0~11'
+noti 600 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 600 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 600 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0' to migrate HA groups
+err 600 node1/crm: retry in 10 rounds.
+noti 800 node1/crm: Start migrating HA groups...
+noti 800 node1/crm: HA groups to rules config migration successful
+noti 800 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 800 node1/crm: Node 'node1' has pve-manager version '9.1.2'
+noti 800 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 800 node1/crm: Node 'node2' has pve-manager version '9.0.0~11'
+noti 800 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 800 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 800 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0' to migrate HA groups
+err 800 node1/crm: retry in 10 rounds.
+noti 1000 node1/crm: Start migrating HA groups...
+noti 1000 node1/crm: HA groups to rules config migration successful
+noti 1000 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 1000 node1/crm: Node 'node1' has pve-manager version '9.1.2'
+noti 1000 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 1000 node1/crm: Node 'node2' has pve-manager version '9.0.0~11'
+noti 1000 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 1000 node1/crm: Node 'node3' has pve-manager version '8.4.1'
+err 1000 node1/crm: Abort HA group migration: node 'node3' needs at least '9.0.0' to migrate HA groups
+err 1000 node1/crm: retry in 10 rounds.
+info 1200 hardware: exit simulation - done
diff --git a/src/test/test-group-migrate1/manager_status b/src/test/test-group-migrate1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-group-migrate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-group-migrate1/service_config b/src/test/test-group-migrate1/service_config
new file mode 100644
index 00000000..a27551e5
--- /dev/null
+++ b/src/test/test-group-migrate1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started", "group": "group1" },
+ "vm:102": { "node": "node2", "state": "started", "group": "group2" },
+ "vm:103": { "node": "node3", "state": "started", "group": "group2" }
+}
diff --git a/src/test/test-group-migrate2/README b/src/test/test-group-migrate2/README
new file mode 100644
index 00000000..0430bf25
--- /dev/null
+++ b/src/test/test-group-migrate2/README
@@ -0,0 +1,3 @@
+Test whether a fully upgraded cluster, i.e. each node has reached the minimum
+version to understand HA rules, correctly migrates the HA group config to the
+HA rules config and deletes the HA groups config.
diff --git a/src/test/test-group-migrate2/cmdlist b/src/test/test-group-migrate2/cmdlist
new file mode 100644
index 00000000..3bfad442
--- /dev/null
+++ b/src/test/test-group-migrate2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"]
+]
diff --git a/src/test/test-group-migrate2/groups b/src/test/test-group-migrate2/groups
new file mode 100644
index 00000000..bad746ca
--- /dev/null
+++ b/src/test/test-group-migrate2/groups
@@ -0,0 +1,7 @@
+group: group1
+ nodes node1
+ restricted 1
+
+group: group2
+ nodes node2:2,node3
+ nofailback 1
diff --git a/src/test/test-group-migrate2/hardware_status b/src/test/test-group-migrate2/hardware_status
new file mode 100644
index 00000000..ec45176b
--- /dev/null
+++ b/src/test/test-group-migrate2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "version": "9.0.0~11" },
+ "node2": { "power": "off", "network": "off", "version": "9.0.1" },
+ "node3": { "power": "off", "network": "off", "version": "9.4.1" }
+}
diff --git a/src/test/test-group-migrate2/log.expect b/src/test/test-group-migrate2/log.expect
new file mode 100644
index 00000000..d80aecc0
--- /dev/null
+++ b/src/test/test-group-migrate2/log.expect
@@ -0,0 +1,47 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+noti 200 node1/crm: Start migrating HA groups...
+noti 200 node1/crm: HA groups to rules config migration successful
+noti 200 node1/crm: node 'node1' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node1' has pve-manager version '9.0.0~11'
+noti 200 node1/crm: node 'node2' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node2' has pve-manager version '9.0.1'
+noti 200 node1/crm: node 'node3' is in state 'online' during HA group migration.
+noti 200 node1/crm: Node 'node3' has pve-manager version '9.4.1'
+noti 200 node1/crm: HA groups to services config migration successful
+noti 200 node1/crm: HA groups config deletion successful
+noti 200 node1/crm: HA groups migration successful
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-group-migrate2/manager_status b/src/test/test-group-migrate2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-group-migrate2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-group-migrate2/service_config b/src/test/test-group-migrate2/service_config
new file mode 100644
index 00000000..a27551e5
--- /dev/null
+++ b/src/test/test-group-migrate2/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started", "group": "group1" },
+ "vm:102": { "node": "node2", "state": "started", "group": "group2" },
+ "vm:103": { "node": "node3", "state": "started", "group": "group2" }
+}
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH docs v4 1/2] ha: add documentation about ha rules and ha node affinity rules
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (18 preceding siblings ...)
2025-07-29 18:00 ` [pve-devel] [PATCH ha-manager v4 19/19] manager: persistently migrate ha groups to ha rules Daniel Kral
@ 2025-07-29 18:01 ` Daniel Kral
2025-07-29 18:01 ` [pve-devel] [PATCH docs v4 2/2] ha: crs: add effects of ha node affinity rule on the crs scheduler Daniel Kral
` (5 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:01 UTC (permalink / raw)
To: pve-devel
Add documentation about HA Node Affinity rules and general documentation
what HA rules are for in a format that is extendable with other HA rule
types in the future.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
append to ha intro
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
Makefile | 2 +
gen-ha-rules-node-affinity-opts.pl | 20 ++++++
gen-ha-rules-opts.pl | 17 +++++
ha-manager.adoc | 103 +++++++++++++++++++++++++++++
ha-rules-node-affinity-opts.adoc | 18 +++++
ha-rules-opts.adoc | 12 ++++
pmxcfs.adoc | 1 +
7 files changed, 173 insertions(+)
create mode 100755 gen-ha-rules-node-affinity-opts.pl
create mode 100755 gen-ha-rules-opts.pl
create mode 100644 ha-rules-node-affinity-opts.adoc
create mode 100644 ha-rules-opts.adoc
diff --git a/Makefile b/Makefile
index f30d77a..c5e506e 100644
--- a/Makefile
+++ b/Makefile
@@ -49,6 +49,8 @@ GEN_DEB_SOURCES= \
GEN_SCRIPTS= \
gen-ha-groups-opts.pl \
gen-ha-resources-opts.pl \
+ gen-ha-rules-node-affinity-opts.pl \
+ gen-ha-rules-opts.pl \
gen-datacenter.cfg.5-opts.pl \
gen-pct.conf.5-opts.pl \
gen-pct-network-opts.pl \
diff --git a/gen-ha-rules-node-affinity-opts.pl b/gen-ha-rules-node-affinity-opts.pl
new file mode 100755
index 0000000..e2f07fa
--- /dev/null
+++ b/gen-ha-rules-node-affinity-opts.pl
@@ -0,0 +1,20 @@
+#!/usr/bin/perl
+
+use lib '.';
+use strict;
+use warnings;
+use PVE::RESTHandler;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::NodeAffinity;
+
+my $private = PVE::HA::Rules::private();
+my $node_affinity_rule_props = PVE::HA::Rules::NodeAffinity::properties();
+my $properties = {
+ resources => $private->{propertyList}->{resources},
+ $node_affinity_rule_props->%*,
+};
+
+print PVE::RESTHandler::dump_properties($properties);
diff --git a/gen-ha-rules-opts.pl b/gen-ha-rules-opts.pl
new file mode 100755
index 0000000..66dd174
--- /dev/null
+++ b/gen-ha-rules-opts.pl
@@ -0,0 +1,17 @@
+#!/usr/bin/perl
+
+use lib '.';
+use strict;
+use warnings;
+use PVE::RESTHandler;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+
+my $private = PVE::HA::Rules::private();
+my $properties = $private->{propertyList};
+delete $properties->{type};
+delete $properties->{rule};
+
+print PVE::RESTHandler::dump_properties($properties);
diff --git a/ha-manager.adoc b/ha-manager.adoc
index 5fdd5cf..3b3f87d 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -668,6 +668,109 @@ up online again to investigate the cause of failure and check if it runs
stably again. Setting the `nofailback` flag prevents the recovered services from
moving straight back to the fenced node.
+[[ha_manager_rules]]
+Rules
+~~~~~
+
+HA rules are used to put certain constraints on HA-managed resources, which are
+defined in the HA rules configuration file `/etc/pve/ha/rules.cfg`.
+
+----
+<type>: <rule>
+ resources <resources_list>
+ <property> <value>
+ ...
+----
+
+include::ha-rules-opts.adoc[]
+
+.Available HA Rule Types
+[width="100%",cols="1,3",options="header"]
+|===========================================================
+| HA Rule Type | Description
+| `node-affinity` | Places affinity from one or more HA resources to one or
+more nodes.
+|===========================================================
+
+[[ha_manager_node_affinity_rules]]
+Node Affinity Rules
+^^^^^^^^^^^^^^^^^^^
+
+NOTE: HA Node Affinity rules are equivalent to HA Groups and will replace them
+in an upcoming major release.
+
+By default, a HA resource is able to run on any cluster node, but a common
+requirement is that a HA resource should run on a specific node. That can be
+implemented by defining a HA node affinity rule to make the HA resource
+`vm:100` prefer the node `node1`:
+
+----
+# ha-manager rules add node-affinity ha-rule-vm100 --resources vm:100 --nodes node1
+----
+
+By default, node affinity rules are not strict, i.e., if there is none of the
+specified nodes available, the HA resource can also be moved to other nodes.
+If, on the other hand, a HA resource must be restricted to the specified nodes,
+then the node affinity rule must be set to be strict.
+
+In the previous example, the node affinity rule can be modified to restrict the
+resource `vm:100` to be only on `node1`:
+
+----
+# ha-manager rules set node-affinity ha-rule-vm100 --strict 1
+----
+
+For bigger clusters or specific use cases, it makes sense to define a more
+detailed failover behavior. For example, the resources `vm:200` and `ct:300`
+should run on `node1`. If `node1` becomes unavailable, the resources should be
+distributed on `node2` and `node3`. If `node2` and `node3` are also
+unavailable, the resources should run on `node4`.
+
+To implement this behavior in a node affinity rule, nodes can be paired with
+priorities to order the preference for nodes. If two or more nodes have the same
+priority, the resources can run on any of them. For the above example, `node1`
+gets the highest priority, `node2` and `node3` get the same priority, and at
+last `node4` gets the lowest priority, which can be omitted to default to `0`:
+
+----
+# ha-manager rules add node-affinity priority-cascade \
+ --resources vm:200,ct:300 --nodes "node1:2,node2:1,node3:1,node4"
+----
+
+The above commands create the following rules in the rules configuration file:
+
+.Node Affinity Rules Configuration Example (`/etc/pve/ha/rules.cfg`)
+----
+node-affinity: ha-rule-vm100
+ resources vm:100
+ nodes node1
+ strict 1
+
+node-affinity: priority-cascade
+ resources vm:200,ct:300
+ nodes node1:2,node2:1,node3:1,node4
+----
+
+Node Affinity Rule Properties
++++++++++++++++++++++++++++++
+
+include::ha-rules-node-affinity-opts.adoc[]
+
+[[ha_manager_rule_conflicts]]
+Rule Conflicts and Errors
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+HA rules can impose rather complex constraints on the HA resources. To ensure
+that a new or modified HA rule does not introduce uncertainty into the HA
+stack's CRS scheduler, HA rules are tested for feasibility before these are
+applied. If a rule does fail any of these tests, the rule is disabled until the
+conflicts and errors is resolved.
+
+Currently, HA rules are checked for the following feasibility tests:
+
+* A HA resource can only be referenced by a single HA node affinity rule in
+ total. If two or more HA node affinity rules specify the same HA resource,
+ these HA node affinity rules will be disabled.
[[ha_manager_fencing]]
Fencing
diff --git a/ha-rules-node-affinity-opts.adoc b/ha-rules-node-affinity-opts.adoc
new file mode 100644
index 0000000..852636c
--- /dev/null
+++ b/ha-rules-node-affinity-opts.adoc
@@ -0,0 +1,18 @@
+`nodes`: `<node>[:<pri>]{,<node>[:<pri>]}*` ::
+
+List of cluster node members, where a priority can be given to each node. A resource bound to a group will run on the available nodes with the highest priority. If there are more nodes in the highest priority class, the services will get distributed to those nodes. The priorities have a relative meaning only. The higher the number, the higher the priority.
+
+`resources`: `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
+`strict`: `<boolean>` ('default =' `0`)::
+
+Describes whether the node affinity rule is strict or non-strict.
++
+A non-strict node affinity rule makes resources prefer to be on the defined nodes.
+If none of the defined nodes are available, the resource may run on any other node.
++
+A strict node affinity rule makes resources be restricted to the defined nodes. If
+none of the defined nodes are available, the resource will be stopped.
+
diff --git a/ha-rules-opts.adoc b/ha-rules-opts.adoc
new file mode 100644
index 0000000..b50b289
--- /dev/null
+++ b/ha-rules-opts.adoc
@@ -0,0 +1,12 @@
+`comment`: `<string>` ::
+
+HA rule description.
+
+`disable`: `<boolean>` ('default =' `0`)::
+
+Whether the HA rule is disabled.
+
+`resources`: `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
diff --git a/pmxcfs.adoc b/pmxcfs.adoc
index f4aa847..8ca7284 100644
--- a/pmxcfs.adoc
+++ b/pmxcfs.adoc
@@ -104,6 +104,7 @@ Files
|`ha/crm_commands` | Displays HA operations that are currently being carried out by the CRM
|`ha/manager_status` | JSON-formatted information regarding HA services on the cluster
|`ha/resources.cfg` | Resources managed by high availability, and their current state
+|`ha/rules.cfg` | Rules putting constraints on the HA manager's scheduling of HA resources
|`nodes/<NAME>/config` | Node-specific configuration
|`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers
|`nodes/<NAME>/openvz/` | Prior to {pve} 4.0, used for container configuration data (deprecated, removed soon)
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH docs v4 2/2] ha: crs: add effects of ha node affinity rule on the crs scheduler
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (19 preceding siblings ...)
2025-07-29 18:01 ` [pve-devel] [PATCH docs v4 1/2] ha: add documentation about ha rules and ha node affinity rules Daniel Kral
@ 2025-07-29 18:01 ` Daniel Kral
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 1/4] api: ha: add ha rules api endpoints Daniel Kral
` (4 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:01 UTC (permalink / raw)
To: pve-devel
Add information about the effects that HA rules and HA Node Affinity
rules have on the CRS scheduler and what can be expected by a user if
they do changes to them.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
ha-manager.adoc | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/ha-manager.adoc b/ha-manager.adoc
index 3b3f87d..20eeb88 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -1193,6 +1193,16 @@ The CRS is currently used at the following scheduling points:
new target node for the HA services in that group, matching the adapted
priority constraints.
+- HA rule config changes (always active). If a rule emposes different
+ constraints on the HA resources, the HA stack will use the CRS algorithm to
+ find a new target node for the HA resources affected by these rules depending
+ on the type of the new rules:
+
+** Node affinity rules: If a node affinity rule is created or HA resources/nodes
+ are added to an existing node affinity rule, the HA stack will use the CRS
+ algorithm to ensure that these HA resources are assigned according to their
+ node and priority constraints.
+
- HA service stopped -> start transition (opt-in). Requesting that a stopped
service should be started is an good opportunity to check for the best suited
node as per the CRS algorithm, as moving stopped services is cheaper to do
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH manager v4 1/4] api: ha: add ha rules api endpoints
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (20 preceding siblings ...)
2025-07-29 18:01 ` [pve-devel] [PATCH docs v4 2/2] ha: crs: add effects of ha node affinity rule on the crs scheduler Daniel Kral
@ 2025-07-29 18:01 ` Daniel Kral
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 2/4] ui: ha: remove ha groups from ha resource components Daniel Kral
` (3 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:01 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
PVE/API2/HAConfig.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/PVE/API2/HAConfig.pm b/PVE/API2/HAConfig.pm
index 35f49cbb..d29211fb 100644
--- a/PVE/API2/HAConfig.pm
+++ b/PVE/API2/HAConfig.pm
@@ -12,6 +12,7 @@ use PVE::JSONSchema qw(get_standard_option);
use PVE::Exception qw(raise_param_exc);
use PVE::API2::HA::Resources;
use PVE::API2::HA::Groups;
+use PVE::API2::HA::Rules;
use PVE::API2::HA::Status;
use base qw(PVE::RESTHandler);
@@ -26,6 +27,11 @@ __PACKAGE__->register_method({
path => 'groups',
});
+__PACKAGE__->register_method({
+ subclass => "PVE::API2::HA::Rules",
+ path => 'rules',
+});
+
__PACKAGE__->register_method({
subclass => "PVE::API2::HA::Status",
path => 'status',
@@ -57,7 +63,7 @@ __PACKAGE__->register_method({
my ($param) = @_;
my $res = [
- { id => 'status' }, { id => 'resources' }, { id => 'groups' },
+ { id => 'status' }, { id => 'resources' }, { id => 'groups' }, { id => 'rules' },
];
return $res;
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH manager v4 2/4] ui: ha: remove ha groups from ha resource components
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (21 preceding siblings ...)
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 1/4] api: ha: add ha rules api endpoints Daniel Kral
@ 2025-07-29 18:01 ` Daniel Kral
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 3/4] ui: ha: show failback flag in resources status view Daniel Kral
` (2 subsequent siblings)
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:01 UTC (permalink / raw)
To: pve-devel
Remove the HA group column from the HA Resources grid view and the HA
group selector from the HA Resources edit window, as these will be
replaced by semantically equivalent HA node affinity rules in the next
patch.
Add the field 'failback' that is moved to the HA Resources config as
part of the migration from groups to node affinity rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
www/manager6/ha/ResourceEdit.js | 16 ++++++++++++----
www/manager6/ha/Resources.js | 17 -----------------
www/manager6/ha/StatusView.js | 1 -
3 files changed, 12 insertions(+), 22 deletions(-)
diff --git a/www/manager6/ha/ResourceEdit.js b/www/manager6/ha/ResourceEdit.js
index 1048ccca..428672a8 100644
--- a/www/manager6/ha/ResourceEdit.js
+++ b/www/manager6/ha/ResourceEdit.js
@@ -11,7 +11,7 @@ Ext.define('PVE.ha.VMResourceInputPanel', {
}
delete values.vmid;
- PVE.Utils.delete_if_default(values, 'group', '', me.isCreate);
+ PVE.Utils.delete_if_default(values, 'failback', '1', me.isCreate);
PVE.Utils.delete_if_default(values, 'max_restart', '1', me.isCreate);
PVE.Utils.delete_if_default(values, 'max_relocate', '1', me.isCreate);
@@ -110,9 +110,17 @@ Ext.define('PVE.ha.VMResourceInputPanel', {
me.column2 = [
{
- xtype: 'pveHAGroupSelector',
- name: 'group',
- fieldLabel: gettext('Group'),
+ xtype: 'proxmoxcheckbox',
+ name: 'failback',
+ fieldLabel: gettext('Failback'),
+ autoEl: {
+ tag: 'div',
+ 'data-qtip': gettext(
+ 'Enable if HA resource should automatically adjust to HA rules.',
+ ),
+ },
+ uncheckedValue: 0,
+ value: 1,
},
{
xtype: 'proxmoxKVComboBox',
diff --git a/www/manager6/ha/Resources.js b/www/manager6/ha/Resources.js
index e8e53b3b..097097dc 100644
--- a/www/manager6/ha/Resources.js
+++ b/www/manager6/ha/Resources.js
@@ -136,23 +136,6 @@ Ext.define('PVE.ha.ResourcesView', {
renderer: (v) => (v === undefined ? '1' : v),
dataIndex: 'max_relocate',
},
- {
- header: gettext('Group'),
- width: 200,
- sortable: true,
- renderer: function (value, metaData, { data }) {
- if (data.errors && data.errors.group) {
- metaData.tdCls = 'proxmox-invalid-row';
- let html = Ext.htmlEncode(
- `<p>${Ext.htmlEncode(data.errors.group)}</p>`,
- );
- metaData.tdAttr =
- 'data-qwidth=600 data-qtitle="ERROR" data-qtip="' + html + '"';
- }
- return value;
- },
- dataIndex: 'group',
- },
{
header: gettext('Description'),
flex: 1,
diff --git a/www/manager6/ha/StatusView.js b/www/manager6/ha/StatusView.js
index 3e3205a5..a3ca9fdf 100644
--- a/www/manager6/ha/StatusView.js
+++ b/www/manager6/ha/StatusView.js
@@ -78,7 +78,6 @@ Ext.define(
'status',
'sid',
'state',
- 'group',
'comment',
'max_restart',
'max_relocate',
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH manager v4 3/4] ui: ha: show failback flag in resources status view
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (22 preceding siblings ...)
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 2/4] ui: ha: remove ha groups from ha resource components Daniel Kral
@ 2025-07-29 18:01 ` Daniel Kral
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 4/4] ui: ha: replace ha groups with ha node affinity rules Daniel Kral
2025-07-30 17:29 ` [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Michael Köppl
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:01 UTC (permalink / raw)
To: pve-devel
As the HA groups' failback flag is now being part of the HA resources
config, it should also be shown there instead of the previous HA groups
view.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
www/manager6/ha/Resources.js | 6 ++++++
www/manager6/ha/StatusView.js | 4 ++++
2 files changed, 10 insertions(+)
diff --git a/www/manager6/ha/Resources.js b/www/manager6/ha/Resources.js
index 097097dc..65897bed 100644
--- a/www/manager6/ha/Resources.js
+++ b/www/manager6/ha/Resources.js
@@ -136,6 +136,12 @@ Ext.define('PVE.ha.ResourcesView', {
renderer: (v) => (v === undefined ? '1' : v),
dataIndex: 'max_relocate',
},
+ {
+ header: gettext('Failback'),
+ width: 100,
+ sortable: true,
+ dataIndex: 'failback',
+ },
{
header: gettext('Description'),
flex: 1,
diff --git a/www/manager6/ha/StatusView.js b/www/manager6/ha/StatusView.js
index a3ca9fdf..50ad8e84 100644
--- a/www/manager6/ha/StatusView.js
+++ b/www/manager6/ha/StatusView.js
@@ -79,6 +79,10 @@ Ext.define(
'sid',
'state',
'comment',
+ {
+ name: 'failback',
+ type: 'boolean',
+ },
'max_restart',
'max_relocate',
'type',
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [pve-devel] [PATCH manager v4 4/4] ui: ha: replace ha groups with ha node affinity rules
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (23 preceding siblings ...)
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 3/4] ui: ha: show failback flag in resources status view Daniel Kral
@ 2025-07-29 18:01 ` Daniel Kral
2025-07-30 17:29 ` [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Michael Köppl
25 siblings, 0 replies; 28+ messages in thread
From: Daniel Kral @ 2025-07-29 18:01 UTC (permalink / raw)
To: pve-devel
Introduce HA rules and replace the existing HA groups with the new HA
node affinity rules in the web interface.
The HA rules components are designed to be extendible for other new rule
types and allow users to display the errors of contradictory HA rules,
if there are any, in addition to the other basic CRUD operations.
HA rule ids are automatically generated with a 13 character UUID string
in the web interface, as also done for other concepts already, e.g.,
backup jobs, because coming up with future-proof rule ids that cannot be
changed later is not that user friendly. The HA rule's comment field is
meant to store that information instead.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
www/manager6/Makefile | 8 +-
www/manager6/StateProvider.js | 2 +-
www/manager6/dc/Config.js | 8 +-
www/manager6/ha/GroupSelector.js | 71 -------
www/manager6/ha/Groups.js | 117 -----------
www/manager6/ha/RuleEdit.js | 146 +++++++++++++
www/manager6/ha/RuleErrorsModal.js | 50 +++++
www/manager6/ha/Rules.js | 196 ++++++++++++++++++
.../NodeAffinityRuleEdit.js} | 105 ++--------
www/manager6/ha/rules/NodeAffinityRules.js | 36 ++++
10 files changed, 455 insertions(+), 284 deletions(-)
delete mode 100644 www/manager6/ha/GroupSelector.js
delete mode 100644 www/manager6/ha/Groups.js
create mode 100644 www/manager6/ha/RuleEdit.js
create mode 100644 www/manager6/ha/RuleErrorsModal.js
create mode 100644 www/manager6/ha/Rules.js
rename www/manager6/ha/{GroupEdit.js => rules/NodeAffinityRuleEdit.js} (67%)
create mode 100644 www/manager6/ha/rules/NodeAffinityRules.js
diff --git a/www/manager6/Makefile b/www/manager6/Makefile
index 84a8b4d0..9bea169a 100644
--- a/www/manager6/Makefile
+++ b/www/manager6/Makefile
@@ -143,13 +143,15 @@ JSSRC= \
window/DirMapEdit.js \
window/GuestImport.js \
ha/Fencing.js \
- ha/GroupEdit.js \
- ha/GroupSelector.js \
- ha/Groups.js \
ha/ResourceEdit.js \
ha/Resources.js \
+ ha/RuleEdit.js \
+ ha/RuleErrorsModal.js \
+ ha/Rules.js \
ha/Status.js \
ha/StatusView.js \
+ ha/rules/NodeAffinityRuleEdit.js \
+ ha/rules/NodeAffinityRules.js \
dc/ACLView.js \
dc/ACMEClusterView.js \
dc/AuthEditBase.js \
diff --git a/www/manager6/StateProvider.js b/www/manager6/StateProvider.js
index 5137ee55..889f198b 100644
--- a/www/manager6/StateProvider.js
+++ b/www/manager6/StateProvider.js
@@ -54,7 +54,7 @@ Ext.define('PVE.StateProvider', {
system: 50,
monitor: 49,
'ha-fencing': 48,
- 'ha-groups': 47,
+ 'ha-rules': 47,
'ha-resources': 46,
'ceph-log': 45,
'ceph-crushmap': 44,
diff --git a/www/manager6/dc/Config.js b/www/manager6/dc/Config.js
index 76c9a6ca..0de67c1b 100644
--- a/www/manager6/dc/Config.js
+++ b/www/manager6/dc/Config.js
@@ -170,11 +170,11 @@ Ext.define('PVE.dc.Config', {
itemId: 'ha',
},
{
- title: gettext('Groups'),
+ title: gettext('Rules'),
groups: ['ha'],
- xtype: 'pveHAGroupsView',
- iconCls: 'fa fa-object-group',
- itemId: 'ha-groups',
+ xtype: 'pveHARulesView',
+ iconCls: 'fa fa-gears',
+ itemId: 'ha-rules',
},
{
title: gettext('Fencing'),
diff --git a/www/manager6/ha/GroupSelector.js b/www/manager6/ha/GroupSelector.js
deleted file mode 100644
index 9dc1f4bb..00000000
--- a/www/manager6/ha/GroupSelector.js
+++ /dev/null
@@ -1,71 +0,0 @@
-Ext.define(
- 'PVE.ha.GroupSelector',
- {
- extend: 'Proxmox.form.ComboGrid',
- alias: ['widget.pveHAGroupSelector'],
-
- autoSelect: false,
- valueField: 'group',
- displayField: 'group',
- listConfig: {
- columns: [
- {
- header: gettext('Group'),
- width: 100,
- sortable: true,
- dataIndex: 'group',
- },
- {
- header: gettext('Nodes'),
- width: 100,
- sortable: false,
- dataIndex: 'nodes',
- },
- {
- header: gettext('Comment'),
- flex: 1,
- dataIndex: 'comment',
- renderer: Ext.String.htmlEncode,
- },
- ],
- },
- store: {
- model: 'pve-ha-groups',
- sorters: {
- property: 'group',
- direction: 'ASC',
- },
- },
-
- initComponent: function () {
- var me = this;
- me.callParent();
- me.getStore().load();
- },
- },
- function () {
- Ext.define('pve-ha-groups', {
- extend: 'Ext.data.Model',
- fields: [
- 'group',
- 'type',
- 'digest',
- 'nodes',
- 'comment',
- {
- name: 'restricted',
- type: 'boolean',
- },
- {
- name: 'nofailback',
- type: 'boolean',
- },
- ],
- proxy: {
- type: 'proxmox',
- url: '/api2/json/cluster/ha/groups',
- },
- idProperty: 'group',
- });
- },
-);
diff --git a/www/manager6/ha/Groups.js b/www/manager6/ha/Groups.js
deleted file mode 100644
index 6b4958f0..00000000
--- a/www/manager6/ha/Groups.js
+++ /dev/null
@@ -1,117 +0,0 @@
-Ext.define('PVE.ha.GroupsView', {
- extend: 'Ext.grid.GridPanel',
- alias: ['widget.pveHAGroupsView'],
-
- onlineHelp: 'ha_manager_groups',
-
- stateful: true,
- stateId: 'grid-ha-groups',
-
- initComponent: function () {
- var me = this;
-
- var caps = Ext.state.Manager.get('GuiCap');
-
- var store = new Ext.data.Store({
- model: 'pve-ha-groups',
- sorters: {
- property: 'group',
- direction: 'ASC',
- },
- });
-
- var reload = function () {
- store.load();
- };
-
- var sm = Ext.create('Ext.selection.RowModel', {});
-
- let run_editor = function () {
- let rec = sm.getSelection()[0];
- Ext.create('PVE.ha.GroupEdit', {
- groupId: rec.data.group,
- listeners: {
- destroy: () => store.load(),
- },
- autoShow: true,
- });
- };
-
- let remove_btn = Ext.create('Proxmox.button.StdRemoveButton', {
- selModel: sm,
- baseurl: '/cluster/ha/groups/',
- callback: () => store.load(),
- });
- let edit_btn = new Proxmox.button.Button({
- text: gettext('Edit'),
- disabled: true,
- selModel: sm,
- handler: run_editor,
- });
-
- Ext.apply(me, {
- store: store,
- selModel: sm,
- viewConfig: {
- trackOver: false,
- },
- tbar: [
- {
- text: gettext('Create'),
- disabled: !caps.nodes['Sys.Console'],
- handler: function () {
- Ext.create('PVE.ha.GroupEdit', {
- listeners: {
- destroy: () => store.load(),
- },
- autoShow: true,
- });
- },
- },
- edit_btn,
- remove_btn,
- ],
- columns: [
- {
- header: gettext('Group'),
- width: 150,
- sortable: true,
- dataIndex: 'group',
- },
- {
- header: 'restricted',
- width: 100,
- sortable: true,
- renderer: Proxmox.Utils.format_boolean,
- dataIndex: 'restricted',
- },
- {
- header: 'nofailback',
- width: 100,
- sortable: true,
- renderer: Proxmox.Utils.format_boolean,
- dataIndex: 'nofailback',
- },
- {
- header: gettext('Nodes'),
- flex: 1,
- sortable: false,
- dataIndex: 'nodes',
- },
- {
- header: gettext('Comment'),
- flex: 1,
- renderer: Ext.String.htmlEncode,
- dataIndex: 'comment',
- },
- ],
- listeners: {
- activate: reload,
- beforeselect: (grid, record, index, eOpts) => caps.nodes['Sys.Console'],
- itemdblclick: run_editor,
- },
- });
-
- me.callParent();
- },
-});
diff --git a/www/manager6/ha/RuleEdit.js b/www/manager6/ha/RuleEdit.js
new file mode 100644
index 00000000..9ecebd6d
--- /dev/null
+++ b/www/manager6/ha/RuleEdit.js
@@ -0,0 +1,146 @@
+Ext.define('PVE.ha.RuleInputPanel', {
+ extend: 'Proxmox.panel.InputPanel',
+
+ onlineHelp: 'ha_manager_rules',
+
+ formatResourceListString: function (resources) {
+ let me = this;
+
+ return resources.map((vmid) => {
+ if (me.resourcesStore.getById(`qemu/${vmid}`)) {
+ return `vm:${vmid}`;
+ } else if (me.resourcesStore.getById(`lxc/${vmid}`)) {
+ return `ct:${vmid}`;
+ } else {
+ Ext.Msg.alert(gettext('Error'), `Could not find resource type for ${vmid}`);
+ throw `Unknown resource type: ${vmid}`;
+ }
+ });
+ },
+
+ onGetValues: function (values) {
+ let me = this;
+
+ values.type = me.ruleType;
+
+ if (me.isCreate) {
+ values.rule = 'ha-rule-' + Ext.data.identifier.Uuid.Global.generate().slice(0, 13);
+ }
+
+ values.disable = values.enable ? 0 : 1;
+ delete values.enable;
+
+ values.resources = me.formatResourceListString(values.resources);
+
+ return values;
+ },
+
+ initComponent: function () {
+ let me = this;
+
+ let resourcesStore = Ext.create('Ext.data.Store', {
+ model: 'PVEResources',
+ autoLoad: true,
+ sorters: 'vmid',
+ filters: [
+ {
+ property: 'type',
+ value: /lxc|qemu/,
+ },
+ {
+ property: 'hastate',
+ operator: '!=',
+ value: 'unmanaged',
+ },
+ ],
+ });
+
+ Ext.apply(me, {
+ resourcesStore: resourcesStore,
+ });
+
+ me.column1 = me.column1 ?? [];
+ me.column1.unshift(
+ {
+ xtype: 'proxmoxcheckbox',
+ name: 'enable',
+ fieldLabel: gettext('Enable'),
+ uncheckedValue: 0,
+ defaultValue: 1,
+ checked: true,
+ },
+ {
+ xtype: 'vmComboSelector',
+ name: 'resources',
+ fieldLabel: gettext('HA Resources'),
+ store: me.resourcesStore,
+ allowBlank: false,
+ autoSelect: false,
+ multiSelect: true,
+ validateExists: true,
+ },
+ );
+
+ me.column2 = me.column2 ?? [];
+
+ me.columnB = me.columnB ?? [];
+ me.columnB.unshift({
+ xtype: 'textfield',
+ name: 'comment',
+ fieldLabel: gettext('Comment'),
+ allowBlank: true,
+ });
+
+ me.callParent();
+ },
+});
+
+Ext.define('PVE.ha.RuleEdit', {
+ extend: 'Proxmox.window.Edit',
+
+ defaultFocus: undefined, // prevent the vmComboSelector to be expanded when focusing the window
+
+ initComponent: function () {
+ let me = this;
+
+ me.isCreate = !me.ruleId;
+
+ if (me.isCreate) {
+ me.url = '/api2/extjs/cluster/ha/rules';
+ me.method = 'POST';
+ } else {
+ me.url = `/api2/extjs/cluster/ha/rules/${me.ruleId}`;
+ me.method = 'PUT';
+ }
+
+ let inputPanel = Ext.create(me.panelType, {
+ ruleId: me.ruleId,
+ ruleType: me.ruleType,
+ isCreate: me.isCreate,
+ });
+
+ Ext.apply(me, {
+ subject: me.panelName,
+ isAdd: true,
+ items: [inputPanel],
+ });
+
+ me.callParent();
+
+ if (!me.isCreate) {
+ me.load({
+ success: (response, options) => {
+ let values = response.result.data;
+
+ values.resources = values.resources
+ .split(',')
+ .map((resource) => resource.split(':')[1]);
+
+ values.enable = values.disable ? 0 : 1;
+
+ inputPanel.setValues(values);
+ },
+ });
+ }
+ },
+});
diff --git a/www/manager6/ha/RuleErrorsModal.js b/www/manager6/ha/RuleErrorsModal.js
new file mode 100644
index 00000000..ebd909fc
--- /dev/null
+++ b/www/manager6/ha/RuleErrorsModal.js
@@ -0,0 +1,50 @@
+Ext.define('PVE.ha.RuleErrorsModal', {
+ extend: 'Ext.window.Window',
+ alias: ['widget.pveHARulesErrorsModal'],
+ mixins: ['Proxmox.Mixin.CBind'],
+
+ modal: true,
+ scrollable: true,
+ resizable: false,
+
+ title: gettext('HA rule errors'),
+
+ initComponent: function () {
+ let me = this;
+
+ let renderHARuleErrors = (errors) => {
+ if (!errors) {
+ return gettext('The HA rule has no errors.');
+ }
+
+ let errorListItemsHtml = '';
+
+ for (let [opt, messages] of Object.entries(errors)) {
+ errorListItemsHtml += messages
+ .map((message) => `<li>${Ext.htmlEncode(`${opt}: ${message}`)}</li>`)
+ .join('');
+ }
+
+ return `<div>
+ <p>${gettext('The HA rule has the following errors:')}</p>
+ <ul>${errorListItemsHtml}</ul>
+ </div>`;
+ };
+
+ Ext.apply(me, {
+ modal: true,
+ border: false,
+ layout: 'fit',
+ items: [
+ {
+ xtype: 'displayfield',
+ padding: 20,
+ scrollable: true,
+ value: renderHARuleErrors(me.errors),
+ },
+ ],
+ });
+
+ me.callParent();
+ },
+});
diff --git a/www/manager6/ha/Rules.js b/www/manager6/ha/Rules.js
new file mode 100644
index 00000000..8f487465
--- /dev/null
+++ b/www/manager6/ha/Rules.js
@@ -0,0 +1,196 @@
+Ext.define('PVE.ha.RulesBaseView', {
+ extend: 'Ext.grid.GridPanel',
+
+ initComponent: function () {
+ let me = this;
+
+ if (!me.ruleType) {
+ throw 'no rule type given';
+ }
+
+ let store = new Ext.data.Store({
+ model: 'pve-ha-rules',
+ autoLoad: true,
+ filters: [
+ {
+ property: 'type',
+ value: me.ruleType,
+ },
+ ],
+ });
+
+ let reloadStore = () => store.load();
+
+ let sm = Ext.create('Ext.selection.RowModel', {});
+
+ let createRuleEditWindow = (ruleId) => {
+ if (!me.inputPanel) {
+ throw `no editor registered for ha rule type: ${me.ruleType}`;
+ }
+
+ Ext.create('PVE.ha.RuleEdit', {
+ panelType: `PVE.ha.rules.${me.inputPanel}`,
+ panelName: me.ruleTitle,
+ ruleType: me.ruleType,
+ ruleId: ruleId,
+ autoShow: true,
+ listeners: {
+ destroy: reloadStore,
+ },
+ });
+ };
+
+ let runEditor = () => {
+ let rec = sm.getSelection()[0];
+ if (!rec) {
+ return;
+ }
+ let { rule } = rec.data;
+ createRuleEditWindow(rule);
+ };
+
+ let editButton = Ext.create('Proxmox.button.Button', {
+ text: gettext('Edit'),
+ disabled: true,
+ selModel: sm,
+ handler: runEditor,
+ });
+
+ let removeButton = Ext.create('Proxmox.button.StdRemoveButton', {
+ selModel: sm,
+ baseurl: '/cluster/ha/rules/',
+ callback: reloadStore,
+ });
+
+ Ext.apply(me, {
+ store: store,
+ selModel: sm,
+ viewConfig: {
+ trackOver: false,
+ },
+ emptyText: Ext.String.format(gettext('No {0} rules configured.'), me.ruleTitle),
+ tbar: [
+ {
+ text: gettext('Add'),
+ handler: () => createRuleEditWindow(),
+ },
+ editButton,
+ removeButton,
+ ],
+ listeners: {
+ activate: reloadStore,
+ itemdblclick: runEditor,
+ },
+ });
+
+ me.columns.unshift(
+ {
+ header: gettext('Enabled'),
+ width: 80,
+ dataIndex: 'disable',
+ align: 'center',
+ renderer: function (value) {
+ return Proxmox.Utils.renderEnabledIcon(!value);
+ },
+ sortable: true,
+ },
+ {
+ header: gettext('State'),
+ xtype: 'actioncolumn',
+ width: 65,
+ align: 'center',
+ dataIndex: 'errors',
+ items: [
+ {
+ handler: (table, rowIndex, colIndex, item, event, { data }) => {
+ let errors = Object.keys(data.errors ?? {});
+ if (!errors.length) {
+ return;
+ }
+
+ Ext.create('PVE.ha.RuleErrorsModal', {
+ autoShow: true,
+ errors: data.errors ?? {},
+ });
+ },
+ getTip: (value) => {
+ let errors = Object.keys(value ?? {});
+ if (errors.length) {
+ return gettext('HA Rule has conflicts and/or errors.');
+ } else {
+ return gettext('HA Rule is OK.');
+ }
+ },
+ getClass: (value) => {
+ let iconName = 'check';
+
+ let errors = Object.keys(value ?? {});
+ if (errors.length) {
+ iconName = 'exclamation-triangle';
+ }
+
+ return `fa fa-${iconName}`;
+ },
+ },
+ ],
+ },
+ );
+
+ me.columns.push({
+ header: gettext('Comment'),
+ flex: 1,
+ renderer: Ext.String.htmlEncode,
+ dataIndex: 'comment',
+ });
+
+ me.callParent();
+ },
+});
+
+Ext.define(
+ 'PVE.ha.RulesView',
+ {
+ extend: 'Ext.panel.Panel',
+ alias: 'widget.pveHARulesView',
+
+ onlineHelp: 'ha_manager_rules',
+
+ layout: {
+ type: 'vbox',
+ align: 'stretch',
+ },
+
+ items: [
+ {
+ title: gettext('HA Node Affinity Rules'),
+ xtype: 'pveHANodeAffinityRulesView',
+ flex: 1,
+ border: 0,
+ },
+ ],
+ },
+ function () {
+ Ext.define('pve-ha-rules', {
+ extend: 'Ext.data.Model',
+ fields: [
+ 'rule',
+ 'type',
+ 'nodes',
+ 'digest',
+ 'errors',
+ 'disable',
+ 'comment',
+ 'resources',
+ {
+ name: 'strict',
+ type: 'boolean',
+ },
+ ],
+ proxy: {
+ type: 'proxmox',
+ url: '/api2/json/cluster/ha/rules',
+ },
+ idProperty: 'rule',
+ });
+ },
+);
diff --git a/www/manager6/ha/GroupEdit.js b/www/manager6/ha/rules/NodeAffinityRuleEdit.js
similarity index 67%
rename from www/manager6/ha/GroupEdit.js
rename to www/manager6/ha/rules/NodeAffinityRuleEdit.js
index f7eed22e..4574d9ef 100644
--- a/www/manager6/ha/GroupEdit.js
+++ b/www/manager6/ha/rules/NodeAffinityRuleEdit.js
@@ -1,22 +1,10 @@
-Ext.define('PVE.ha.GroupInputPanel', {
- extend: 'Proxmox.panel.InputPanel',
- onlineHelp: 'ha_manager_groups',
-
- groupId: undefined,
-
- onGetValues: function (values) {
- var me = this;
-
- if (me.isCreate) {
- values.type = 'group';
- }
-
- return values;
- },
+Ext.define('PVE.ha.rules.NodeAffinityInputPanel', {
+ extend: 'PVE.ha.RuleInputPanel',
initComponent: function () {
- var me = this;
+ let me = this;
+ /* TODO Node selector should be factored out in its own component */
let update_nodefield, update_node_selection;
let sm = Ext.create('Ext.selection.CheckboxModel', {
@@ -134,84 +122,25 @@ Ext.define('PVE.ha.GroupInputPanel', {
nodefield.resumeEvent('change');
};
- me.column1 = [
+ me.column2 = [
{
- xtype: me.isCreate ? 'textfield' : 'displayfield',
- name: 'group',
- value: me.groupId || '',
- fieldLabel: 'ID',
- vtype: 'StorageId',
- allowBlank: false,
+ xtype: 'proxmoxcheckbox',
+ name: 'strict',
+ fieldLabel: gettext('Strict'),
+ autoEl: {
+ tag: 'div',
+ 'data-qtip': gettext(
+ 'Enable if the HA Resources must be restricted to the nodes.',
+ ),
+ },
+ uncheckedValue: 0,
+ defaultValue: 0,
},
nodefield,
];
- me.column2 = [
- {
- xtype: 'proxmoxcheckbox',
- name: 'restricted',
- uncheckedValue: 0,
- fieldLabel: 'restricted',
- },
- {
- xtype: 'proxmoxcheckbox',
- name: 'nofailback',
- uncheckedValue: 0,
- fieldLabel: 'nofailback',
- },
- ];
-
- me.columnB = [
- {
- xtype: 'textfield',
- name: 'comment',
- fieldLabel: gettext('Comment'),
- },
- nodegrid,
- ];
+ me.columnB = [nodegrid];
me.callParent();
},
});
-
-Ext.define('PVE.ha.GroupEdit', {
- extend: 'Proxmox.window.Edit',
-
- groupId: undefined,
-
- initComponent: function () {
- var me = this;
-
- me.isCreate = !me.groupId;
-
- if (me.isCreate) {
- me.url = '/api2/extjs/cluster/ha/groups';
- me.method = 'POST';
- } else {
- me.url = '/api2/extjs/cluster/ha/groups/' + me.groupId;
- me.method = 'PUT';
- }
-
- var ipanel = Ext.create('PVE.ha.GroupInputPanel', {
- isCreate: me.isCreate,
- groupId: me.groupId,
- });
-
- Ext.apply(me, {
- subject: gettext('HA Group'),
- items: [ipanel],
- });
-
- me.callParent();
-
- if (!me.isCreate) {
- me.load({
- success: function (response, options) {
- var values = response.result.data;
-
- ipanel.setValues(values);
- },
- });
- }
- },
-});
diff --git a/www/manager6/ha/rules/NodeAffinityRules.js b/www/manager6/ha/rules/NodeAffinityRules.js
new file mode 100644
index 00000000..b6143acd
--- /dev/null
+++ b/www/manager6/ha/rules/NodeAffinityRules.js
@@ -0,0 +1,36 @@
+Ext.define('PVE.ha.NodeAffinityRulesView', {
+ extend: 'PVE.ha.RulesBaseView',
+ alias: 'widget.pveHANodeAffinityRulesView',
+
+ ruleType: 'node-affinity',
+ ruleTitle: gettext('HA Node Affinity'),
+ inputPanel: 'NodeAffinityInputPanel',
+ faIcon: 'map-pin',
+
+ stateful: true,
+ stateId: 'grid-ha-node-affinity-rules',
+
+ initComponent: function () {
+ let me = this;
+
+ me.columns = [
+ {
+ header: gettext('Strict'),
+ width: 75,
+ dataIndex: 'strict',
+ },
+ {
+ header: gettext('HA Resources'),
+ flex: 1,
+ dataIndex: 'resources',
+ },
+ {
+ header: gettext('Nodes'),
+ flex: 1,
+ dataIndex: 'nodes',
+ },
+ ];
+
+ me.callParent();
+ },
+});
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules
2025-07-29 18:00 [pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules Daniel Kral
` (24 preceding siblings ...)
2025-07-29 18:01 ` [pve-devel] [PATCH manager v4 4/4] ui: ha: replace ha groups with ha node affinity rules Daniel Kral
@ 2025-07-30 17:29 ` Michael Köppl
25 siblings, 0 replies; 28+ messages in thread
From: Michael Köppl @ 2025-07-30 17:29 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Gave this version another spin today, focusing on the migration from
groups to rules. I tested this 3-node and 5-node clusters. Went through
the following scenarios:
1) At least one of the nodes in the cluster not at minimum version
required for migration to rules
2) At least one node offline during the attempt to migrate to rules
In both of the above cases, only the in-memory mapping of groups to
rules will happen. Groups continue to work on the PVE 8 nodes and rules
continue work on the PVE 9 nodes. It should be noted that the nofailback
flag is not inverted for the resources while the rules are still
in-memory. This "switch" from nofailback to failback only occurs once
the migration is persisted.
3) Updating the remaining PVE 8 nodes one after another
Persistent migration started soon after all nodes were upgraded to PVE 9
(there is a slight delay since the check if groups need to migrated does
not happen every round). Worked smoothly and I did not notice any
discrepancies in the rules.cfg generated from the groups.cfg.
4) Migration with non-existent groups in resource.cfg
5) Invalid properties in resources.cfg or groups.cfg
6) Partially upgrading the cluster, editing a rule on a PVE 9 node
This will not persist. It is not unexpected, since the rules exist only
in-memory at this point, but users should probably be warned about
making any changes to rules mid-upgrade.
Dano already incorporated feedback from Hannes' and my tests and we also
tested updated versions that fix the problems that we noticed, just
documenting it here for the sake of completeness. The migration from
groups to rules overall worked very well in the cases where migration
was already possible and did not proceed (and provided informative
errors or warnings) if it was not.
On 7/29/25 20:03, Daniel Kral wrote:
> Here's a quick update on the core HA rules series. This cleans up the
> series so that all tests are running again and includes the missing ui
> patch that I didn't see missing last time.
>
> The persistent migration path has been tested for at least four full
> upgrade runs now, always with one node being behind and checking that
> the group config is only removed as soon as all nodes are on the right
> version.
>
> I'll wait for tomorrow if something comes up and will do some testing
> myself, so I'm anticipating to follow up on this tomorrow. I'll also
> want to get a more mature version of the HA resource affinity series
> ready for tomorrow on the mailing list.
>
> For maintainers: ha-manager patch #19 should be updated to the correct
> pve-manager version that is dependent on the pve-ha-manager package
> which can interpret the HA rules config.
>
> Changelog since v3
> ------------------
>
> - rebased on newest available master
>
> - included missing ui patch for web interface
>
> - correction in failback property description (does not influence the ha
> node affinity rules)
>
> - migrated the groups configs in the test cases to node affinity rules
> in rules configs (except two test cases for the persistent migration)
>
> - improved persistent ha group migration process
>
> - try a persistent upgrade only every 10 HA manager rounds
>
> - various other minor touches
>
> TODO
> ----
>
> - More testing on edge cases for the HA Manager migration path
>
> - Some more testing of the ha-manager CLI and adding a deprecation
> warning on the HA Groups API and disallowing requests as soon as the
> groups config is fully migrated
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 28+ messages in thread