* [pve-devel] [PATCH common v2 1/1] introduce HashTools module
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 1/3] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
` (41 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add a new package PVE::HashTools to provide helpers for common
operations done on hashes.
These initial helper subroutines implement basic set operations done on
hash sets, i.e. hashes with elements set to a true value, e.g. 1.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- moved from pve-ha-manager PVE::HA::Tools to pve-common as
PVE::HashTools
- improved implementations
- added documentation
src/Makefile | 1 +
src/PVE/HashTools.pm | 101 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+)
create mode 100644 src/PVE/HashTools.pm
diff --git a/src/Makefile b/src/Makefile
index 2d8bdc4..ee114d1 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -17,6 +17,7 @@ LIB_SOURCES = \
Daemon.pm \
Exception.pm \
Format.pm \
+ HashTools.pm \
INotify.pm \
JSONSchema.pm \
Job/Registry.pm \
diff --git a/src/PVE/HashTools.pm b/src/PVE/HashTools.pm
new file mode 100644
index 0000000..463fe7c
--- /dev/null
+++ b/src/PVE/HashTools.pm
@@ -0,0 +1,101 @@
+package PVE::HashTools;
+
+use strict;
+use warnings;
+
+=head1 NAME
+
+PVE::HashTools - Helpers for Hashes
+
+=head1 DESCRIPTION
+
+This packages provides helpers for common operations on hashes.
+
+Even though these operations' implementation are often one-liners, they are
+meant to improve code readability by stating a operation name instead of the
+more verbose implementation.
+
+=cut
+
+=head1 FUNCTIONS
+
+=cut
+
+=head3 set_intersect($hash1, $hash2)
+
+Returns a hash set of the intersection of the hash sets C<$hash1> and
+C<$hash2>, i.e. the elements that are both in C<$hash1> and C<$hash2>.
+
+The hashes C<$hash1> and C<$hash2> are expected to be hash sets, i.e.
+key-value pairs are always set to C<1> or another truthy value.
+
+=cut
+
+sub set_intersect : prototype($$) {
+ my ($hash1, $hash2) = @_;
+
+ my $result = { map { $hash1->{$_} && $hash2->{$_} ? ($_ => 1) : () } keys %$hash1 };
+
+ return $result;
+}
+
+=head3 set_difference($hash1, $hash2)
+
+Returns a hash set of the set difference between the hash sets C<$hash1> and
+C<$hash2>, i.e. the elements that are in C<$hash1> without the elements that
+are in C<$hash2>.
+
+The hashes C<$hash1> and C<$hash2> are expected to be hash sets, i.e.
+key-value pairs are always set to C<1> or another truthy value.
+
+=cut
+
+sub set_difference : prototype($$) {
+ my ($hash1, $hash2) = @_;
+
+ my $result = { map { $hash2->{$_} ? () : ($_ => 1) } keys %$hash1 };
+
+ return $result;
+}
+
+=head3 set_union($hash1, $hash2)
+
+Returns a hash set of the union of the hash sets C<$hash1> and C<$hash2>, i.e.
+the elements that are in either C<$hash1> or C<$hash2>.
+
+The hashes C<$hash1> and C<$hash2> are expected to be hash sets, i.e.
+key-value pairs are always set to C<1> or another truthy value.
+
+=cut
+
+sub set_union : prototype($$) {
+ my ($hash1, $hash2) = @_;
+
+ my $result = { map { $_ => 1 } keys %$hash1, keys %$hash2 };
+
+ return $result;
+}
+
+=head3 sets_are_disjoint($hash1, $hash2)
+
+Checks whether the two given hash sets C<$hash1> and C<$hash2> are disjoint,
+i.e. have no common element in both of them.
+
+The hashes C<$hash1> and C<$hash2> are expected to be hash sets, i.e.
+key-value pairs are always set to C<1> or another truthy value.
+
+Returns C<1> if they are disjoint, C<0> otherwise.
+
+=cut
+
+sub sets_are_disjoint : prototype($$) {
+ my ($hash1, $hash2) = @_;
+
+ for my $key (keys %$hash1) {
+ return 0 if $hash2->{$key};
+ }
+
+ return 1;
+}
+
+1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH cluster v2 1/3] cfs: add 'ha/rules.cfg' to observed files
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH common v2 1/1] introduce HashTools module Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 2/3] datacenter config: make pve-ha-shutdown-policy optional Daniel Kral
` (40 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- only rebased on master
src/PVE/Cluster.pm | 1 +
src/pmxcfs/status.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index 3b1de57..9ec4f66 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -69,6 +69,7 @@ my $observed = {
'ha/crm_commands' => 1,
'ha/manager_status' => 1,
'ha/resources.cfg' => 1,
+ 'ha/rules.cfg' => 1,
'ha/groups.cfg' => 1,
'ha/fence.cfg' => 1,
'status.cfg' => 1,
diff --git a/src/pmxcfs/status.c b/src/pmxcfs/status.c
index 0895e53..38316b4 100644
--- a/src/pmxcfs/status.c
+++ b/src/pmxcfs/status.c
@@ -97,6 +97,7 @@ static memdb_change_t memdb_change_array[] = {
{.path = "ha/crm_commands"},
{.path = "ha/manager_status"},
{.path = "ha/resources.cfg"},
+ {.path = "ha/rules.cfg"},
{.path = "ha/groups.cfg"},
{.path = "ha/fence.cfg"},
{.path = "status.cfg"},
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH cluster v2 2/3] datacenter config: make pve-ha-shutdown-policy optional
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH common v2 1/1] introduce HashTools module Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 1/3] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules Daniel Kral
` (39 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
If there are other properties in the HA config hash, these cannot be set
without also giving a value for shutdown_policy, which is unnecessary as
it already has a default value. Therefore, make it optional.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/DataCenterConfig.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index 53095a1..3c983b8 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -124,6 +124,7 @@ register_standard_option(
. "be moved back to the previously powered-off node, at least if no other migration, "
. "reloaction or recovery took place.",
default => 'conditional',
+ optional => 1,
},
);
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (2 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 2/3] datacenter config: make pve-ha-shutdown-policy optional Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-23 15:58 ` Thomas Lamprecht
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 01/26] tree-wide: make arguments for select_service_node explicit Daniel Kral
` (38 subsequent siblings)
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add a feature flag 'use-location-rules', which is used to control the
behavior of how the HA WebGUI interface and HA API endpoints handle HA
Groups and HA Location rules.
If the flag is not set, HA Location rules shouldn't be able to be
created or modified, but only allow their behavior to be represented
with HA Groups, i.e. as it has been before. If they are present in the
config, e.g. added manually, then they should be ignored.
If the flag is set, the HA WebGUI and API endpoints should not allow HA
Groups to be CRUD'd anymore, but only allow their behavior to be
represented with HA Location rules. This also should expose the
'failback' property on HA services and disallow HA services to be
assigned to HA groups through the API.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/DataCenterConfig.pm | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index 3c983b8..76c5706 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -130,6 +130,12 @@ register_standard_option(
my $ha_format = {
shutdown_policy => get_standard_option('pve-ha-shutdown-policy'),
+ 'use-location-rules' => {
+ type => 'boolean',
+ description => "Whether HA Location rules should be used instead of HA groups.",
+ optional => 1,
+ default => 0,
+ },
};
my $next_id_format = {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules Daniel Kral
@ 2025-06-23 15:58 ` Thomas Lamprecht
2025-06-24 7:29 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Thomas Lamprecht @ 2025-06-23 15:58 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 20.06.25 um 16:31 schrieb Daniel Kral:
> Add a feature flag 'use-location-rules', which is used to control the
> behavior of how the HA WebGUI interface and HA API endpoints handle HA
> Groups and HA Location rules.
>
> If the flag is not set, HA Location rules shouldn't be able to be
> created or modified, but only allow their behavior to be represented
> with HA Groups, i.e. as it has been before. If they are present in the
> config, e.g. added manually, then they should be ignored.
>
> If the flag is set, the HA WebGUI and API endpoints should not allow HA
> Groups to be CRUD'd anymore, but only allow their behavior to be
> represented with HA Location rules. This also should expose the
> 'failback' property on HA services and disallow HA services to be
> assigned to HA groups through the API.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes since v1:
> - NEW!
>
> src/PVE/DataCenterConfig.pm | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
> index 3c983b8..76c5706 100644
> --- a/src/PVE/DataCenterConfig.pm
> +++ b/src/PVE/DataCenterConfig.pm
> @@ -130,6 +130,12 @@ register_standard_option(
>
> my $ha_format = {
> shutdown_policy => get_standard_option('pve-ha-shutdown-policy'),
> + 'use-location-rules' => {
> + type => 'boolean',
> + description => "Whether HA Location rules should be used instead of HA groups.",
> + optional => 1,
> + default => 0,
it's IMO rather odd that one can enable this, then do some CRUD stuff and
then disable this flag here again, feels rather awkward and prone to user
errors.
I'd much rather see a transparent switch, where the new affinity system
parses and handles existing group definition, and when anything is written
out (group or affinity changes) then writes those group definitions also
out as such affinity rules and drops the group definitions (or just ignore
them completely once an affinity rule config exists).
If the building blocks are there this should not be really _that_ hard
I think.
> + },
> };
>
> my $next_id_format = {
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules
2025-06-23 15:58 ` Thomas Lamprecht
@ 2025-06-24 7:29 ` Daniel Kral
2025-06-24 7:51 ` Thomas Lamprecht
0 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-24 7:29 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
On 6/23/25 17:58, Thomas Lamprecht wrote:
> Am 20.06.25 um 16:31 schrieb Daniel Kral:
>> diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
>> index 3c983b8..76c5706 100644
>> --- a/src/PVE/DataCenterConfig.pm
>> +++ b/src/PVE/DataCenterConfig.pm
>> @@ -130,6 +130,12 @@ register_standard_option(
>>
>> my $ha_format = {
>> shutdown_policy => get_standard_option('pve-ha-shutdown-policy'),
>> + 'use-location-rules' => {
>> + type => 'boolean',
>> + description => "Whether HA Location rules should be used instead of HA groups.",
>> + optional => 1,
>> + default => 0,
>
> it's IMO rather odd that one can enable this, then do some CRUD stuff and
> then disable this flag here again, feels rather awkward and prone to user
> errors.
>
> I'd much rather see a transparent switch, where the new affinity system
> parses and handles existing group definition, and when anything is written
> out (group or affinity changes) then writes those group definitions also
> out as such affinity rules and drops the group definitions (or just ignore
> them completely once an affinity rule config exists).
>
> If the building blocks are there this should not be really _that_ hard
> I think.
I'm not a fan of it myself right now how it works and it would also
prefer a "one-time" switch and users not being able to go back and forth
as it introduces many unnecessary states that we need to also test (i.e.
going back from location rules to groups). The migration part from
groups to the service config + location rules are already there as HA
groups are converted to HA location rules and then applied as such after
patch ha-manager #11 and #12.
I like the idea of writing out new/existing groups to location rules,
but the only part that would still be missing here for me is how we
should inform users about this? It would feel rather irritating if
they're testing out the colocation rules and suddenly their groups are
gone and converted to location rules.
@Fiona suggested a "Convert to Location Rules" button in the web
interface and else a API/CLI endpoint once off-list, maybe that could do
the trick? As soon as the conversion was successful (no naming
conflicts, group reference was removed from services, etc.), the groups
config is dropped and that is the indicator whether location rules
should be used or not. For new users that would also be true as the
group config doesn't exist yet. What do you think?
I'd also prevent users then to create HA groups if the group config
doesn't already exist, so that new users already start to use the
location rules instead.
Resolving naming conflicts could just be a mapping table in the web
interface where users can choose the "new" location rules names, but I'm
wondering if there's a better way to do this, especially when users do
that migration on the CLI.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules
2025-06-24 7:29 ` Daniel Kral
@ 2025-06-24 7:51 ` Thomas Lamprecht
2025-06-24 8:19 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Thomas Lamprecht @ 2025-06-24 7:51 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion
Am 24.06.25 um 09:29 schrieb Daniel Kral:
> I like the idea of writing out new/existing groups to location rules,
> but the only part that would still be missing here for me is how we
> should inform users about this? It would feel rather irritating if
> they're testing out the colocation rules and suddenly their groups are
> gone and converted to location rules.
If you think that's irritating it always will be, so then we should
keep the groups as is (how they are internal handled and represented
doesn't matter).
But if we really want to replace groups by this then why should this
even be irritating? Upgrade and the groups are gone and there are
just affinity/anti-affinity rules view/api left, the user handles
it purely over that groups are no more and the parse+write is the
transparent upgrade. That way the admin doesn't need to do anything,
which is always a huge win, and there is only one API/UI to manage
those so it's rather crystal clear what happens; IMO much less confusing
than any enable/disable or active conversion, as once we can convert
automatically anyway, why add extra work and make the user do it?
Sure, the implementation needs to work, but that it does anyway
(or what else should a user do when they enabled this or pressed
convert?¿), that's what unit tests and QA/testing is for.
Especially with the upcoming major release we would have a chance
to make an easy and clean transition here. But we can still do this
later on, then we need to keep the API for groups and wire it to
location rules, not as nice, but I do not think the group API is
heavily used in automation, or at least anybody will happily switch
over to affinity rule API as any automation probably tried to
workaround the lack of those flexible rules, so for the user it's
still better than what's proposed here IMO.
> @Fiona suggested a "Convert to Location Rules" button in the web
> interface and else a API/CLI endpoint once off-list, maybe that could do
> the trick? As soon as the conversion was successful (no naming
> conflicts, group reference was removed from services, etc.), the groups
> config is dropped and that is the indicator whether location rules
> should be used or not. For new users that would also be true as the
> group config doesn't exist yet. What do you think?
No, please no convert buttons, that's just as "irritating", especially
without a convert back button, and on top of that it puts mental load
to admins to make a decision they quite definitively do not care about.
Let's please not make ours (those having to maintain and those having
to provide support) and users life more difficult at the same time.
> I'd also prevent users then to create HA groups if the group config
> doesn't already exist, so that new users already start to use the
> location rules instead.
So does a replacement of the API/UI for groups, as we cannot really
fully hedge against user manually editing config files that's hardly
a case I'd look at.
>
> Resolving naming conflicts could just be a mapping table in the web
> interface where users can choose the "new" location rules names, but I'm
> wondering if there's a better way to do this, especially when users do
> that migration on the CLI.
Just use the group name, if you parse them as affinity rules right away
no conflict can happen due to them being already included in the state
and thus any uniqueness check.
That said, might not really require a named ID's here anyway, having
a list (array) of rules and a short 127 or 255 max-length free-form
comment might be more helpful here anyway, as that avoids the need
to choose an available ID while still allowing to encode hints and
rationale for certain rule entries.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules
2025-06-24 7:51 ` Thomas Lamprecht
@ 2025-06-24 8:19 ` Daniel Kral
2025-06-24 8:25 ` Thomas Lamprecht
0 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-24 8:19 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
On 6/24/25 09:51, Thomas Lamprecht wrote:
> Am 24.06.25 um 09:29 schrieb Daniel Kral:
>> I like the idea of writing out new/existing groups to location rules,
>> but the only part that would still be missing here for me is how we
>> should inform users about this? It would feel rather irritating if
>> they're testing out the colocation rules and suddenly their groups are
>> gone and converted to location rules.
>
> If you think that's irritating it always will be, so then we should
> keep the groups as is (how they are internal handled and represented
> doesn't matter).
>
> But if we really want to replace groups by this then why should this
> even be irritating? Upgrade and the groups are gone and there are
> just affinity/anti-affinity rules view/api left, the user handles
> it purely over that groups are no more and the parse+write is the
> transparent upgrade. That way the admin doesn't need to do anything,
> which is always a huge win, and there is only one API/UI to manage
> those so it's rather crystal clear what happens; IMO much less confusing
> than any enable/disable or active conversion, as once we can convert
> automatically anyway, why add extra work and make the user do it?
>
> Sure, the implementation needs to work, but that it does anyway
> (or what else should a user do when they enabled this or pressed
> convert?¿), that's what unit tests and QA/testing is for.
>
> Especially with the upcoming major release we would have a chance
> to make an easy and clean transition here. But we can still do this
> later on, then we need to keep the API for groups and wire it to
> location rules, not as nice, but I do not think the group API is
> heavily used in automation, or at least anybody will happily switch
> over to affinity rule API as any automation probably tried to
> workaround the lack of those flexible rules, so for the user it's
> still better than what's proposed here IMO.
>
>> @Fiona suggested a "Convert to Location Rules" button in the web
>> interface and else a API/CLI endpoint once off-list, maybe that could do
>> the trick? As soon as the conversion was successful (no naming
>> conflicts, group reference was removed from services, etc.), the groups
>> config is dropped and that is the indicator whether location rules
>> should be used or not. For new users that would also be true as the
>> group config doesn't exist yet. What do you think?
>
> No, please no convert buttons, that's just as "irritating", especially
> without a convert back button, and on top of that it puts mental load
> to admins to make a decision they quite definitively do not care about.
>
> Let's please not make ours (those having to maintain and those having
> to provide support) and users life more difficult at the same time.
Right, I fully agree for both of your answers above here, we shouldn't
put the burden on the end user here to make the decision if in the end
groups should be replaced with location rules anyway.
As said I'd be happy to reduce the amount of compatibility burden for
users, support and in the code base, so I'll happily implement it that way!
>
>> I'd also prevent users then to create HA groups if the group config
>> doesn't already exist, so that new users already start to use the
>> location rules instead.
>
> So does a replacement of the API/UI for groups, as we cannot really
> fully hedge against user manually editing config files that's hardly
> a case I'd look at.
>
>>
>> Resolving naming conflicts could just be a mapping table in the web
>> interface where users can choose the "new" location rules names, but I'm
>> wondering if there's a better way to do this, especially when users do
>> that migration on the CLI.
>
> Just use the group name, if you parse them as affinity rules right away
> no conflict can happen due to them being already included in the state
> and thus any uniqueness check.
>
> That said, might not really require a named ID's here anyway, having
> a list (array) of rules and a short 127 or 255 max-length free-form
> comment might be more helpful here anyway, as that avoids the need
> to choose an available ID while still allowing to encode hints and
> rationale for certain rule entries.
Right, especially because ids can't be changed but comments can, I'll
also prefer that much over forcing users to come up with new names for
rules and not being able to change them (without the delete/create
cycle). I guess something similar to the user.cfg layout would do it?
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules
2025-06-24 8:19 ` Daniel Kral
@ 2025-06-24 8:25 ` Thomas Lamprecht
2025-06-24 8:52 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Thomas Lamprecht @ 2025-06-24 8:25 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion
Am 24.06.25 um 10:19 schrieb Daniel Kral:
> On 6/24/25 09:51, Thomas Lamprecht wrote:
>> That said, might not really require a named ID's here anyway, having
>> a list (array) of rules and a short 127 or 255 max-length free-form
>> comment might be more helpful here anyway, as that avoids the need
>> to choose an available ID while still allowing to encode hints and
>> rationale for certain rule entries.
>
> Right, especially because ids can't be changed but comments can, I'll
> also prefer that much over forcing users to come up with new names for
> rules and not being able to change them (without the delete/create
> cycle). I guess something similar to the user.cfg layout would do it?
FWIW, we could keep ID's for simplicity but auto-generate them on
creation (probably frontend side though to not break the POST-only-once
REST-API principle.) and not show them in the UI by default.
Similar to what we did at a few places in PBS, albeit there we allowed
overriding it as there references to those IDs can exist, for affinity
rules that probably should not matter, at least nothing would come to
my mind that would need referencing them.
Biggest advantage is that we can keep section config format here and
avoid a semi-custom format. The user.cfg is inspired by shadow.cfg,
which does a similar job so it's more relatable there (and it's one of
the older config files), so I'd not take it too much as example.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules
2025-06-24 8:25 ` Thomas Lamprecht
@ 2025-06-24 8:52 ` Daniel Kral
0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-24 8:52 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
On 6/24/25 10:25, Thomas Lamprecht wrote:
> FWIW, we could keep ID's for simplicity but auto-generate them on
> creation (probably frontend side though to not break the POST-only-once
> REST-API principle.) and not show them in the UI by default.
> Similar to what we did at a few places in PBS, albeit there we allowed
> overriding it as there references to those IDs can exist, for affinity
> rules that probably should not matter, at least nothing would come to
> my mind that would need referencing them.
>
> Biggest advantage is that we can keep section config format here and
> avoid a semi-custom format. The user.cfg is inspired by shadow.cfg,
> which does a similar job so it's more relatable there (and it's one of
> the older config files), so I'd not take it too much as example.
ACK, that sounds good to me as well, I'll do that!
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 01/26] tree-wide: make arguments for select_service_node explicit
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (3 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node Daniel Kral
` (37 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Explicitly state all the parameters at all call sites for
select_service_node(...) to clarify in which states these are.
The call site in next_state_recovery(...) sets $best_scored to 1, as it
should find the next best node when recovering from the failed node
$current_node. All references to $best_scored in select_service_node()
are there to check whether $current_node can be selected, but as
$current_node is not available anyway, so this change should not change
the result of select_service_node(...).
Otherwise, $sd->{failed_nodes} and $sd->{maintenance_node} should
contain only the failed $current_node in next_state_recovery(...), and
therefore both can be passed as these should be impossible states here
anyway. A cleaner way could be to explicitly remove them beforehand or
do extra checks in select_service_node(...).
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/HA/Manager.pm | 11 ++++++++++-
src/test/test_failover1.pl | 15 ++++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 12292e6..85f2b1a 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -971,6 +971,7 @@ sub next_state_started {
$try_next,
$sd->{failed_nodes},
$sd->{maintenance_node},
+ 0, # best_score
);
if ($node && ($sd->{node} ne $node)) {
@@ -1083,7 +1084,15 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
my $recovery_node = select_service_node(
- $self->{groups}, $self->{online_node_usage}, $sid, $cd, $sd->{node},
+ $self->{groups},
+ $self->{online_node_usage},
+ $sid,
+ $cd,
+ $sd->{node},
+ 0, # try_next
+ $sd->{failed_nodes},
+ $sd->{maintenance_node},
+ 1, # best_score
);
if ($recovery_node) {
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 371bdcf..2478b2b 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -24,13 +24,26 @@ my $service_conf = {
group => 'prefer_node1',
};
+my $sd = {
+ failed_nodes => undef,
+ maintenance_node => undef,
+};
+
my $current_node = $service_conf->{node};
sub test {
my ($expected_node, $try_next) = @_;
my $node = PVE::HA::Manager::select_service_node(
- $groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next,
+ $groups,
+ $online_node_usage,
+ "vm:111",
+ $service_conf,
+ $current_node,
+ $try_next,
+ $sd->{failed_nodes},
+ $sd->{maintenance_node},
+ 0, # best_score
);
my (undef, undef, $line) = caller();
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (4 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 01/26] tree-wide: make arguments for select_service_node explicit Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-23 16:21 ` Thomas Lamprecht
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 03/26] introduce rules base plugin Daniel Kral
` (36 subsequent siblings)
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
As the signature of select_service_node(...) has become rather long
already, make it more compact by retrieving service- and
affinity-related data directly from the service state in $sd and
introduce a $mode parameter to distinguish the behaviors of $try_next
and $best_scored, which have already been mutually exclusive before.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/HA/Manager.pm | 87 +++++++++++++++++++++-----------------
src/test/test_failover1.pl | 17 +++-----
2 files changed, 53 insertions(+), 51 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 85f2b1a..85bb114 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -149,18 +149,41 @@ sub get_node_priority_groups {
return ($pri_groups, $group_members);
}
+=head3 select_service_node(...)
+
+=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $mode)
+
+Used to select the best fitting node for the service C<$sid>, with the
+configuration C<$service_conf> and state C<$sd>, according to the groups defined
+in C<$groups>, available node utilization in C<$online_node_usage>, and the
+given C<$mode>.
+
+The C<$mode> can be set to:
+
+=over
+
+=item C<'none'>
+
+Try to stay on the current node as much as possible.
+
+=item C<'best-score'>
+
+Try to select the best-scored node.
+
+=item C<'try-next'>
+
+Try to select the best-scored node, which is not in C<< $sd->{failed_nodes} >>,
+while trying to stay on the current node.
+
+=back
+
+=cut
+
sub select_service_node {
- my (
- $groups,
- $online_node_usage,
- $sid,
- $service_conf,
- $current_node,
- $try_next,
- $tried_nodes,
- $maintenance_fallback,
- $best_scored,
- ) = @_;
+ my ($groups, $online_node_usage, $sid, $service_conf, $sd, $mode) = @_;
+
+ my ($current_node, $tried_nodes, $maintenance_fallback) =
+ $sd->@{qw(node failed_nodes maintenance_node)};
my $group = get_service_group($groups, $online_node_usage, $service_conf);
@@ -170,11 +193,7 @@ sub select_service_node {
return undef if !scalar(@pri_list);
# stay on current node if possible (avoids random migrations)
- if (
- (!$try_next && !$best_scored)
- && $group->{nofailback}
- && defined($group_members->{$current_node})
- ) {
+ if ($mode eq 'none' && $group->{nofailback} && defined($group_members->{$current_node})) {
return $current_node;
}
@@ -183,7 +202,7 @@ sub select_service_node {
my $top_pri = $pri_list[0];
# try to avoid nodes where the service failed already if we want to relocate
- if ($try_next) {
+ if ($mode eq 'try-next') {
foreach my $node (@$tried_nodes) {
delete $pri_groups->{$top_pri}->{$node};
}
@@ -192,8 +211,7 @@ sub select_service_node {
return $maintenance_fallback
if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
- return $current_node
- if (!$try_next && !$best_scored) && $pri_groups->{$top_pri}->{$current_node};
+ return $current_node if $mode eq 'none' && $pri_groups->{$top_pri}->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
@@ -208,8 +226,8 @@ sub select_service_node {
}
}
- if ($try_next) {
- if (!$best_scored && defined($found) && ($found < (scalar(@nodes) - 1))) {
+ if ($mode eq 'try-next') {
+ if (defined($found) && ($found < (scalar(@nodes) - 1))) {
return $nodes[$found + 1];
} else {
return $nodes[0];
@@ -797,11 +815,8 @@ sub next_state_request_start {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- 0, # try_next
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 1, # best_score
+ $sd,
+ 'best-score',
);
my $select_text = $selected_node ne $current_node ? 'new' : 'current';
$haenv->log(
@@ -901,7 +916,7 @@ sub next_state_started {
} else {
- my $try_next = 0;
+ my $select_mode = 'none';
if ($lrm_res) {
@@ -932,7 +947,7 @@ sub next_state_started {
if (scalar(@{ $sd->{failed_nodes} }) <= $cd->{max_relocate}) {
# tell select_service_node to relocate if possible
- $try_next = 1;
+ $select_mode = 'try-next';
$haenv->log(
'warning',
@@ -967,11 +982,8 @@ sub next_state_started {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- $try_next,
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 0, # best_score
+ $sd,
+ $select_mode,
);
if ($node && ($sd->{node} ne $node)) {
@@ -1009,7 +1021,7 @@ sub next_state_started {
);
}
} else {
- if ($try_next && !defined($node)) {
+ if ($select_mode eq 'try-next' && !defined($node)) {
$haenv->log(
'warning',
"Start Error Recovery: Tried all available nodes for service '$sid', retry"
@@ -1088,11 +1100,8 @@ sub next_state_recovery {
$self->{online_node_usage},
$sid,
$cd,
- $sd->{node},
- 0, # try_next
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 1, # best_score
+ $sd,
+ 'best-score',
);
if ($recovery_node) {
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 2478b2b..90f5cf4 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -25,32 +25,25 @@ my $service_conf = {
};
my $sd = {
+ node => $service_conf->{node},
failed_nodes => undef,
maintenance_node => undef,
};
-my $current_node = $service_conf->{node};
-
sub test {
my ($expected_node, $try_next) = @_;
+ my $select_mode = $try_next ? 'try-next' : 'none';
+
my $node = PVE::HA::Manager::select_service_node(
- $groups,
- $online_node_usage,
- "vm:111",
- $service_conf,
- $current_node,
- $try_next,
- $sd->{failed_nodes},
- $sd->{maintenance_node},
- 0, # best_score
+ $groups, $online_node_usage, "vm:111", $service_conf, $sd, $select_mode,
);
my (undef, undef, $line) = caller();
die "unexpected result: $node != ${expected_node} at line $line\n"
if $node ne $expected_node;
- $current_node = $node;
+ $sd->{node} = $node;
}
test('node1');
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node Daniel Kral
@ 2025-06-23 16:21 ` Thomas Lamprecht
2025-06-24 8:06 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Thomas Lamprecht @ 2025-06-23 16:21 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 20.06.25 um 16:31 schrieb Daniel Kral:
> As the signature of select_service_node(...) has become rather long
> already, make it more compact by retrieving service- and
> affinity-related data directly from the service state in $sd and
> introduce a $mode parameter to distinguish the behaviors of $try_next
> and $best_scored, which have already been mutually exclusive before.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes since v1:
> - NEW!
>
> src/PVE/HA/Manager.pm | 87 +++++++++++++++++++++-----------------
> src/test/test_failover1.pl | 17 +++-----
> 2 files changed, 53 insertions(+), 51 deletions(-)
>
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 85f2b1a..85bb114 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -149,18 +149,41 @@ sub get_node_priority_groups {
> return ($pri_groups, $group_members);
> }
>
> +=head3 select_service_node(...)
> +
> +=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $mode)
> +
> +Used to select the best fitting node for the service C<$sid>, with the
> +configuration C<$service_conf> and state C<$sd>, according to the groups defined
> +in C<$groups>, available node utilization in C<$online_node_usage>, and the
> +given C<$mode>.
> +
> +The C<$mode> can be set to:
> +
> +=over
> +
> +=item C<'none'>
> +
> +Try to stay on the current node as much as possible.
Why use "none" then? "no mode" is not really helping understand
what happens. Maybe it can be resolved with better names/descriptions,
even for the parameter name (like maybe "node_preference"?), but another
option might be to combine the flags into a hash ref $opts parameter, as
that would also avoid the issues of rather opaque "1" or "0" params on
the call sites, not saying it's a must, but we use that pattern a few
times and conflating such flags into a single string param comes with
its own can of worms (bit more about that below).
> +
> +=item C<'best-score'>
> +
> +Try to select the best-scored node.
> +
> +=item C<'try-next'>
Not sure if switching to a free form string that is nowhere checked for
unknown (invalid) values really is an improvement over the status quo.
Rejecting unknown modes upfront should be definitively added.
Also can we please move the item description as short sentence after the
`=item ...`, i.e., on the same line, to reduce bloating up the code too
much with POD.
=item C<'none'>: Try to stay on the current node as much as possible.
=item C<'best-score'>: Try to select the best-scored node.
=item C<'try-next'>: ...?
> +
> +Try to select the best-scored node, which is not in C<< $sd->{failed_nodes} >>,
> +while trying to stay on the current node.
might be a bit to long since I did more in the HA stack, but is "while trying
to stay on the current node" correct?
> +
> +=back
> +
> +=cut
> +
> sub select_service_node {
> - my (
> - $groups,
> - $online_node_usage,
> - $sid,
> - $service_conf,
> - $current_node,
> - $try_next,
> - $tried_nodes,
> - $maintenance_fallback,
> - $best_scored,
> - ) = @_;
> + my ($groups, $online_node_usage, $sid, $service_conf, $sd, $mode) = @_;
> +
> + my ($current_node, $tried_nodes, $maintenance_fallback) =
> + $sd->@{qw(node failed_nodes maintenance_node)};
>
> my $group = get_service_group($groups, $online_node_usage, $service_conf);
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node
2025-06-23 16:21 ` Thomas Lamprecht
@ 2025-06-24 8:06 ` Daniel Kral
0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-24 8:06 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
On 6/23/25 18:21, Thomas Lamprecht wrote:
> Am 20.06.25 um 16:31 schrieb Daniel Kral:
>> +=head3 select_service_node(...)
>> +
>> +=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $mode)
>> +
>> +Used to select the best fitting node for the service C<$sid>, with the
>> +configuration C<$service_conf> and state C<$sd>, according to the groups defined
>> +in C<$groups>, available node utilization in C<$online_node_usage>, and the
>> +given C<$mode>.
>> +
>> +The C<$mode> can be set to:
>> +
>> +=over
>> +
>> +=item C<'none'>
>> +
>> +Try to stay on the current node as much as possible.
>
> Why use "none" then? "no mode" is not really helping understand
> what happens. Maybe it can be resolved with better names/descriptions,
> even for the parameter name (like maybe "node_preference"?), but another
> option might be to combine the flags into a hash ref $opts parameter, as
> that would also avoid the issues of rather opaque "1" or "0" params on
> the call sites, not saying it's a must, but we use that pattern a few
> times and conflating such flags into a single string param comes with
> its own can of worms (bit more about that below).
I thought about putting them in an $opts, but I felt like a enum-like
type would fit here better to reduce the amount of states slightly as
setting $try_next more or less already implied $best_scored (not the
other way around). $mode = 'none' is really no good information here, I
think $node_preference will be a better name for it here.
>
>> +
>> +=item C<'best-score'>
>> +
>> +Try to select the best-scored node.
>> +
>> +=item C<'try-next'>
>
> Not sure if switching to a free form string that is nowhere checked for
> unknown (invalid) values really is an improvement over the status quo.
> Rejecting unknown modes upfront should be definitively added.
>
> Also can we please move the item description as short sentence after the
> `=item ...`, i.e., on the same line, to reduce bloating up the code too
> much with POD.
>
> =item C<'none'>: Try to stay on the current node as much as possible.
> =item C<'best-score'>: Try to select the best-scored node.
> =item C<'try-next'>: ...?
Will definitely do that!
Speaking of 'try-next' (more below), shouldn't $sd->{failed_nodes}->@*
be cleaned as soon as the service was successfully started on a node? I
feel like $try_next was introduced to separate it from the 'stay on
current node as much ass possible' behavior and might not be needed
anymore now with an explicit $node_preference = 'none'? Then we're only
left with $node_preference = 'none' / 'best-score' (where the latter is
the standard behavior of the helper.
>
>
>> +
>> +Try to select the best-scored node, which is not in C<< $sd->{failed_nodes} >>,
>> +while trying to stay on the current node.
>
> might be a bit to long since I did more in the HA stack, but is "while trying
> to stay on the current node" correct?
Right, I misread the last part at select_service_node(...)
my $found;
for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
my $node = $nodes[$i];
if ($node eq $current_node) {
$found = $i;
}
}
if ($mode eq 'try-next') {
if (defined($found) && ($found < (scalar(@nodes) - 1))) {
return $nodes[$found + 1];
} else {
return $nodes[0];
}
} else {
return $nodes[0];
}
for preferring the $current_node there but it actually is the next-best
node after $current_node after sorting the nodes by the Usage's scores.
If 'try-next' is set, shouldn't $current_node already have been deleted
from $pri_nodes/@nodes here by
if ($mode eq 'try-next') {
foreach my $node (@$tried_nodes) {
delete $pri_nodes->{$node};
}
}
Either way, the last part of the description is wrong and I'll remove it.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 03/26] introduce rules base plugin
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (5 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-07-04 14:18 ` Michael Köppl
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin Daniel Kral
` (35 subsequent siblings)
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add a rules base plugin to allow users to specify different kinds of HA
rules in a single configuration file, which put constraints on the HA
Manager's behavior.
Rule checkers can be registered for every plugin with the
register_check(...) method and are used for checking the feasibility of
the rule set with check_feasibility(...), and creating a feasible rule
set with canonicalize(...).
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- added documentation
- added state property to enable/disable rules (and allow a
tri-state 'contradictory' state for the API endpoint there)
- replace `{encode,decode}_value` plugin calls to
`{encode,decode}_plugin_value` with default implementation to make
these implementations optional for rule plugins
- introduce `set_rule_defaults` to set optional fields with defaults
in later patches
- renamed `are_satisfiable` to `check_feasibility`
- factored `canonicalize` and `checked_config` to only
`canonicalize`
- refactored the feasibility checks to be registered as individual
checkers with `register_check`
- refactored the canonicalization for plugins to be a single call to
plugin_canonicalize (if implemented)
- added rule auto completion
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 2 +-
src/PVE/HA/Rules.pm | 445 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Tools.pm | 22 ++
4 files changed, 469 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Rules.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 0ffbd8d..9bbd375 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -32,6 +32,7 @@
/usr/share/perl5/PVE/HA/Resources.pm
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
+/usr/share/perl5/PVE/HA/Rules.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 8c91b97..489cbc0 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -1,4 +1,4 @@
-SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
+SIM_SOURCES=CRM.pm Env.pm Groups.pm Rules.pm Resources.pm LRM.pm Manager.pm \
NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
SOURCES=${SIM_SOURCES} Config.pm
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
new file mode 100644
index 0000000..e1c84bc
--- /dev/null
+++ b/src/PVE/HA/Rules.pm
@@ -0,0 +1,445 @@
+package PVE::HA::Rules;
+
+use strict;
+use warnings;
+
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::Tools;
+
+use PVE::HA::Tools;
+
+use base qw(PVE::SectionConfig);
+
+=head1 NAME
+
+PVE::HA::Rules - Base Plugin for HA Rules
+
+=head1 SYNOPSIS
+
+ use base qw(PVE::HA::Rules);
+
+=head1 DESCRIPTION
+
+This package provides the capability to have different types of rules in the
+same config file, which put constraints or other rules that the HA Manager must
+or should follow.
+
+Since rules can interfere with each other, i.e., rules can make other rules
+invalid or infeasible, this package also provides the capability to check for
+the feasibility of the rules within a rule plugin and between different rule
+plugins and also prune the rule set in such a way, that becomes feasible again.
+
+This packages inherits its config-related methods from C<L<PVE::SectionConfig>>
+and therefore rule plugins need to implement methods from there as well.
+
+=head1 USAGE
+
+Each I<rule plugin> is required to implement the methods C<L<type()>>,
+C<L<properties()>>, and C<L<options>> from the C<L<PVE::SectionConfig>> to
+extend the properties of this I<base plugin> with plugin-specific properties.
+
+=head2 REGISTERING CHECKS
+
+In order to C<L<< register|/$class->register_check(...) >>> checks for a rule
+plugin, the plugin can override the
+C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
+method, which allows the plugin's checkers to pass plugin-specific subsets of
+specific rules, which are relevant to the checks.
+
+The following example shows a plugin's implementation of its
+C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
+and a trivial check, which will render all rules defining a comment erroneous,
+and blames these errors on the I<comment> property:
+
+ sub get_plugin_check_arguments {
+ my ($class, $rules) = @_;
+
+ my @ruleids = sort {
+ $rules->{order}->{$a} <=> $rules->{order}->{$b}
+ } keys %{$rules->{ids}};
+
+ my $result = {
+ custom_rules => {},
+ };
+
+ for my $ruleid (@ruleids) {
+ my $rule = $rules->{ids}->{$ruleid};
+
+ $result->{custom_rules}->{$ruleid} = $rule if defined($rule->{comment});
+ }
+
+ return $result;
+ }
+
+ __PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return [ sort keys $args->{custom_rules}->%* ];
+ },
+ sub {
+ my ($ruleids, $errors) = @_;
+
+ for my $ruleid (@$ruleids) {
+ push @{$errors->{$ruleid}->{comment}},
+ "rule is ineffective, because I said so.";
+ }
+ }
+ );
+
+=head1 METHODS
+
+=cut
+
+my $defaultData = {
+ propertyList => {
+ type => {
+ description => "HA rule type.",
+ },
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ {
+ completion => \&PVE::HA::Tools::complete_rule,
+ optional => 0,
+ },
+ ),
+ state => {
+ description => "State of the HA rule.",
+ type => 'string',
+ enum => [
+ 'enabled', 'disabled',
+ ],
+ default => 'enabled',
+ optional => 1,
+ },
+ comment => {
+ description => "HA rule description.",
+ type => 'string',
+ maxLength => 4096,
+ optional => 1,
+ },
+ },
+};
+
+sub private {
+ return $defaultData;
+}
+
+=head3 $class->decode_plugin_value(...)
+
+=head3 $class->decode_plugin_value($type, $key, $value)
+
+B<OPTIONAL:> Can be implemented in a I<rule plugin>.
+
+Called during base plugin's C<decode_value()> in order to extend the
+deserialization for plugin-specific values which need it (e.g. lists).
+
+If it is not overrridden by the I<rule plugin>, then it does nothing to
+C<$value> by default.
+
+=cut
+
+sub decode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ return $value;
+}
+
+sub decode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::decode_text($value);
+ }
+
+ my $plugin = $class->lookup($type);
+ return $plugin->decode_plugin_value($type, $key, $value);
+}
+
+=head3 $class->encode_plugin_value(...)
+
+=head3 $class->encode_plugin_value($type, $key, $value)
+
+B<OPTIONAL:> Can be implemented in a I<rule plugin>.
+
+Called during base plugin's C<encode_value()> in order to extend the
+serialization for plugin-specific values which need it (e.g. lists).
+
+If it is not overrridden by the I<rule plugin>, then it does nothing to
+C<$value> by default.
+
+=cut
+
+sub encode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ return $value;
+}
+
+sub encode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::encode_text($value);
+ }
+
+ my $plugin = $class->lookup($type);
+ return $plugin->encode_plugin_value($type, $key, $value);
+}
+
+sub parse_section_header {
+ my ($class, $line) = @_;
+
+ if ($line =~ m/^(\S+):\s*(\S+)\s*$/) {
+ my ($type, $ruleid) = (lc($1), $2);
+ my $errmsg = undef; # set if you want to skip whole section
+ eval { PVE::JSONSchema::pve_verify_configid($ruleid); };
+ $errmsg = $@ if $@;
+ my $config = {}; # to return additional attributes
+ return ($type, $ruleid, $errmsg, $config);
+ }
+ return undef;
+}
+
+# General rule helpers
+
+=head3 $class->set_rule_defaults($rule)
+
+Sets the optional properties in the C<$rule>, which have default values, but
+haven't been explicitly set yet.
+
+=cut
+
+sub set_rule_defaults : prototype($$) {
+ my ($class, $rule) = @_;
+
+ $rule->{state} = 'enabled' if !defined($rule->{state});
+
+ if (my $plugin = $class->lookup($rule->{type})) {
+ my $properties = $plugin->properties();
+
+ for my $prop (keys %$properties) {
+ next if defined($rule->{$prop});
+ next if !$properties->{$prop}->{default};
+ next if !$properties->{$prop}->{optional};
+
+ $rule->{$prop} = $properties->{$prop}->{default};
+ }
+ }
+}
+
+# Rule checks definition and methods
+
+my $types = [];
+my $checkdef;
+
+sub register {
+ my ($class) = @_;
+
+ $class->SUPER::register($class);
+
+ # store order in which plugin types are registered
+ push @$types, $class->type();
+}
+
+=head3 $class->register_check(...)
+
+=head3 $class->register_check($check_func, $collect_errors_func)
+
+Used to register rule checks for a rule plugin.
+
+=cut
+
+sub register_check : prototype($$$) {
+ my ($class, $check_func, $collect_errors_func) = @_;
+
+ my $type = eval { $class->type() };
+ $type = 'global' if $@; # check registered here in the base plugin
+
+ push @{ $checkdef->{$type} }, [
+ $check_func, $collect_errors_func,
+ ];
+}
+
+=head3 $class->get_plugin_check_arguments(...)
+
+=head3 $class->get_plugin_check_arguments($rules)
+
+B<OPTIONAL:> Can be implemented in the I<rule plugin>.
+
+Returns a hash of subsets of rules, which are passed to the plugin's
+C<L<< registered checks|/$class->register_check(...) >>> so that the
+creation of these can be shared inbetween rule check implementations.
+
+=cut
+
+sub get_plugin_check_arguments : prototype($$) {
+ my ($class, $rules) = @_;
+
+ return {};
+}
+
+=head3 $class->get_check_arguments(...)
+
+=head3 $class->get_check_arguments($rules)
+
+Returns the union of the plugin's subsets of rules, which are passed to the
+plugin's C<L<< registered checks|/$class->register_check(...) >>> so that the
+creation of these can be shared inbetween rule check implementations.
+
+=cut
+
+sub get_check_arguments : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $global_args = {};
+
+ for my $type (@$types) {
+ my $plugin = $class->lookup($type);
+ my $plugin_args = eval { $plugin->get_plugin_check_arguments($rules) };
+ next if $@; # plugin doesn't implement get_plugin_check_arguments(...)
+
+ $global_args = { $global_args->%*, $plugin_args->%* };
+ }
+
+ return $global_args;
+}
+
+=head3 $class->check_feasibility($rules)
+
+Checks whether the given C<$rules> are feasible by running all checks, which
+were registered with C<L<< register_check()|/$class->register_check(...) >>>,
+and returns a hash map of errorneous rules.
+
+The checks are run in the order in which the rule plugins were registered,
+while global checks, i.e. checks between different rule types, are run at the
+very last.
+
+=cut
+
+sub check_feasibility : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $global_errors = {};
+ my $removable_ruleids = [];
+
+ my $global_args = $class->get_check_arguments($rules);
+
+ for my $type (@$types, 'global') {
+ for my $entry (@{ $checkdef->{$type} }) {
+ my ($check, $collect_errors) = @$entry;
+
+ my $errors = $check->($global_args);
+ $collect_errors->($errors, $global_errors);
+ }
+ }
+
+ return $global_errors;
+}
+
+=head3 $class->plugin_canonicalize($rules)
+
+B<OPTIONAL:> Can be implemented in the I<rule plugin>.
+
+Modifies the C<$rules> to a plugin-specific canonical form.
+
+=cut
+
+sub plugin_canonicalize : prototype($$) {
+ my ($class, $rules) = @_;
+}
+
+=head3 $class->canonicalize($rules)
+
+Modifies C<$rules> to contain only feasible rules.
+
+This is done by running all checks, which were registered with
+C<L<< register_check()|/$class->register_check(...) >>> and removing any
+rule, which makes the rule set infeasible.
+
+Returns a list of messages with the reasons why rules were removed.
+
+=cut
+
+sub canonicalize : prototype($$) {
+ my ($class, $rules) = @_;
+
+ my $messages = [];
+ my $global_errors = $class->check_feasibility($rules);
+
+ for my $ruleid (keys %$global_errors) {
+ delete $rules->{ids}->{$ruleid};
+ delete $rules->{order}->{$ruleid};
+ }
+
+ for my $ruleid (sort keys %$global_errors) {
+ for my $opt (sort keys %{ $global_errors->{$ruleid} }) {
+ for my $message (@{ $global_errors->{$ruleid}->{$opt} }) {
+ push @$messages, "Drop rule '$ruleid', because $message.\n";
+ }
+ }
+ }
+
+ for my $type (@$types) {
+ my $plugin = $class->lookup($type);
+ eval { $plugin->plugin_canonicalize($rules) };
+ next if $@; # plugin doesn't implement plugin_canonicalize(...)
+ }
+
+ return $messages;
+}
+
+=head1 FUNCTIONS
+
+=cut
+
+=head3 foreach_rule(...)
+
+=head3 foreach_rule($rules, $func [, $opts])
+
+Filters the given C<$rules> according to the C<$opts> and loops over the
+resulting rules in the order as defined in the section config and executes
+C<$func> with the parameters C<L<< ($rule, $ruleid) >>>.
+
+The filter properties for C<$opts> are:
+
+=over
+
+=item C<$type>
+
+Limits C<$rules> to those which are of rule type C<$type>.
+
+=item C<$state>
+
+Limits C<$rules> to those which are in the rule state C<$state>.
+
+=back
+
+=cut
+
+sub foreach_rule : prototype($$;$) {
+ my ($rules, $func, $opts) = @_;
+
+ my $type = $opts->{type};
+ my $state = $opts->{state};
+
+ my @ruleids = sort {
+ $rules->{order}->{$a} <=> $rules->{order}->{$b}
+ } keys %{ $rules->{ids} };
+
+ for my $ruleid (@ruleids) {
+ my $rule = $rules->{ids}->{$ruleid};
+
+ next if !$rule; # skip invalid rules
+ next if defined($type) && $rule->{type} ne $type;
+
+ # rules are enabled by default
+ my $rule_state = defined($rule->{state}) ? $rule->{state} : 'enabled';
+
+ next if defined($state) && $rule_state ne $state;
+
+ $func->($rule, $ruleid);
+ }
+}
+
+1;
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index a01ac38..767659f 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -112,6 +112,15 @@ PVE::JSONSchema::register_standard_option(
},
);
+PVE::JSONSchema::register_standard_option(
+ 'pve-ha-rule-id',
+ {
+ description => "HA rule identifier.",
+ type => 'string',
+ format => 'pve-configid',
+ },
+);
+
sub read_json_from_file {
my ($filename, $default) = @_;
@@ -292,4 +301,17 @@ sub complete_group {
return $res;
}
+sub complete_rule {
+ my ($cmd, $pname, $cur) = @_;
+
+ my $cfg = PVE::HA::Config::read_rules_config();
+
+ my $res = [];
+ foreach my $rule (keys %{ $cfg->{ids} }) {
+ push @$res, $rule;
+ }
+
+ return $res;
+}
+
1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 03/26] introduce rules base plugin
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 03/26] introduce rules base plugin Daniel Kral
@ 2025-07-04 14:18 ` Michael Köppl
0 siblings, 0 replies; 70+ messages in thread
From: Michael Köppl @ 2025-07-04 14:18 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Just noted a couple of typos inline
On 6/20/25 16:31, Daniel Kral wrote:
> Add a rules base plugin to allow users to specify different kinds of HA
> rules in a single configuration file, which put constraints on the HA
> Manager's behavior.
>
> Rule checkers can be registered for every plugin with the
> register_check(...) method and are used for checking the feasibility of
> the rule set with check_feasibility(...), and creating a feasible rule
> set with canonicalize(...).
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes since v1:
> - added documentation
> - added state property to enable/disable rules (and allow a
> tri-state 'contradictory' state for the API endpoint there)
> - replace `{encode,decode}_value` plugin calls to
> `{encode,decode}_plugin_value` with default implementation to make
> these implementations optional for rule plugins
> - introduce `set_rule_defaults` to set optional fields with defaults
> in later patches
> - renamed `are_satisfiable` to `check_feasibility`
> - factored `canonicalize` and `checked_config` to only
> `canonicalize`
> - refactored the feasibility checks to be registered as individual
> checkers with `register_check`
> - refactored the canonicalization for plugins to be a single call to
> plugin_canonicalize (if implemented)
> - added rule auto completion
>
> debian/pve-ha-manager.install | 1 +
> src/PVE/HA/Makefile | 2 +-
> src/PVE/HA/Rules.pm | 445 ++++++++++++++++++++++++++++++++++
> src/PVE/HA/Tools.pm | 22 ++
> 4 files changed, 469 insertions(+), 1 deletion(-)
> create mode 100644 src/PVE/HA/Rules.pm
>
> diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
> index 0ffbd8d..9bbd375 100644
> --- a/debian/pve-ha-manager.install
> +++ b/debian/pve-ha-manager.install
> @@ -32,6 +32,7 @@
> /usr/share/perl5/PVE/HA/Resources.pm
> /usr/share/perl5/PVE/HA/Resources/PVECT.pm
> /usr/share/perl5/PVE/HA/Resources/PVEVM.pm
> +/usr/share/perl5/PVE/HA/Rules.pm
> /usr/share/perl5/PVE/HA/Tools.pm
> /usr/share/perl5/PVE/HA/Usage.pm
> /usr/share/perl5/PVE/HA/Usage/Basic.pm
> diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
> index 8c91b97..489cbc0 100644
> --- a/src/PVE/HA/Makefile
> +++ b/src/PVE/HA/Makefile
> @@ -1,4 +1,4 @@
> -SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
> +SIM_SOURCES=CRM.pm Env.pm Groups.pm Rules.pm Resources.pm LRM.pm Manager.pm \
> NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
>
> SOURCES=${SIM_SOURCES} Config.pm
> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
> new file mode 100644
> index 0000000..e1c84bc
> --- /dev/null
> +++ b/src/PVE/HA/Rules.pm
> @@ -0,0 +1,445 @@
> +package PVE::HA::Rules;
> +
> +use strict;
> +use warnings;
> +
> +use PVE::JSONSchema qw(get_standard_option);
> +use PVE::Tools;
> +
> +use PVE::HA::Tools;
> +
> +use base qw(PVE::SectionConfig);
> +
> +=head1 NAME
> +
> +PVE::HA::Rules - Base Plugin for HA Rules
> +
> +=head1 SYNOPSIS
> +
> + use base qw(PVE::HA::Rules);
> +
> +=head1 DESCRIPTION
> +
> +This package provides the capability to have different types of rules in the
> +same config file, which put constraints or other rules that the HA Manager must
> +or should follow.
> +
> +Since rules can interfere with each other, i.e., rules can make other rules
> +invalid or infeasible, this package also provides the capability to check for
> +the feasibility of the rules within a rule plugin and between different rule
> +plugins and also prune the rule set in such a way, that becomes feasible again.
> +
> +This packages inherits its config-related methods from C<L<PVE::SectionConfig>>
> +and therefore rule plugins need to implement methods from there as well.
> +
> +=head1 USAGE
> +
> +Each I<rule plugin> is required to implement the methods C<L<type()>>,
> +C<L<properties()>>, and C<L<options>> from the C<L<PVE::SectionConfig>> to
> +extend the properties of this I<base plugin> with plugin-specific properties.
> +
> +=head2 REGISTERING CHECKS
> +
> +In order to C<L<< register|/$class->register_check(...) >>> checks for a rule
> +plugin, the plugin can override the
> +C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
> +method, which allows the plugin's checkers to pass plugin-specific subsets of
> +specific rules, which are relevant to the checks.
> +
> +The following example shows a plugin's implementation of its
> +C<L<< get_plugin_check_arguments()|/$class->get_plugin_check_arguments(...) >>>
> +and a trivial check, which will render all rules defining a comment erroneous,
> +and blames these errors on the I<comment> property:
> +
> + sub get_plugin_check_arguments {
> + my ($class, $rules) = @_;
> +
> + my @ruleids = sort {
> + $rules->{order}->{$a} <=> $rules->{order}->{$b}
> + } keys %{$rules->{ids}};
> +
> + my $result = {
> + custom_rules => {},
> + };
> +
> + for my $ruleid (@ruleids) {
> + my $rule = $rules->{ids}->{$ruleid};
> +
> + $result->{custom_rules}->{$ruleid} = $rule if defined($rule->{comment});
> + }
> +
> + return $result;
> + }
> +
> + __PACKAGE__->register_check(
> + sub {
> + my ($args) = @_;
> +
> + return [ sort keys $args->{custom_rules}->%* ];
> + },
> + sub {
> + my ($ruleids, $errors) = @_;
> +
> + for my $ruleid (@$ruleids) {
> + push @{$errors->{$ruleid}->{comment}},
> + "rule is ineffective, because I said so.";
> + }
> + }
> + );
> +
> +=head1 METHODS
> +
> +=cut
> +
> +my $defaultData = {
> + propertyList => {
> + type => {
> + description => "HA rule type.",
> + },
> + rule => get_standard_option(
> + 'pve-ha-rule-id',
> + {
> + completion => \&PVE::HA::Tools::complete_rule,
> + optional => 0,
> + },
> + ),
> + state => {
> + description => "State of the HA rule.",
> + type => 'string',
> + enum => [
> + 'enabled', 'disabled',
> + ],
> + default => 'enabled',
> + optional => 1,
> + },
> + comment => {
> + description => "HA rule description.",
> + type => 'string',
> + maxLength => 4096,
> + optional => 1,
> + },
> + },
> +};
> +
> +sub private {
> + return $defaultData;
> +}
> +
> +=head3 $class->decode_plugin_value(...)
> +
> +=head3 $class->decode_plugin_value($type, $key, $value)
> +
> +B<OPTIONAL:> Can be implemented in a I<rule plugin>.
> +
> +Called during base plugin's C<decode_value()> in order to extend the
> +deserialization for plugin-specific values which need it (e.g. lists).
> +
> +If it is not overrridden by the I<rule plugin>, then it does nothing to
> +C<$value> by default.
> +
> +=cut
> +
> +sub decode_plugin_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + return $value;
> +}
> +
> +sub decode_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + if ($key eq 'comment') {
> + return PVE::Tools::decode_text($value);
> + }
> +
> + my $plugin = $class->lookup($type);
> + return $plugin->decode_plugin_value($type, $key, $value);
> +}
> +
> +=head3 $class->encode_plugin_value(...)
> +
> +=head3 $class->encode_plugin_value($type, $key, $value)
> +
> +B<OPTIONAL:> Can be implemented in a I<rule plugin>.
> +
> +Called during base plugin's C<encode_value()> in order to extend the
> +serialization for plugin-specific values which need it (e.g. lists).
> +
> +If it is not overrridden by the I<rule plugin>, then it does nothing to
nit: there's an extra r
> +C<$value> by default.
> +
> +=cut
> +
> +sub encode_plugin_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + return $value;
> +}
> +
> +sub encode_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + if ($key eq 'comment') {
> + return PVE::Tools::encode_text($value);
> + }
> +
> + my $plugin = $class->lookup($type);
> + return $plugin->encode_plugin_value($type, $key, $value);
> +}
> +
> +sub parse_section_header {
> + my ($class, $line) = @_;
> +
> + if ($line =~ m/^(\S+):\s*(\S+)\s*$/) {
> + my ($type, $ruleid) = (lc($1), $2);
> + my $errmsg = undef; # set if you want to skip whole section
> + eval { PVE::JSONSchema::pve_verify_configid($ruleid); };
> + $errmsg = $@ if $@;
> + my $config = {}; # to return additional attributes
> + return ($type, $ruleid, $errmsg, $config);
> + }
> + return undef;
> +}
> +
> +# General rule helpers
> +
> +=head3 $class->set_rule_defaults($rule)
> +
> +Sets the optional properties in the C<$rule>, which have default values, but
> +haven't been explicitly set yet.
> +
> +=cut
> +
> +sub set_rule_defaults : prototype($$) {
> + my ($class, $rule) = @_;
> +
> + $rule->{state} = 'enabled' if !defined($rule->{state});
> +
> + if (my $plugin = $class->lookup($rule->{type})) {
> + my $properties = $plugin->properties();
> +
> + for my $prop (keys %$properties) {
> + next if defined($rule->{$prop});
> + next if !$properties->{$prop}->{default};
> + next if !$properties->{$prop}->{optional};
> +
> + $rule->{$prop} = $properties->{$prop}->{default};
> + }
> + }
> +}
> +
> +# Rule checks definition and methods
> +
> +my $types = [];
> +my $checkdef;
> +
> +sub register {
> + my ($class) = @_;
> +
> + $class->SUPER::register($class);
> +
> + # store order in which plugin types are registered
> + push @$types, $class->type();
> +}
> +
> +=head3 $class->register_check(...)
> +
> +=head3 $class->register_check($check_func, $collect_errors_func)
> +
> +Used to register rule checks for a rule plugin.
> +
> +=cut
> +
> +sub register_check : prototype($$$) {
> + my ($class, $check_func, $collect_errors_func) = @_;
> +
> + my $type = eval { $class->type() };
> + $type = 'global' if $@; # check registered here in the base plugin
> +
> + push @{ $checkdef->{$type} }, [
> + $check_func, $collect_errors_func,
> + ];
> +}
> +
> +=head3 $class->get_plugin_check_arguments(...)
> +
> +=head3 $class->get_plugin_check_arguments($rules)
> +
> +B<OPTIONAL:> Can be implemented in the I<rule plugin>.
> +
> +Returns a hash of subsets of rules, which are passed to the plugin's
> +C<L<< registered checks|/$class->register_check(...) >>> so that the
> +creation of these can be shared inbetween rule check implementations.
> +
> +=cut
> +
> +sub get_plugin_check_arguments : prototype($$) {
> + my ($class, $rules) = @_;
> +
> + return {};
> +}
> +
> +=head3 $class->get_check_arguments(...)
> +
> +=head3 $class->get_check_arguments($rules)
> +
> +Returns the union of the plugin's subsets of rules, which are passed to the
> +plugin's C<L<< registered checks|/$class->register_check(...) >>> so that the
> +creation of these can be shared inbetween rule check implementations.
> +
> +=cut
> +
> +sub get_check_arguments : prototype($$) {
> + my ($class, $rules) = @_;
> +
> + my $global_args = {};
> +
> + for my $type (@$types) {
> + my $plugin = $class->lookup($type);
> + my $plugin_args = eval { $plugin->get_plugin_check_arguments($rules) };
> + next if $@; # plugin doesn't implement get_plugin_check_arguments(...)
> +
> + $global_args = { $global_args->%*, $plugin_args->%* };
> + }
> +
> + return $global_args;
> +}
> +
> +=head3 $class->check_feasibility($rules)
> +
> +Checks whether the given C<$rules> are feasible by running all checks, which
> +were registered with C<L<< register_check()|/$class->register_check(...) >>>,
> +and returns a hash map of errorneous rules.
nit: s/errorneous/erroneous
> +
> +The checks are run in the order in which the rule plugins were registered,
> +while global checks, i.e. checks between different rule types, are run at the
> +very last.
> +
> +=cut
> +
> +sub check_feasibility : prototype($$) {
> + my ($class, $rules) = @_;
> +
> + my $global_errors = {};
> + my $removable_ruleids = [];
> +
> + my $global_args = $class->get_check_arguments($rules);
> +
> + for my $type (@$types, 'global') {
> + for my $entry (@{ $checkdef->{$type} }) {
> + my ($check, $collect_errors) = @$entry;
> +
> + my $errors = $check->($global_args);
> + $collect_errors->($errors, $global_errors);
> + }
> + }
> +
> + return $global_errors;
> +}
> +
> +=head3 $class->plugin_canonicalize($rules)
> +
> +B<OPTIONAL:> Can be implemented in the I<rule plugin>.
> +
> +Modifies the C<$rules> to a plugin-specific canonical form.
> +
> +=cut
> +
> +sub plugin_canonicalize : prototype($$) {
> + my ($class, $rules) = @_;
> +}
> +
> +=head3 $class->canonicalize($rules)
> +
> +Modifies C<$rules> to contain only feasible rules.
> +
> +This is done by running all checks, which were registered with
> +C<L<< register_check()|/$class->register_check(...) >>> and removing any
> +rule, which makes the rule set infeasible.
> +
> +Returns a list of messages with the reasons why rules were removed.
> +
> +=cut
> +
> +sub canonicalize : prototype($$) {
> + my ($class, $rules) = @_;
> +
> + my $messages = [];
> + my $global_errors = $class->check_feasibility($rules);
> +
> + for my $ruleid (keys %$global_errors) {
> + delete $rules->{ids}->{$ruleid};
> + delete $rules->{order}->{$ruleid};
> + }
> +
> + for my $ruleid (sort keys %$global_errors) {
> + for my $opt (sort keys %{ $global_errors->{$ruleid} }) {
> + for my $message (@{ $global_errors->{$ruleid}->{$opt} }) {
> + push @$messages, "Drop rule '$ruleid', because $message.\n";
> + }
> + }
> + }
> +
> + for my $type (@$types) {
> + my $plugin = $class->lookup($type);
> + eval { $plugin->plugin_canonicalize($rules) };
> + next if $@; # plugin doesn't implement plugin_canonicalize(...)
> + }
> +
> + return $messages;
> +}
> +
> +=head1 FUNCTIONS
> +
> +=cut
> +
> +=head3 foreach_rule(...)
> +
> +=head3 foreach_rule($rules, $func [, $opts])
> +
> +Filters the given C<$rules> according to the C<$opts> and loops over the
> +resulting rules in the order as defined in the section config and executes
> +C<$func> with the parameters C<L<< ($rule, $ruleid) >>>.
> +
> +The filter properties for C<$opts> are:
> +
> +=over
> +
> +=item C<$type>
> +
> +Limits C<$rules> to those which are of rule type C<$type>.
> +
> +=item C<$state>
> +
> +Limits C<$rules> to those which are in the rule state C<$state>.
> +
> +=back
> +
> +=cut
> +
> +sub foreach_rule : prototype($$;$) {
> + my ($rules, $func, $opts) = @_;
> +
> + my $type = $opts->{type};
> + my $state = $opts->{state};
> +
> + my @ruleids = sort {
> + $rules->{order}->{$a} <=> $rules->{order}->{$b}
> + } keys %{ $rules->{ids} };
> +
> + for my $ruleid (@ruleids) {
> + my $rule = $rules->{ids}->{$ruleid};
> +
> + next if !$rule; # skip invalid rules
> + next if defined($type) && $rule->{type} ne $type;
> +
> + # rules are enabled by default
> + my $rule_state = defined($rule->{state}) ? $rule->{state} : 'enabled';
> +
> + next if defined($state) && $rule_state ne $state;
> +
> + $func->($rule, $ruleid);
> + }
> +}
> +
> +1;
> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
> index a01ac38..767659f 100644
> --- a/src/PVE/HA/Tools.pm
> +++ b/src/PVE/HA/Tools.pm
> @@ -112,6 +112,15 @@ PVE::JSONSchema::register_standard_option(
> },
> );
>
> +PVE::JSONSchema::register_standard_option(
> + 'pve-ha-rule-id',
> + {
> + description => "HA rule identifier.",
> + type => 'string',
> + format => 'pve-configid',
> + },
> +);
> +
> sub read_json_from_file {
> my ($filename, $default) = @_;
>
> @@ -292,4 +301,17 @@ sub complete_group {
> return $res;
> }
>
> +sub complete_rule {
> + my ($cmd, $pname, $cur) = @_;
> +
> + my $cfg = PVE::HA::Config::read_rules_config();
> +
> + my $res = [];
> + foreach my $rule (keys %{ $cfg->{ids} }) {
> + push @$res, $rule;
> + }
> +
> + return $res;
> +}
> +
> 1;
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (6 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 03/26] introduce rules base plugin Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 16:17 ` Jillian Morgan
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 05/26] rules: introduce colocation " Daniel Kral
` (34 subsequent siblings)
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add the location rule plugin to allow users to specify node affinity
constraints for independent services.
Location rules must specify one or more services, one or more node with
optional priorities (the default is 0), and a strictness, which is
either
* 0 (loose): services MUST be located on one of the rules' nodes, or
* 1 (strict): services SHOULD be located on one of the rules' nodes.
The initial implementation restricts location rules to only specify a
single service once in all location rules, else these location rules
will not be applied.
This makes location rules structurally equivalent to HA groups with the
exception of the "failback" option, which will be moved to the service
config in an upcoming patch.
The services property is added to the rules base plugin as it will also
be used by the colocation rule plugin in the next patch.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 1 +
src/PVE/HA/Rules.pm | 31 ++++-
src/PVE/HA/Rules/Location.pm | 206 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Rules/Makefile | 6 +
src/PVE/HA/Tools.pm | 24 ++++
6 files changed, 267 insertions(+), 2 deletions(-)
create mode 100644 src/PVE/HA/Rules/Location.pm
create mode 100644 src/PVE/HA/Rules/Makefile
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 9bbd375..2835492 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,6 +33,7 @@
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
/usr/share/perl5/PVE/HA/Rules.pm
+/usr/share/perl5/PVE/HA/Rules/Location.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 489cbc0..e386cbf 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -8,6 +8,7 @@ install:
install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
make -C Resources install
+ make -C Rules install
make -C Usage install
make -C Env install
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
index e1c84bc..4134283 100644
--- a/src/PVE/HA/Rules.pm
+++ b/src/PVE/HA/Rules.pm
@@ -112,6 +112,13 @@ my $defaultData = {
default => 'enabled',
optional => 1,
},
+ services => get_standard_option(
+ 'pve-ha-resource-id-list',
+ {
+ completion => \&PVE::HA::Tools::complete_sid,
+ optional => 0,
+ },
+ ),
comment => {
description => "HA rule description.",
type => 'string',
@@ -148,7 +155,17 @@ sub decode_plugin_value {
sub decode_value {
my ($class, $type, $key, $value) = @_;
- if ($key eq 'comment') {
+ if ($key eq 'services') {
+ my $res = {};
+
+ for my $service (PVE::Tools::split_list($value)) {
+ if (PVE::HA::Tools::pve_verify_ha_resource_id($service)) {
+ $res->{$service} = 1;
+ }
+ }
+
+ return $res;
+ } elsif ($key eq 'comment') {
return PVE::Tools::decode_text($value);
}
@@ -179,7 +196,11 @@ sub encode_plugin_value {
sub encode_value {
my ($class, $type, $key, $value) = @_;
- if ($key eq 'comment') {
+ if ($key eq 'services') {
+ PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
+
+ return join(',', sort keys %$value);
+ } elsif ($key eq 'comment') {
return PVE::Tools::encode_text($value);
}
@@ -405,6 +426,10 @@ The filter properties for C<$opts> are:
=over
+=item C<$sid>
+
+Limits C<$rules> to those which contain the given service C<$sid>.
+
=item C<$type>
Limits C<$rules> to those which are of rule type C<$type>.
@@ -420,6 +445,7 @@ Limits C<$rules> to those which are in the rule state C<$state>.
sub foreach_rule : prototype($$;$) {
my ($rules, $func, $opts) = @_;
+ my $sid = $opts->{sid};
my $type = $opts->{type};
my $state = $opts->{state};
@@ -432,6 +458,7 @@ sub foreach_rule : prototype($$;$) {
next if !$rule; # skip invalid rules
next if defined($type) && $rule->{type} ne $type;
+ next if defined($sid) && !defined($rule->{services}->{$sid});
# rules are enabled by default
my $rule_state = defined($rule->{state}) ? $rule->{state} : 'enabled';
diff --git a/src/PVE/HA/Rules/Location.pm b/src/PVE/HA/Rules/Location.pm
new file mode 100644
index 0000000..67f0b32
--- /dev/null
+++ b/src/PVE/HA/Rules/Location.pm
@@ -0,0 +1,206 @@
+package PVE::HA::Rules::Location;
+
+use strict;
+use warnings;
+
+use Storable qw(dclone);
+
+use PVE::Cluster;
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::Tools;
+
+use PVE::HA::Rules;
+use PVE::HA::Tools;
+
+use base qw(PVE::HA::Rules);
+
+=head1 NAME
+
+PVE::HA::Rules::Location
+
+=head1 DESCRIPTION
+
+This package provides the capability to specify and apply location rules, which
+put affinity constraints between a set of HA services and a set of nodes.
+
+Location rules can be either C<'loose'> or C<'strict'>:
+
+=over
+
+=item C<'loose'>
+
+Loose location rules SHOULD be applied if possible, i.e., HA services SHOULD
+prefer to be on the defined nodes, but may fall back to other non-defined nodes,
+if none of the defined nodes are available.
+
+=item C<'strict'>
+
+Strict location rules MUST be applied, i.e., HA services MUST prefer to be on
+the defined nodes. In other words, these HA services are restricted to the
+defined nodes and may not run on any other non-defined node.
+
+=back
+
+=cut
+
+sub type {
+ return 'location';
+}
+
+sub properties {
+ return {
+ nodes => get_standard_option(
+ 'pve-ha-group-node-list',
+ {
+ completion => \&PVE::Cluster::get_nodelist,
+ optional => 0,
+ },
+ ),
+ strict => {
+ description => "Describes whether the location rule is mandatory or optional.",
+ verbose_description =>
+ "Describes whether the location rule is mandatory or optional."
+ . "\nA mandatory location rule makes services be restricted to the defined nodes."
+ . " If none of the nodes are available, the service will be stopped."
+ . "\nAn optional location rule makes services prefer to be on the defined nodes."
+ . " If none of the nodes are available, the service may run on any other node.",
+ type => 'boolean',
+ optional => 1,
+ default => 0,
+ },
+ };
+}
+
+sub options {
+ return {
+ services => { optional => 0 },
+ nodes => { optional => 0 },
+ strict => { optional => 1 },
+ state => { optional => 1 },
+ comment => { optional => 1 },
+ };
+}
+
+sub decode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'nodes') {
+ my $res = {};
+
+ for my $node (PVE::Tools::split_list($value)) {
+ if (my ($node, $priority) = PVE::HA::Tools::parse_node_priority($node, 1)) {
+ $res->{$node} = {
+ priority => $priority,
+ };
+ }
+ }
+
+ return $res;
+ }
+
+ return $value;
+}
+
+sub encode_plugin_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'nodes') {
+ my $res = [];
+
+ for my $node (sort keys %$value) {
+ my $priority = $value->{$node}->{priority};
+
+ if ($priority) {
+ push @$res, "$node:$priority";
+ } else {
+ push @$res, "$node";
+ }
+ }
+
+ return join(',', @$res);
+ }
+
+ return $value;
+}
+
+sub get_plugin_check_arguments {
+ my ($self, $rules) = @_;
+
+ my $result = {
+ location_rules => {},
+ };
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ $result->{location_rules}->{$ruleid} = $rule;
+ },
+ {
+ type => 'location',
+ state => 'enabled',
+ },
+ );
+
+ return $result;
+}
+
+=head1 LOCATION RULE CHECKERS
+
+=cut
+
+=head3 check_singular_service_location_rule($location_rules)
+
+Returns all location rules defined in C<$location_rules> as a list of lists,
+each consisting of the location rule id and the service id, where at least
+one service is shared between them.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_singular_service_location_rule {
+ my ($location_rules) = @_;
+
+ my @conflicts = ();
+ my $located_services = {};
+
+ while (my ($ruleid, $rule) = each %$location_rules) {
+ for my $sid (keys %{ $rule->{services} }) {
+ push @{ $located_services->{$sid} }, $ruleid;
+ }
+ }
+
+ for my $sid (keys %$located_services) {
+ my $ruleids = $located_services->{$sid};
+
+ next if @$ruleids < 2;
+ for my $ruleid (@$ruleids) {
+ push @conflicts, [$ruleid, $sid];
+ }
+ }
+
+ @conflicts = sort { $a->[0] cmp $b->[0] } @conflicts;
+ return \@conflicts;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_singular_service_location_rule($args->{location_rules});
+ },
+ sub {
+ my ($conflicts, $errors) = @_;
+
+ for my $conflict (@$conflicts) {
+ my ($ruleid, $sid) = @$conflict;
+
+ push @{ $errors->{$ruleid}->{services} },
+ "service '$sid' is already used in another location rule";
+ }
+ },
+);
+
+1;
diff --git a/src/PVE/HA/Rules/Makefile b/src/PVE/HA/Rules/Makefile
new file mode 100644
index 0000000..e5cf737
--- /dev/null
+++ b/src/PVE/HA/Rules/Makefile
@@ -0,0 +1,6 @@
+SOURCES=Location.pm
+
+.PHONY: install
+install:
+ install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Rules
+ for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Rules/$$i; done
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index 767659f..549cbe1 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -51,6 +51,18 @@ PVE::JSONSchema::register_standard_option(
},
);
+PVE::JSONSchema::register_standard_option(
+ 'pve-ha-resource-id-list',
+ {
+ description =>
+ "List of HA resource IDs. This consists of a list of resource types followed"
+ . " by a resource specific name separated with a colon (example: vm:100,ct:101).",
+ typetext => "<type>:<name>{,<type>:<name>}*",
+ type => 'string',
+ format => 'pve-ha-resource-id-list',
+ },
+);
+
PVE::JSONSchema::register_format('pve-ha-resource-or-vm-id', \&pve_verify_ha_resource_or_vm_id);
sub pve_verify_ha_resource_or_vm_id {
@@ -103,6 +115,18 @@ PVE::JSONSchema::register_standard_option(
},
);
+sub parse_node_priority {
+ my ($value, $noerr) = @_;
+
+ if ($value =~ m/^([a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?)(:(\d+))?$/) {
+ # node without priority set defaults to priority 0
+ return ($1, int($4 // 0));
+ }
+
+ return undef if $noerr;
+ die "unable to parse HA node entry '$value'\n";
+}
+
PVE::JSONSchema::register_standard_option(
'pve-ha-group-id',
{
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin Daniel Kral
@ 2025-06-20 16:17 ` Jillian Morgan
2025-06-20 16:30 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Jillian Morgan @ 2025-06-20 16:17 UTC (permalink / raw)
To: Proxmox VE development discussion
On Fri, Jun 20, 2025 at 10:32 AM Daniel Kral <d.kral@proxmox.com> wrote:
> Add the location rule plugin to allow users to specify node affinity
> constraints for independent services.
>
> Location rules must specify one or more services, one or more node with
> optional priorities (the default is 0), and a strictness, which is
> either
>
> * 0 (loose): services MUST be located on one of the rules' nodes, or
> * 1 (strict): services SHOULD be located on one of the rules' nodes.
>
Shouldn't these be the other way around (0/"loose" = SHOULD, and 1/"strict"
= MUST)? The code would seem to bear that out, so it's only your
description here that was backwards, but still..
--
Jillian Morgan (she/her)
Systems & Networking Specialist
Primordial Software Group & I.T. Consultancy
https://www.primordial.ca
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin
2025-06-20 16:17 ` Jillian Morgan
@ 2025-06-20 16:30 ` Daniel Kral
0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 16:30 UTC (permalink / raw)
To: Proxmox VE development discussion, Jillian Morgan
On 6/20/25 18:17, Jillian Morgan wrote:
> On Fri, Jun 20, 2025 at 10:32 AM Daniel Kral <d.kral@proxmox.com> wrote:
>
>> Add the location rule plugin to allow users to specify node affinity
>> constraints for independent services.
>>
>> Location rules must specify one or more services, one or more node with
>> optional priorities (the default is 0), and a strictness, which is
>> either
>>
>> * 0 (loose): services MUST be located on one of the rules' nodes, or
>> * 1 (strict): services SHOULD be located on one of the rules' nodes.
>>
>
> Shouldn't these be the other way around (0/"loose" = SHOULD, and 1/"strict"
> = MUST)? The code would seem to bear that out, so it's only your
> description here that was backwards, but still..
>
Yes, thanks for pointing that out, it should be the other way around,
will fix that for the follow-up. Otherwise, it is implemented as
0/"loose" = SHOULD, and 1/"strict" = MUST of course, as documented in
the module documentation in the patch below.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 05/26] rules: introduce colocation rule plugin
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (7 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules Daniel Kral
` (33 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add the colocation rule plugin to allow users to specify inter-service
affinity constraints. Colocation rules must specify two or more services
and a colocation affinity. The inter-service affinity of colocation
rules must be either
* together (positive): keeping services together, or
* separate (negative): keeping service separate;
The initial implementation restricts colocation rules to need at least
two specified services, as they are ineffective else, and disallows that
the same two or more services are specified in both a positive and a
negative colocation rule, as that is an infeasible rule set.
Furthermore, positive colocation rules whose service sets overlap are
handled as a single positive colocation rule to make it easier to
retrieve the positively colocated services of a service in later
patches.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- added documentation
- dropped non-strict colocations
- replaced `foreach_colocation_rule` with `foreach_rule`
- removed `split_colocation_rules` and moved that logic mainly in
`get_plugin_check_arguments(...)`
- use new `register_check(...)` helper instead of implementing
`check_feasibility(...)` here
- renamed `check_service_count` to `check_colocation_service_count`
- replaced `check_positive_transitivity(...)` with
`$find_disjoint_colocation_rules` and
`merge_connected_positive_colocation_rules` canonicalization
helpers (which is more accurate to find mergeable positive
colocation rules and is now also not called during the feasibility
check but only when calling canonicalize)
- renamed `check_inner_consistency` to
`check_inter_colocation_consistency`
- renamed some variables in checkers to make them clearer
- move `check_{positive,negative}_group_consistency` checks to next
patch which introduces the global checks to the base plugin
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Rules/Colocation.pm | 287 +++++++++++++++++++++++++++++++++
src/PVE/HA/Rules/Makefile | 2 +-
3 files changed, 289 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Rules/Colocation.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 2835492..e83c0de 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,6 +33,7 @@
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
/usr/share/perl5/PVE/HA/Rules.pm
+/usr/share/perl5/PVE/HA/Rules/Colocation.pm
/usr/share/perl5/PVE/HA/Rules/Location.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
new file mode 100644
index 0000000..0539eb3
--- /dev/null
+++ b/src/PVE/HA/Rules/Colocation.pm
@@ -0,0 +1,287 @@
+package PVE::HA::Rules::Colocation;
+
+use strict;
+use warnings;
+
+use PVE::HashTools;
+
+use PVE::HA::Rules;
+
+use base qw(PVE::HA::Rules);
+
+=head1 NAME
+
+PVE::HA::Rules::Colocation - Colocation Plugin for HA Rules
+
+=head1 DESCRIPTION
+
+This package provides the capability to specify and apply colocation rules,
+which put affinity constraints between the HA services. A colocation rule has
+one of the two types: positive (C<'together'>) or negative (C<'separate'>).
+
+Positive colocations specify that HA services need to be kept together, while
+negative colocations specify that HA services need to be kept separate.
+
+Colocation rules MUST be applied. That is, if a HA service cannot comply with
+the colocation rule, it is put in recovery or other error-like states, if there
+is no other way to recover them.
+
+=cut
+
+sub type {
+ return 'colocation';
+}
+
+sub properties {
+ return {
+ affinity => {
+ description => "Describes whether the services are supposed to be kept on separate"
+ . " nodes, or are supposed to be kept together on the same node.",
+ type => 'string',
+ enum => ['separate', 'together'],
+ optional => 0,
+ },
+ };
+}
+
+sub options {
+ return {
+ services => { optional => 0 },
+ affinity => { optional => 0 },
+ state => { optional => 1 },
+ comment => { optional => 1 },
+ };
+}
+
+sub get_plugin_check_arguments {
+ my ($self, $rules) = @_;
+
+ my $result = {
+ colocation_rules => {},
+ positive_rules => {},
+ negative_rules => {},
+ };
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ $result->{colocation_rules}->{$ruleid} = $rule;
+
+ $result->{positive_rules}->{$ruleid} = $rule if $rule->{affinity} eq 'together';
+ $result->{negative_rules}->{$ruleid} = $rule if $rule->{affinity} eq 'separate';
+ },
+ {
+ type => 'colocation',
+ state => 'enabled',
+ },
+ );
+
+ return $result;
+}
+
+=head1 COLOCATION RULE CHECKERS
+
+=cut
+
+=head3 check_colocation_services_count($colocation_rules)
+
+Returns a list of colocation rule ids defined in C<$colocation_rules>, which do
+not have enough services defined to be effective colocation rules.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_colocation_services_count {
+ my ($colocation_rules) = @_;
+
+ my @conflicts = ();
+
+ while (my ($ruleid, $rule) = each %$colocation_rules) {
+ push @conflicts, $ruleid if keys %{ $rule->{services} } < 2;
+ }
+
+ @conflicts = sort @conflicts;
+ return \@conflicts;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_colocation_services_count($args->{colocation_rules});
+ },
+ sub {
+ my ($ruleids, $errors) = @_;
+
+ for my $ruleid (@$ruleids) {
+ push @{ $errors->{$ruleid}->{services} },
+ "rule is ineffective as there are less than two services";
+ }
+ },
+);
+
+=head3 check_inter_colocation_consistency($positive_rules, $negative_rules)
+
+Returns a list of lists consisting of a positive colocation rule defined in
+C<$positive_rules> and a negative colocation rule id C<$negative_rules>, which
+share at least the same two services among them. This is an impossible
+constraint as the same services cannot be kept together on the same node and
+kept separate on different nodes at the same time.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_inter_colocation_consistency {
+ my ($positive_rules, $negative_rules) = @_;
+
+ my @conflicts = ();
+
+ while (my ($positiveid, $positive) = each %$positive_rules) {
+ my $positive_services = $positive->{services};
+
+ while (my ($negativeid, $negative) = each %$negative_rules) {
+ my $common_services =
+ PVE::HashTools::set_intersect($positive_services, $negative->{services});
+ next if %$common_services < 2;
+
+ push @conflicts, [$positiveid, $negativeid];
+ }
+ }
+
+ @conflicts = sort { $a->[0] cmp $b->[0] || $a->[1] cmp $b->[1] } @conflicts;
+ return \@conflicts;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_inter_colocation_consistency(
+ $args->{positive_rules}, $args->{negative_rules},
+ );
+ },
+ sub {
+ my ($conflicts, $errors) = @_;
+
+ for my $conflict (@$conflicts) {
+ my ($positiveid, $negativeid) = @$conflict;
+
+ push @{ $errors->{$positiveid}->{services} },
+ "rule shares two or more services with '$negativeid'";
+ push @{ $errors->{$negativeid}->{services} },
+ "rule shares two or more services with '$positiveid'";
+ }
+ },
+);
+
+=head1 COLOCATION RULE CANONICALIZATION HELPERS
+
+=cut
+
+my $sort_by_lowest_service_id = sub {
+ my ($rules) = @_;
+
+ my $lowest_rule_service_id = {};
+ for my $ruleid (keys %$rules) {
+ my @rule_services = sort keys $rules->{$ruleid}->{services}->%*;
+ $lowest_rule_service_id->{$ruleid} = $rule_services[0];
+ }
+
+ # sort rules such that rules with the lowest numbered service come first
+ my @sorted_ruleids = sort {
+ $lowest_rule_service_id->{$a} cmp $lowest_rule_service_id->{$b}
+ } keys %$rules;
+
+ return @sorted_ruleids;
+};
+
+# returns a list of hashes, which contain disjoint colocation rules, i.e.,
+# put colocation constraints on disjoint sets of services
+my $find_disjoint_colocation_rules = sub {
+ my ($rules) = @_;
+
+ my @disjoint_rules = ();
+
+ # order needed so that it is easier to check whether there is an overlap
+ my @sorted_ruleids = $sort_by_lowest_service_id->($rules);
+
+ for my $ruleid (@sorted_ruleids) {
+ my $rule = $rules->{$ruleid};
+
+ my $found = 0;
+ for my $entry (@disjoint_rules) {
+ next if PVE::HashTools::sets_are_disjoint($rule->{services}, $entry->{services});
+
+ $found = 1;
+ push @{ $entry->{ruleids} }, $ruleid;
+ $entry->{services}->{$_} = 1 for keys $rule->{services}->%*;
+
+ last;
+ }
+ if (!$found) {
+ push @disjoint_rules,
+ {
+ ruleids => [$ruleid],
+ services => { $rule->{services}->%* },
+ };
+ }
+ }
+
+ return @disjoint_rules;
+};
+
+=head3 merge_connected_positive_colocation_rules($rules, $positive_rules)
+
+Modifies C<$rules> to contain only disjoint positive colocation rules among the
+ones defined in C<$positive_rules>, i.e., all positive colocation rules put
+positive colocation constraints on disjoint sets of services.
+
+If two or more positive colocation rules have overlapping service sets, then
+these will be removed from C<$rules> and a new positive colocation rule, where
+the rule id is the dashed concatenation of the rule ids (e.g. C<'$rule1-$rule2'>),
+is inserted in C<$rules>.
+
+This makes it cheaper to find the positively colocated services of a service in
+C<$rules> at a later point in time.
+
+=cut
+
+sub merge_connected_positive_colocation_rules {
+ my ($rules, $positive_rules) = @_;
+
+ my @disjoint_positive_rules = $find_disjoint_colocation_rules->($positive_rules);
+
+ for my $entry (@disjoint_positive_rules) {
+ next if @{ $entry->{ruleids} } < 2;
+
+ my $new_ruleid = join('-', @{ $entry->{ruleids} });
+ my $first_ruleid = @{ $entry->{ruleids} }[0];
+
+ $rules->{ids}->{$new_ruleid} = {
+ type => 'colocation',
+ affinity => 'together',
+ services => $entry->{services},
+ state => 'enabled',
+ };
+ $rules->{order}->{$new_ruleid} = $rules->{order}->{$first_ruleid};
+
+ for my $ruleid (@{ $entry->{ruleids} }) {
+ delete $rules->{ids}->{$ruleid};
+ delete $rules->{order}->{$ruleid};
+ }
+ }
+}
+
+sub plugin_canonicalize {
+ my ($class, $rules) = @_;
+
+ my $args = $class->get_plugin_check_arguments($rules);
+
+ merge_connected_positive_colocation_rules($rules, $args->{positive_rules});
+}
+
+1;
diff --git a/src/PVE/HA/Rules/Makefile b/src/PVE/HA/Rules/Makefile
index e5cf737..e08fd94 100644
--- a/src/PVE/HA/Rules/Makefile
+++ b/src/PVE/HA/Rules/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Location.pm
+SOURCES=Colocation.pm Location.pm
.PHONY: install
install:
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (8 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 05/26] rules: introduce colocation " Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-07-01 11:02 ` Daniel Kral
2025-07-04 14:43 ` Michael Köppl
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 07/26] config, env, hw: add rules read and parse methods Daniel Kral
` (32 subsequent siblings)
42 siblings, 2 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add checks, which determine infeasible colocation rules, because their
services are already restricted by their location rules in such a way,
that these cannot be satisfied or are reasonable to be proven to be
satisfiable.
Positive colocation rule services need to have at least one common node
to be feasible and negative colocation rule services need to have at
least the amount of nodes available that nodes are restricted to in
total, i.e. services that are in strict location rules.
Since location rules allow nodes to be put in priority groups, but the
information which priority group is relevant depends on the online
nodes, these checks currently prohibit colocation rules with services,
which make use of these kinds of location rules.
Even though location rules are restricted to only allow a service to be
used in a single location rule, the checks here still go over all
location rules, as this restriction is bound to be changed in the
future.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- moved global checks from Colocation plugin to base plugin
- add check to only allow colocation rules for services which are in
single-priority location rules / ha groups because these are quite
stateful and cannot be easily verified to be possible
src/PVE/HA/Rules.pm | 189 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 189 insertions(+)
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
index 4134283..588e53b 100644
--- a/src/PVE/HA/Rules.pm
+++ b/src/PVE/HA/Rules.pm
@@ -3,6 +3,7 @@ package PVE::HA::Rules;
use strict;
use warnings;
+use PVE::HashTools;
use PVE::JSONSchema qw(get_standard_option);
use PVE::Tools;
@@ -469,4 +470,192 @@ sub foreach_rule : prototype($$;$) {
}
}
+=head1 INTER-PLUGIN RULE CHECKERS
+
+=cut
+
+=head3 check_single_priority_location_for_colocated_services($location_rules, $colocation_rules)
+
+Returns a list of colocation rule ids defined in C<$colocation_rules>, where
+the services in the colocation rule are in location rules, defined in
+C<$location_rules>, which have multiple priority groups defined. That is, the
+colocation rule cannot be statically checked to be feasible as the selection of
+the priority group is dependent on the currently online nodes.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_single_priority_location_for_colocated_services {
+ my ($colocation_rules, $location_rules) = @_;
+
+ my @errors = ();
+
+ while (my ($colocationid, $colocation_rule) = each %$colocation_rules) {
+ my $priority;
+ my $services = $colocation_rule->{services};
+
+ for my $locationid (keys %$location_rules) {
+ my $location_rule = $location_rules->{$locationid};
+
+ next if PVE::HashTools::sets_are_disjoint($services, $location_rule->{services});
+
+ for my $node (values %{ $location_rule->{nodes} }) {
+ $priority = $node->{priority} if !defined($priority);
+
+ if ($priority != $node->{priority}) {
+ push @errors, $colocationid;
+ last; # early return to check next colocation rule
+ }
+ }
+ }
+ }
+
+ @errors = sort @errors;
+ return \@errors;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_single_priority_location_for_colocated_services(
+ $args->{colocation_rules},
+ $args->{location_rules},
+ );
+ },
+ sub {
+ my ($ruleids, $errors) = @_;
+
+ for my $ruleid (@$ruleids) {
+ push @{ $errors->{$ruleid}->{services} },
+ "services are in location rules with multiple priorities";
+ }
+ },
+);
+
+=head3 check_positive_colocation_location_consistency($positive_rules, $location_rules)
+
+Returns a list of positive colocation rule ids defined in C<$positive_rules>,
+where the services in the positive colocation rule are restricted to a disjoint
+set of nodes by their location rules, defined in C<$location_rules>. That is,
+the positive colocation rule cannot be fullfilled as the services cannot be
+placed on the same node.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_positive_colocation_location_consistency {
+ my ($positive_rules, $location_rules) = @_;
+
+ my @errors = ();
+
+ while (my ($positiveid, $positive_rule) = each %$positive_rules) {
+ my $allowed_nodes;
+ my $services = $positive_rule->{services};
+
+ for my $locationid (keys %$location_rules) {
+ my $location_rule = $location_rules->{$locationid};
+
+ next if !$location_rule->{strict};
+ next if PVE::HashTools::sets_are_disjoint($services, $location_rule->{services});
+
+ $allowed_nodes = { $location_rule->{nodes}->%* } if !defined($allowed_nodes);
+ $allowed_nodes = PVE::HashTools::set_intersect($allowed_nodes, $location_rule->{nodes});
+
+ if (keys %$allowed_nodes < 1) {
+ push @errors, $positiveid;
+ last; # early return to check next positive colocation rule
+ }
+ }
+ }
+
+ @errors = sort @errors;
+ return \@errors;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_positive_colocation_location_consistency(
+ $args->{positive_rules},
+ $args->{location_rules},
+ );
+ },
+ sub {
+ my ($ruleids, $errors) = @_;
+
+ for my $ruleid (@$ruleids) {
+ push @{ $errors->{$ruleid}->{services} },
+ "two or more services are restricted to different nodes";
+ }
+ },
+);
+
+=head3 check_negative_colocation_location_consistency($negative_rules, $location_rules)
+
+Returns a list of negative colocation rule ids defined in C<$negative_rules>,
+where the services in the negative colocation rule are restricted to less nodes
+than needed to keep them separate by their location rules, defined in
+C<$location_rules>. That is, the negative colocation rule cannot be fullfilled
+as there are not enough nodes to spread the services on.
+
+If there are none, the returned list is empty.
+
+=cut
+
+sub check_negative_colocation_location_consistency {
+ my ($negative_rules, $location_rules) = @_;
+
+ my @errors = ();
+
+ while (my ($negativeid, $negative_rule) = each %$negative_rules) {
+ my $allowed_nodes = {};
+ my $located_services;
+ my $services = $negative_rule->{services};
+
+ for my $locationid (keys %$location_rules) {
+ my $location_rule = $location_rules->{$locationid};
+
+ my $location_services = $location_rule->{services};
+ my $common_services = PVE::HashTools::set_intersect($services, $location_services);
+
+ next if !$location_rule->{strict};
+ next if keys %$common_services < 1;
+
+ $located_services = PVE::HashTools::set_union($located_services, $common_services);
+ $allowed_nodes = PVE::HashTools::set_union($allowed_nodes, $location_rule->{nodes});
+
+ if (keys %$allowed_nodes < keys %$located_services) {
+ push @errors, $negativeid;
+ last; # early return to check next negative colocation rule
+ }
+ }
+ }
+
+ @errors = sort @errors;
+ return \@errors;
+}
+
+__PACKAGE__->register_check(
+ sub {
+ my ($args) = @_;
+
+ return check_negative_colocation_location_consistency(
+ $args->{negative_rules},
+ $args->{location_rules},
+ );
+ },
+ sub {
+ my ($ruleids, $errors) = @_;
+
+ for my $ruleid (@$ruleids) {
+ push @{ $errors->{$ruleid}->{services} },
+ "two or more services are restricted to less nodes than available to the services";
+ }
+ },
+);
+
1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules Daniel Kral
@ 2025-07-01 11:02 ` Daniel Kral
2025-07-04 14:43 ` Michael Köppl
1 sibling, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-07-01 11:02 UTC (permalink / raw)
To: pve-devel
On 6/20/25 16:31, Daniel Kral wrote:
> +=head3 check_positive_colocation_location_consistency($positive_rules, $location_rules)
> +
> +Returns a list of positive colocation rule ids defined in C<$positive_rules>,
> +where the services in the positive colocation rule are restricted to a disjoint
> +set of nodes by their location rules, defined in C<$location_rules>. That is,
> +the positive colocation rule cannot be fullfilled as the services cannot be
> +placed on the same node.
> +
> +If there are none, the returned list is empty.
> +
> +=cut
> +
> +sub check_positive_colocation_location_consistency {
> + my ($positive_rules, $location_rules) = @_;
> +
> + my @errors = ();
> +
> + while (my ($positiveid, $positive_rule) = each %$positive_rules) {
> + my $allowed_nodes;
> + my $services = $positive_rule->{services};
> +
> + for my $locationid (keys %$location_rules) {
> + my $location_rule = $location_rules->{$locationid};
> +
> + next if !$location_rule->{strict};
The "strict" requirement will be removed in a v3.
Service affinity (colocation rules) is determined after node affinity
(location rules). That is, service affinity selects from the nodes,
which are in the highest priority node group determined by the node
affinity rules. Even if a service is in a non-strict node affinity rule,
the service affinity can only select one of the highest priority nodes,
no matter if it is strict or non-strict...
I also realized now that it is still in question what positive service
affinity rules should be allowed if one (or more) of their services are
also in a node affinity rule.
Consider the case where vm:101, vm:102 and vm:103 must be kept on the
same node and only one is in a node affinity rule restricting vm:101 to
only node1. What does that node affinity rule state? Should that rule
infer that vm:102 and vm:103 are also in the node affinity rule now
(rather implicit for the user)?
I'd rather make these combinations invalid and remind the user that all
services should be put in the node affinity rule first with the same
node selection and then they can create the service affinity rule, but
feedback on that would be much appreciated.
> + next if PVE::HashTools::sets_are_disjoint($services, $location_rule->{services});
> +
> + $allowed_nodes = { $location_rule->{nodes}->%* } if !defined($allowed_nodes);
> + $allowed_nodes = PVE::HashTools::set_intersect($allowed_nodes, $location_rule->{nodes});
> +
> + if (keys %$allowed_nodes < 1) {
> + push @errors, $positiveid;
> + last; # early return to check next positive colocation rule
> + }
> + }
> + }
> +
> + @errors = sort @errors;
> + return \@errors;
> +}
> +
> +__PACKAGE__->register_check(
> + sub {
> + my ($args) = @_;
> +
> + return check_positive_colocation_location_consistency(
> + $args->{positive_rules},
> + $args->{location_rules},
> + );
> + },
> + sub {
> + my ($ruleids, $errors) = @_;
> +
> + for my $ruleid (@$ruleids) {
> + push @{ $errors->{$ruleid}->{services} },
> + "two or more services are restricted to different nodes";
> + }
> + },
> +);
> +
> +=head3 check_negative_colocation_location_consistency($negative_rules, $location_rules)
> +
> +Returns a list of negative colocation rule ids defined in C<$negative_rules>,
> +where the services in the negative colocation rule are restricted to less nodes
> +than needed to keep them separate by their location rules, defined in
> +C<$location_rules>. That is, the negative colocation rule cannot be fullfilled
> +as there are not enough nodes to spread the services on.
> +
> +If there are none, the returned list is empty.
> +
> +=cut
> +
> +sub check_negative_colocation_location_consistency {
> + my ($negative_rules, $location_rules) = @_;
> +
> + my @errors = ();
> +
> + while (my ($negativeid, $negative_rule) = each %$negative_rules) {
> + my $allowed_nodes = {};
> + my $located_services;
> + my $services = $negative_rule->{services};
> +
> + for my $locationid (keys %$location_rules) {
> + my $location_rule = $location_rules->{$locationid};
> +
> + my $location_services = $location_rule->{services};
> + my $common_services = PVE::HashTools::set_intersect($services, $location_services);
> +
> + next if !$location_rule->{strict};
Same argument as above regarding the strictness.
> + next if keys %$common_services < 1;
> +
> + $located_services = PVE::HashTools::set_union($located_services, $common_services);
> + $allowed_nodes = PVE::HashTools::set_union($allowed_nodes, $location_rule->{nodes});
> +
> + if (keys %$allowed_nodes < keys %$located_services) {
> + push @errors, $negativeid;
> + last; # early return to check next negative colocation rule
> + }
> + }
> + }
> +
> + @errors = sort @errors;
> + return \@errors;
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules Daniel Kral
2025-07-01 11:02 ` Daniel Kral
@ 2025-07-04 14:43 ` Michael Köppl
1 sibling, 0 replies; 70+ messages in thread
From: Michael Köppl @ 2025-07-04 14:43 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
On 6/20/25 16:31, Daniel Kral wrote:
> Add checks, which determine infeasible colocation rules, because their
> services are already restricted by their location rules in such a way,
> that these cannot be satisfied or are reasonable to be proven to be
> satisfiable.
>
> Positive colocation rule services need to have at least one common node
> to be feasible and negative colocation rule services need to have at
> least the amount of nodes available that nodes are restricted to in
> total, i.e. services that are in strict location rules.
>
> Since location rules allow nodes to be put in priority groups, but the
> information which priority group is relevant depends on the online
> nodes, these checks currently prohibit colocation rules with services,
> which make use of these kinds of location rules.
>
> Even though location rules are restricted to only allow a service to be
> used in a single location rule, the checks here still go over all
> location rules, as this restriction is bound to be changed in the
> future.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes since v1:
> - moved global checks from Colocation plugin to base plugin
> - add check to only allow colocation rules for services which are in
> single-priority location rules / ha groups because these are quite
> stateful and cannot be easily verified to be possible
>
> src/PVE/HA/Rules.pm | 189 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 189 insertions(+)
>
> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
> index 4134283..588e53b 100644
> --- a/src/PVE/HA/Rules.pm
> +++ b/src/PVE/HA/Rules.pm
> @@ -3,6 +3,7 @@ package PVE::HA::Rules;
> use strict;
> use warnings;
>
> +use PVE::HashTools;
> use PVE::JSONSchema qw(get_standard_option);
> use PVE::Tools;
>
> @@ -469,4 +470,192 @@ sub foreach_rule : prototype($$;$) {
> }
> }
>
> +=head1 INTER-PLUGIN RULE CHECKERS
> +
> +=cut
> +
> +=head3 check_single_priority_location_for_colocated_services($location_rules, $colocation_rules)
> +
> +Returns a list of colocation rule ids defined in C<$colocation_rules>, where
> +the services in the colocation rule are in location rules, defined in
> +C<$location_rules>, which have multiple priority groups defined. That is, the
> +colocation rule cannot be statically checked to be feasible as the selection of
> +the priority group is dependent on the currently online nodes.
Might be that I'm misinterpreting this, but doesn't that only apply when
the location rule contains more than 1 node? At the moment, this check
would fail if I have location rules vm:100->node1 and vm:101->node2,
either of them has a priority assigned and I then try to add a
colocation rule that separates them. This would be possible without the
priority assigned.
> +
> +If there are none, the returned list is empty.
> +
> +=cut
> +
> +sub check_single_priority_location_for_colocated_services {
> + my ($colocation_rules, $location_rules) = @_;
> +
> + my @errors = ();
> +
> + while (my ($colocationid, $colocation_rule) = each %$colocation_rules) {
> + my $priority;
> + my $services = $colocation_rule->{services};
> +
> + for my $locationid (keys %$location_rules) {
> + my $location_rule = $location_rules->{$locationid};
> +
> + next if PVE::HashTools::sets_are_disjoint($services, $location_rule->{services});
> +
> + for my $node (values %{ $location_rule->{nodes} }) {
> + $priority = $node->{priority} if !defined($priority);
> +
> + if ($priority != $node->{priority}) {
> + push @errors, $colocationid;
> + last; # early return to check next colocation rule
> + }
> + }
> + }
> + }
> +
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 07/26] config, env, hw: add rules read and parse methods
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (9 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 08/26] manager: read and update rules config Daniel Kral
` (31 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Adds methods to the HA environment to read and write the rules
configuration file for the different environment implementations.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- reorder use statements
- use property isolation for the rules plugin
- introduce `read_and_check_rules_config()` to add rule defaults
- add rule defaults also for the test/simulator environment
src/PVE/HA/Config.pm | 30 ++++++++++++++++++++++++++++++
src/PVE/HA/Env.pm | 6 ++++++
src/PVE/HA/Env/PVE2.pm | 14 ++++++++++++++
src/PVE/HA/Sim/Env.pm | 16 ++++++++++++++++
src/PVE/HA/Sim/Hardware.pm | 21 +++++++++++++++++++++
5 files changed, 87 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index ec9360e..012ae16 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -7,12 +7,14 @@ use JSON;
use PVE::HA::Tools;
use PVE::HA::Groups;
+use PVE::HA::Rules;
use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
use PVE::HA::Resources;
my $manager_status_filename = "ha/manager_status";
my $ha_groups_config = "ha/groups.cfg";
my $ha_resources_config = "ha/resources.cfg";
+my $ha_rules_config = "ha/rules.cfg";
my $crm_commands_filename = "ha/crm_commands";
my $ha_fence_config = "ha/fence.cfg";
@@ -31,6 +33,11 @@ cfs_register_file(
sub { PVE::HA::Resources->parse_config(@_); },
sub { PVE::HA::Resources->write_config(@_); },
);
+cfs_register_file(
+ $ha_rules_config,
+ sub { PVE::HA::Rules->parse_config(@_); },
+ sub { PVE::HA::Rules->write_config(@_); },
+);
cfs_register_file($manager_status_filename, \&json_reader, \&json_writer);
cfs_register_file(
$ha_fence_config,
@@ -197,6 +204,29 @@ sub parse_sid {
return wantarray ? ($sid, $type, $name) : $sid;
}
+sub read_rules_config {
+
+ return cfs_read_file($ha_rules_config);
+}
+
+sub read_and_check_rules_config {
+
+ my $rules = cfs_read_file($ha_rules_config);
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ return $rules;
+}
+
+sub write_rules_config {
+ my ($cfg) = @_;
+
+ cfs_write_file($ha_rules_config, $cfg);
+}
+
sub read_group_config {
return cfs_read_file($ha_groups_config);
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 285e440..5cee7b3 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -131,6 +131,12 @@ sub steal_service {
return $self->{plug}->steal_service($sid, $current_node, $new_node);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return $self->{plug}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index b709f30..1beba7d 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -22,12 +22,20 @@ use PVE::HA::FenceConfig;
use PVE::HA::Resources;
use PVE::HA::Resources::PVEVM;
use PVE::HA::Resources::PVECT;
+use PVE::HA::Rules;
+use PVE::HA::Rules::Location;
+use PVE::HA::Rules::Colocation;
PVE::HA::Resources::PVEVM->register();
PVE::HA::Resources::PVECT->register();
PVE::HA::Resources->init();
+PVE::HA::Rules::Location->register();
+PVE::HA::Rules::Colocation->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
my $lockdir = "/etc/pve/priv/lock";
sub new {
@@ -189,6 +197,12 @@ sub steal_service {
$self->cluster_state_update();
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return PVE::HA::Config::read_and_check_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index d892a00..fc16d3e 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -10,6 +10,9 @@ use Fcntl qw(:DEFAULT :flock);
use PVE::HA::Tools;
use PVE::HA::Env;
use PVE::HA::Resources;
+use PVE::HA::Rules;
+use PVE::HA::Rules::Location;
+use PVE::HA::Rules::Colocation;
use PVE::HA::Sim::Resources::VirtVM;
use PVE::HA::Sim::Resources::VirtCT;
use PVE::HA::Sim::Resources::VirtFail;
@@ -20,6 +23,11 @@ PVE::HA::Sim::Resources::VirtFail->register();
PVE::HA::Resources->init();
+PVE::HA::Rules::Location->register();
+PVE::HA::Rules::Colocation->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
sub new {
my ($this, $nodename, $hardware, $log_id) = @_;
@@ -245,6 +253,14 @@ sub exec_fence_agent {
return $self->{hardware}->exec_fence_agent($agent, $node, @param);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ return $self->{hardware}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 576527d..89dbdfa 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -28,6 +28,7 @@ my $watchdog_timeout = 60;
# $testdir/cmdlist Command list for simulation
# $testdir/hardware_status Hardware description (number of nodes, ...)
# $testdir/manager_status CRM status (start with {})
+# $testdir/rules_config Contraints / Rules configuration
# $testdir/service_config Service configuration
# $testdir/static_service_stats Static service usage information (cpu, memory)
# $testdir/groups HA groups configuration
@@ -319,6 +320,22 @@ sub read_crm_commands {
return $self->global_lock($code);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/rules_config";
+ my $raw = '';
+ $raw = PVE::Tools::file_get_contents($filename) if -f $filename;
+ my $rules = PVE::HA::Rules->parse_config($filename, $raw);
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ return $rules;
+}
+
sub read_group_config {
my ($self) = @_;
@@ -391,6 +408,10 @@ sub new {
# copy initial configuartion
copy("$testdir/manager_status", "$statusdir/manager_status"); # optional
+ if (-f "$testdir/rules_config") {
+ copy("$testdir/rules_config", "$statusdir/rules_config");
+ }
+
if (-f "$testdir/groups") {
copy("$testdir/groups", "$statusdir/groups");
} else {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 08/26] manager: read and update rules config
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (10 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 07/26] config, env, hw: add rules read and parse methods Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 09/26] test: ha tester: add test cases for future location rules Daniel Kral
` (30 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Read the rules configuration in each round and update the canonicalized
rules configuration if there were any changes since the last round to
reduce the amount of times of verifying the rule set.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- only read and canonicalize rules here... introduce the migration
from groups to services and rules in a later patch
src/PVE/HA/Manager.pm | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 85bb114..08c2fd3 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -8,6 +8,9 @@ use Digest::MD5 qw(md5_base64);
use PVE::Tools;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
+use PVE::HA::Rules;
+use PVE::HA::Rules::Location;
+use PVE::HA::Rules::Colocation;
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -41,7 +44,11 @@ sub new {
my $class = ref($this) || $this;
- my $self = bless { haenv => $haenv, crs => {} }, $class;
+ my $self = bless {
+ haenv => $haenv,
+ crs => {},
+ last_rules_digest => '',
+ }, $class;
my $old_ms = $haenv->read_manager_status();
@@ -556,6 +563,18 @@ sub manage {
delete $ss->{$sid};
}
+ my $new_rules = $haenv->read_rules_config();
+
+ if ($new_rules->{digest} ne $self->{last_rules_digest}) {
+
+ my $messages = PVE::HA::Rules->canonicalize($new_rules);
+ $haenv->log('info', $_) for @$messages;
+
+ $self->{rules} = $new_rules;
+
+ $self->{last_rules_digest} = $self->{rules}->{digest};
+ }
+
$self->update_crm_commands();
for (;;) {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 09/26] test: ha tester: add test cases for future location rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (11 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 08/26] manager: read and update rules config Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 10/26] resources: introduce failback property in service config Daniel Kral
` (29 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add test cases to verify that the location rules, which will be added in
a following patch, are functionally equivalent to the HA groups.
These test cases verify the following scenarios for (a) unrestricted and
(b) restricted groups (i.e. loose and strict location rules):
1. If a service is manually migrated to a non-member node and failback
is enabled, then (a)(b) migrate the service back to a member node.
2. If a service is manually migrated to a non-member node and failback
is disabled, then (a) migrate the service back to a member node, or
(b) do nothing for unrestricted groups.
3. If a service's node fails, where the failed node is the only
available group member left, (a) stay in recovery, or (b) migrate the
service to a non-member node.
4. If a service's node fails, but there is another available group
member left, (a)(b) migrate the service to the other member node.
5. If a service's group has failback enabled and the service's node,
which is the node with the highest priority in the group, fails and
comes back later, (a)(b) migrate it to the second-highest prioritized
node and automatically migrate it back to the highest priority node
as soon as it is available again.
6. If a service's group has failback disabled and the service's node,
which is the node with the highest priority in the group, fails and
comes back later, (a)(b) migrate it to the second-highest prioritized
node, but do not migrate it back to the highest priority node if it
becomes available again.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/test/test-location-loose1/README | 10 +++
src/test/test-location-loose1/cmdlist | 4 +
src/test/test-location-loose1/groups | 2 +
src/test/test-location-loose1/hardware_status | 5 ++
src/test/test-location-loose1/log.expect | 40 ++++++++++
src/test/test-location-loose1/manager_status | 1 +
src/test/test-location-loose1/service_config | 3 +
src/test/test-location-loose2/README | 12 +++
src/test/test-location-loose2/cmdlist | 4 +
src/test/test-location-loose2/groups | 3 +
src/test/test-location-loose2/hardware_status | 5 ++
src/test/test-location-loose2/log.expect | 35 +++++++++
src/test/test-location-loose2/manager_status | 1 +
src/test/test-location-loose2/service_config | 3 +
src/test/test-location-loose3/README | 10 +++
src/test/test-location-loose3/cmdlist | 4 +
src/test/test-location-loose3/groups | 2 +
src/test/test-location-loose3/hardware_status | 5 ++
src/test/test-location-loose3/log.expect | 56 ++++++++++++++
src/test/test-location-loose3/manager_status | 1 +
src/test/test-location-loose3/service_config | 5 ++
src/test/test-location-loose4/README | 14 ++++
src/test/test-location-loose4/cmdlist | 4 +
src/test/test-location-loose4/groups | 2 +
src/test/test-location-loose4/hardware_status | 5 ++
src/test/test-location-loose4/log.expect | 54 ++++++++++++++
src/test/test-location-loose4/manager_status | 1 +
src/test/test-location-loose4/service_config | 5 ++
src/test/test-location-loose5/README | 16 ++++
src/test/test-location-loose5/cmdlist | 5 ++
src/test/test-location-loose5/groups | 2 +
src/test/test-location-loose5/hardware_status | 5 ++
src/test/test-location-loose5/log.expect | 66 +++++++++++++++++
src/test/test-location-loose5/manager_status | 1 +
src/test/test-location-loose5/service_config | 3 +
src/test/test-location-loose6/README | 14 ++++
src/test/test-location-loose6/cmdlist | 5 ++
src/test/test-location-loose6/groups | 3 +
src/test/test-location-loose6/hardware_status | 5 ++
src/test/test-location-loose6/log.expect | 52 +++++++++++++
src/test/test-location-loose6/manager_status | 1 +
src/test/test-location-loose6/service_config | 3 +
src/test/test-location-strict1/README | 10 +++
src/test/test-location-strict1/cmdlist | 4 +
src/test/test-location-strict1/groups | 3 +
.../test-location-strict1/hardware_status | 5 ++
src/test/test-location-strict1/log.expect | 40 ++++++++++
src/test/test-location-strict1/manager_status | 1 +
src/test/test-location-strict1/service_config | 3 +
src/test/test-location-strict2/README | 11 +++
src/test/test-location-strict2/cmdlist | 4 +
src/test/test-location-strict2/groups | 4 +
.../test-location-strict2/hardware_status | 5 ++
src/test/test-location-strict2/log.expect | 40 ++++++++++
src/test/test-location-strict2/manager_status | 1 +
src/test/test-location-strict2/service_config | 3 +
src/test/test-location-strict3/README | 10 +++
src/test/test-location-strict3/cmdlist | 4 +
src/test/test-location-strict3/groups | 3 +
.../test-location-strict3/hardware_status | 5 ++
src/test/test-location-strict3/log.expect | 74 +++++++++++++++++++
src/test/test-location-strict3/manager_status | 1 +
src/test/test-location-strict3/service_config | 5 ++
src/test/test-location-strict4/README | 14 ++++
src/test/test-location-strict4/cmdlist | 4 +
src/test/test-location-strict4/groups | 3 +
.../test-location-strict4/hardware_status | 5 ++
src/test/test-location-strict4/log.expect | 54 ++++++++++++++
src/test/test-location-strict4/manager_status | 1 +
src/test/test-location-strict4/service_config | 5 ++
src/test/test-location-strict5/README | 16 ++++
src/test/test-location-strict5/cmdlist | 5 ++
src/test/test-location-strict5/groups | 3 +
.../test-location-strict5/hardware_status | 5 ++
src/test/test-location-strict5/log.expect | 66 +++++++++++++++++
src/test/test-location-strict5/manager_status | 1 +
src/test/test-location-strict5/service_config | 3 +
src/test/test-location-strict6/README | 14 ++++
src/test/test-location-strict6/cmdlist | 5 ++
src/test/test-location-strict6/groups | 4 +
.../test-location-strict6/hardware_status | 5 ++
src/test/test-location-strict6/log.expect | 52 +++++++++++++
src/test/test-location-strict6/manager_status | 1 +
src/test/test-location-strict6/service_config | 3 +
84 files changed, 982 insertions(+)
create mode 100644 src/test/test-location-loose1/README
create mode 100644 src/test/test-location-loose1/cmdlist
create mode 100644 src/test/test-location-loose1/groups
create mode 100644 src/test/test-location-loose1/hardware_status
create mode 100644 src/test/test-location-loose1/log.expect
create mode 100644 src/test/test-location-loose1/manager_status
create mode 100644 src/test/test-location-loose1/service_config
create mode 100644 src/test/test-location-loose2/README
create mode 100644 src/test/test-location-loose2/cmdlist
create mode 100644 src/test/test-location-loose2/groups
create mode 100644 src/test/test-location-loose2/hardware_status
create mode 100644 src/test/test-location-loose2/log.expect
create mode 100644 src/test/test-location-loose2/manager_status
create mode 100644 src/test/test-location-loose2/service_config
create mode 100644 src/test/test-location-loose3/README
create mode 100644 src/test/test-location-loose3/cmdlist
create mode 100644 src/test/test-location-loose3/groups
create mode 100644 src/test/test-location-loose3/hardware_status
create mode 100644 src/test/test-location-loose3/log.expect
create mode 100644 src/test/test-location-loose3/manager_status
create mode 100644 src/test/test-location-loose3/service_config
create mode 100644 src/test/test-location-loose4/README
create mode 100644 src/test/test-location-loose4/cmdlist
create mode 100644 src/test/test-location-loose4/groups
create mode 100644 src/test/test-location-loose4/hardware_status
create mode 100644 src/test/test-location-loose4/log.expect
create mode 100644 src/test/test-location-loose4/manager_status
create mode 100644 src/test/test-location-loose4/service_config
create mode 100644 src/test/test-location-loose5/README
create mode 100644 src/test/test-location-loose5/cmdlist
create mode 100644 src/test/test-location-loose5/groups
create mode 100644 src/test/test-location-loose5/hardware_status
create mode 100644 src/test/test-location-loose5/log.expect
create mode 100644 src/test/test-location-loose5/manager_status
create mode 100644 src/test/test-location-loose5/service_config
create mode 100644 src/test/test-location-loose6/README
create mode 100644 src/test/test-location-loose6/cmdlist
create mode 100644 src/test/test-location-loose6/groups
create mode 100644 src/test/test-location-loose6/hardware_status
create mode 100644 src/test/test-location-loose6/log.expect
create mode 100644 src/test/test-location-loose6/manager_status
create mode 100644 src/test/test-location-loose6/service_config
create mode 100644 src/test/test-location-strict1/README
create mode 100644 src/test/test-location-strict1/cmdlist
create mode 100644 src/test/test-location-strict1/groups
create mode 100644 src/test/test-location-strict1/hardware_status
create mode 100644 src/test/test-location-strict1/log.expect
create mode 100644 src/test/test-location-strict1/manager_status
create mode 100644 src/test/test-location-strict1/service_config
create mode 100644 src/test/test-location-strict2/README
create mode 100644 src/test/test-location-strict2/cmdlist
create mode 100644 src/test/test-location-strict2/groups
create mode 100644 src/test/test-location-strict2/hardware_status
create mode 100644 src/test/test-location-strict2/log.expect
create mode 100644 src/test/test-location-strict2/manager_status
create mode 100644 src/test/test-location-strict2/service_config
create mode 100644 src/test/test-location-strict3/README
create mode 100644 src/test/test-location-strict3/cmdlist
create mode 100644 src/test/test-location-strict3/groups
create mode 100644 src/test/test-location-strict3/hardware_status
create mode 100644 src/test/test-location-strict3/log.expect
create mode 100644 src/test/test-location-strict3/manager_status
create mode 100644 src/test/test-location-strict3/service_config
create mode 100644 src/test/test-location-strict4/README
create mode 100644 src/test/test-location-strict4/cmdlist
create mode 100644 src/test/test-location-strict4/groups
create mode 100644 src/test/test-location-strict4/hardware_status
create mode 100644 src/test/test-location-strict4/log.expect
create mode 100644 src/test/test-location-strict4/manager_status
create mode 100644 src/test/test-location-strict4/service_config
create mode 100644 src/test/test-location-strict5/README
create mode 100644 src/test/test-location-strict5/cmdlist
create mode 100644 src/test/test-location-strict5/groups
create mode 100644 src/test/test-location-strict5/hardware_status
create mode 100644 src/test/test-location-strict5/log.expect
create mode 100644 src/test/test-location-strict5/manager_status
create mode 100644 src/test/test-location-strict5/service_config
create mode 100644 src/test/test-location-strict6/README
create mode 100644 src/test/test-location-strict6/cmdlist
create mode 100644 src/test/test-location-strict6/groups
create mode 100644 src/test/test-location-strict6/hardware_status
create mode 100644 src/test/test-location-strict6/log.expect
create mode 100644 src/test/test-location-strict6/manager_status
create mode 100644 src/test/test-location-strict6/service_config
diff --git a/src/test/test-location-loose1/README b/src/test/test-location-loose1/README
new file mode 100644
index 0000000..8775b6c
--- /dev/null
+++ b/src/test/test-location-loose1/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group will automatically migrate back
+to a node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is a group member and has higher priority than the other nodes
diff --git a/src/test/test-location-loose1/cmdlist b/src/test/test-location-loose1/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-location-loose1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-location-loose1/groups b/src/test/test-location-loose1/groups
new file mode 100644
index 0000000..50c9a2d
--- /dev/null
+++ b/src/test/test-location-loose1/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node3
diff --git a/src/test/test-location-loose1/hardware_status b/src/test/test-location-loose1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-loose1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-loose1/log.expect b/src/test/test-location-loose1/log.expect
new file mode 100644
index 0000000..e0f4d46
--- /dev/null
+++ b/src/test/test-location-loose1/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-loose1/manager_status b/src/test/test-location-loose1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-loose1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-loose1/service_config b/src/test/test-location-loose1/service_config
new file mode 100644
index 0000000..5f55843
--- /dev/null
+++ b/src/test/test-location-loose1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-location-loose2/README b/src/test/test-location-loose2/README
new file mode 100644
index 0000000..f27414b
--- /dev/null
+++ b/src/test/test-location-loose2/README
@@ -0,0 +1,12 @@
+Test whether a service in a unrestricted group with nofailback enabled will
+stay on the manual migration target node, even though the target node is not a
+member of the unrestricted group.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, vm:101 stays on node2; even though
+ node2 is not a group member, the nofailback flag prevents vm:101 to be
+ migrated back to a group member
diff --git a/src/test/test-location-loose2/cmdlist b/src/test/test-location-loose2/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-location-loose2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-location-loose2/groups b/src/test/test-location-loose2/groups
new file mode 100644
index 0000000..59192fa
--- /dev/null
+++ b/src/test/test-location-loose2/groups
@@ -0,0 +1,3 @@
+group: should_stay_here
+ nodes node3
+ nofailback 1
diff --git a/src/test/test-location-loose2/hardware_status b/src/test/test-location-loose2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-loose2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-loose2/log.expect b/src/test/test-location-loose2/log.expect
new file mode 100644
index 0000000..35e2470
--- /dev/null
+++ b/src/test/test-location-loose2/log.expect
@@ -0,0 +1,35 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: starting service vm:101
+info 143 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-loose2/manager_status b/src/test/test-location-loose2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-loose2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-loose2/service_config b/src/test/test-location-loose2/service_config
new file mode 100644
index 0000000..5f55843
--- /dev/null
+++ b/src/test/test-location-loose2/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-location-loose3/README b/src/test/test-location-loose3/README
new file mode 100644
index 0000000..c4ddfab
--- /dev/null
+++ b/src/test/test-location-loose3/README
@@ -0,0 +1,10 @@
+Test whether a service in a unrestricted group with only one node member will
+be migrated to a non-member node in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 should be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node1
diff --git a/src/test/test-location-loose3/cmdlist b/src/test/test-location-loose3/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-location-loose3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-location-loose3/groups b/src/test/test-location-loose3/groups
new file mode 100644
index 0000000..50c9a2d
--- /dev/null
+++ b/src/test/test-location-loose3/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node3
diff --git a/src/test/test-location-loose3/hardware_status b/src/test/test-location-loose3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-loose3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-loose3/log.expect b/src/test/test-location-loose3/log.expect
new file mode 100644
index 0000000..752300b
--- /dev/null
+++ b/src/test/test-location-loose3/log.expect
@@ -0,0 +1,56 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: got lock 'ha_agent_node1_lock'
+info 241 node1/lrm: status change wait_for_agent_lock => active
+info 241 node1/lrm: starting service vm:101
+info 241 node1/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-loose3/manager_status b/src/test/test-location-loose3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-location-loose3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-location-loose3/service_config b/src/test/test-location-loose3/service_config
new file mode 100644
index 0000000..777b2a7
--- /dev/null
+++ b/src/test/test-location-loose3/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-location-loose4/README b/src/test/test-location-loose4/README
new file mode 100644
index 0000000..a08f0e1
--- /dev/null
+++ b/src/test/test-location-loose4/README
@@ -0,0 +1,14 @@
+Test whether a service in a unrestricted group with two node members will stay
+assigned to one of the node members in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher service count than node1 to test whether the restriction
+ to node2 and node3 is applied even though the scheduler would prefer the less
+ utilized node1
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node2, as it's the only available node
+ left in the unrestricted group
diff --git a/src/test/test-location-loose4/cmdlist b/src/test/test-location-loose4/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-location-loose4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-location-loose4/groups b/src/test/test-location-loose4/groups
new file mode 100644
index 0000000..b1584b5
--- /dev/null
+++ b/src/test/test-location-loose4/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node2,node3
diff --git a/src/test/test-location-loose4/hardware_status b/src/test/test-location-loose4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-loose4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-loose4/log.expect b/src/test/test-location-loose4/log.expect
new file mode 100644
index 0000000..847e157
--- /dev/null
+++ b/src/test/test-location-loose4/log.expect
@@ -0,0 +1,54 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-loose4/manager_status b/src/test/test-location-loose4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-loose4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-loose4/service_config b/src/test/test-location-loose4/service_config
new file mode 100644
index 0000000..777b2a7
--- /dev/null
+++ b/src/test/test-location-loose4/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-location-loose5/README b/src/test/test-location-loose5/README
new file mode 100644
index 0000000..0c37044
--- /dev/null
+++ b/src/test/test-location-loose5/README
@@ -0,0 +1,16 @@
+Test whether a service in a unrestricted group with two differently prioritized
+node members will stay on the node with the highest priority in case of a
+failover or when the service is on a lower-priority node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
+ a higher priority than node3
+- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
+ available node member left in the unrestricted group
+- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
+ higher priority than node3
diff --git a/src/test/test-location-loose5/cmdlist b/src/test/test-location-loose5/cmdlist
new file mode 100644
index 0000000..6932aa7
--- /dev/null
+++ b/src/test/test-location-loose5/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off" ],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-location-loose5/groups b/src/test/test-location-loose5/groups
new file mode 100644
index 0000000..03a0ee9
--- /dev/null
+++ b/src/test/test-location-loose5/groups
@@ -0,0 +1,2 @@
+group: should_stay_here
+ nodes node2:2,node3:1
diff --git a/src/test/test-location-loose5/hardware_status b/src/test/test-location-loose5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-loose5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-loose5/log.expect b/src/test/test-location-loose5/log.expect
new file mode 100644
index 0000000..a875e11
--- /dev/null
+++ b/src/test/test-location-loose5/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 25 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 260 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 260 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 265 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 265 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 280 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 363 node2/lrm: got lock 'ha_agent_node2_lock'
+info 363 node2/lrm: status change wait_for_agent_lock => active
+info 363 node2/lrm: starting service vm:101
+info 363 node2/lrm: service status vm:101 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-location-loose5/manager_status b/src/test/test-location-loose5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-loose5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-loose5/service_config b/src/test/test-location-loose5/service_config
new file mode 100644
index 0000000..5f55843
--- /dev/null
+++ b/src/test/test-location-loose5/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-location-loose6/README b/src/test/test-location-loose6/README
new file mode 100644
index 0000000..4ab1275
--- /dev/null
+++ b/src/test/test-location-loose6/README
@@ -0,0 +1,14 @@
+Test whether a service in a unrestricted group with nofailback enabled and two
+differently prioritized node members will stay on the current node without
+migrating back to the highest priority node.
+
+The test scenario is:
+- vm:101 should be kept on node2 or node3
+- vm:101 is currently running on node2
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As node2 fails, vm:101 is migrated to node3 as it is the only available node
+ member left in the unrestricted group
+- As node2 comes back online, vm:101 stays on node3; even though node2 has a
+ higher priority, the nofailback flag prevents vm:101 to migrate back to node2
diff --git a/src/test/test-location-loose6/cmdlist b/src/test/test-location-loose6/cmdlist
new file mode 100644
index 0000000..4dd33cc
--- /dev/null
+++ b/src/test/test-location-loose6/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off"],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-location-loose6/groups b/src/test/test-location-loose6/groups
new file mode 100644
index 0000000..a7aed17
--- /dev/null
+++ b/src/test/test-location-loose6/groups
@@ -0,0 +1,3 @@
+group: should_stay_here
+ nodes node2:2,node3:1
+ nofailback 1
diff --git a/src/test/test-location-loose6/hardware_status b/src/test/test-location-loose6/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-loose6/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-loose6/log.expect b/src/test/test-location-loose6/log.expect
new file mode 100644
index 0000000..bcb472b
--- /dev/null
+++ b/src/test/test-location-loose6/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: got lock 'ha_agent_node3_lock'
+info 245 node3/lrm: status change wait_for_agent_lock => active
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-location-loose6/manager_status b/src/test/test-location-loose6/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-loose6/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-loose6/service_config b/src/test/test-location-loose6/service_config
new file mode 100644
index 0000000..c4ece62
--- /dev/null
+++ b/src/test/test-location-loose6/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node2", "state": "started", "group": "should_stay_here" }
+}
diff --git a/src/test/test-location-strict1/README b/src/test/test-location-strict1/README
new file mode 100644
index 0000000..c717d58
--- /dev/null
+++ b/src/test/test-location-strict1/README
@@ -0,0 +1,10 @@
+Test whether a service in a restricted group will automatically migrate back to
+a restricted node member in case of a manual migration to a non-member node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is the only available node member left in the restricted group
diff --git a/src/test/test-location-strict1/cmdlist b/src/test/test-location-strict1/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-location-strict1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-location-strict1/groups b/src/test/test-location-strict1/groups
new file mode 100644
index 0000000..370865f
--- /dev/null
+++ b/src/test/test-location-strict1/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
diff --git a/src/test/test-location-strict1/hardware_status b/src/test/test-location-strict1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-strict1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-strict1/log.expect b/src/test/test-location-strict1/log.expect
new file mode 100644
index 0000000..e0f4d46
--- /dev/null
+++ b/src/test/test-location-strict1/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-strict1/manager_status b/src/test/test-location-strict1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-strict1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-strict1/service_config b/src/test/test-location-strict1/service_config
new file mode 100644
index 0000000..36ea15b
--- /dev/null
+++ b/src/test/test-location-strict1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-location-strict2/README b/src/test/test-location-strict2/README
new file mode 100644
index 0000000..f4d06a1
--- /dev/null
+++ b/src/test/test-location-strict2/README
@@ -0,0 +1,11 @@
+Test whether a service in a restricted group with nofailback enabled will
+automatically migrate back to a restricted node member in case of a manual
+migration to a non-member node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As vm:101 is manually migrated to node2, it is migrated back to node3, as
+ node3 is the only available node member left in the restricted group
diff --git a/src/test/test-location-strict2/cmdlist b/src/test/test-location-strict2/cmdlist
new file mode 100644
index 0000000..a63e4fd
--- /dev/null
+++ b/src/test/test-location-strict2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-location-strict2/groups b/src/test/test-location-strict2/groups
new file mode 100644
index 0000000..e43eafc
--- /dev/null
+++ b/src/test/test-location-strict2/groups
@@ -0,0 +1,4 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
+ nofailback 1
diff --git a/src/test/test-location-strict2/hardware_status b/src/test/test-location-strict2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-strict2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-strict2/log.expect b/src/test/test-location-strict2/log.expect
new file mode 100644
index 0000000..e0f4d46
--- /dev/null
+++ b/src/test/test-location-strict2/log.expect
@@ -0,0 +1,40 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 125 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 125 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: migrate service 'vm:101' to node 'node3' (running)
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 143 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 165 node3/lrm: starting service vm:101
+info 165 node3/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-strict2/manager_status b/src/test/test-location-strict2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-strict2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-strict2/service_config b/src/test/test-location-strict2/service_config
new file mode 100644
index 0000000..36ea15b
--- /dev/null
+++ b/src/test/test-location-strict2/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-location-strict3/README b/src/test/test-location-strict3/README
new file mode 100644
index 0000000..5aced39
--- /dev/null
+++ b/src/test/test-location-strict3/README
@@ -0,0 +1,10 @@
+Test whether a service in a restricted group with only one node member will
+stay in recovery in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 must be kept on node3
+- vm:101 is currently running on node3
+
+The expected outcome is:
+- As node3 fails, vm:101 stays in recovery since there's no available node
+ member left in the restricted group
diff --git a/src/test/test-location-strict3/cmdlist b/src/test/test-location-strict3/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-location-strict3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-location-strict3/groups b/src/test/test-location-strict3/groups
new file mode 100644
index 0000000..370865f
--- /dev/null
+++ b/src/test/test-location-strict3/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node3
+ restricted 1
diff --git a/src/test/test-location-strict3/hardware_status b/src/test/test-location-strict3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-strict3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-strict3/log.expect b/src/test/test-location-strict3/log.expect
new file mode 100644
index 0000000..47f9776
--- /dev/null
+++ b/src/test/test-location-strict3/log.expect
@@ -0,0 +1,74 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+err 240 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 260 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 280 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 300 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 320 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 340 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 360 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 380 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 400 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 420 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 440 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 460 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 480 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 500 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 520 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 540 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 560 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 580 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 600 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 620 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 640 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 660 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 680 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+err 700 node1/crm: recovering service 'vm:101' from fenced node 'node3' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-strict3/manager_status b/src/test/test-location-strict3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-location-strict3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-location-strict3/service_config b/src/test/test-location-strict3/service_config
new file mode 100644
index 0000000..9adf02c
--- /dev/null
+++ b/src/test/test-location-strict3/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-location-strict4/README b/src/test/test-location-strict4/README
new file mode 100644
index 0000000..25ded53
--- /dev/null
+++ b/src/test/test-location-strict4/README
@@ -0,0 +1,14 @@
+Test whether a service in a restricted group with two node members will stay
+assigned to one of the node members in case of a failover of their previously
+assigned node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher service count than node1 to test whether the restriction
+ to node2 and node3 is applied even though the scheduler would prefer the less
+ utilized node1
+
+The expected outcome is:
+- As node3 fails, vm:101 is migrated to node2, as it's the only available node
+ left in the restricted group
diff --git a/src/test/test-location-strict4/cmdlist b/src/test/test-location-strict4/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-location-strict4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-location-strict4/groups b/src/test/test-location-strict4/groups
new file mode 100644
index 0000000..0ad2abc
--- /dev/null
+++ b/src/test/test-location-strict4/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node2,node3
+ restricted 1
diff --git a/src/test/test-location-strict4/hardware_status b/src/test/test-location-strict4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-strict4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-strict4/log.expect b/src/test/test-location-strict4/log.expect
new file mode 100644
index 0000000..847e157
--- /dev/null
+++ b/src/test/test-location-strict4/log.expect
@@ -0,0 +1,54 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-location-strict4/manager_status b/src/test/test-location-strict4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-strict4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-strict4/service_config b/src/test/test-location-strict4/service_config
new file mode 100644
index 0000000..9adf02c
--- /dev/null
+++ b/src/test/test-location-strict4/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-location-strict5/README b/src/test/test-location-strict5/README
new file mode 100644
index 0000000..a4e67f4
--- /dev/null
+++ b/src/test/test-location-strict5/README
@@ -0,0 +1,16 @@
+Test whether a service in a restricted group with two differently prioritized
+node members will stay on the node with the highest priority in case of a
+failover or when the service is on a lower-priority node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node3
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As vm:101 runs on node3, it is automatically migrated to node2, as node2 has
+ a higher priority than node3
+- As node2 fails, vm:101 is migrated to node3 as node3 is the next and only
+ available node member left in the restricted group
+- As node2 comes back online, vm:101 is migrated back to node2, as node2 has a
+ higher priority than node3
diff --git a/src/test/test-location-strict5/cmdlist b/src/test/test-location-strict5/cmdlist
new file mode 100644
index 0000000..6932aa7
--- /dev/null
+++ b/src/test/test-location-strict5/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off" ],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-location-strict5/groups b/src/test/test-location-strict5/groups
new file mode 100644
index 0000000..ec3cd79
--- /dev/null
+++ b/src/test/test-location-strict5/groups
@@ -0,0 +1,3 @@
+group: must_stay_here
+ nodes node2:2,node3:1
+ restricted 1
diff --git a/src/test/test-location-strict5/hardware_status b/src/test/test-location-strict5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-strict5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-strict5/log.expect b/src/test/test-location-strict5/log.expect
new file mode 100644
index 0000000..a875e11
--- /dev/null
+++ b/src/test/test-location-strict5/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 25 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 260 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 260 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 265 node3/lrm: service vm:101 - start migrate to node 'node2'
+info 265 node3/lrm: service vm:101 - end migrate to node 'node2'
+info 280 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 363 node2/lrm: got lock 'ha_agent_node2_lock'
+info 363 node2/lrm: status change wait_for_agent_lock => active
+info 363 node2/lrm: starting service vm:101
+info 363 node2/lrm: service status vm:101 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-location-strict5/manager_status b/src/test/test-location-strict5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-strict5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-strict5/service_config b/src/test/test-location-strict5/service_config
new file mode 100644
index 0000000..36ea15b
--- /dev/null
+++ b/src/test/test-location-strict5/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node3", "state": "started", "group": "must_stay_here" }
+}
diff --git a/src/test/test-location-strict6/README b/src/test/test-location-strict6/README
new file mode 100644
index 0000000..c558afd
--- /dev/null
+++ b/src/test/test-location-strict6/README
@@ -0,0 +1,14 @@
+Test whether a service in a restricted group with nofailback enabled and two
+differently prioritized node members will stay on the current node without
+migrating back to the highest priority node.
+
+The test scenario is:
+- vm:101 must be kept on node2 or node3
+- vm:101 is currently running on node2
+- node2 has a higher priority than node3
+
+The expected outcome is:
+- As node2 fails, vm:101 is migrated to node3 as it is the only available node
+ member left in the restricted group
+- As node2 comes back online, vm:101 stays on node3; even though node2 has a
+ higher priority, the nofailback flag prevents vm:101 to migrate back to node2
diff --git a/src/test/test-location-strict6/cmdlist b/src/test/test-location-strict6/cmdlist
new file mode 100644
index 0000000..4dd33cc
--- /dev/null
+++ b/src/test/test-location-strict6/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node2 off"],
+ [ "power node2 on", "network node2 on" ]
+]
diff --git a/src/test/test-location-strict6/groups b/src/test/test-location-strict6/groups
new file mode 100644
index 0000000..cdd0e50
--- /dev/null
+++ b/src/test/test-location-strict6/groups
@@ -0,0 +1,4 @@
+group: must_stay_here
+ nodes node2:2,node3:1
+ restricted 1
+ nofailback 1
diff --git a/src/test/test-location-strict6/hardware_status b/src/test/test-location-strict6/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-location-strict6/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-location-strict6/log.expect b/src/test/test-location-strict6/log.expect
new file mode 100644
index 0000000..bcb472b
--- /dev/null
+++ b/src/test/test-location-strict6/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node2 off
+info 120 node1/crm: node 'node2': state changed from 'online' => 'unknown'
+info 122 node2/crm: status change slave => wait_for_quorum
+info 123 node2/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node2'
+info 164 watchdog: execute power node2 off
+info 163 node2/crm: killed by poweroff
+info 164 node2/lrm: killed by poweroff
+info 164 hardware: server 'node2' stopped by poweroff (watchdog)
+info 220 cmdlist: execute power node2 on
+info 220 node2/crm: status change startup => wait_for_quorum
+info 220 node2/lrm: status change startup => wait_for_agent_lock
+info 220 cmdlist: execute network node2 on
+info 240 node1/crm: got lock 'ha_agent_node2_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node2' to node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node3)
+info 242 node2/crm: status change wait_for_quorum => slave
+info 245 node3/lrm: got lock 'ha_agent_node3_lock'
+info 245 node3/lrm: status change wait_for_agent_lock => active
+info 245 node3/lrm: starting service vm:101
+info 245 node3/lrm: service status vm:101 started
+info 260 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-location-strict6/manager_status b/src/test/test-location-strict6/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-location-strict6/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-location-strict6/service_config b/src/test/test-location-strict6/service_config
new file mode 100644
index 0000000..1d371e1
--- /dev/null
+++ b/src/test/test-location-strict6/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node2", "state": "started", "group": "must_stay_here" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 10/26] resources: introduce failback property in service config
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (12 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 09/26] test: ha tester: add test cases for future location rules Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 11/26] manager: migrate ha groups to location rules in-memory Daniel Kral
` (28 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add the failback property in the service config, which is functionally
equivalent to the negation of the HA group's nofailback property.
It is set to be enabled by default as the HA group's nofailback property
was disabled by default.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/HA/Config.pm | 1 +
src/PVE/HA/Resources.pm | 8 ++++++++
src/PVE/HA/Resources/PVECT.pm | 1 +
src/PVE/HA/Resources/PVEVM.pm | 1 +
src/PVE/HA/Sim/Hardware.pm | 1 +
src/test/test_failover1.pl | 1 +
6 files changed, 13 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 012ae16..1b67443 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -116,6 +116,7 @@ sub read_and_check_resources_config {
my (undef, undef, $name) = parse_sid($sid);
$d->{state} = 'started' if !defined($d->{state});
$d->{state} = 'started' if $d->{state} eq 'enabled'; # backward compatibility
+ $d->{failback} = 1 if !defined($d->{failback});
$d->{max_restart} = 1 if !defined($d->{max_restart});
$d->{max_relocate} = 1 if !defined($d->{max_relocate});
if (PVE::HA::Resources->lookup($d->{type})) {
diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 873387e..90410a9 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -62,6 +62,14 @@ EODESC
completion => \&PVE::HA::Tools::complete_group,
},
),
+ failback => {
+ description => "Automatically migrate service to the node with the highest priority"
+ . " according to their location rules, if a node with a higher priority than the"
+ . " current node comes online.",
+ type => 'boolean',
+ optional => 1,
+ default => 1,
+ },
max_restart => {
description => "Maximal number of tries to restart the service on"
. " a node after its start failed.",
diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index d1ab679..44644d9 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -37,6 +37,7 @@ sub options {
state => { optional => 1 },
group => { optional => 1 },
comment => { optional => 1 },
+ failback => { optional => 1 },
max_restart => { optional => 1 },
max_relocate => { optional => 1 },
};
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index fe65577..e634fe3 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -37,6 +37,7 @@ sub options {
state => { optional => 1 },
group => { optional => 1 },
comment => { optional => 1 },
+ failback => { optional => 1 },
max_restart => { optional => 1 },
max_relocate => { optional => 1 },
};
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 89dbdfa..579be2a 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -106,6 +106,7 @@ sub read_service_config {
}
$d->{state} = 'disabled' if !$d->{state};
$d->{state} = 'started' if $d->{state} eq 'enabled'; # backward compatibility
+ $d->{failback} = 1 if !defined($d->{failback});
$d->{max_restart} = 1 if !defined($d->{max_restart});
$d->{max_relocate} = 1 if !defined($d->{max_relocate});
}
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 90f5cf4..90bd61a 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -22,6 +22,7 @@ $online_node_usage->add_node("node3");
my $service_conf = {
node => 'node1',
group => 'prefer_node1',
+ failback => 1,
};
my $sd = {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 11/26] manager: migrate ha groups to location rules in-memory
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (13 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 10/26] resources: introduce failback property in service config Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 12/26] manager: apply location rules when selecting service nodes Daniel Kral
` (27 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Migrate the currently configured HA groups to HA Location rules
in-memory if the use-location-rules feature flag isn't set, so that they
can be applied as such in the next patches and therefore replace HA
groups internally.
Also ignore location rules written to the rules config if the
use-location-rules isn't set, because these should be mutually exclusive
so that users can only use HA groups or HA location rules.
Location rules in their initial implementation are designed to be as
restrictive as HA groups, i.e. only allow a service to be used in a
single location rule, to ease the migration between them.
HA groups map directly to location rules, except that the 'restricted'
property is renamed to 'strict', so the name is the same as for
colocation rules, and that the 'failback' property is moved to the
service config.
The 'nofailback' property is moved to the service config, because it
allows users to set it more granularly for individual services and
allows the location rules to be more extendible in the future, e.g.
multiple location rules for a single service.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/HA/Config.pm | 3 ++-
src/PVE/HA/Groups.pm | 48 ++++++++++++++++++++++++++++++++++++
src/PVE/HA/Manager.pm | 24 ++++++++++++++++--
src/PVE/HA/Rules/Location.pm | 17 +++++++++++++
4 files changed, 89 insertions(+), 3 deletions(-)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 1b67443..2b3d726 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -131,7 +131,8 @@ sub read_and_check_resources_config {
}
}
- return $conf;
+ # TODO PVE 10: Remove digest when HA groups have been fully migrated to location rules
+ return wantarray ? ($conf, $res->{digest}) : $conf;
}
sub update_resources_config {
diff --git a/src/PVE/HA/Groups.pm b/src/PVE/HA/Groups.pm
index 821d969..d0f5721 100644
--- a/src/PVE/HA/Groups.pm
+++ b/src/PVE/HA/Groups.pm
@@ -107,4 +107,52 @@ sub parse_section_header {
__PACKAGE__->register();
__PACKAGE__->init();
+# Migrate nofailback from group config to service config
+sub migrate_groups_to_services {
+ my ($groups, $services) = @_;
+
+ for my $sid (keys %$services) {
+ my $groupid = $services->{$sid}->{group}
+ or next; # skip services without groups
+
+ # name it 'failback' to remove the double negation
+ $services->{$sid}->{failback} = !$groups->{ids}->{$groupid}->{nofailback};
+ }
+}
+
+# Migrate groups from group config to location rules in rules config
+sub migrate_groups_to_rules {
+ my ($rules, $groups, $services) = @_;
+
+ my $order = (sort { $a <=> $b } values %{ $rules->{order} })[0] || 0;
+
+ my $group_services = {};
+
+ for my $sid (keys %$services) {
+ my $groupid = $services->{$sid}->{group}
+ or next; # skip services without groups
+
+ $group_services->{$groupid}->{$sid} = 1;
+ }
+
+ while (my ($group, $services) = each %$group_services) {
+ # prefix generated rules with '_' as those are not allowed for user-provided ruleids
+ my $ruleid = "_group_$group";
+ my $nodes = {};
+ for my $entry (keys %{ $groups->{ids}->{$group}->{nodes} }) {
+ my ($node, $priority) = PVE::HA::Tools::parse_node_priority($entry);
+
+ $nodes->{$node} = { priority => $priority };
+ }
+
+ $rules->{order}->{$ruleid} = ++$order;
+ $rules->{ids}->{$ruleid} = {
+ type => 'location',
+ services => $services,
+ nodes => $nodes,
+ strict => $groups->{ids}->{$group}->{restricted},
+ };
+ }
+}
+
1;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 08c2fd3..5ae8da1 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -6,6 +6,7 @@ use warnings;
use Digest::MD5 qw(md5_base64);
use PVE::Tools;
+use PVE::HA::Groups;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Rules;
@@ -48,6 +49,8 @@ sub new {
haenv => $haenv,
crs => {},
last_rules_digest => '',
+ last_groups_digest => '',
+ last_services_digest => '',
}, $class;
my $old_ms = $haenv->read_manager_status();
@@ -530,7 +533,7 @@ sub manage {
$self->update_crs_scheduler_mode();
- my $sc = $haenv->read_service_config();
+ my ($sc, $services_digest) = $haenv->read_service_config();
$self->{groups} = $haenv->read_group_config(); # update
@@ -565,7 +568,22 @@ sub manage {
my $new_rules = $haenv->read_rules_config();
- if ($new_rules->{digest} ne $self->{last_rules_digest}) {
+ my $dc_cfg = $haenv->get_datacenter_settings();
+ my $use_location_rules = $dc_cfg->{ha}->{'use-location-rules'};
+
+ # TODO PVE 10: Remove group migration when HA groups have been fully migrated to location rules
+ PVE::HA::Groups::migrate_groups_to_services($self->{groups}, $sc) if !$use_location_rules;
+
+ if (
+ $new_rules->{digest} ne $self->{last_rules_digest}
+ || $self->{groups}->{digest} ne $self->{last_groups_digest}
+ || $services_digest && $services_digest ne $self->{last_services_digest}
+ ) {
+
+ if (!$use_location_rules) {
+ PVE::HA::Rules::Location::delete_location_rules($new_rules);
+ PVE::HA::Groups::migrate_groups_to_rules($new_rules, $self->{groups}, $sc);
+ }
my $messages = PVE::HA::Rules->canonicalize($new_rules);
$haenv->log('info', $_) for @$messages;
@@ -573,6 +591,8 @@ sub manage {
$self->{rules} = $new_rules;
$self->{last_rules_digest} = $self->{rules}->{digest};
+ $self->{last_groups_digest} = $self->{groups}->{digest};
+ $self->{last_services_digest} = $services_digest;
}
$self->update_crm_commands();
diff --git a/src/PVE/HA/Rules/Location.pm b/src/PVE/HA/Rules/Location.pm
index 67f0b32..b9f76c7 100644
--- a/src/PVE/HA/Rules/Location.pm
+++ b/src/PVE/HA/Rules/Location.pm
@@ -203,4 +203,21 @@ __PACKAGE__->register_check(
},
);
+=head1 LOCATION RULE HELPERS
+
+=cut
+
+# Remove location rules from rules config
+# TODO PVE 10: Can be removed if use-location-rules feature flag is not needed anymore
+sub delete_location_rules {
+ my ($rules) = @_;
+
+ for my $ruleid (keys %{ $rules->{ids} }) {
+ next if $rules->{ids}->{$ruleid}->{type} ne 'location';
+
+ delete $rules->{ids}->{$ruleid};
+ delete $rules->{order}->{$ruleid};
+ }
+}
+
1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 12/26] manager: apply location rules when selecting service nodes
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (14 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 11/26] manager: migrate ha groups to location rules in-memory Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 13/26] usage: add information about a service's assigned nodes Daniel Kral
` (26 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Replace the HA group mechanism by replacing it with the functionally
equivalent location rules' get_location_preference(...), which enforces
the location rules defined in the rules config.
This allows the $groups parameter to be replaced with the $rules
parameter in select_service_node(...) as all behavior of the HA groups
is now encoded in $service_conf and $rules, and $rules will also be
shared with colocation rules in the next patch.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/HA/Manager.pm | 81 +++++++-----------------------------
src/PVE/HA/Rules/Location.pm | 79 +++++++++++++++++++++++++++++++++++
src/test/test_failover1.pl | 18 +++++---
3 files changed, 107 insertions(+), 71 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 5ae8da1..00efc7c 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -10,7 +10,7 @@ use PVE::HA::Groups;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Rules;
-use PVE::HA::Rules::Location;
+use PVE::HA::Rules::Location qw(get_location_preference);
use PVE::HA::Rules::Colocation;
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -115,57 +115,13 @@ sub flush_master_status {
$haenv->write_manager_status($ms);
}
-sub get_service_group {
- my ($groups, $online_node_usage, $service_conf) = @_;
-
- my $group = {};
- # add all online nodes to default group to allow try_next when no group set
- $group->{nodes}->{$_} = 1 for $online_node_usage->list_nodes();
-
- # overwrite default if service is bound to a specific group
- if (my $group_id = $service_conf->{group}) {
- $group = $groups->{ids}->{$group_id} if $groups->{ids}->{$group_id};
- }
-
- return $group;
-}
-
-# groups available nodes with their priority as group index
-sub get_node_priority_groups {
- my ($group, $online_node_usage) = @_;
-
- my $pri_groups = {};
- my $group_members = {};
- foreach my $entry (keys %{ $group->{nodes} }) {
- my ($node, $pri) = ($entry, 0);
- if ($entry =~ m/^(\S+):(\d+)$/) {
- ($node, $pri) = ($1, $2);
- }
- next if !$online_node_usage->contains_node($node); # offline
- $pri_groups->{$pri}->{$node} = 1;
- $group_members->{$node} = $pri;
- }
-
- # add non-group members to unrestricted groups (priority -1)
- if (!$group->{restricted}) {
- my $pri = -1;
- for my $node ($online_node_usage->list_nodes()) {
- next if defined($group_members->{$node});
- $pri_groups->{$pri}->{$node} = 1;
- $group_members->{$node} = -1;
- }
- }
-
- return ($pri_groups, $group_members);
-}
-
=head3 select_service_node(...)
-=head3 select_service_node($groups, $online_node_usage, $sid, $service_conf, $sd, $mode)
+=head3 select_service_node($rules, $online_node_usage, $sid, $service_conf, $sd, $mode)
Used to select the best fitting node for the service C<$sid>, with the
-configuration C<$service_conf> and state C<$sd>, according to the groups defined
-in C<$groups>, available node utilization in C<$online_node_usage>, and the
+configuration C<$service_conf> and state C<$sd>, according to the rules defined
+in C<$rules>, available node utilization in C<$online_node_usage>, and the
given C<$mode>.
The C<$mode> can be set to:
@@ -190,43 +146,36 @@ while trying to stay on the current node.
=cut
sub select_service_node {
- my ($groups, $online_node_usage, $sid, $service_conf, $sd, $mode) = @_;
+ my ($rules, $online_node_usage, $sid, $service_conf, $sd, $mode) = @_;
my ($current_node, $tried_nodes, $maintenance_fallback) =
$sd->@{qw(node failed_nodes maintenance_node)};
- my $group = get_service_group($groups, $online_node_usage, $service_conf);
+ my ($allowed_nodes, $pri_nodes) = get_location_preference($rules, $sid, $online_node_usage);
- my ($pri_groups, $group_members) = get_node_priority_groups($group, $online_node_usage);
-
- my @pri_list = sort { $b <=> $a } keys %$pri_groups;
- return undef if !scalar(@pri_list);
+ return undef if !%$pri_nodes;
# stay on current node if possible (avoids random migrations)
- if ($mode eq 'none' && $group->{nofailback} && defined($group_members->{$current_node})) {
+ if ($mode eq 'none' && !$service_conf->{failback} && $allowed_nodes->{$current_node}) {
return $current_node;
}
- # select node from top priority node list
-
- my $top_pri = $pri_list[0];
-
# try to avoid nodes where the service failed already if we want to relocate
if ($mode eq 'try-next') {
foreach my $node (@$tried_nodes) {
- delete $pri_groups->{$top_pri}->{$node};
+ delete $pri_nodes->{$node};
}
}
return $maintenance_fallback
- if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+ if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
- return $current_node if $mode eq 'none' && $pri_groups->{$top_pri}->{$current_node};
+ return $current_node if $mode eq 'none' && $pri_nodes->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
$scores->{$a} <=> $scores->{$b} || $a cmp $b
- } keys %{ $pri_groups->{$top_pri} };
+ } keys %$pri_nodes;
my $found;
for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
@@ -850,7 +799,7 @@ sub next_state_request_start {
if ($self->{crs}->{rebalance_on_request_start}) {
my $selected_node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
@@ -1017,7 +966,7 @@ sub next_state_started {
}
my $node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
@@ -1135,7 +1084,7 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
my $recovery_node = select_service_node(
- $self->{groups},
+ $self->{rules},
$self->{online_node_usage},
$sid,
$cd,
diff --git a/src/PVE/HA/Rules/Location.pm b/src/PVE/HA/Rules/Location.pm
index b9f76c7..4e27174 100644
--- a/src/PVE/HA/Rules/Location.pm
+++ b/src/PVE/HA/Rules/Location.pm
@@ -12,8 +12,13 @@ use PVE::Tools;
use PVE::HA::Rules;
use PVE::HA::Tools;
+use base qw(Exporter);
use base qw(PVE::HA::Rules);
+our @EXPORT_OK = qw(
+ get_location_preference
+);
+
=head1 NAME
PVE::HA::Rules::Location
@@ -207,6 +212,80 @@ __PACKAGE__->register_check(
=cut
+my $get_service_location_rule = sub {
+ my ($rules, $sid) = @_;
+
+ # with the current restriction a service can only be in one location rule
+ my $location_rule;
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule) = @_;
+
+ $location_rule = dclone($rule) if !$location_rule;
+ },
+ {
+ sid => $sid,
+ type => 'location',
+ state => 'enabled',
+ },
+ );
+
+ return $location_rule;
+};
+
+=head3 get_location_preference($rules, $sid, $online_node_usage)
+
+Returns a list of two hashes, where each is a hash set of the location
+preference of C<$sid>, according to the location rules in C<$rules> and the
+available nodes in C<$online_node_usage>.
+
+The first hash is a hash set of available nodes, i.e. nodes where the
+service C<$sid> is allowed to be assigned to, and the second hash is a hash set
+of preferred nodes, i.e. nodes where the service C<$sid> should be assigned to.
+
+If there are no available nodes at all, returns C<undef>.
+
+=cut
+
+sub get_location_preference : prototype($$$) {
+ my ($rules, $sid, $online_node_usage) = @_;
+
+ my $location_rule = $get_service_location_rule->($rules, $sid);
+
+ # default to a location rule with all available nodes
+ if (!$location_rule) {
+ for my $node ($online_node_usage->list_nodes()) {
+ $location_rule->{nodes}->{$node} = { priority => 0 };
+ }
+ }
+
+ # add remaining nodes with low priority for non-strict location rules
+ if (!$location_rule->{strict}) {
+ for my $node ($online_node_usage->list_nodes()) {
+ next if defined($location_rule->{nodes}->{$node});
+
+ $location_rule->{nodes}->{$node} = { priority => -1 };
+ }
+ }
+
+ my $allowed_nodes = {};
+ my $prioritized_nodes = {};
+
+ while (my ($node, $props) = each %{ $location_rule->{nodes} }) {
+ next if !$online_node_usage->contains_node($node); # node is offline
+
+ $allowed_nodes->{$node} = 1;
+ $prioritized_nodes->{ $props->{priority} }->{$node} = 1;
+ }
+
+ my $preferred_nodes = {};
+ my $highest_priority = (sort { $b <=> $a } keys %$prioritized_nodes)[0];
+ $preferred_nodes = $prioritized_nodes->{$highest_priority} if defined($highest_priority);
+
+ return ($allowed_nodes, $preferred_nodes);
+}
+
# Remove location rules from rules config
# TODO PVE 10: Can be removed if use-location-rules feature flag is not needed anymore
sub delete_location_rules {
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 90bd61a..412d88d 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -4,12 +4,21 @@ use strict;
use warnings;
use lib '..';
-use PVE::HA::Groups;
use PVE::HA::Manager;
use PVE::HA::Usage::Basic;
-my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
-group: prefer_node1
+use PVE::HA::Rules;
+use PVE::HA::Rules::Location;
+use PVE::HA::Rules::Colocation;
+
+PVE::HA::Rules::Location->register();
+PVE::HA::Rules::Colocation->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
+my $rules = PVE::HA::Rules->parse_config("rules.tmp", <<EOD);
+location: prefer_node1
+ services vm:111
nodes node1
EOD
@@ -21,7 +30,6 @@ $online_node_usage->add_node("node3");
my $service_conf = {
node => 'node1',
- group => 'prefer_node1',
failback => 1,
};
@@ -37,7 +45,7 @@ sub test {
my $select_mode = $try_next ? 'try-next' : 'none';
my $node = PVE::HA::Manager::select_service_node(
- $groups, $online_node_usage, "vm:111", $service_conf, $sd, $select_mode,
+ $rules, $online_node_usage, "vm:111", $service_conf, $sd, $select_mode,
);
my (undef, undef, $line) = caller();
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 13/26] usage: add information about a service's assigned nodes
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (15 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 12/26] manager: apply location rules when selecting service nodes Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 14/26] manager: apply colocation rules when selecting service nodes Daniel Kral
` (25 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
This will be used to retrieve the nodes, which a service is currently
putting load on and using their resources, when dealing with colocation
rules in select_service_node(...). For example, a migrating service in a
negative colocation will need to block other negatively colocated
services to migrate on both the source and target node.
This is implemented here, because the service's usage of the nodes is
currently best encoded in recompute_online_node_usage(...) and other
call sites of add_service_usage_to_node(...).
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- let these be in Usage as it would introduce a second/third source
of truth where the services are supposed to be; this should
possibly be refactored in a follow-up (e.g. when making service
state/node load changes more granular)
- use services-nodes key for both Usage implementations
- replace `pin_service_node(...)` with `set_service_node(...)`
- introduce `add_service_node(...)` and allow multiple nodes in the
services-nodes hash
- make adds to the services-nodes hash more explicit with direct
calls to `{add,set}_service_node(...)` instead of being in
`add_service_usage_to_node(...)`
src/PVE/HA/Manager.pm | 17 +++++++++++++----
src/PVE/HA/Usage.pm | 18 ++++++++++++++++++
src/PVE/HA/Usage/Basic.pm | 19 +++++++++++++++++++
src/PVE/HA/Usage/Static.pm | 19 +++++++++++++++++++
4 files changed, 69 insertions(+), 4 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 00efc7c..4c7228e 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -258,6 +258,7 @@ sub recompute_online_node_usage {
my $sd = $self->{ss}->{$sid};
my $state = $sd->{state};
my $target = $sd->{target}; # optional
+
if ($online_node_usage->contains_node($sd->{node})) {
if (
$state eq 'started'
@@ -268,6 +269,7 @@ sub recompute_online_node_usage {
|| $state eq 'recovery'
) {
$online_node_usage->add_service_usage_to_node($sd->{node}, $sid, $sd->{node});
+ $online_node_usage->set_service_node($sid, $sd->{node});
} elsif (
$state eq 'migrate'
|| $state eq 'relocate'
@@ -275,10 +277,14 @@ sub recompute_online_node_usage {
) {
my $source = $sd->{node};
# count it for both, source and target as load is put on both
- $online_node_usage->add_service_usage_to_node($source, $sid, $source, $target)
- if $state ne 'request_start_balance';
- $online_node_usage->add_service_usage_to_node($target, $sid, $source, $target)
- if $online_node_usage->contains_node($target);
+ if ($state ne 'request_start_balance') {
+ $online_node_usage->add_service_usage_to_node($source, $sid, $source, $target);
+ $online_node_usage->add_service_node($sid, $source);
+ }
+ if ($online_node_usage->contains_node($target)) {
+ $online_node_usage->add_service_usage_to_node($target, $sid, $source, $target);
+ $online_node_usage->add_service_node($sid, $target);
+ }
} elsif ($state eq 'stopped' || $state eq 'request_start') {
# do nothing
} else {
@@ -290,6 +296,7 @@ sub recompute_online_node_usage {
# case a node dies, as we cannot really know if the to-be-aborted incoming migration
# has already cleaned up all used resources
$online_node_usage->add_service_usage_to_node($target, $sid, $sd->{node}, $target);
+ $online_node_usage->set_service_node($sid, $target);
}
}
}
@@ -976,6 +983,7 @@ sub next_state_started {
if ($node && ($sd->{node} ne $node)) {
$self->{online_node_usage}->add_service_usage_to_node($node, $sid, $sd->{node});
+ $self->{online_node_usage}->add_service_node($sid, $node);
if (defined(my $fallback = $sd->{maintenance_node})) {
if ($node eq $fallback) {
@@ -1104,6 +1112,7 @@ sub next_state_recovery {
$haenv->steal_service($sid, $sd->{node}, $recovery_node);
$self->{online_node_usage}->add_service_usage_to_node($recovery_node, $sid, $recovery_node);
+ $self->{online_node_usage}->add_service_node($sid, $recovery_node);
# NOTE: $sd *is normally read-only*, fencing is the exception
$cd->{node} = $sd->{node} = $recovery_node;
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 66d9572..7f4d9ca 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -27,6 +27,24 @@ sub list_nodes {
die "implement in subclass";
}
+sub get_service_nodes {
+ my ($self, $sid) = @_;
+
+ die "implement in subclass";
+}
+
+sub set_service_node {
+ my ($self, $sid, $nodename) = @_;
+
+ die "implement in subclass";
+}
+
+sub add_service_node {
+ my ($self, $sid, $nodename) = @_;
+
+ die "implement in subclass";
+}
+
sub contains_node {
my ($self, $nodename) = @_;
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index ead08c5..afe3733 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -11,6 +11,7 @@ sub new {
return bless {
nodes => {},
haenv => $haenv,
+ 'service-nodes' => {},
}, $class;
}
@@ -38,6 +39,24 @@ sub contains_node {
return defined($self->{nodes}->{$nodename});
}
+sub get_service_nodes {
+ my ($self, $sid) = @_;
+
+ return $self->{'service-nodes'}->{$sid};
+}
+
+sub set_service_node {
+ my ($self, $sid, $nodename) = @_;
+
+ $self->{'service-nodes'}->{$sid} = [$nodename];
+}
+
+sub add_service_node {
+ my ($self, $sid, $nodename) = @_;
+
+ push @{ $self->{'service-nodes'}->{$sid} }, $nodename;
+}
+
sub add_service_usage_to_node {
my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 061e74a..6707a54 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -22,6 +22,7 @@ sub new {
'service-stats' => {},
haenv => $haenv,
scheduler => $scheduler,
+ 'service-nodes' => {},
'service-counts' => {}, # Service count on each node. Fallback if scoring calculation fails.
}, $class;
}
@@ -86,6 +87,24 @@ my sub get_service_usage {
return $service_stats;
}
+sub get_service_nodes {
+ my ($self, $sid) = @_;
+
+ return $self->{'service-nodes'}->{$sid};
+}
+
+sub set_service_node {
+ my ($self, $sid, $nodename) = @_;
+
+ $self->{'service-nodes'}->{$sid} = [$nodename];
+}
+
+sub add_service_node {
+ my ($self, $sid, $nodename) = @_;
+
+ push @{ $self->{'service-nodes'}->{$sid} }, $nodename;
+}
+
sub add_service_usage_to_node {
my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 14/26] manager: apply colocation rules when selecting service nodes
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (16 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 13/26] usage: add information about a service's assigned nodes Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 15/26] manager: handle migrations for colocated services Daniel Kral
` (24 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add a mechanism to the node selection subroutine, which enforces the
colocation rules defined in the rules config.
The algorithm makes in-place changes to the set of nodes in such a way,
that the final set contains only the nodes where the colocation rules
allow the service to run on, depending on the affinity type of the
colocation rules.
The service's failback property also slightly changes meaning because
now it also controls how services behave for colocation rules, not only
location rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- added documentation
- moved apply helpers from Manager.pm into Colocation rule plugin
- dropped loose colocations, so I could shrink down the apply
helpers to only a few lines
- dropped `get_colocated_services(...)` from this patch (which will
be introduced in another version in a later patch), and merged its
logic into `get_colocation_preference(...)`
- fix bug when positively colocated services are on different nodes
(e.g. when newly creating a rule for these), then they could still
favor their current node, because then multiple nodes are in the
$together hash set. For now, just select for all of them the node
which is the most populated by all other pos. colocated services;
this can be improved in a follow-up to check which node has the
resources for all of them, for example
- introduce `is_allowed_on_node(...)` helper to check in 'none' mode
whether the current node is compliant with the colocation rules
src/PVE/HA/Manager.pm | 15 +++-
src/PVE/HA/Resources.pm | 3 +-
src/PVE/HA/Rules/Colocation.pm | 151 +++++++++++++++++++++++++++++++++
3 files changed, 166 insertions(+), 3 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 4c7228e..a69898b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -11,7 +11,8 @@ use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Rules;
use PVE::HA::Rules::Location qw(get_location_preference);
-use PVE::HA::Rules::Colocation;
+use PVE::HA::Rules::Colocation
+ qw(get_colocation_preference apply_positive_colocation_rules apply_negative_colocation_rules);
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -155,8 +156,15 @@ sub select_service_node {
return undef if !%$pri_nodes;
+ my ($together, $separate) = get_colocation_preference($rules, $sid, $online_node_usage);
+
# stay on current node if possible (avoids random migrations)
- if ($mode eq 'none' && !$service_conf->{failback} && $allowed_nodes->{$current_node}) {
+ if (
+ $mode eq 'none'
+ && !$service_conf->{failback}
+ && $allowed_nodes->{$current_node}
+ && PVE::HA::Rules::Colocation::is_allowed_on_node($together, $separate, $current_node)
+ ) {
return $current_node;
}
@@ -167,6 +175,9 @@ sub select_service_node {
}
}
+ apply_positive_colocation_rules($together, $pri_nodes);
+ apply_negative_colocation_rules($separate, $pri_nodes);
+
return $maintenance_fallback
if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 90410a9..f8aad35 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -65,7 +65,8 @@ EODESC
failback => {
description => "Automatically migrate service to the node with the highest priority"
. " according to their location rules, if a node with a higher priority than the"
- . " current node comes online.",
+ . " current node comes online, or migrate to the node, which doesn't violate any"
+ . " colocation rule.",
type => 'boolean',
optional => 1,
default => 1,
diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
index 0539eb3..190478e 100644
--- a/src/PVE/HA/Rules/Colocation.pm
+++ b/src/PVE/HA/Rules/Colocation.pm
@@ -7,8 +7,15 @@ use PVE::HashTools;
use PVE::HA::Rules;
+use base qw(Exporter);
use base qw(PVE::HA::Rules);
+our @EXPORT_OK = qw(
+ get_colocation_preference
+ apply_positive_colocation_rules
+ apply_negative_colocation_rules
+);
+
=head1 NAME
PVE::HA::Rules::Colocation - Colocation Plugin for HA Rules
@@ -284,4 +291,148 @@ sub plugin_canonicalize {
merge_connected_positive_colocation_rules($rules, $args->{positive_rules});
}
+=head1 COLOCATION RULE HELPERS
+
+=cut
+
+=head3 get_colocation_preference($rules, $sid, $online_node_usage)
+
+Returns a list of two hashes, where the first describes the positive colocation
+preference and the second describes the negative colocation preference for
+C<$sid> according to the colocation rules in C<$rules> and the service
+locations in C<$online_node_usage>.
+
+For the positive colocation preference of a service C<$sid>, each element in the
+hash represents an online node, where other positively colocated services are
+already running, and how many of them. That is, each element represents a node,
+where the service must be.
+
+For the negative colocation preference of a service C<$sid>, each element in the
+hash represents an online node, where other negatively colocated services are
+already running. That is, each element represents a node, where the service must
+not be.
+
+For example, if there are already three services running, which the service
+C<$sid> is in a positive colocation with, and two running services, which the
+service C<$sid> is in a negative colocation relationship with, the returned
+value will be:
+
+ {
+ together => {
+ node2 => 3
+ },
+ separate => {
+ node1 => 1,
+ node3 => 1
+ }
+ }
+
+=cut
+
+sub get_colocation_preference : prototype($$$) {
+ my ($rules, $sid, $online_node_usage) = @_;
+
+ my $together = {};
+ my $separate = {};
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule) = @_;
+
+ for my $csid (keys %{ $rule->{services} }) {
+ next if $csid eq $sid;
+
+ my $nodes = $online_node_usage->get_service_nodes($csid);
+
+ next if !$nodes || !@$nodes; # skip unassigned nodes
+
+ if ($rule->{affinity} eq 'together') {
+ $together->{$_}++ for @$nodes;
+ } elsif ($rule->{affinity} eq 'separate') {
+ $separate->{$_} = 1 for @$nodes;
+ } else {
+ die "unimplemented colocation affinity type $rule->{affinity}\n";
+ }
+ }
+ },
+ {
+ sid => $sid,
+ type => 'colocation',
+ state => 'enabled',
+ },
+ );
+
+ return ($together, $separate);
+}
+
+=head3 is_allowed_on_node($together, $separate, $node)
+
+Checks whether the colocation preference hashes C<$together> or C<$separate>
+describe colocation relation, where for C<$together> the C<$node> must be
+selected, or for C<$separate> the node C<$node> must be avoided.
+
+=cut
+
+sub is_allowed_on_node : prototype($$$) {
+ my ($together, $separate, $node) = @_;
+
+ return $together->{$node} || !$separate->{$node};
+}
+
+=head3 apply_positive_colocation_rules($together, $allowed_nodes)
+
+Applies the positive colocation preference C<$together> on the allowed node
+hash set C<$allowed_nodes> by modifying it directly.
+
+Positive colocation means keeping services together on a single node and
+therefore minimizing the separation of services.
+
+The allowed node hash set C<$allowed_nodes> is expected to contain all nodes,
+which are available to the service this helper is called for, i.e. each node
+is currently online, available according to other location constraints, and the
+service has not failed running there yet.
+
+=cut
+
+sub apply_positive_colocation_rules : prototype($$) {
+ my ($together, $allowed_nodes) = @_;
+
+ my @possible_nodes = sort keys $together->%*
+ or return; # nothing to do if there are no positive colocation preferences
+
+ # select the most populated node as the target node for positive colocations
+ @possible_nodes = sort { $together->{$b} <=> $together->{$a} } @possible_nodes;
+ my $majority_node = $possible_nodes[0];
+
+ for my $node (keys %$allowed_nodes) {
+ delete $allowed_nodes->{$node} if $node ne $majority_node;
+ }
+}
+
+=head3 apply_negative_colocation_rules($separate, $allowed_nodes)
+
+Applies the negative colocation preference C<$separate> on the allowed node
+hash set C<$allowed_nodes> by modifying it directly.
+
+Negative colocation means keeping services separate on multiple nodes and
+therefore maximizing the separation of services.
+
+The allowed node hash set C<$allowed_nodes> is expected to contain all nodes,
+which are available to the service this helper is called for, i.e. each node
+is currently online, available according to other location constraints, and the
+service has not failed running there yet.
+
+=cut
+
+sub apply_negative_colocation_rules : prototype($$) {
+ my ($separate, $allowed_nodes) = @_;
+
+ my $forbidden_nodes = { $separate->%* };
+
+ for my $node (keys %$forbidden_nodes) {
+ delete $allowed_nodes->{$node};
+ }
+}
+
1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 15/26] manager: handle migrations for colocated services
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (17 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 14/26] manager: apply colocation rules when selecting service nodes Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-27 9:10 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 16/26] sim: resources: add option to limit start and migrate tries to node Daniel Kral
` (23 subsequent siblings)
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Make positively colocated services migrate to the same target node as
the manually migrated service and prevent a service to be manually
migrated to a node, which contains negatively colocated services.
The log information here is only redirected to the HA Manager node's
syslog, so user-facing endpoints need to implement this logic as well to
give users adequate feedback about the errors and side-effects.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/HA/Manager.pm | 44 ++++++++++++++++++++++++++++--
src/PVE/HA/Rules/Colocation.pm | 50 ++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+), 3 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index a69898b..66e5710 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -12,7 +12,7 @@ use PVE::HA::NodeStatus;
use PVE::HA::Rules;
use PVE::HA::Rules::Location qw(get_location_preference);
use PVE::HA::Rules::Colocation
- qw(get_colocation_preference apply_positive_colocation_rules apply_negative_colocation_rules);
+ qw(get_colocated_services get_colocation_preference apply_positive_colocation_rules apply_negative_colocation_rules);
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
@@ -412,6 +412,45 @@ sub read_lrm_status {
return ($results, $modes);
}
+sub execute_migration {
+ my ($self, $cmd, $task, $sid, $target) = @_;
+
+ my ($haenv, $ss) = $self->@{qw(haenv ss)};
+
+ my ($together, $separate) = get_colocated_services($self->{rules}, $sid);
+
+ for my $csid (sort keys %$separate) {
+ next if $ss->{$csid}->{node} && $ss->{$csid}->{node} ne $target;
+ next if $ss->{$csid}->{target} && $ss->{$csid}->{target} ne $target;
+
+ $haenv->log(
+ 'err',
+ "crm command '$cmd' error - negatively colocated service '$csid' on '$target'",
+ );
+
+ return; # one negative colocation is enough to not execute migration
+ }
+
+ $haenv->log('info', "got crm command: $cmd");
+ $ss->{$sid}->{cmd} = [$task, $target];
+
+ my $services_to_migrate = [];
+ for my $csid (sort keys %$together) {
+ next if $ss->{$csid}->{node} && $ss->{$csid}->{node} eq $target;
+ next if $ss->{$csid}->{target} && $ss->{$csid}->{target} eq $target;
+
+ push @$services_to_migrate, $csid;
+ }
+
+ for my $csid (@$services_to_migrate) {
+ $haenv->log(
+ 'info',
+ "crm command '$cmd' - $task positively colocated service '$csid' to '$target'",
+ );
+ $ss->{$csid}->{cmd} = [$task, $target];
+ }
+}
+
# read new crm commands and save them into crm master status
sub update_crm_commands {
my ($self) = @_;
@@ -435,8 +474,7 @@ sub update_crm_commands {
"ignore crm command - service already on target node: $cmd",
);
} else {
- $haenv->log('info', "got crm command: $cmd");
- $ss->{$sid}->{cmd} = [$task, $node];
+ $self->execute_migration($cmd, $task, $sid, $node);
}
}
} else {
diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
index 190478e..45d20d0 100644
--- a/src/PVE/HA/Rules/Colocation.pm
+++ b/src/PVE/HA/Rules/Colocation.pm
@@ -11,6 +11,7 @@ use base qw(Exporter);
use base qw(PVE::HA::Rules);
our @EXPORT_OK = qw(
+ get_colocated_services
get_colocation_preference
apply_positive_colocation_rules
apply_negative_colocation_rules
@@ -295,6 +296,55 @@ sub plugin_canonicalize {
=cut
+=head3 get_colocated_services($rules, $sid)
+
+Returns a list of two hash sets, where the first hash set contains the
+positively colocated services for C<$sid>, while the second hash set contains
+the negatively colocated services for C<$sid> according to the colocation rules
+in C<$rules>.
+
+For example, if a service is in a negative colocation with C<'vm:101'> and in a
+positive colocation with C<'ct:200'> and C<'ct:201'>, the returned value will be:
+
+ {
+ together => {
+ 'vm:101' => 1
+ },
+ separate => {
+ 'ct:200' => 1,
+ 'ct:201' => 1
+ }
+ }
+
+=cut
+
+sub get_colocated_services : prototype($$) {
+ my ($rules, $sid) = @_;
+
+ my $together = {};
+ my $separate = {};
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ my $affinity_set = $rule->{affinity} eq 'together' ? $together : $separate;
+
+ for my $csid (sort keys %{ $rule->{services} }) {
+ $affinity_set->{$csid} = 1 if $csid ne $sid;
+ }
+ },
+ {
+ sid => $sid,
+ type => 'colocation',
+ state => 'enabled',
+ },
+ );
+
+ return ($together, $separate);
+}
+
=head3 get_colocation_preference($rules, $sid, $online_node_usage)
Returns a list of two hashes, where the first describes the positive colocation
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 15/26] manager: handle migrations for colocated services
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 15/26] manager: handle migrations for colocated services Daniel Kral
@ 2025-06-27 9:10 ` Daniel Kral
0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-27 9:10 UTC (permalink / raw)
To: pve-devel
On 6/20/25 16:31, Daniel Kral wrote:
> +=head3 get_colocated_services($rules, $sid)
> +
> +Returns a list of two hash sets, where the first hash set contains the
> +positively colocated services for C<$sid>, while the second hash set contains
> +the negatively colocated services for C<$sid> according to the colocation rules
> +in C<$rules>.
> +
> +For example, if a service is in a negative colocation with C<'vm:101'> and in a
> +positive colocation with C<'ct:200'> and C<'ct:201'>, the returned value will be:
> +
> + {
> + together => {
> + 'vm:101' => 1
> + },
> + separate => {
> + 'ct:200' => 1,
> + 'ct:201' => 1
> + }
> + }
> +
> +=cut
> +
I'd tend to introduce another colocation rules canonicalize helper in v3
to also make any negatively colocated services of the positively
colocated services of a service $sid also negatively colocated with that
service $sid. This could also be done in get_colocated_services(...) and
get_colocation_preference(...) individually, but introducing these
inferred extra rules ends up in less code.
An example could help here to understand the above better:
Services A, B, and C must be kept together
Services A and Z must be kept separate
Therefore, services B and Z must be kept separate and services C and Z
must be kept separate too.
I hope this is still intuitive enough for users (came across it while
implementing showing the comigrated services / blocking services in the
web interface) and must prevent invalid migrations as a migration of
service B would currently allow "comigrating" service A (!) and C to the
node of service Z.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 16/26] sim: resources: add option to limit start and migrate tries to node
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (18 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 15/26] manager: handle migrations for colocated services Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 17/26] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
` (22 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add an option to the VirtFail's name to allow the start and migrate fail
counts to only apply on a certain node number with a specific naming
scheme.
This allows a slightly more elaborate test type, e.g. where a service
can start on one node (or any other in that case), but fails to start on
a specific node, which it is expected to start on after a migration.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- make check if retries should be done for node simpler from a regex
to a string comparison
- inline `$should_retry_action->(...)` in if statements
src/PVE/HA/Sim/Resources/VirtFail.pm | 29 +++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/src/PVE/HA/Sim/Resources/VirtFail.pm b/src/PVE/HA/Sim/Resources/VirtFail.pm
index 3b476e1..13b72dc 100644
--- a/src/PVE/HA/Sim/Resources/VirtFail.pm
+++ b/src/PVE/HA/Sim/Resources/VirtFail.pm
@@ -10,25 +10,28 @@ use base qw(PVE::HA::Sim::Resources);
# To make it more interesting we can encode some behavior in the VMID
# with the following format, where fa: is the type and a, b, c, ...
# are digits in base 10, i.e. the full service ID would be:
-# fa:abcde
+# fa:abcdef
# And the digits after the fa: type prefix would mean:
# - a: no meaning but can be used for differentiating similar resources
# - b: how many tries are needed to start correctly (0 is normal behavior) (should be set)
# - c: how many tries are needed to migrate correctly (0 is normal behavior) (should be set)
# - d: should shutdown be successful (0 = yes, anything else no) (optional)
# - e: return value of $plugin->exists() defaults to 1 if not set (optional)
+# - f: limits the constraints of b and c to the nodeX (0 = apply to all nodes) (optional)
my $decode_id = sub {
my $id = shift;
- my ($start, $migrate, $stop, $exists) = $id =~ /^\d(\d)(\d)(\d)?(\d)?/g;
+ my ($start, $migrate, $stop, $exists, $limit_to_node) =
+ $id =~ /^\d(\d)(\d)(\d)?(\d)?(\d)?/g;
$start = 0 if !defined($start);
$migrate = 0 if !defined($migrate);
$stop = 0 if !defined($stop);
$exists = 1 if !defined($exists);
+ $limit_to_node = 0 if !defined($limit_to_node);
- return ($start, $migrate, $stop, $exists);
+ return ($start, $migrate, $stop, $exists, $limit_to_node);
};
my $tries = {
@@ -52,12 +55,14 @@ sub exists {
sub start {
my ($class, $haenv, $id) = @_;
- my ($start_failure_count) = &$decode_id($id);
+ my ($start_failure_count, $limit_to_node) = ($decode_id->($id))[0, 4];
- $tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
- $tries->{start}->{$id}++;
+ if ($limit_to_node == 0 || $haenv->nodename() eq "node$limit_to_node") {
+ $tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
+ $tries->{start}->{$id}++;
- return if $start_failure_count >= $tries->{start}->{$id};
+ return if $start_failure_count >= $tries->{start}->{$id};
+ }
$tries->{start}->{$id} = 0; # reset counts
@@ -78,12 +83,14 @@ sub shutdown {
sub migrate {
my ($class, $haenv, $id, $target, $online) = @_;
- my (undef, $migrate_failure_count) = &$decode_id($id);
+ my ($migrate_failure_count, $limit_to_node) = ($decode_id->($id))[1, 4];
- $tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
- $tries->{migrate}->{$id}++;
+ if ($limit_to_node == 0 || $haenv->nodename() eq "node$limit_to_node") {
+ $tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
+ $tries->{migrate}->{$id}++;
- return if $migrate_failure_count >= $tries->{migrate}->{$id};
+ return if $migrate_failure_count >= $tries->{migrate}->{$id};
+ }
$tries->{migrate}->{$id} = 0; # reset counts
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 17/26] test: ha tester: add test cases for strict negative colocation rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (19 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 16/26] sim: resources: add option to limit start and migrate tries to node Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 18/26] test: ha tester: add test cases for strict positive " Daniel Kral
` (21 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add test cases for strict negative colocation rules, i.e. where services
must be kept on separate nodes. These verify the behavior of the
services in strict negative colocation rules in case of a failover of
the node of one or more of these services in the following scenarios:
1. 2 neg. colocated services in a 3 node cluster; 1 node failing
2. 3 neg. colocated services in a 5 node cluster; 1 node failing
3. 3 neg. colocated services in a 5 node cluster; 2 nodes failing
4. 2 neg. colocated services in a 3 node cluster; 1 node failing, but
the recovery node cannot start the service
5. Pair of 2 neg. colocated services (with one common service in both)
in a 3 node cluster; 1 node failing
6. 2 neg. colocated services in a 3 node cluster; 1 node failing, but
both services cannot start on the recovery node
7. 2 neg. colocated services in a 3 node cluster; 1 service manually
migrated to another free node; other neg. colocated service cannot be
migrated to migrated service's source node during migration
8. 3 neg. colocated services in a 3 node cluster; 1 service manually
migrated to another neg. colocated service's node fails
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- added test cases 6, 7, and 8
- corrected README in test case #2
- removed strict from rules_config but let them in the test case
name and READMEs when loose colocation rules are added later
- other slight corrections or adaptions in existing READMEs
.../test-colocation-strict-separate1/README | 13 +++
.../test-colocation-strict-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 ++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 6 +
.../test-colocation-strict-separate2/README | 15 +++
.../test-colocation-strict-separate2/cmdlist | 4 +
.../hardware_status | 7 ++
.../log.expect | 90 ++++++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 10 ++
.../test-colocation-strict-separate3/README | 16 +++
.../test-colocation-strict-separate3/cmdlist | 4 +
.../hardware_status | 7 ++
.../log.expect | 110 ++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 10 ++
.../test-colocation-strict-separate4/README | 18 +++
.../test-colocation-strict-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 69 +++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 6 +
.../test-colocation-strict-separate5/README | 11 ++
.../test-colocation-strict-separate5/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 56 +++++++++
.../manager_status | 1 +
.../rules_config | 7 ++
.../service_config | 5 +
.../test-colocation-strict-separate6/README | 18 +++
.../test-colocation-strict-separate6/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 69 +++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 6 +
.../test-colocation-strict-separate7/README | 15 +++
.../test-colocation-strict-separate7/cmdlist | 5 +
.../hardware_status | 5 +
.../log.expect | 52 +++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 4 +
.../test-colocation-strict-separate8/README | 11 ++
.../test-colocation-strict-separate8/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 38 ++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 5 +
56 files changed, 826 insertions(+)
create mode 100644 src/test/test-colocation-strict-separate1/README
create mode 100644 src/test/test-colocation-strict-separate1/cmdlist
create mode 100644 src/test/test-colocation-strict-separate1/hardware_status
create mode 100644 src/test/test-colocation-strict-separate1/log.expect
create mode 100644 src/test/test-colocation-strict-separate1/manager_status
create mode 100644 src/test/test-colocation-strict-separate1/rules_config
create mode 100644 src/test/test-colocation-strict-separate1/service_config
create mode 100644 src/test/test-colocation-strict-separate2/README
create mode 100644 src/test/test-colocation-strict-separate2/cmdlist
create mode 100644 src/test/test-colocation-strict-separate2/hardware_status
create mode 100644 src/test/test-colocation-strict-separate2/log.expect
create mode 100644 src/test/test-colocation-strict-separate2/manager_status
create mode 100644 src/test/test-colocation-strict-separate2/rules_config
create mode 100644 src/test/test-colocation-strict-separate2/service_config
create mode 100644 src/test/test-colocation-strict-separate3/README
create mode 100644 src/test/test-colocation-strict-separate3/cmdlist
create mode 100644 src/test/test-colocation-strict-separate3/hardware_status
create mode 100644 src/test/test-colocation-strict-separate3/log.expect
create mode 100644 src/test/test-colocation-strict-separate3/manager_status
create mode 100644 src/test/test-colocation-strict-separate3/rules_config
create mode 100644 src/test/test-colocation-strict-separate3/service_config
create mode 100644 src/test/test-colocation-strict-separate4/README
create mode 100644 src/test/test-colocation-strict-separate4/cmdlist
create mode 100644 src/test/test-colocation-strict-separate4/hardware_status
create mode 100644 src/test/test-colocation-strict-separate4/log.expect
create mode 100644 src/test/test-colocation-strict-separate4/manager_status
create mode 100644 src/test/test-colocation-strict-separate4/rules_config
create mode 100644 src/test/test-colocation-strict-separate4/service_config
create mode 100644 src/test/test-colocation-strict-separate5/README
create mode 100644 src/test/test-colocation-strict-separate5/cmdlist
create mode 100644 src/test/test-colocation-strict-separate5/hardware_status
create mode 100644 src/test/test-colocation-strict-separate5/log.expect
create mode 100644 src/test/test-colocation-strict-separate5/manager_status
create mode 100644 src/test/test-colocation-strict-separate5/rules_config
create mode 100644 src/test/test-colocation-strict-separate5/service_config
create mode 100644 src/test/test-colocation-strict-separate6/README
create mode 100644 src/test/test-colocation-strict-separate6/cmdlist
create mode 100644 src/test/test-colocation-strict-separate6/hardware_status
create mode 100644 src/test/test-colocation-strict-separate6/log.expect
create mode 100644 src/test/test-colocation-strict-separate6/manager_status
create mode 100644 src/test/test-colocation-strict-separate6/rules_config
create mode 100644 src/test/test-colocation-strict-separate6/service_config
create mode 100644 src/test/test-colocation-strict-separate7/README
create mode 100644 src/test/test-colocation-strict-separate7/cmdlist
create mode 100644 src/test/test-colocation-strict-separate7/hardware_status
create mode 100644 src/test/test-colocation-strict-separate7/log.expect
create mode 100644 src/test/test-colocation-strict-separate7/manager_status
create mode 100644 src/test/test-colocation-strict-separate7/rules_config
create mode 100644 src/test/test-colocation-strict-separate7/service_config
create mode 100644 src/test/test-colocation-strict-separate8/README
create mode 100644 src/test/test-colocation-strict-separate8/cmdlist
create mode 100644 src/test/test-colocation-strict-separate8/hardware_status
create mode 100644 src/test/test-colocation-strict-separate8/log.expect
create mode 100644 src/test/test-colocation-strict-separate8/manager_status
create mode 100644 src/test/test-colocation-strict-separate8/rules_config
create mode 100644 src/test/test-colocation-strict-separate8/service_config
diff --git a/src/test/test-colocation-strict-separate1/README b/src/test/test-colocation-strict-separate1/README
new file mode 100644
index 0000000..ae6c12f
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/README
@@ -0,0 +1,13 @@
+Test whether a strict negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other in case of a
+failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 must be kept separate
+- vm:101 and vm:102 are currently running on node2 and node3 respectively
+- node1 has a higher service count than node2 to test the colocation rule is
+ applied even though the scheduler would prefer the less utilized node
+
+The expected outcome is:
+- As node3 fails, vm:102 is migrated to node1; even though the utilization of
+ node1 is high already, the services must be kept separate
diff --git a/src/test/test-colocation-strict-separate1/cmdlist b/src/test/test-colocation-strict-separate1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate1/hardware_status b/src/test/test-colocation-strict-separate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate1/log.expect b/src/test/test-colocation-strict-separate1/log.expect
new file mode 100644
index 0000000..475db39
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/log.expect
@@ -0,0 +1,60 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate1/manager_status b/src/test/test-colocation-strict-separate1/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-strict-separate1/rules_config b/src/test/test-colocation-strict-separate1/rules_config
new file mode 100644
index 0000000..87d309e
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate1/service_config b/src/test/test-colocation-strict-separate1/service_config
new file mode 100644
index 0000000..6582e8c
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate2/README b/src/test/test-colocation-strict-separate2/README
new file mode 100644
index 0000000..37245a5
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/README
@@ -0,0 +1,15 @@
+Test whether a strict negative colocation rule among three services makes one
+of the services migrate to a different node than the other services in case of
+a failover of the service's previously assigned node.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept separate
+- vm:101, vm:102, and vm:103 are on node3, node4, and node5 respectively
+- node1 and node2 have each both higher service counts than node3, node4 and
+ node5 to test the rule is applied even though the scheduler would prefer the
+ less utilized nodes node3 and node4
+
+The expected outcome is:
+- As node5 fails, vm:103 is migrated to node2; even though the utilization of
+ node2 is high already, the services must be kept separate; node2 is chosen
+ since node1 has one more service running on it
diff --git a/src/test/test-colocation-strict-separate2/cmdlist b/src/test/test-colocation-strict-separate2/cmdlist
new file mode 100644
index 0000000..89d09c9
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+ [ "network node5 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate2/hardware_status b/src/test/test-colocation-strict-separate2/hardware_status
new file mode 100644
index 0000000..7b8e961
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/hardware_status
@@ -0,0 +1,7 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" },
+ "node4": { "power": "off", "network": "off" },
+ "node5": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate2/log.expect b/src/test/test-colocation-strict-separate2/log.expect
new file mode 100644
index 0000000..858d3c9
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/log.expect
@@ -0,0 +1,90 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node4 on
+info 20 node4/crm: status change startup => wait_for_quorum
+info 20 node4/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node5 on
+info 20 node5/crm: status change startup => wait_for_quorum
+info 20 node5/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node4'
+info 20 node1/crm: adding new service 'vm:103' on node 'node5'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node1'
+info 20 node1/crm: adding new service 'vm:107' on node 'node2'
+info 20 node1/crm: adding new service 'vm:108' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node4)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node5)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 21 node1/lrm: starting service vm:106
+info 21 node1/lrm: service status vm:106 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:107
+info 23 node2/lrm: service status vm:107 started
+info 23 node2/lrm: starting service vm:108
+info 23 node2/lrm: service status vm:108 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 26 node4/crm: status change wait_for_quorum => slave
+info 27 node4/lrm: got lock 'ha_agent_node4_lock'
+info 27 node4/lrm: status change wait_for_agent_lock => active
+info 27 node4/lrm: starting service vm:102
+info 27 node4/lrm: service status vm:102 started
+info 28 node5/crm: status change wait_for_quorum => slave
+info 29 node5/lrm: got lock 'ha_agent_node5_lock'
+info 29 node5/lrm: status change wait_for_agent_lock => active
+info 29 node5/lrm: starting service vm:103
+info 29 node5/lrm: service status vm:103 started
+info 120 cmdlist: execute network node5 off
+info 120 node1/crm: node 'node5': state changed from 'online' => 'unknown'
+info 128 node5/crm: status change slave => wait_for_quorum
+info 129 node5/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node5'
+info 170 watchdog: execute power node5 off
+info 169 node5/crm: killed by poweroff
+info 170 node5/lrm: killed by poweroff
+info 170 hardware: server 'node5' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node5_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate2/manager_status b/src/test/test-colocation-strict-separate2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate2/rules_config b/src/test/test-colocation-strict-separate2/rules_config
new file mode 100644
index 0000000..64c7bfb
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102,vm:103
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate2/service_config b/src/test/test-colocation-strict-separate2/service_config
new file mode 100644
index 0000000..2c27816
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/service_config
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node4", "state": "started" },
+ "vm:103": { "node": "node5", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node1", "state": "started" },
+ "vm:107": { "node": "node2", "state": "started" },
+ "vm:108": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate3/README b/src/test/test-colocation-strict-separate3/README
new file mode 100644
index 0000000..0397fdf
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/README
@@ -0,0 +1,16 @@
+Test whether a strict negative colocation rule among three services makes two
+of the services migrate to two different recovery nodes than the node of the
+third service in case of a failover of their two previously assigned nodes.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept separate
+- vm:101, vm:102, and vm:103 are respectively on node3, node4, and node5
+- node1 and node2 have both higher service counts than node3, node4 and node5
+ to test the colocation rule is enforced even though the utilization would
+ prefer the other node3
+
+The expected outcome is:
+- As node4 and node5 fails, vm:102 and vm:103 are migrated to node2 and node1
+ respectively; even though the utilization of node1 and node2 are high
+ already, the services must be kept separate; node2 is chosen first since
+ node1 has one more service running on it
diff --git a/src/test/test-colocation-strict-separate3/cmdlist b/src/test/test-colocation-strict-separate3/cmdlist
new file mode 100644
index 0000000..1934596
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+ [ "network node4 off", "network node5 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate3/hardware_status b/src/test/test-colocation-strict-separate3/hardware_status
new file mode 100644
index 0000000..7b8e961
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/hardware_status
@@ -0,0 +1,7 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" },
+ "node4": { "power": "off", "network": "off" },
+ "node5": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate3/log.expect b/src/test/test-colocation-strict-separate3/log.expect
new file mode 100644
index 0000000..4acdcec
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/log.expect
@@ -0,0 +1,110 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node4 on
+info 20 node4/crm: status change startup => wait_for_quorum
+info 20 node4/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node5 on
+info 20 node5/crm: status change startup => wait_for_quorum
+info 20 node5/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node4'
+info 20 node1/crm: adding new service 'vm:103' on node 'node5'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node1'
+info 20 node1/crm: adding new service 'vm:107' on node 'node2'
+info 20 node1/crm: adding new service 'vm:108' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node4)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node5)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 21 node1/lrm: starting service vm:106
+info 21 node1/lrm: service status vm:106 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:107
+info 23 node2/lrm: service status vm:107 started
+info 23 node2/lrm: starting service vm:108
+info 23 node2/lrm: service status vm:108 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 26 node4/crm: status change wait_for_quorum => slave
+info 27 node4/lrm: got lock 'ha_agent_node4_lock'
+info 27 node4/lrm: status change wait_for_agent_lock => active
+info 27 node4/lrm: starting service vm:102
+info 27 node4/lrm: service status vm:102 started
+info 28 node5/crm: status change wait_for_quorum => slave
+info 29 node5/lrm: got lock 'ha_agent_node5_lock'
+info 29 node5/lrm: status change wait_for_agent_lock => active
+info 29 node5/lrm: starting service vm:103
+info 29 node5/lrm: service status vm:103 started
+info 120 cmdlist: execute network node4 off
+info 120 cmdlist: execute network node5 off
+info 120 node1/crm: node 'node4': state changed from 'online' => 'unknown'
+info 120 node1/crm: node 'node5': state changed from 'online' => 'unknown'
+info 126 node4/crm: status change slave => wait_for_quorum
+info 127 node4/lrm: status change active => lost_agent_lock
+info 128 node5/crm: status change slave => wait_for_quorum
+info 129 node5/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node4': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node4'
+info 160 node1/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node5'
+info 168 watchdog: execute power node4 off
+info 167 node4/crm: killed by poweroff
+info 168 node4/lrm: killed by poweroff
+info 168 hardware: server 'node4' stopped by poweroff (watchdog)
+info 170 watchdog: execute power node5 off
+info 169 node5/crm: killed by poweroff
+info 170 node5/lrm: killed by poweroff
+info 170 hardware: server 'node5' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node4_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node4'
+info 240 node1/crm: node 'node4': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node4'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: got lock 'ha_agent_node5_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node4' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node1'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:103
+info 241 node1/lrm: service status vm:103 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate3/manager_status b/src/test/test-colocation-strict-separate3/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate3/rules_config b/src/test/test-colocation-strict-separate3/rules_config
new file mode 100644
index 0000000..64c7bfb
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102,vm:103
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate3/service_config b/src/test/test-colocation-strict-separate3/service_config
new file mode 100644
index 0000000..2c27816
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/service_config
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node4", "state": "started" },
+ "vm:103": { "node": "node5", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node1", "state": "started" },
+ "vm:107": { "node": "node2", "state": "started" },
+ "vm:108": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate4/README b/src/test/test-colocation-strict-separate4/README
new file mode 100644
index 0000000..824274c
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/README
@@ -0,0 +1,18 @@
+Test whether a strict negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other service in
+case of a failover of service's previously assigned node. As the service fails
+to start on the recovery node (e.g. insufficient resources), the failing
+service is kept on the recovery node.
+
+The test scenario is:
+- vm:101 and fa:120001 must be kept separate
+- vm:101 and fa:120001 are on node2 and node3 respectively
+- fa:120001 will fail to start on node1
+- node1 has a higher service count than node2 to test the colocation rule is
+ applied even though the scheduler would prefer the less utilized node
+
+The expected outcome is:
+- As node3 fails, fa:120001 is migrated to node1
+- fa:120001 will stay on the node (potentially in recovery), since it cannot be
+ started on node1, but cannot be relocated to another one either due to the
+ strict colocation rule
diff --git a/src/test/test-colocation-strict-separate4/cmdlist b/src/test/test-colocation-strict-separate4/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate4/hardware_status b/src/test/test-colocation-strict-separate4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate4/log.expect b/src/test/test-colocation-strict-separate4/log.expect
new file mode 100644
index 0000000..f772ea8
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/log.expect
@@ -0,0 +1,69 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120001' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'fa:120001': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120001
+info 25 node3/lrm: service status fa:120001 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120001': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120001': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120001' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'fa:120001': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service fa:120001
+warn 241 node1/lrm: unable to start service fa:120001
+warn 241 node1/lrm: restart policy: retry number 1 for service 'fa:120001'
+info 261 node1/lrm: starting service fa:120001
+warn 261 node1/lrm: unable to start service fa:120001
+err 261 node1/lrm: unable to start service fa:120001 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120001 on node 'node1' failed, relocating service.
+warn 280 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:120001', retry start on current node. Tried nodes: node1
+info 281 node1/lrm: starting service fa:120001
+info 281 node1/lrm: service status fa:120001 started
+info 300 node1/crm: relocation policy successful for 'fa:120001' on node 'node1', failed nodes: node1
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate4/manager_status b/src/test/test-colocation-strict-separate4/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-strict-separate4/rules_config b/src/test/test-colocation-strict-separate4/rules_config
new file mode 100644
index 0000000..90226b7
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-vms-be
+ services vm:101,fa:120001
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate4/service_config b/src/test/test-colocation-strict-separate4/service_config
new file mode 100644
index 0000000..f53c2bc
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "fa:120001": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate5/README b/src/test/test-colocation-strict-separate5/README
new file mode 100644
index 0000000..7795e3d
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/README
@@ -0,0 +1,11 @@
+Test whether two pair-wise strict negative colocation rules, i.e. where one
+service is in two separate non-colocation relationship with two other services,
+makes one of the outer services migrate to the same node as the other outer
+service in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102, and vm:101 and vm:103 must each be kept separate
+- vm:101, vm:102, and vm:103 are respectively on node1, node2, and node3
+
+The expected outcome is:
+- As node3 fails, vm:103 is migrated to node2 - the same as vm:102
diff --git a/src/test/test-colocation-strict-separate5/cmdlist b/src/test/test-colocation-strict-separate5/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate5/hardware_status b/src/test/test-colocation-strict-separate5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate5/log.expect b/src/test/test-colocation-strict-separate5/log.expect
new file mode 100644
index 0000000..16156ad
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/log.expect
@@ -0,0 +1,56 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate5/manager_status b/src/test/test-colocation-strict-separate5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate5/rules_config b/src/test/test-colocation-strict-separate5/rules_config
new file mode 100644
index 0000000..b198427
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/rules_config
@@ -0,0 +1,7 @@
+colocation: lonely-must-some-vms-be1
+ services vm:101,vm:102
+ affinity separate
+
+colocation: lonely-must-some-vms-be2
+ services vm:101,vm:103
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate5/service_config b/src/test/test-colocation-strict-separate5/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate6/README b/src/test/test-colocation-strict-separate6/README
new file mode 100644
index 0000000..ff10171
--- /dev/null
+++ b/src/test/test-colocation-strict-separate6/README
@@ -0,0 +1,18 @@
+Test whether a strict negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other service in
+case of a failover of the service's previously assigned node. As the other
+service fails to starts on the recovery node (e.g. insufficient resources),
+the failing service is kept on the recovery node.
+
+The test scenario is:
+- fa:120001 and fa:220001 must be kept separate
+- fa:120001 and fa:220001 are on node2 and node3 respectively
+- fa:120001 and fa:220001 will fail to start on node1
+- node1 has a higher service count than node2 to test the colocation rule is
+ applied even though the scheduler would prefer the less utilized node
+
+The expected outcome is:
+- As node3 fails, fa:220001 is migrated to node1
+- fa:220001 will stay on the node (potentially in recovery), since it cannot be
+ started on node1, but cannot be relocated to another one either due to the
+ strict colocation rule
diff --git a/src/test/test-colocation-strict-separate6/cmdlist b/src/test/test-colocation-strict-separate6/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate6/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate6/hardware_status b/src/test/test-colocation-strict-separate6/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate6/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate6/log.expect b/src/test/test-colocation-strict-separate6/log.expect
new file mode 100644
index 0000000..0d9854a
--- /dev/null
+++ b/src/test/test-colocation-strict-separate6/log.expect
@@ -0,0 +1,69 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120001' on node 'node2'
+info 20 node1/crm: adding new service 'fa:220001' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: service 'fa:120001': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'fa:220001': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service fa:120001
+info 23 node2/lrm: service status fa:120001 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:220001
+info 25 node3/lrm: service status fa:220001 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:220001': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:220001': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:220001' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'fa:220001': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service fa:220001
+warn 241 node1/lrm: unable to start service fa:220001
+warn 241 node1/lrm: restart policy: retry number 1 for service 'fa:220001'
+info 261 node1/lrm: starting service fa:220001
+warn 261 node1/lrm: unable to start service fa:220001
+err 261 node1/lrm: unable to start service fa:220001 on local node after 1 retries
+warn 280 node1/crm: starting service fa:220001 on node 'node1' failed, relocating service.
+warn 280 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:220001', retry start on current node. Tried nodes: node1
+info 281 node1/lrm: starting service fa:220001
+info 281 node1/lrm: service status fa:220001 started
+info 300 node1/crm: relocation policy successful for 'fa:220001' on node 'node1', failed nodes: node1
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate6/manager_status b/src/test/test-colocation-strict-separate6/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate6/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate6/rules_config b/src/test/test-colocation-strict-separate6/rules_config
new file mode 100644
index 0000000..82482d0
--- /dev/null
+++ b/src/test/test-colocation-strict-separate6/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-vms-be
+ services fa:120001,fa:220001
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate6/service_config b/src/test/test-colocation-strict-separate6/service_config
new file mode 100644
index 0000000..1f9480c
--- /dev/null
+++ b/src/test/test-colocation-strict-separate6/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "fa:120001": { "node": "node2", "state": "started" },
+ "fa:220001": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate7/README b/src/test/test-colocation-strict-separate7/README
new file mode 100644
index 0000000..b783a47
--- /dev/null
+++ b/src/test/test-colocation-strict-separate7/README
@@ -0,0 +1,15 @@
+Test whether a strict negative colocation rule among two services makes one of
+the service, which is manually migrated to another node, be migrated there and
+disallows other negatively colocated services to not be migrated to the
+migrated service's source node.
+
+The test scenario is:
+- vm:101 and vm:102 must be kept separate
+- vm:101 and vm:102 are running on node1 and node2 respectively
+
+The expected outcome is:
+- vm:101 is migrated to node3
+- While vm:101 is migrated, vm:102 cannot be migrated to node1, as vm:101 is
+ still putting load on node1 as its source node
+- After vm:101 is successfully migrated to node3, vm:102 can be migrated to
+ node1
diff --git a/src/test/test-colocation-strict-separate7/cmdlist b/src/test/test-colocation-strict-separate7/cmdlist
new file mode 100644
index 0000000..468ba56
--- /dev/null
+++ b/src/test/test-colocation-strict-separate7/cmdlist
@@ -0,0 +1,5 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node3", "service vm:102 migrate node1" ],
+ [ "service vm:102 migrate node1" ]
+]
diff --git a/src/test/test-colocation-strict-separate7/hardware_status b/src/test/test-colocation-strict-separate7/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate7/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate7/log.expect b/src/test/test-colocation-strict-separate7/log.expect
new file mode 100644
index 0000000..07213b2
--- /dev/null
+++ b/src/test/test-colocation-strict-separate7/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:101 migrate node3
+info 120 cmdlist: execute service vm:102 migrate node1
+info 120 node1/crm: got crm command: migrate vm:101 node3
+err 120 node1/crm: crm command 'migrate vm:102 node1' error - negatively colocated service 'vm:101' on 'node1'
+info 120 node1/crm: migrate service 'vm:101' to node 'node3'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 121 node1/lrm: service vm:101 - start migrate to node 'node3'
+info 121 node1/lrm: service vm:101 - end migrate to node 'node3'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 145 node3/lrm: got lock 'ha_agent_node3_lock'
+info 145 node3/lrm: status change wait_for_agent_lock => active
+info 145 node3/lrm: starting service vm:101
+info 145 node3/lrm: service status vm:101 started
+info 220 cmdlist: execute service vm:102 migrate node1
+info 220 node1/crm: got crm command: migrate vm:102 node1
+info 220 node1/crm: migrate service 'vm:102' to node 'node1'
+info 220 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 223 node2/lrm: service vm:102 - start migrate to node 'node1'
+info 223 node2/lrm: service vm:102 - end migrate to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate7/manager_status b/src/test/test-colocation-strict-separate7/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate7/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate7/rules_config b/src/test/test-colocation-strict-separate7/rules_config
new file mode 100644
index 0000000..87d309e
--- /dev/null
+++ b/src/test/test-colocation-strict-separate7/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate7/service_config b/src/test/test-colocation-strict-separate7/service_config
new file mode 100644
index 0000000..0336d09
--- /dev/null
+++ b/src/test/test-colocation-strict-separate7/service_config
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate8/README b/src/test/test-colocation-strict-separate8/README
new file mode 100644
index 0000000..78035a8
--- /dev/null
+++ b/src/test/test-colocation-strict-separate8/README
@@ -0,0 +1,11 @@
+Test whether a strict negative colocation rule among three services makes one
+of the service, which is manually migrated to another negatively colocated
+service's node, stay on the node of the other services.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept separate
+- vm:101, vm:102, and vm:103 are all running on node1, node2, and node3
+
+The expected outcome is:
+- vm:101 cannot be migrated to node3 as it would conflict the negative
+ colocation rule between vm:101, vm:102 and vm:103.
diff --git a/src/test/test-colocation-strict-separate8/cmdlist b/src/test/test-colocation-strict-separate8/cmdlist
new file mode 100644
index 0000000..13cab7b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate8/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "service vm:101 migrate node3" ]
+]
diff --git a/src/test/test-colocation-strict-separate8/hardware_status b/src/test/test-colocation-strict-separate8/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate8/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate8/log.expect b/src/test/test-colocation-strict-separate8/log.expect
new file mode 100644
index 0000000..d1048ed
--- /dev/null
+++ b/src/test/test-colocation-strict-separate8/log.expect
@@ -0,0 +1,38 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute service vm:101 migrate node3
+err 120 node1/crm: crm command 'migrate vm:101 node3' error - negatively colocated service 'vm:103' on 'node3'
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate8/manager_status b/src/test/test-colocation-strict-separate8/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate8/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate8/rules_config b/src/test/test-colocation-strict-separate8/rules_config
new file mode 100644
index 0000000..64c7bfb
--- /dev/null
+++ b/src/test/test-colocation-strict-separate8/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102,vm:103
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate8/service_config b/src/test/test-colocation-strict-separate8/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate8/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 18/26] test: ha tester: add test cases for strict positive colocation rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (20 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 17/26] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 19/26] test: ha tester: add test cases in more complex scenarios Daniel Kral
` (20 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add test cases for strict positive colocation rules, i.e. where services
must be kept on the same node together. These verify the behavior of the
services in strict positive colocation rules in case of a failover of
their assigned nodes in the following scenarios:
1. 2 pos. colocated services in a 3 node cluster; 1 node failing
2. 3 pos. colocated services in a 3 node cluster; 1 node failing
3. 3 pos. colocated services in a 3 node cluster; 1 node failing, but
the recovery node cannot start one of the services
4. 3 pos. colocated services in a 3 node cluster; 1 service manually
migrated to another node will migrate the other pos. colocated
services to the same node as well
5. 9 pos. colocated services in a 3 node cluster; 1 service manually
migrated to another node will migrate the other pos. colocated
services to the same node as well
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- added test case 4 and 5
- removed strict property from rules_config
- smaller adaptions in existing READMEs as noted by @Fiona
.../test-colocation-strict-together1/README | 11 +
.../test-colocation-strict-together1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 66 ++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 6 +
.../test-colocation-strict-together2/README | 10 +
.../test-colocation-strict-together2/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 80 +++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 8 +
.../test-colocation-strict-together3/README | 17 ++
.../test-colocation-strict-together3/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 89 ++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 8 +
.../test-colocation-strict-together4/README | 11 +
.../test-colocation-strict-together4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 59 ++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 5 +
.../test-colocation-strict-together5/README | 19 ++
.../test-colocation-strict-together5/cmdlist | 8 +
.../hardware_status | 5 +
.../log.expect | 281 ++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 15 +
.../service_config | 11 +
35 files changed, 762 insertions(+)
create mode 100644 src/test/test-colocation-strict-together1/README
create mode 100644 src/test/test-colocation-strict-together1/cmdlist
create mode 100644 src/test/test-colocation-strict-together1/hardware_status
create mode 100644 src/test/test-colocation-strict-together1/log.expect
create mode 100644 src/test/test-colocation-strict-together1/manager_status
create mode 100644 src/test/test-colocation-strict-together1/rules_config
create mode 100644 src/test/test-colocation-strict-together1/service_config
create mode 100644 src/test/test-colocation-strict-together2/README
create mode 100644 src/test/test-colocation-strict-together2/cmdlist
create mode 100644 src/test/test-colocation-strict-together2/hardware_status
create mode 100644 src/test/test-colocation-strict-together2/log.expect
create mode 100644 src/test/test-colocation-strict-together2/manager_status
create mode 100644 src/test/test-colocation-strict-together2/rules_config
create mode 100644 src/test/test-colocation-strict-together2/service_config
create mode 100644 src/test/test-colocation-strict-together3/README
create mode 100644 src/test/test-colocation-strict-together3/cmdlist
create mode 100644 src/test/test-colocation-strict-together3/hardware_status
create mode 100644 src/test/test-colocation-strict-together3/log.expect
create mode 100644 src/test/test-colocation-strict-together3/manager_status
create mode 100644 src/test/test-colocation-strict-together3/rules_config
create mode 100644 src/test/test-colocation-strict-together3/service_config
create mode 100644 src/test/test-colocation-strict-together4/README
create mode 100644 src/test/test-colocation-strict-together4/cmdlist
create mode 100644 src/test/test-colocation-strict-together4/hardware_status
create mode 100644 src/test/test-colocation-strict-together4/log.expect
create mode 100644 src/test/test-colocation-strict-together4/manager_status
create mode 100644 src/test/test-colocation-strict-together4/rules_config
create mode 100644 src/test/test-colocation-strict-together4/service_config
create mode 100644 src/test/test-colocation-strict-together5/README
create mode 100644 src/test/test-colocation-strict-together5/cmdlist
create mode 100644 src/test/test-colocation-strict-together5/hardware_status
create mode 100644 src/test/test-colocation-strict-together5/log.expect
create mode 100644 src/test/test-colocation-strict-together5/manager_status
create mode 100644 src/test/test-colocation-strict-together5/rules_config
create mode 100644 src/test/test-colocation-strict-together5/service_config
diff --git a/src/test/test-colocation-strict-together1/README b/src/test/test-colocation-strict-together1/README
new file mode 100644
index 0000000..1678cf1
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/README
@@ -0,0 +1,11 @@
+Test whether a strict positive colocation rule makes two services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 must be kept together
+- vm:101 and vm:102 are both currently running on node3
+- node1 and node2 have the same service count to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+The expected outcome is:
+- As node3 fails, both services are migrated to node1
diff --git a/src/test/test-colocation-strict-together1/cmdlist b/src/test/test-colocation-strict-together1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-together1/hardware_status b/src/test/test-colocation-strict-together1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together1/log.expect b/src/test/test-colocation-strict-together1/log.expect
new file mode 100644
index 0000000..7d43314
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node1)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:101
+info 241 node1/lrm: service status vm:101 started
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together1/manager_status b/src/test/test-colocation-strict-together1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-together1/rules_config b/src/test/test-colocation-strict-together1/rules_config
new file mode 100644
index 0000000..1e63579
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/rules_config
@@ -0,0 +1,3 @@
+colocation: vms-must-stick-together
+ services vm:101,vm:102
+ affinity together
diff --git a/src/test/test-colocation-strict-together1/service_config b/src/test/test-colocation-strict-together1/service_config
new file mode 100644
index 0000000..9fb091d
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-together2/README b/src/test/test-colocation-strict-together2/README
new file mode 100644
index 0000000..b282e5f
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/README
@@ -0,0 +1,10 @@
+Test whether a strict positive colocation rule makes three services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept together
+- vm:101, vm:102, and vm:103 are all currently running on node3
+
+The expected outcome is:
+- As node3 fails, all services are migrated to node2, as node2 is less utilized
+ than the other available node1
diff --git a/src/test/test-colocation-strict-together2/cmdlist b/src/test/test-colocation-strict-together2/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-together2/hardware_status b/src/test/test-colocation-strict-together2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together2/log.expect b/src/test/test-colocation-strict-together2/log.expect
new file mode 100644
index 0000000..78f4d66
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/log.expect
@@ -0,0 +1,80 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:106
+info 23 node2/lrm: service status vm:106 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together2/manager_status b/src/test/test-colocation-strict-together2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-together2/rules_config b/src/test/test-colocation-strict-together2/rules_config
new file mode 100644
index 0000000..22ffa1e
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/rules_config
@@ -0,0 +1,3 @@
+colocation: vms-must-stick-together
+ services vm:101,vm:102,vm:103
+ affinity together
diff --git a/src/test/test-colocation-strict-together2/service_config b/src/test/test-colocation-strict-together2/service_config
new file mode 100644
index 0000000..fd4a87e
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/service_config
@@ -0,0 +1,8 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-together3/README b/src/test/test-colocation-strict-together3/README
new file mode 100644
index 0000000..35ce2e4
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/README
@@ -0,0 +1,17 @@
+Test whether a strict positive colocation rule makes three services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+If one of those fail to start on the recovery node (e.g. insufficient
+resources), the failing service will be kept on the recovery node.
+
+The test scenario is:
+- vm:101, vm:102, and fa:120002 must be kept together
+- vm:101, vm:102, and fa:120002 are all currently running on node3
+- fa:120002 will fail to start on node2
+- node1 has a higher service count than node2 so that node2 is selected for
+ migration so that fa:12002 is guaranteed to fail there
+
+The expected outcome is:
+- As node3 fails, all services are migrated to node2
+- Two of those services will start successfully, but fa:120002 will stay in
+ recovery, since it cannot be started on this node, but cannot be relocated to
+ another one either due to the strict colocation rule
diff --git a/src/test/test-colocation-strict-together3/cmdlist b/src/test/test-colocation-strict-together3/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-together3/hardware_status b/src/test/test-colocation-strict-together3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together3/log.expect b/src/test/test-colocation-strict-together3/log.expect
new file mode 100644
index 0000000..4a54cb3
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/log.expect
@@ -0,0 +1,89 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120002' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node2'
+info 20 node1/crm: service 'fa:120002': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:106
+info 23 node2/lrm: service status vm:106 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120002
+info 25 node3/lrm: service status fa:120002 started
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120002': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120002': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120002' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'fa:120002': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service fa:120002
+warn 243 node2/lrm: unable to start service fa:120002
+warn 243 node2/lrm: restart policy: retry number 1 for service 'fa:120002'
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 263 node2/lrm: starting service fa:120002
+warn 263 node2/lrm: unable to start service fa:120002
+err 263 node2/lrm: unable to start service fa:120002 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
+warn 280 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:120002', retry start on current node. Tried nodes: node2
+info 283 node2/lrm: starting service fa:120002
+info 283 node2/lrm: service status fa:120002 started
+info 300 node1/crm: relocation policy successful for 'fa:120002' on node 'node2', failed nodes: node2
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together3/manager_status b/src/test/test-colocation-strict-together3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-strict-together3/rules_config b/src/test/test-colocation-strict-together3/rules_config
new file mode 100644
index 0000000..46c00c8
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/rules_config
@@ -0,0 +1,3 @@
+colocation: vms-must-stick-together
+ services vm:101,vm:102,fa:120002
+ affinity together
diff --git a/src/test/test-colocation-strict-together3/service_config b/src/test/test-colocation-strict-together3/service_config
new file mode 100644
index 0000000..3ce5f27
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/service_config
@@ -0,0 +1,8 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "fa:120002": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-together4/README b/src/test/test-colocation-strict-together4/README
new file mode 100644
index 0000000..7ef7e69
--- /dev/null
+++ b/src/test/test-colocation-strict-together4/README
@@ -0,0 +1,11 @@
+Test whether a strict positive colocation rule of three services makes the
+services stay together, if one of the services is manually migrated to another
+node, i.e., migrate to the same node.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept together
+- vm:101, vm:102, and vm:103 are all currently running on node1
+
+The expected outcome is:
+- As vm:101 is migrated to node2, vm:102 and vm:103 are migrated to node2 as
+ well as a side-effect to follow the positive colocation rule.
diff --git a/src/test/test-colocation-strict-together4/cmdlist b/src/test/test-colocation-strict-together4/cmdlist
new file mode 100644
index 0000000..2e420cc
--- /dev/null
+++ b/src/test/test-colocation-strict-together4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "service vm:101 migrate node2" ]
+]
diff --git a/src/test/test-colocation-strict-together4/hardware_status b/src/test/test-colocation-strict-together4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together4/log.expect b/src/test/test-colocation-strict-together4/log.expect
new file mode 100644
index 0000000..545f4eb
--- /dev/null
+++ b/src/test/test-colocation-strict-together4/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:101 migrate node2
+info 120 node1/crm: got crm command: migrate vm:101 node2
+info 120 node1/crm: crm command 'migrate vm:101 node2' - migrate positively colocated service 'vm:102' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:101 node2' - migrate positively colocated service 'vm:103' to 'node2'
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:102' to node 'node2'
+info 120 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:103' to node 'node2'
+info 120 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 121 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:102 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:102 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:103 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:103 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node2)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: starting service vm:101
+info 143 node2/lrm: service status vm:101 started
+info 143 node2/lrm: starting service vm:102
+info 143 node2/lrm: service status vm:102 started
+info 143 node2/lrm: starting service vm:103
+info 143 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together4/manager_status b/src/test/test-colocation-strict-together4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-together4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-together4/rules_config b/src/test/test-colocation-strict-together4/rules_config
new file mode 100644
index 0000000..22ffa1e
--- /dev/null
+++ b/src/test/test-colocation-strict-together4/rules_config
@@ -0,0 +1,3 @@
+colocation: vms-must-stick-together
+ services vm:101,vm:102,vm:103
+ affinity together
diff --git a/src/test/test-colocation-strict-together4/service_config b/src/test/test-colocation-strict-together4/service_config
new file mode 100644
index 0000000..57e3579
--- /dev/null
+++ b/src/test/test-colocation-strict-together4/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-together5/README b/src/test/test-colocation-strict-together5/README
new file mode 100644
index 0000000..22d5883
--- /dev/null
+++ b/src/test/test-colocation-strict-together5/README
@@ -0,0 +1,19 @@
+Test whether multiple connected positive colocation rules makes the services
+stay together, if one of the services is manually migrated to another node,
+i.e., migrate all of them to the same node.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept together
+- vm:103, vm:104, and vm:105 must be kept together
+- vm:105, vm:106, and vm:107 must be kept together
+- vm:105, vm:108, and vm:109 must be kept together
+- So essentially, vm:101 through vm:109 must be kept together
+- vm:101 through vm:109 are all on node1
+
+The expected outcome is:
+- As vm:103 is migrated to node2, all of vm:101 through vm:109 are migrated to
+ node2 as well, as these all must be kept together
+- As vm:101 is migrated to node3, all of vm:101 through vm:109 are migrated to
+ node3 as well, as these all must be kept together
+- As vm:109 is migrated to node1, all of vm:101 through vm:109 are migrated to
+ node1 as well, as these all must be kept together
diff --git a/src/test/test-colocation-strict-together5/cmdlist b/src/test/test-colocation-strict-together5/cmdlist
new file mode 100644
index 0000000..85c33d0
--- /dev/null
+++ b/src/test/test-colocation-strict-together5/cmdlist
@@ -0,0 +1,8 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "service vm:103 migrate node2" ],
+ [ "delay 100" ],
+ [ "service vm:101 migrate node3" ],
+ [ "delay 100" ],
+ [ "service vm:109 migrate node1" ]
+]
diff --git a/src/test/test-colocation-strict-together5/hardware_status b/src/test/test-colocation-strict-together5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together5/log.expect b/src/test/test-colocation-strict-together5/log.expect
new file mode 100644
index 0000000..4f5a0e6
--- /dev/null
+++ b/src/test/test-colocation-strict-together5/log.expect
@@ -0,0 +1,281 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node1'
+info 20 node1/crm: adding new service 'vm:107' on node 'node1'
+info 20 node1/crm: adding new service 'vm:108' on node 'node1'
+info 20 node1/crm: adding new service 'vm:109' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:109': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 21 node1/lrm: starting service vm:106
+info 21 node1/lrm: service status vm:106 started
+info 21 node1/lrm: starting service vm:107
+info 21 node1/lrm: service status vm:107 started
+info 21 node1/lrm: starting service vm:108
+info 21 node1/lrm: service status vm:108 started
+info 21 node1/lrm: starting service vm:109
+info 21 node1/lrm: service status vm:109 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:103 migrate node2
+info 120 node1/crm: got crm command: migrate vm:103 node2
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:101' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:102' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:104' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:105' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:106' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:107' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:108' to 'node2'
+info 120 node1/crm: crm command 'migrate vm:103 node2' - migrate positively colocated service 'vm:109' to 'node2'
+info 120 node1/crm: migrate service 'vm:101' to node 'node2'
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:102' to node 'node2'
+info 120 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:103' to node 'node2'
+info 120 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:104' to node 'node2'
+info 120 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:105' to node 'node2'
+info 120 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:106' to node 'node2'
+info 120 node1/crm: service 'vm:106': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:107' to node 'node2'
+info 120 node1/crm: service 'vm:107': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:108' to node 'node2'
+info 120 node1/crm: service 'vm:108': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:109' to node 'node2'
+info 120 node1/crm: service 'vm:109': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 121 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:102 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:102 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:103 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:103 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:104 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:104 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:105 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:105 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:106 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:106 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:107 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:107 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:108 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:108 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:109 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:109 - end migrate to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:106': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:107': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:108': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:109': state changed from 'migrate' to 'started' (node = node2)
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: starting service vm:101
+info 143 node2/lrm: service status vm:101 started
+info 143 node2/lrm: starting service vm:102
+info 143 node2/lrm: service status vm:102 started
+info 143 node2/lrm: starting service vm:103
+info 143 node2/lrm: service status vm:103 started
+info 143 node2/lrm: starting service vm:104
+info 143 node2/lrm: service status vm:104 started
+info 143 node2/lrm: starting service vm:105
+info 143 node2/lrm: service status vm:105 started
+info 143 node2/lrm: starting service vm:106
+info 143 node2/lrm: service status vm:106 started
+info 143 node2/lrm: starting service vm:107
+info 143 node2/lrm: service status vm:107 started
+info 143 node2/lrm: starting service vm:108
+info 143 node2/lrm: service status vm:108 started
+info 143 node2/lrm: starting service vm:109
+info 143 node2/lrm: service status vm:109 started
+info 220 cmdlist: execute delay 100
+info 400 cmdlist: execute service vm:101 migrate node3
+info 400 node1/crm: got crm command: migrate vm:101 node3
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:102' to 'node3'
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:103' to 'node3'
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:104' to 'node3'
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:105' to 'node3'
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:106' to 'node3'
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:107' to 'node3'
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:108' to 'node3'
+info 400 node1/crm: crm command 'migrate vm:101 node3' - migrate positively colocated service 'vm:109' to 'node3'
+info 400 node1/crm: migrate service 'vm:101' to node 'node3'
+info 400 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:102' to node 'node3'
+info 400 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:103' to node 'node3'
+info 400 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:104' to node 'node3'
+info 400 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:105' to node 'node3'
+info 400 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:106' to node 'node3'
+info 400 node1/crm: service 'vm:106': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:107' to node 'node3'
+info 400 node1/crm: service 'vm:107': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:108' to node 'node3'
+info 400 node1/crm: service 'vm:108': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 400 node1/crm: migrate service 'vm:109' to node 'node3'
+info 400 node1/crm: service 'vm:109': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 403 node2/lrm: service vm:101 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:101 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:102 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:102 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:103 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:103 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:104 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:104 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:105 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:105 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:106 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:106 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:107 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:107 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:108 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:108 - end migrate to node 'node3'
+info 403 node2/lrm: service vm:109 - start migrate to node 'node3'
+info 403 node2/lrm: service vm:109 - end migrate to node 'node3'
+info 420 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:106': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:107': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:108': state changed from 'migrate' to 'started' (node = node3)
+info 420 node1/crm: service 'vm:109': state changed from 'migrate' to 'started' (node = node3)
+info 425 node3/lrm: got lock 'ha_agent_node3_lock'
+info 425 node3/lrm: status change wait_for_agent_lock => active
+info 425 node3/lrm: starting service vm:101
+info 425 node3/lrm: service status vm:101 started
+info 425 node3/lrm: starting service vm:102
+info 425 node3/lrm: service status vm:102 started
+info 425 node3/lrm: starting service vm:103
+info 425 node3/lrm: service status vm:103 started
+info 425 node3/lrm: starting service vm:104
+info 425 node3/lrm: service status vm:104 started
+info 425 node3/lrm: starting service vm:105
+info 425 node3/lrm: service status vm:105 started
+info 425 node3/lrm: starting service vm:106
+info 425 node3/lrm: service status vm:106 started
+info 425 node3/lrm: starting service vm:107
+info 425 node3/lrm: service status vm:107 started
+info 425 node3/lrm: starting service vm:108
+info 425 node3/lrm: service status vm:108 started
+info 425 node3/lrm: starting service vm:109
+info 425 node3/lrm: service status vm:109 started
+info 500 cmdlist: execute delay 100
+info 680 cmdlist: execute service vm:109 migrate node1
+info 680 node1/crm: got crm command: migrate vm:109 node1
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:101' to 'node1'
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:102' to 'node1'
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:103' to 'node1'
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:104' to 'node1'
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:105' to 'node1'
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:106' to 'node1'
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:107' to 'node1'
+info 680 node1/crm: crm command 'migrate vm:109 node1' - migrate positively colocated service 'vm:108' to 'node1'
+info 680 node1/crm: migrate service 'vm:101' to node 'node1'
+info 680 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:102' to node 'node1'
+info 680 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:103' to node 'node1'
+info 680 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:104' to node 'node1'
+info 680 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:105' to node 'node1'
+info 680 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:106' to node 'node1'
+info 680 node1/crm: service 'vm:106': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:107' to node 'node1'
+info 680 node1/crm: service 'vm:107': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:108' to node 'node1'
+info 680 node1/crm: service 'vm:108': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 680 node1/crm: migrate service 'vm:109' to node 'node1'
+info 680 node1/crm: service 'vm:109': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 685 node3/lrm: service vm:101 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:101 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:102 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:102 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:103 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:103 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:104 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:104 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:105 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:105 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:106 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:106 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:107 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:107 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:108 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:108 - end migrate to node 'node1'
+info 685 node3/lrm: service vm:109 - start migrate to node 'node1'
+info 685 node3/lrm: service vm:109 - end migrate to node 'node1'
+info 700 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:106': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:107': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:108': state changed from 'migrate' to 'started' (node = node1)
+info 700 node1/crm: service 'vm:109': state changed from 'migrate' to 'started' (node = node1)
+info 701 node1/lrm: starting service vm:101
+info 701 node1/lrm: service status vm:101 started
+info 701 node1/lrm: starting service vm:102
+info 701 node1/lrm: service status vm:102 started
+info 701 node1/lrm: starting service vm:103
+info 701 node1/lrm: service status vm:103 started
+info 701 node1/lrm: starting service vm:104
+info 701 node1/lrm: service status vm:104 started
+info 701 node1/lrm: starting service vm:105
+info 701 node1/lrm: service status vm:105 started
+info 701 node1/lrm: starting service vm:106
+info 701 node1/lrm: service status vm:106 started
+info 701 node1/lrm: starting service vm:107
+info 701 node1/lrm: service status vm:107 started
+info 701 node1/lrm: starting service vm:108
+info 701 node1/lrm: service status vm:108 started
+info 701 node1/lrm: starting service vm:109
+info 701 node1/lrm: service status vm:109 started
+info 1280 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together5/manager_status b/src/test/test-colocation-strict-together5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-together5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-together5/rules_config b/src/test/test-colocation-strict-together5/rules_config
new file mode 100644
index 0000000..481bce5
--- /dev/null
+++ b/src/test/test-colocation-strict-together5/rules_config
@@ -0,0 +1,15 @@
+colocation: vms-must-stick-together1
+ services vm:101,vm:102,vm:103
+ affinity together
+
+colocation: vms-must-stick-together2
+ services vm:103,vm:104,vm:105
+ affinity together
+
+colocation: vms-must-stick-together3
+ services vm:105,vm:106,vm:107
+ affinity together
+
+colocation: vms-must-stick-together4
+ services vm:105,vm:108,vm:109
+ affinity together
diff --git a/src/test/test-colocation-strict-together5/service_config b/src/test/test-colocation-strict-together5/service_config
new file mode 100644
index 0000000..48db7b1
--- /dev/null
+++ b/src/test/test-colocation-strict-together5/service_config
@@ -0,0 +1,11 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node1", "state": "started" },
+ "vm:107": { "node": "node1", "state": "started" },
+ "vm:108": { "node": "node1", "state": "started" },
+ "vm:109": { "node": "node1", "state": "started" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 19/26] test: ha tester: add test cases in more complex scenarios
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (21 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 18/26] test: ha tester: add test cases for strict positive " Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 20/26] test: add test cases for rules config Daniel Kral
` (19 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add test cases, where colocation rules are used with the static
utilization scheduler and the rebalance on start option enabled. These
verify the behavior in the following scenarios:
- 7 services with interwined colocation rules in a 3 node cluster;
1 node failing
- 3 neg. colocated services in a 3 node cluster, where the rules are
stated in pairwise form; 1 node failing
- 5 neg. colocated services in a 5 node cluster; nodes consecutively
failing after each other
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- changed intransitive to pairwise
- added dummy services in second test case to check whether
colocation rules are applied during rebalance
- changed third test case to check for consecutive node fails and
that with each failed node the colocation rules are applied
correctly
.../test-crs-static-rebalance-coloc1/README | 26 ++
.../test-crs-static-rebalance-coloc1/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 120 ++++++++
.../manager_status | 1 +
.../rules_config | 19 ++
.../service_config | 10 +
.../static_service_stats | 10 +
.../test-crs-static-rebalance-coloc2/README | 20 ++
.../test-crs-static-rebalance-coloc2/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 174 +++++++++++
.../manager_status | 1 +
.../rules_config | 11 +
.../service_config | 14 +
.../static_service_stats | 14 +
.../test-crs-static-rebalance-coloc3/README | 22 ++
.../test-crs-static-rebalance-coloc3/cmdlist | 22 ++
.../datacenter.cfg | 6 +
.../hardware_status | 7 +
.../log.expect | 272 ++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 9 +
.../static_service_stats | 9 +
27 files changed, 801 insertions(+)
create mode 100644 src/test/test-crs-static-rebalance-coloc1/README
create mode 100644 src/test/test-crs-static-rebalance-coloc1/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc1/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc1/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc2/README
create mode 100644 src/test/test-crs-static-rebalance-coloc2/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc2/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc2/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc3/README
create mode 100644 src/test/test-crs-static-rebalance-coloc3/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc3/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc3/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/static_service_stats
diff --git a/src/test/test-crs-static-rebalance-coloc1/README b/src/test/test-crs-static-rebalance-coloc1/README
new file mode 100644
index 0000000..0685189
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/README
@@ -0,0 +1,26 @@
+Test whether a mixed set of strict colocation rules in conjunction with the
+static load scheduler with auto-rebalancing are applied correctly on service
+start enabled and in case of a subsequent failover.
+
+The test scenario is:
+- vm:101 and vm:102 are non-colocated services
+- Services that must be kept together:
+ - vm:102 and vm:107
+ - vm:104, vm:106, and vm:108
+- Services that must be kept separate:
+ - vm:103, vm:104, and vm:105
+ - vm:103, vm:106, and vm:107
+ - vm:107 and vm:108
+- Therefore, there are consistent interdependencies between the positive and
+ negative colocation rules' service members
+- vm:101 and vm:102 are currently assigned to node1 and node2 respectively
+- vm:103 through vm:108 are currently assigned to node3
+
+The expected outcome is:
+- vm:101, vm:102, vm:103 should be started on node1, node2, and node3
+ respectively, as there's nothing running on there yet
+- vm:104, vm:106, and vm:108 should all be assigned on the same node, which
+ will be node1, since it has the most resources left for vm:104
+- vm:105 and vm:107 should both be assigned on the same node, which will be
+ node2, since both cannot be assigned to the other nodes because of the
+ colocation constraints
diff --git a/src/test/test-crs-static-rebalance-coloc1/cmdlist b/src/test/test-crs-static-rebalance-coloc1/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-static-rebalance-coloc1/datacenter.cfg b/src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
new file mode 100644
index 0000000..f2671a5
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-rebalance-on-start": 1
+ }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc1/hardware_status b/src/test/test-crs-static-rebalance-coloc1/hardware_status
new file mode 100644
index 0000000..84484af
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc1/log.expect b/src/test/test-crs-static-rebalance-coloc1/log.expect
new file mode 100644
index 0000000..cdd2497
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/log.expect
@@ -0,0 +1,120 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node3'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: adding new service 'vm:108' on node 'node3'
+info 20 node1/crm: service vm:101: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:102: re-balance selected current node node2 for startup
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service vm:103: re-balance selected current node node3 for startup
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service vm:104: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 20 node1/crm: service vm:105: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2)
+info 20 node1/crm: service vm:106: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 20 node1/crm: service vm:107: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2)
+info 20 node1/crm: service vm:108: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 25 node3/lrm: service vm:104 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:104 - end relocate to node 'node1'
+info 25 node3/lrm: service vm:105 - start relocate to node 'node2'
+info 25 node3/lrm: service vm:105 - end relocate to node 'node2'
+info 25 node3/lrm: service vm:106 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:106 - end relocate to node 'node1'
+info 25 node3/lrm: service vm:107 - start relocate to node 'node2'
+info 25 node3/lrm: service vm:107 - end relocate to node 'node2'
+info 25 node3/lrm: service vm:108 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:108 - end relocate to node 'node1'
+info 40 node1/crm: service 'vm:104': state changed from 'request_start_balance' to 'started' (node = node1)
+info 40 node1/crm: service 'vm:105': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:106': state changed from 'request_start_balance' to 'started' (node = node1)
+info 40 node1/crm: service 'vm:107': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:108': state changed from 'request_start_balance' to 'started' (node = node1)
+info 41 node1/lrm: starting service vm:104
+info 41 node1/lrm: service status vm:104 started
+info 41 node1/lrm: starting service vm:106
+info 41 node1/lrm: service status vm:106 started
+info 41 node1/lrm: starting service vm:108
+info 41 node1/lrm: service status vm:108 started
+info 43 node2/lrm: starting service vm:105
+info 43 node2/lrm: service status vm:105 started
+info 43 node2/lrm: starting service vm:107
+info 43 node2/lrm: service status vm:107 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+err 240 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 260 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 280 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 300 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 320 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 340 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 360 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 380 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 400 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 420 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 440 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 460 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 480 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 500 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 520 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 540 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 560 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 580 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 600 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 620 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 640 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 660 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 680 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 700 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-rebalance-coloc1/manager_status b/src/test/test-crs-static-rebalance-coloc1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-rebalance-coloc1/rules_config b/src/test/test-crs-static-rebalance-coloc1/rules_config
new file mode 100644
index 0000000..3e6ebf2
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/rules_config
@@ -0,0 +1,19 @@
+colocation: vms-must-stick-together1
+ services vm:102,vm:107
+ affinity together
+
+colocation: vms-must-stick-together2
+ services vm:104,vm:106,vm:108
+ affinity together
+
+colocation: vms-must-stay-apart1
+ services vm:103,vm:104,vm:105
+ affinity separate
+
+colocation: vms-must-stay-apart2
+ services vm:103,vm:106,vm:107
+ affinity separate
+
+colocation: vms-must-stay-apart3
+ services vm:107,vm:108
+ affinity separate
diff --git a/src/test/test-crs-static-rebalance-coloc1/service_config b/src/test/test-crs-static-rebalance-coloc1/service_config
new file mode 100644
index 0000000..02e4a07
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/service_config
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node3", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" },
+ "vm:108": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc1/static_service_stats b/src/test/test-crs-static-rebalance-coloc1/static_service_stats
new file mode 100644
index 0000000..c6472ca
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/static_service_stats
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "maxcpu": 8, "maxmem": 16000000000 },
+ "vm:102": { "maxcpu": 4, "maxmem": 24000000000 },
+ "vm:103": { "maxcpu": 2, "maxmem": 32000000000 },
+ "vm:104": { "maxcpu": 4, "maxmem": 48000000000 },
+ "vm:105": { "maxcpu": 8, "maxmem": 16000000000 },
+ "vm:106": { "maxcpu": 4, "maxmem": 32000000000 },
+ "vm:107": { "maxcpu": 2, "maxmem": 64000000000 },
+ "vm:108": { "maxcpu": 8, "maxmem": 48000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/README b/src/test/test-crs-static-rebalance-coloc2/README
new file mode 100644
index 0000000..c335752
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/README
@@ -0,0 +1,20 @@
+Test whether a pairwise strict negative colocation rules, i.e. negative
+colocation relations a<->b, b<->c and a<->c, in conjunction with the static
+load scheduler with auto-rebalancing are applied correctly on service start and
+in case of a subsequent failover.
+
+The test scenario is:
+- vm:100 and vm:200 must be kept separate
+- vm:200 and vm:300 must be kept separate
+- vm:100 and vm:300 must be kept separate
+- Therefore, vm:100, vm:200, and vm:300 must be kept separate
+- The services' static usage stats are chosen so that during rebalancing vm:300
+ will need to select a less than ideal node according to the static usage
+ scheduler, i.e. node1 being the ideal one, to test whether the colocation
+ rule still applies correctly
+
+The expected outcome is:
+- vm:100, vm:200, and vm:300 should be started on node1, node2, and node3
+ respectively, just as if the three negative colocation rule would've been
+ stated in a single negative colocation rule
+- As node3 fails, vm:300 cannot be recovered
diff --git a/src/test/test-crs-static-rebalance-coloc2/cmdlist b/src/test/test-crs-static-rebalance-coloc2/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-static-rebalance-coloc2/datacenter.cfg b/src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
new file mode 100644
index 0000000..f2671a5
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-rebalance-on-start": 1
+ }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/hardware_status b/src/test/test-crs-static-rebalance-coloc2/hardware_status
new file mode 100644
index 0000000..84484af
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/log.expect b/src/test/test-crs-static-rebalance-coloc2/log.expect
new file mode 100644
index 0000000..a7e5c8e
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/log.expect
@@ -0,0 +1,174 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:100' on node 'node1'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:200' on node 'node1'
+info 20 node1/crm: adding new service 'vm:201' on node 'node1'
+info 20 node1/crm: adding new service 'vm:202' on node 'node1'
+info 20 node1/crm: adding new service 'vm:203' on node 'node1'
+info 20 node1/crm: adding new service 'vm:300' on node 'node1'
+info 20 node1/crm: adding new service 'vm:301' on node 'node1'
+info 20 node1/crm: adding new service 'vm:302' on node 'node1'
+info 20 node1/crm: adding new service 'vm:303' on node 'node1'
+info 20 node1/crm: service vm:100: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:100': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:101: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node2)
+info 20 node1/crm: service vm:102: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 20 node1/crm: service vm:103: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 20 node1/crm: service vm:200: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:200': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node2)
+info 20 node1/crm: service vm:201: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:201': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 20 node1/crm: service vm:202: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:202': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 20 node1/crm: service vm:203: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:203': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:300: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:300': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 20 node1/crm: service vm:301: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:301': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:302: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:302': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node2)
+info 20 node1/crm: service vm:303: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:303': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:100
+info 21 node1/lrm: service status vm:100 started
+info 21 node1/lrm: service vm:101 - start relocate to node 'node2'
+info 21 node1/lrm: service vm:101 - end relocate to node 'node2'
+info 21 node1/lrm: service vm:102 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:102 - end relocate to node 'node3'
+info 21 node1/lrm: service vm:103 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:103 - end relocate to node 'node3'
+info 21 node1/lrm: service vm:200 - start relocate to node 'node2'
+info 21 node1/lrm: service vm:200 - end relocate to node 'node2'
+info 21 node1/lrm: service vm:201 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:201 - end relocate to node 'node3'
+info 21 node1/lrm: service vm:202 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:202 - end relocate to node 'node3'
+info 21 node1/lrm: starting service vm:203
+info 21 node1/lrm: service status vm:203 started
+info 21 node1/lrm: service vm:300 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:300 - end relocate to node 'node3'
+info 21 node1/lrm: starting service vm:301
+info 21 node1/lrm: service status vm:301 started
+info 21 node1/lrm: service vm:302 - start relocate to node 'node2'
+info 21 node1/lrm: service vm:302 - end relocate to node 'node2'
+info 21 node1/lrm: starting service vm:303
+info 21 node1/lrm: service status vm:303 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started' (node = node3)
+info 40 node1/crm: service 'vm:103': state changed from 'request_start_balance' to 'started' (node = node3)
+info 40 node1/crm: service 'vm:200': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:201': state changed from 'request_start_balance' to 'started' (node = node3)
+info 40 node1/crm: service 'vm:202': state changed from 'request_start_balance' to 'started' (node = node3)
+info 40 node1/crm: service 'vm:300': state changed from 'request_start_balance' to 'started' (node = node3)
+info 40 node1/crm: service 'vm:302': state changed from 'request_start_balance' to 'started' (node = node2)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 43 node2/lrm: starting service vm:200
+info 43 node2/lrm: service status vm:200 started
+info 43 node2/lrm: starting service vm:302
+info 43 node2/lrm: service status vm:302 started
+info 45 node3/lrm: got lock 'ha_agent_node3_lock'
+info 45 node3/lrm: status change wait_for_agent_lock => active
+info 45 node3/lrm: starting service vm:102
+info 45 node3/lrm: service status vm:102 started
+info 45 node3/lrm: starting service vm:103
+info 45 node3/lrm: service status vm:103 started
+info 45 node3/lrm: starting service vm:201
+info 45 node3/lrm: service status vm:201 started
+info 45 node3/lrm: starting service vm:202
+info 45 node3/lrm: service status vm:202 started
+info 45 node3/lrm: starting service vm:300
+info 45 node3/lrm: service status vm:300 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:201': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:202': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:300': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:201': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:202': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:300': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:201' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:201': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:202' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:202': state changed from 'recovery' to 'started' (node = node2)
+err 240 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 240 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 243 node2/lrm: starting service vm:201
+info 243 node2/lrm: service status vm:201 started
+info 243 node2/lrm: starting service vm:202
+info 243 node2/lrm: service status vm:202 started
+err 260 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 280 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 300 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 320 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 340 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 360 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 380 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 400 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 420 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 440 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 460 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 480 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 500 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 520 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 540 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 560 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 580 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 600 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 620 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 640 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 660 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 680 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+err 700 node1/crm: recovering service 'vm:300' from fenced node 'node3' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-rebalance-coloc2/manager_status b/src/test/test-crs-static-rebalance-coloc2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-rebalance-coloc2/rules_config b/src/test/test-crs-static-rebalance-coloc2/rules_config
new file mode 100644
index 0000000..ea1ec10
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/rules_config
@@ -0,0 +1,11 @@
+colocation: very-lonely-services1
+ services vm:100,vm:200
+ affinity separate
+
+colocation: very-lonely-services2
+ services vm:200,vm:300
+ affinity separate
+
+colocation: very-lonely-services3
+ services vm:100,vm:300
+ affinity separate
diff --git a/src/test/test-crs-static-rebalance-coloc2/service_config b/src/test/test-crs-static-rebalance-coloc2/service_config
new file mode 100644
index 0000000..0de367e
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/service_config
@@ -0,0 +1,14 @@
+{
+ "vm:100": { "node": "node1", "state": "started" },
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:200": { "node": "node1", "state": "started" },
+ "vm:201": { "node": "node1", "state": "started" },
+ "vm:202": { "node": "node1", "state": "started" },
+ "vm:203": { "node": "node1", "state": "started" },
+ "vm:300": { "node": "node1", "state": "started" },
+ "vm:301": { "node": "node1", "state": "started" },
+ "vm:302": { "node": "node1", "state": "started" },
+ "vm:303": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/static_service_stats b/src/test/test-crs-static-rebalance-coloc2/static_service_stats
new file mode 100644
index 0000000..3c7502e
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/static_service_stats
@@ -0,0 +1,14 @@
+{
+ "vm:100": { "maxcpu": 8, "maxmem": 16000000000 },
+ "vm:101": { "maxcpu": 4, "maxmem": 8000000000 },
+ "vm:102": { "maxcpu": 2, "maxmem": 8000000000 },
+ "vm:103": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:200": { "maxcpu": 4, "maxmem": 24000000000 },
+ "vm:201": { "maxcpu": 2, "maxmem": 8000000000 },
+ "vm:202": { "maxcpu": 4, "maxmem": 4000000000 },
+ "vm:203": { "maxcpu": 2, "maxmem": 8000000000 },
+ "vm:300": { "maxcpu": 6, "maxmem": 32000000000 },
+ "vm:301": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:302": { "maxcpu": 2, "maxmem": 8000000000 },
+ "vm:303": { "maxcpu": 4, "maxmem": 8000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/README b/src/test/test-crs-static-rebalance-coloc3/README
new file mode 100644
index 0000000..4e3a1ae
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/README
@@ -0,0 +1,22 @@
+Test whether a more complex set of pairwise strict negative colocation rules,
+i.e. there's negative colocation relations a<->b, b<->c and a<->c, with 5
+services in conjunction with the static load scheduler with auto-rebalancing
+are applied correctly on service start and in case of a consecutive failover of
+all nodes after each other.
+
+The test scenario is:
+- vm:100, vm:200, vm:300, vm:400, and vm:500 must be kept separate
+- The services' static usage stats are chosen so that during rebalancing vm:300
+ and vm:500 will need to select a less than ideal node according to the static
+ usage scheduler, i.e. node2 and node3 being their ideal ones, to test whether
+ the colocation rule still applies correctly
+
+The expected outcome is:
+- vm:100, vm:200, vm:300, vm:400, and vm:500 should be started on node2, node1,
+ node4, node3, and node5 respectively
+- vm:400 and vm:500 are started on node3 and node5, instead of node2 and node3
+ as would've been without the colocation rules
+- As node1, node2, node3, node4, and node5 fail consecutively with each node
+ coming back online, vm:200, vm:100, vm:400, vm:300, and vm:500 will be put in
+ recovery during the failover respectively, as there is no other node left to
+ accomodate them without violating the colocation rule.
diff --git a/src/test/test-crs-static-rebalance-coloc3/cmdlist b/src/test/test-crs-static-rebalance-coloc3/cmdlist
new file mode 100644
index 0000000..6665419
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/cmdlist
@@ -0,0 +1,22 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+ [ "power node1 off" ],
+ [ "delay 100" ],
+ [ "power node1 on" ],
+ [ "delay 100" ],
+ [ "power node2 off" ],
+ [ "delay 100" ],
+ [ "power node2 on" ],
+ [ "delay 100" ],
+ [ "power node3 off" ],
+ [ "delay 100" ],
+ [ "power node3 on" ],
+ [ "delay 100" ],
+ [ "power node4 off" ],
+ [ "delay 100" ],
+ [ "power node4 on" ],
+ [ "delay 100" ],
+ [ "power node5 off" ],
+ [ "delay 100" ],
+ [ "power node5 on" ]
+]
diff --git a/src/test/test-crs-static-rebalance-coloc3/datacenter.cfg b/src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
new file mode 100644
index 0000000..f2671a5
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-rebalance-on-start": 1
+ }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/hardware_status b/src/test/test-crs-static-rebalance-coloc3/hardware_status
new file mode 100644
index 0000000..b6dcb1a
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/hardware_status
@@ -0,0 +1,7 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 8, "memory": 48000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 36000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 16, "memory": 24000000000 },
+ "node4": { "power": "off", "network": "off", "cpus": 32, "memory": 36000000000 },
+ "node5": { "power": "off", "network": "off", "cpus": 8, "memory": 48000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/log.expect b/src/test/test-crs-static-rebalance-coloc3/log.expect
new file mode 100644
index 0000000..4e87f03
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/log.expect
@@ -0,0 +1,272 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node4 on
+info 20 node4/crm: status change startup => wait_for_quorum
+info 20 node4/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node5 on
+info 20 node5/crm: status change startup => wait_for_quorum
+info 20 node5/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:100' on node 'node1'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:200' on node 'node1'
+info 20 node1/crm: adding new service 'vm:201' on node 'node1'
+info 20 node1/crm: adding new service 'vm:300' on node 'node1'
+info 20 node1/crm: adding new service 'vm:400' on node 'node1'
+info 20 node1/crm: adding new service 'vm:500' on node 'node1'
+info 20 node1/crm: service vm:100: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:100': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node2)
+info 20 node1/crm: service vm:101: re-balance selected new node node4 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node4)
+info 20 node1/crm: service vm:200: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:200': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:201: re-balance selected new node node5 for startup
+info 20 node1/crm: service 'vm:201': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node5)
+info 20 node1/crm: service vm:300: re-balance selected new node node4 for startup
+info 20 node1/crm: service 'vm:300': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node4)
+info 20 node1/crm: service vm:400: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:400': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 20 node1/crm: service vm:500: re-balance selected new node node5 for startup
+info 20 node1/crm: service 'vm:500': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node5)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: service vm:100 - start relocate to node 'node2'
+info 21 node1/lrm: service vm:100 - end relocate to node 'node2'
+info 21 node1/lrm: service vm:101 - start relocate to node 'node4'
+info 21 node1/lrm: service vm:101 - end relocate to node 'node4'
+info 21 node1/lrm: starting service vm:200
+info 21 node1/lrm: service status vm:200 started
+info 21 node1/lrm: service vm:201 - start relocate to node 'node5'
+info 21 node1/lrm: service vm:201 - end relocate to node 'node5'
+info 21 node1/lrm: service vm:300 - start relocate to node 'node4'
+info 21 node1/lrm: service vm:300 - end relocate to node 'node4'
+info 21 node1/lrm: service vm:400 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:400 - end relocate to node 'node3'
+info 21 node1/lrm: service vm:500 - start relocate to node 'node5'
+info 21 node1/lrm: service vm:500 - end relocate to node 'node5'
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 26 node4/crm: status change wait_for_quorum => slave
+info 28 node5/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:100': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started' (node = node4)
+info 40 node1/crm: service 'vm:201': state changed from 'request_start_balance' to 'started' (node = node5)
+info 40 node1/crm: service 'vm:300': state changed from 'request_start_balance' to 'started' (node = node4)
+info 40 node1/crm: service 'vm:400': state changed from 'request_start_balance' to 'started' (node = node3)
+info 40 node1/crm: service 'vm:500': state changed from 'request_start_balance' to 'started' (node = node5)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:100
+info 43 node2/lrm: service status vm:100 started
+info 45 node3/lrm: got lock 'ha_agent_node3_lock'
+info 45 node3/lrm: status change wait_for_agent_lock => active
+info 45 node3/lrm: starting service vm:400
+info 45 node3/lrm: service status vm:400 started
+info 47 node4/lrm: got lock 'ha_agent_node4_lock'
+info 47 node4/lrm: status change wait_for_agent_lock => active
+info 47 node4/lrm: starting service vm:101
+info 47 node4/lrm: service status vm:101 started
+info 47 node4/lrm: starting service vm:300
+info 47 node4/lrm: service status vm:300 started
+info 49 node5/lrm: got lock 'ha_agent_node5_lock'
+info 49 node5/lrm: status change wait_for_agent_lock => active
+info 49 node5/lrm: starting service vm:201
+info 49 node5/lrm: service status vm:201 started
+info 49 node5/lrm: starting service vm:500
+info 49 node5/lrm: service status vm:500 started
+info 120 cmdlist: execute power node1 off
+info 120 node1/crm: killed by poweroff
+info 120 node1/lrm: killed by poweroff
+info 220 cmdlist: execute delay 100
+info 222 node3/crm: got lock 'ha_manager_lock'
+info 222 node3/crm: status change slave => master
+info 222 node3/crm: using scheduler mode 'static'
+info 222 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 282 node3/crm: service 'vm:200': state changed from 'started' to 'fence'
+info 282 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 282 node3/crm: FENCE: Try to fence node 'node1'
+info 282 node3/crm: got lock 'ha_agent_node1_lock'
+info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: service 'vm:200': state changed from 'fence' to 'recovery'
+err 282 node3/crm: recovering service 'vm:200' from fenced node 'node1' failed, no recovery node found
+err 302 node3/crm: recovering service 'vm:200' from fenced node 'node1' failed, no recovery node found
+err 322 node3/crm: recovering service 'vm:200' from fenced node 'node1' failed, no recovery node found
+err 342 node3/crm: recovering service 'vm:200' from fenced node 'node1' failed, no recovery node found
+err 362 node3/crm: recovering service 'vm:200' from fenced node 'node1' failed, no recovery node found
+err 382 node3/crm: recovering service 'vm:200' from fenced node 'node1' failed, no recovery node found
+info 400 cmdlist: execute power node1 on
+info 400 node1/crm: status change startup => wait_for_quorum
+info 400 node1/lrm: status change startup => wait_for_agent_lock
+info 400 node1/crm: status change wait_for_quorum => slave
+info 404 node3/crm: node 'node1': state changed from 'unknown' => 'online'
+info 404 node3/crm: recover service 'vm:200' to previous failed and fenced node 'node1' again
+info 404 node3/crm: service 'vm:200': state changed from 'recovery' to 'started' (node = node1)
+info 421 node1/lrm: got lock 'ha_agent_node1_lock'
+info 421 node1/lrm: status change wait_for_agent_lock => active
+info 421 node1/lrm: starting service vm:200
+info 421 node1/lrm: service status vm:200 started
+info 500 cmdlist: execute delay 100
+info 680 cmdlist: execute power node2 off
+info 680 node2/crm: killed by poweroff
+info 680 node2/lrm: killed by poweroff
+info 682 node3/crm: node 'node2': state changed from 'online' => 'unknown'
+info 742 node3/crm: service 'vm:100': state changed from 'started' to 'fence'
+info 742 node3/crm: node 'node2': state changed from 'unknown' => 'fence'
+emai 742 node3/crm: FENCE: Try to fence node 'node2'
+info 780 cmdlist: execute delay 100
+info 802 node3/crm: got lock 'ha_agent_node2_lock'
+info 802 node3/crm: fencing: acknowledged - got agent lock for node 'node2'
+info 802 node3/crm: node 'node2': state changed from 'fence' => 'unknown'
+emai 802 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node2'
+info 802 node3/crm: service 'vm:100': state changed from 'fence' to 'recovery'
+err 802 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+err 822 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+err 842 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+err 862 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+err 882 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+err 902 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+err 922 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+err 942 node3/crm: recovering service 'vm:100' from fenced node 'node2' failed, no recovery node found
+info 960 cmdlist: execute power node2 on
+info 960 node2/crm: status change startup => wait_for_quorum
+info 960 node2/lrm: status change startup => wait_for_agent_lock
+info 962 node2/crm: status change wait_for_quorum => slave
+info 963 node2/lrm: got lock 'ha_agent_node2_lock'
+info 963 node2/lrm: status change wait_for_agent_lock => active
+info 964 node3/crm: node 'node2': state changed from 'unknown' => 'online'
+info 964 node3/crm: recover service 'vm:100' to previous failed and fenced node 'node2' again
+info 964 node3/crm: service 'vm:100': state changed from 'recovery' to 'started' (node = node2)
+info 983 node2/lrm: starting service vm:100
+info 983 node2/lrm: service status vm:100 started
+info 1060 cmdlist: execute delay 100
+info 1240 cmdlist: execute power node3 off
+info 1240 node3/crm: killed by poweroff
+info 1240 node3/lrm: killed by poweroff
+info 1340 cmdlist: execute delay 100
+info 1346 node5/crm: got lock 'ha_manager_lock'
+info 1346 node5/crm: status change slave => master
+info 1346 node5/crm: using scheduler mode 'static'
+info 1346 node5/crm: node 'node3': state changed from 'online' => 'unknown'
+info 1406 node5/crm: service 'vm:400': state changed from 'started' to 'fence'
+info 1406 node5/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 1406 node5/crm: FENCE: Try to fence node 'node3'
+info 1406 node5/crm: got lock 'ha_agent_node3_lock'
+info 1406 node5/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 1406 node5/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 1406 node5/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 1406 node5/crm: service 'vm:400': state changed from 'fence' to 'recovery'
+err 1406 node5/crm: recovering service 'vm:400' from fenced node 'node3' failed, no recovery node found
+err 1426 node5/crm: recovering service 'vm:400' from fenced node 'node3' failed, no recovery node found
+err 1446 node5/crm: recovering service 'vm:400' from fenced node 'node3' failed, no recovery node found
+err 1466 node5/crm: recovering service 'vm:400' from fenced node 'node3' failed, no recovery node found
+err 1486 node5/crm: recovering service 'vm:400' from fenced node 'node3' failed, no recovery node found
+err 1506 node5/crm: recovering service 'vm:400' from fenced node 'node3' failed, no recovery node found
+info 1520 cmdlist: execute power node3 on
+info 1520 node3/crm: status change startup => wait_for_quorum
+info 1520 node3/lrm: status change startup => wait_for_agent_lock
+info 1524 node3/crm: status change wait_for_quorum => slave
+info 1528 node5/crm: node 'node3': state changed from 'unknown' => 'online'
+info 1528 node5/crm: recover service 'vm:400' to previous failed and fenced node 'node3' again
+info 1528 node5/crm: service 'vm:400': state changed from 'recovery' to 'started' (node = node3)
+info 1545 node3/lrm: got lock 'ha_agent_node3_lock'
+info 1545 node3/lrm: status change wait_for_agent_lock => active
+info 1545 node3/lrm: starting service vm:400
+info 1545 node3/lrm: service status vm:400 started
+info 1620 cmdlist: execute delay 100
+info 1800 cmdlist: execute power node4 off
+info 1800 node4/crm: killed by poweroff
+info 1800 node4/lrm: killed by poweroff
+info 1806 node5/crm: node 'node4': state changed from 'online' => 'unknown'
+info 1866 node5/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 1866 node5/crm: service 'vm:300': state changed from 'started' to 'fence'
+info 1866 node5/crm: node 'node4': state changed from 'unknown' => 'fence'
+emai 1866 node5/crm: FENCE: Try to fence node 'node4'
+info 1900 cmdlist: execute delay 100
+info 1926 node5/crm: got lock 'ha_agent_node4_lock'
+info 1926 node5/crm: fencing: acknowledged - got agent lock for node 'node4'
+info 1926 node5/crm: node 'node4': state changed from 'fence' => 'unknown'
+emai 1926 node5/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node4'
+info 1926 node5/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 1926 node5/crm: service 'vm:300': state changed from 'fence' to 'recovery'
+info 1926 node5/crm: recover service 'vm:101' from fenced node 'node4' to node 'node2'
+info 1926 node5/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+err 1926 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+err 1926 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+info 1943 node2/lrm: starting service vm:101
+info 1943 node2/lrm: service status vm:101 started
+err 1946 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+err 1966 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+err 1986 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+err 2006 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+err 2026 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+err 2046 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+err 2066 node5/crm: recovering service 'vm:300' from fenced node 'node4' failed, no recovery node found
+info 2080 cmdlist: execute power node4 on
+info 2080 node4/crm: status change startup => wait_for_quorum
+info 2080 node4/lrm: status change startup => wait_for_agent_lock
+info 2086 node4/crm: status change wait_for_quorum => slave
+info 2087 node4/lrm: got lock 'ha_agent_node4_lock'
+info 2087 node4/lrm: status change wait_for_agent_lock => active
+info 2088 node5/crm: node 'node4': state changed from 'unknown' => 'online'
+info 2088 node5/crm: recover service 'vm:300' to previous failed and fenced node 'node4' again
+info 2088 node5/crm: service 'vm:300': state changed from 'recovery' to 'started' (node = node4)
+info 2107 node4/lrm: starting service vm:300
+info 2107 node4/lrm: service status vm:300 started
+info 2180 cmdlist: execute delay 100
+info 2360 cmdlist: execute power node5 off
+info 2360 node5/crm: killed by poweroff
+info 2360 node5/lrm: killed by poweroff
+info 2460 cmdlist: execute delay 100
+info 2480 node1/crm: got lock 'ha_manager_lock'
+info 2480 node1/crm: status change slave => master
+info 2480 node1/crm: using scheduler mode 'static'
+info 2480 node1/crm: node 'node5': state changed from 'online' => 'unknown'
+info 2540 node1/crm: service 'vm:201': state changed from 'started' to 'fence'
+info 2540 node1/crm: service 'vm:500': state changed from 'started' to 'fence'
+info 2540 node1/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai 2540 node1/crm: FENCE: Try to fence node 'node5'
+info 2540 node1/crm: got lock 'ha_agent_node5_lock'
+info 2540 node1/crm: fencing: acknowledged - got agent lock for node 'node5'
+info 2540 node1/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai 2540 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info 2540 node1/crm: service 'vm:201': state changed from 'fence' to 'recovery'
+info 2540 node1/crm: service 'vm:500': state changed from 'fence' to 'recovery'
+info 2540 node1/crm: recover service 'vm:201' from fenced node 'node5' to node 'node2'
+info 2540 node1/crm: service 'vm:201': state changed from 'recovery' to 'started' (node = node2)
+err 2540 node1/crm: recovering service 'vm:500' from fenced node 'node5' failed, no recovery node found
+err 2540 node1/crm: recovering service 'vm:500' from fenced node 'node5' failed, no recovery node found
+info 2543 node2/lrm: starting service vm:201
+info 2543 node2/lrm: service status vm:201 started
+err 2560 node1/crm: recovering service 'vm:500' from fenced node 'node5' failed, no recovery node found
+err 2580 node1/crm: recovering service 'vm:500' from fenced node 'node5' failed, no recovery node found
+err 2600 node1/crm: recovering service 'vm:500' from fenced node 'node5' failed, no recovery node found
+err 2620 node1/crm: recovering service 'vm:500' from fenced node 'node5' failed, no recovery node found
+info 2640 cmdlist: execute power node5 on
+info 2640 node5/crm: status change startup => wait_for_quorum
+info 2640 node5/lrm: status change startup => wait_for_agent_lock
+info 2640 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 2640 node1/crm: recover service 'vm:500' to previous failed and fenced node 'node5' again
+info 2640 node1/crm: service 'vm:500': state changed from 'recovery' to 'started' (node = node5)
+info 2648 node5/crm: status change wait_for_quorum => slave
+info 2669 node5/lrm: got lock 'ha_agent_node5_lock'
+info 2669 node5/lrm: status change wait_for_agent_lock => active
+info 2669 node5/lrm: starting service vm:500
+info 2669 node5/lrm: service status vm:500 started
+info 3240 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-rebalance-coloc3/manager_status b/src/test/test-crs-static-rebalance-coloc3/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-rebalance-coloc3/rules_config b/src/test/test-crs-static-rebalance-coloc3/rules_config
new file mode 100644
index 0000000..f2646fc
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/rules_config
@@ -0,0 +1,3 @@
+colocation: keep-them-apart
+ services vm:100,vm:200,vm:300,vm:400,vm:500
+ affinity separate
diff --git a/src/test/test-crs-static-rebalance-coloc3/service_config b/src/test/test-crs-static-rebalance-coloc3/service_config
new file mode 100644
index 0000000..86dc27d
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:100": { "node": "node1", "state": "started" },
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:200": { "node": "node1", "state": "started" },
+ "vm:201": { "node": "node1", "state": "started" },
+ "vm:300": { "node": "node1", "state": "started" },
+ "vm:400": { "node": "node1", "state": "started" },
+ "vm:500": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/static_service_stats b/src/test/test-crs-static-rebalance-coloc3/static_service_stats
new file mode 100644
index 0000000..755282b
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:100": { "maxcpu": 16, "maxmem": 16000000000 },
+ "vm:101": { "maxcpu": 4, "maxmem": 8000000000 },
+ "vm:200": { "maxcpu": 2, "maxmem": 48000000000 },
+ "vm:201": { "maxcpu": 4, "maxmem": 8000000000 },
+ "vm:300": { "maxcpu": 8, "maxmem": 32000000000 },
+ "vm:400": { "maxcpu": 32, "maxmem": 32000000000 },
+ "vm:500": { "maxcpu": 16, "maxmem": 8000000000 }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 20/26] test: add test cases for rules config
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (22 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 19/26] test: ha tester: add test cases in more complex scenarios Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services Daniel Kral
` (18 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add test cases to verify that the rule checkers correctly identify and
remove ill-defined location and colocation rules from the rules:
- Set defaults when reading location and colocation rules
- Dropping location rules, which specify the same service multiple times
- Dropping colocation rules, which state that two or more services are
to be kept together and separate at the same time
- Dropping colocation rules, which cannot be fullfilled because of the
constraints of the location rules of their services
- Dropping colocation rules, which specify less than two services
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- dropped connected positive colocation rules check
- rename illdefined-colocations to ineffective-colocation-rules
- rename inner-inconsistent-colocations to
inconsistent-colocation-rules
- introduced rule set tests for default values
- introduced rule set tests for duplicate service reference in
location rules
- introduced rule set tests for multi-priority location with
colocation rules
.gitignore | 1 +
src/test/Makefile | 4 +-
.../defaults-for-colocation-rules.cfg | 10 ++
.../defaults-for-colocation-rules.cfg.expect | 29 ++++
.../defaults-for-location-rules.cfg | 16 +++
.../defaults-for-location-rules.cfg.expect | 49 +++++++
.../duplicate-service-in-location-rules.cfg | 31 +++++
...icate-service-in-location-rules.cfg.expect | 66 +++++++++
.../inconsistent-colocation-rules.cfg | 11 ++
.../inconsistent-colocation-rules.cfg.expect | 11 ++
...inconsistent-location-colocation-rules.cfg | 54 ++++++++
...stent-location-colocation-rules.cfg.expect | 130 ++++++++++++++++++
.../ineffective-colocation-rules.cfg | 7 +
.../ineffective-colocation-rules.cfg.expect | 9 ++
...ulti-priority-location-with-colocation.cfg | 19 +++
...iority-location-with-colocation.cfg.expect | 47 +++++++
src/test/test_rules_config.pl | 102 ++++++++++++++
17 files changed, 595 insertions(+), 1 deletion(-)
create mode 100644 src/test/rules_cfgs/defaults-for-colocation-rules.cfg
create mode 100644 src/test/rules_cfgs/defaults-for-colocation-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/defaults-for-location-rules.cfg
create mode 100644 src/test/rules_cfgs/defaults-for-location-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/duplicate-service-in-location-rules.cfg
create mode 100644 src/test/rules_cfgs/duplicate-service-in-location-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/inconsistent-colocation-rules.cfg
create mode 100644 src/test/rules_cfgs/inconsistent-colocation-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg
create mode 100644 src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/ineffective-colocation-rules.cfg
create mode 100644 src/test/rules_cfgs/ineffective-colocation-rules.cfg.expect
create mode 100644 src/test/rules_cfgs/multi-priority-location-with-colocation.cfg
create mode 100644 src/test/rules_cfgs/multi-priority-location-with-colocation.cfg.expect
create mode 100755 src/test/test_rules_config.pl
diff --git a/.gitignore b/.gitignore
index c35280e..35de63f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,3 +6,4 @@
/src/test/test-*/status/*
/src/test/fence_cfgs/*.cfg.commands
/src/test/fence_cfgs/*.cfg.write
+/src/test/rules_cfgs/*.cfg.output
diff --git a/src/test/Makefile b/src/test/Makefile
index e54959f..6da9e10 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -5,6 +5,7 @@ all:
test:
@echo "-- start regression tests --"
./test_failover1.pl
+ ./test_rules_config.pl
./ha-tester.pl
./test_fence_config.pl
@echo "-- end regression tests (success) --"
@@ -12,4 +13,5 @@ test:
.PHONY: clean
clean:
rm -rf *~ test-*/log test-*/*~ test-*/status \
- fence_cfgs/*.cfg.commands fence_cfgs/*.write
+ fence_cfgs/*.cfg.commands fence_cfgs/*.write \
+ rules_cfgs/*.cfg.output
diff --git a/src/test/rules_cfgs/defaults-for-colocation-rules.cfg b/src/test/rules_cfgs/defaults-for-colocation-rules.cfg
new file mode 100644
index 0000000..8e68030
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-colocation-rules.cfg
@@ -0,0 +1,10 @@
+# Case 1: Colocation rules are enabled by default, so set it so if it isn't yet.
+colocation: colocation-defaults
+ services vm:101,vm:102
+ affinity separate
+
+# Case 2: Colocation rule is disabled, it shouldn't be enabled afterwards.
+colocation: colocation-disabled
+ services vm:201,vm:202
+ affinity separate
+ state disabled
diff --git a/src/test/rules_cfgs/defaults-for-colocation-rules.cfg.expect b/src/test/rules_cfgs/defaults-for-colocation-rules.cfg.expect
new file mode 100644
index 0000000..faafec8
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-colocation-rules.cfg.expect
@@ -0,0 +1,29 @@
+--- Log ---
+--- Config ---
+$VAR1 = {
+ 'digest' => '692081df1aaf03092f67cd415f5c66222b1d55e2',
+ 'ids' => {
+ 'colocation-defaults' => {
+ 'affinity' => 'separate',
+ 'services' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1
+ },
+ 'state' => 'enabled',
+ 'type' => 'colocation'
+ },
+ 'colocation-disabled' => {
+ 'affinity' => 'separate',
+ 'services' => {
+ 'vm:201' => 1,
+ 'vm:202' => 1
+ },
+ 'state' => 'disabled',
+ 'type' => 'colocation'
+ }
+ },
+ 'order' => {
+ 'colocation-defaults' => 1,
+ 'colocation-disabled' => 2
+ }
+ };
diff --git a/src/test/rules_cfgs/defaults-for-location-rules.cfg b/src/test/rules_cfgs/defaults-for-location-rules.cfg
new file mode 100644
index 0000000..728558d
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-location-rules.cfg
@@ -0,0 +1,16 @@
+# Case 1: Location rules are enabled and loose by default, so set it so if it isn't yet.
+location: location-defaults
+ services vm:101
+ nodes node1
+
+# Case 2: Location rule is disabled, it shouldn't be enabled afterwards.
+location: location-disabled
+ services vm:102
+ nodes node2
+ state disabled
+
+# Case 3: Location rule is set to strict, so it shouldn't be loose afterwards.
+location: location-strict
+ services vm:103
+ nodes node3
+ strict 1
diff --git a/src/test/rules_cfgs/defaults-for-location-rules.cfg.expect b/src/test/rules_cfgs/defaults-for-location-rules.cfg.expect
new file mode 100644
index 0000000..d534deb
--- /dev/null
+++ b/src/test/rules_cfgs/defaults-for-location-rules.cfg.expect
@@ -0,0 +1,49 @@
+--- Log ---
+--- Config ---
+$VAR1 = {
+ 'digest' => '27c1704d1a5497b314f8f5717b633184c1a1aacf',
+ 'ids' => {
+ 'location-defaults' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:101' => 1
+ },
+ 'state' => 'enabled',
+ 'type' => 'location'
+ },
+ 'location-disabled' => {
+ 'nodes' => {
+ 'node2' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:102' => 1
+ },
+ 'state' => 'disabled',
+ 'type' => 'location'
+ },
+ 'location-strict' => {
+ 'nodes' => {
+ 'node3' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:103' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ }
+ },
+ 'order' => {
+ 'location-defaults' => 1,
+ 'location-disabled' => 2,
+ 'location-strict' => 3
+ }
+ };
diff --git a/src/test/rules_cfgs/duplicate-service-in-location-rules.cfg b/src/test/rules_cfgs/duplicate-service-in-location-rules.cfg
new file mode 100644
index 0000000..7409bd7
--- /dev/null
+++ b/src/test/rules_cfgs/duplicate-service-in-location-rules.cfg
@@ -0,0 +1,31 @@
+# Case 1: Do not remove two location rules, which do not share services.
+location: no-same-service1
+ services vm:101,vm:102,vm:103
+ nodes node1,node2:2
+ strict 0
+
+location: no-same-service2
+ services vm:104,vm:105
+ nodes node1,node2:2
+ strict 0
+
+location: no-same-service3
+ services vm:106
+ nodes node1,node2:2
+ strict 1
+
+# Case 2: Remove location rules, which share the same service between them.
+location: same-service1
+ services vm:201
+ nodes node1,node2:2
+ strict 0
+
+location: same-service2
+ services vm:201,vm:202
+ nodes node3
+ strict 1
+
+location: same-service3
+ services vm:201,vm:203,vm:204
+ nodes node1:2,node3:3
+ strict 0
diff --git a/src/test/rules_cfgs/duplicate-service-in-location-rules.cfg.expect b/src/test/rules_cfgs/duplicate-service-in-location-rules.cfg.expect
new file mode 100644
index 0000000..39b95bd
--- /dev/null
+++ b/src/test/rules_cfgs/duplicate-service-in-location-rules.cfg.expect
@@ -0,0 +1,66 @@
+--- Log ---
+Drop rule 'same-service1', because service 'vm:201' is already used in another location rule.
+Drop rule 'same-service2', because service 'vm:201' is already used in another location rule.
+Drop rule 'same-service3', because service 'vm:201' is already used in another location rule.
+--- Config ---
+$VAR1 = {
+ 'digest' => '07ee7a87159672339144fb9d1021b54905e09632',
+ 'ids' => {
+ 'no-same-service1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'services' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ 'vm:103' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 0,
+ 'type' => 'location'
+ },
+ 'no-same-service2' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'services' => {
+ 'vm:104' => 1,
+ 'vm:105' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 0,
+ 'type' => 'location'
+ },
+ 'no-same-service3' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'services' => {
+ 'vm:106' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ }
+ },
+ 'order' => {
+ 'no-same-service1' => 1,
+ 'no-same-service2' => 2,
+ 'no-same-service3' => 3
+ }
+ };
diff --git a/src/test/rules_cfgs/inconsistent-colocation-rules.cfg b/src/test/rules_cfgs/inconsistent-colocation-rules.cfg
new file mode 100644
index 0000000..3199bfb
--- /dev/null
+++ b/src/test/rules_cfgs/inconsistent-colocation-rules.cfg
@@ -0,0 +1,11 @@
+colocation: keep-apart1
+ services vm:102,vm:103
+ affinity separate
+
+colocation: keep-apart2
+ services vm:102,vm:104,vm:106
+ affinity separate
+
+colocation: stick-together1
+ services vm:101,vm:102,vm:103,vm:104,vm:106
+ affinity together
diff --git a/src/test/rules_cfgs/inconsistent-colocation-rules.cfg.expect b/src/test/rules_cfgs/inconsistent-colocation-rules.cfg.expect
new file mode 100644
index 0000000..b1989a8
--- /dev/null
+++ b/src/test/rules_cfgs/inconsistent-colocation-rules.cfg.expect
@@ -0,0 +1,11 @@
+--- Log ---
+Drop rule 'keep-apart1', because rule shares two or more services with 'stick-together1'.
+Drop rule 'keep-apart2', because rule shares two or more services with 'stick-together1'.
+Drop rule 'stick-together1', because rule shares two or more services with 'keep-apart1'.
+Drop rule 'stick-together1', because rule shares two or more services with 'keep-apart2'.
+--- Config ---
+$VAR1 = {
+ 'digest' => '469bc45c05ffbb123a277fc0fda48e0132fc9046',
+ 'ids' => {},
+ 'order' => {}
+ };
diff --git a/src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg b/src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg
new file mode 100644
index 0000000..ed6b82d
--- /dev/null
+++ b/src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg
@@ -0,0 +1,54 @@
+# Case 1: Remove no positive colocation rule, where there is exactly one node to keep them together.
+location: vm101-vm102-must-be-on-node1
+ services vm:101,vm:102
+ nodes node1
+ strict 1
+
+colocation: vm101-vm102-must-be-kept-together
+ services vm:101,vm:102
+ affinity together
+
+# Case 2: Remove no negative colocation rule, where there are exactly enough nodes available to keep them apart.
+location: vm201-must-be-on-node1
+ services vm:201
+ nodes node1
+ strict 1
+
+location: vm202-must-be-on-node2
+ services vm:202
+ nodes node2
+ strict 1
+
+colocation: vm201-vm202-must-be-kept-separate
+ services vm:201,vm:202
+ affinity separate
+
+# Case 1: Remove the positive colocation rules, where two services are restricted to a different node.
+location: vm301-must-be-on-node1
+ services vm:301
+ nodes node1
+ strict 1
+
+location: vm301-must-be-on-node2
+ services vm:302
+ nodes node2
+ strict 1
+
+colocation: vm301-vm302-must-be-kept-together
+ services vm:301,vm:302
+ affinity together
+
+# Case 2: Remove the negative colocation rule, where two services are restricted to less nodes than needed to keep them apart.
+location: vm401-must-be-on-node1
+ services vm:401
+ nodes node1
+ strict 1
+
+location: vm402-must-be-on-node1
+ services vm:402
+ nodes node1
+ strict 1
+
+colocation: vm401-vm402-must-be-kept-separate
+ services vm:401,vm:402
+ affinity separate
diff --git a/src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg.expect b/src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg.expect
new file mode 100644
index 0000000..bed4668
--- /dev/null
+++ b/src/test/rules_cfgs/inconsistent-location-colocation-rules.cfg.expect
@@ -0,0 +1,130 @@
+--- Log ---
+Drop rule 'vm301-vm302-must-be-kept-together', because two or more services are restricted to different nodes.
+Drop rule 'vm401-vm402-must-be-kept-separate', because two or more services are restricted to less nodes than available to the services.
+--- Config ---
+$VAR1 = {
+ 'digest' => '27e76e13c20a1d11c5cbcf14434bcd54655fdd40',
+ 'ids' => {
+ 'vm101-vm102-must-be-kept-together' => {
+ 'affinity' => 'together',
+ 'services' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1
+ },
+ 'state' => 'enabled',
+ 'type' => 'colocation'
+ },
+ 'vm101-vm102-must-be-on-node1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ },
+ 'vm201-must-be-on-node1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:201' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ },
+ 'vm201-vm202-must-be-kept-separate' => {
+ 'affinity' => 'separate',
+ 'services' => {
+ 'vm:201' => 1,
+ 'vm:202' => 1
+ },
+ 'state' => 'enabled',
+ 'type' => 'colocation'
+ },
+ 'vm202-must-be-on-node2' => {
+ 'nodes' => {
+ 'node2' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:202' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ },
+ 'vm301-must-be-on-node1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:301' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ },
+ 'vm301-must-be-on-node2' => {
+ 'nodes' => {
+ 'node2' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:302' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ },
+ 'vm401-must-be-on-node1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:401' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ },
+ 'vm402-must-be-on-node1' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 0
+ }
+ },
+ 'services' => {
+ 'vm:402' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ }
+ },
+ 'order' => {
+ 'vm101-vm102-must-be-kept-together' => 2,
+ 'vm101-vm102-must-be-on-node1' => 1,
+ 'vm201-must-be-on-node1' => 3,
+ 'vm201-vm202-must-be-kept-separate' => 5,
+ 'vm202-must-be-on-node2' => 4,
+ 'vm301-must-be-on-node1' => 6,
+ 'vm301-must-be-on-node2' => 7,
+ 'vm401-must-be-on-node1' => 9,
+ 'vm402-must-be-on-node1' => 10
+ }
+ };
diff --git a/src/test/rules_cfgs/ineffective-colocation-rules.cfg b/src/test/rules_cfgs/ineffective-colocation-rules.cfg
new file mode 100644
index 0000000..4f2338e
--- /dev/null
+++ b/src/test/rules_cfgs/ineffective-colocation-rules.cfg
@@ -0,0 +1,7 @@
+colocation: lonely-service1
+ services vm:101
+ affinity together
+
+colocation: lonely-service2
+ services vm:101
+ affinity separate
diff --git a/src/test/rules_cfgs/ineffective-colocation-rules.cfg.expect b/src/test/rules_cfgs/ineffective-colocation-rules.cfg.expect
new file mode 100644
index 0000000..3741ba7
--- /dev/null
+++ b/src/test/rules_cfgs/ineffective-colocation-rules.cfg.expect
@@ -0,0 +1,9 @@
+--- Log ---
+Drop rule 'lonely-service1', because rule is ineffective as there are less than two services.
+Drop rule 'lonely-service2', because rule is ineffective as there are less than two services.
+--- Config ---
+$VAR1 = {
+ 'digest' => '47bdd78898e4193aa113ab05b0fd7aaaeb08109d',
+ 'ids' => {},
+ 'order' => {}
+ };
diff --git a/src/test/rules_cfgs/multi-priority-location-with-colocation.cfg b/src/test/rules_cfgs/multi-priority-location-with-colocation.cfg
new file mode 100644
index 0000000..889fe47
--- /dev/null
+++ b/src/test/rules_cfgs/multi-priority-location-with-colocation.cfg
@@ -0,0 +1,19 @@
+# Case 1: Remove colocation rules, where there is a loose location rule with multiple priority groups set for the nodes.
+location: vm101-vm102-should-be-on-node1-or-node2
+ services vm:101,vm:102
+ nodes node1:1,node2:2
+ strict 0
+
+colocation: vm101-vm102-must-be-kept-separate
+ services vm:101,vm:102
+ affinity separate
+
+# Case 2: Remove colocation rules, where there is a strict location rule with multiple priority groups set for the nodes.
+location: vm201-vm202-must-be-on-node1-or-node2
+ services vm:201,vm:202
+ nodes node1:1,node2:2
+ strict 1
+
+colocation: vm201-vm202-must-be-kept-together
+ services vm:201,vm:202
+ affinity together
diff --git a/src/test/rules_cfgs/multi-priority-location-with-colocation.cfg.expect b/src/test/rules_cfgs/multi-priority-location-with-colocation.cfg.expect
new file mode 100644
index 0000000..47c7af5
--- /dev/null
+++ b/src/test/rules_cfgs/multi-priority-location-with-colocation.cfg.expect
@@ -0,0 +1,47 @@
+--- Log ---
+Drop rule 'vm101-vm102-must-be-kept-separate', because services are in location rules with multiple priorities.
+Drop rule 'vm201-vm202-must-be-kept-together', because services are in location rules with multiple priorities.
+--- Config ---
+$VAR1 = {
+ 'digest' => '30292457018bf1ae4fcf9bea92199233c877bf28',
+ 'ids' => {
+ 'vm101-vm102-should-be-on-node1-or-node2' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 1
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'services' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 0,
+ 'type' => 'location'
+ },
+ 'vm201-vm202-must-be-on-node1-or-node2' => {
+ 'nodes' => {
+ 'node1' => {
+ 'priority' => 1
+ },
+ 'node2' => {
+ 'priority' => 2
+ }
+ },
+ 'services' => {
+ 'vm:201' => 1,
+ 'vm:202' => 1
+ },
+ 'state' => 'enabled',
+ 'strict' => 1,
+ 'type' => 'location'
+ }
+ },
+ 'order' => {
+ 'vm101-vm102-should-be-on-node1-or-node2' => 1,
+ 'vm201-vm202-must-be-on-node1-or-node2' => 3
+ }
+ };
diff --git a/src/test/test_rules_config.pl b/src/test/test_rules_config.pl
new file mode 100755
index 0000000..6e6b7de
--- /dev/null
+++ b/src/test/test_rules_config.pl
@@ -0,0 +1,102 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+use lib qw(..);
+
+use Test::More;
+use Test::MockModule;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::Location;
+use PVE::HA::Rules::Colocation;
+
+PVE::HA::Rules::Location->register();
+PVE::HA::Rules::Colocation->register();
+
+PVE::HA::Rules->init(property_isolation => 1);
+
+my $opt_nodiff;
+
+if (!GetOptions("nodiff" => \$opt_nodiff)) {
+ print "usage: $0 [test.cfg] [--nodiff]\n";
+ exit -1;
+}
+
+sub _log {
+ my ($fh, $source, $message) = @_;
+
+ chomp $message;
+ $message = "[$source] $message" if $source;
+
+ print "$message\n";
+
+ $fh->print("$message\n");
+ $fh->flush();
+}
+
+sub check_cfg {
+ my ($cfg_fn, $outfile) = @_;
+
+ my $raw = PVE::Tools::file_get_contents($cfg_fn);
+
+ open(my $LOG, '>', "$outfile");
+ select($LOG);
+ $| = 1;
+
+ print "--- Log ---\n";
+ my $cfg = PVE::HA::Rules->parse_config($cfg_fn, $raw);
+ PVE::HA::Rules->set_rule_defaults($_) for values %{ $cfg->{ids} };
+ my $messages = PVE::HA::Rules->canonicalize($cfg);
+ print $_ for @$messages;
+ print "--- Config ---\n";
+ {
+ local $Data::Dumper::Sortkeys = 1;
+ print Dumper($cfg);
+ }
+
+ select(STDOUT);
+}
+
+sub run_test {
+ my ($cfg_fn) = @_;
+
+ print "* check: $cfg_fn\n";
+
+ my $outfile = "$cfg_fn.output";
+ my $expect = "$cfg_fn.expect";
+
+ eval { check_cfg($cfg_fn, $outfile); };
+ if (my $err = $@) {
+ die "Test '$cfg_fn' failed:\n$err\n";
+ }
+
+ return if $opt_nodiff;
+
+ my $res;
+
+ if (-f $expect) {
+ my $cmd = ['diff', '-u', $expect, $outfile];
+ $res = system(@$cmd);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ } else {
+ $res = system('cp', $outfile, $expect);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ }
+
+ print "* end rules test: $cfg_fn (success)\n\n";
+}
+
+# exec tests
+
+if (my $testcfg = shift) {
+ run_test($testcfg);
+} else {
+ for my $cfg (<rules_cfgs/*cfg>) {
+ run_test($cfg);
+ }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (23 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 20/26] test: add test cases for rules config Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-07-01 12:11 ` Michael Köppl
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 22/26] config: prune services from rules if services are deleted from config Daniel Kral
` (17 subsequent siblings)
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
select_service_node(...) in 'none' mode will usually only return no
node, if negative colocations specify more services than nodes
available. In these cases, these cannot be separated as there are no
more nodes left, so these are put in error state for now.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This is not ideal and I'd rather make this be dropped in the
check_feasibility(...) part, but then we'd need to introduce more state
to the check helpers or make a direct call to
PVE::Cluster::get_nodelist(...).
changes since v1:
- NEW!
src/PVE/HA/Manager.pm | 13 +++++
.../test-colocation-strict-separate9/README | 14 +++++
.../test-colocation-strict-separate9/cmdlist | 3 +
.../hardware_status | 5 ++
.../log.expect | 57 +++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 7 +++
8 files changed, 103 insertions(+)
create mode 100644 src/test/test-colocation-strict-separate9/README
create mode 100644 src/test/test-colocation-strict-separate9/cmdlist
create mode 100644 src/test/test-colocation-strict-separate9/hardware_status
create mode 100644 src/test/test-colocation-strict-separate9/log.expect
create mode 100644 src/test/test-colocation-strict-separate9/manager_status
create mode 100644 src/test/test-colocation-strict-separate9/rules_config
create mode 100644 src/test/test-colocation-strict-separate9/service_config
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 66e5710..59b2998 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -1092,6 +1092,19 @@ sub next_state_started {
);
delete $sd->{maintenance_node};
}
+ } elsif ($select_mode eq 'none' && !defined($node)) {
+ # Having no node here means that the service is started but cannot find any
+ # node it is allowed to run on, e.g. added negative colocation rule, while the
+ # nodes aren't separated yet.
+ # TODO Could be made impossible by a dynamic check to drop negative colocation
+ # rules which have defined more services than available nodes
+ $haenv->log(
+ 'err',
+ "service '$sid' cannot run on '$sd->{node}', but no recovery node found",
+ );
+
+ # TODO Should this really move the service to the error state?
+ $change_service_state->($self, $sid, 'error');
}
# ensure service get started again if it went unexpected down
diff --git a/src/test/test-colocation-strict-separate9/README b/src/test/test-colocation-strict-separate9/README
new file mode 100644
index 0000000..85494dd
--- /dev/null
+++ b/src/test/test-colocation-strict-separate9/README
@@ -0,0 +1,14 @@
+Test whether a strict negative colocation rule among five services on a three
+node cluster, makes the services which are on the same node be put in error
+state as there are not enough nodes to separate all of them and it's also not
+clear which of the three is more important to run.
+
+The test scenario is:
+- vm:101 through vm:105 must be kept separate
+- vm:101 through vm:105 are all running on node1
+
+The expected outcome is:
+- As the cluster comes up, vm:102 and vm:103 are migrated to node2 and node3
+- vm:101, vm:104, and vm:105 will be put in error state as there are not enough
+ nodes left to separate them but it is also not clear which service is more
+ important to be run on the only node left.
diff --git a/src/test/test-colocation-strict-separate9/cmdlist b/src/test/test-colocation-strict-separate9/cmdlist
new file mode 100644
index 0000000..3bfad44
--- /dev/null
+++ b/src/test/test-colocation-strict-separate9/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"]
+]
diff --git a/src/test/test-colocation-strict-separate9/hardware_status b/src/test/test-colocation-strict-separate9/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate9/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate9/log.expect b/src/test/test-colocation-strict-separate9/log.expect
new file mode 100644
index 0000000..efe85a2
--- /dev/null
+++ b/src/test/test-colocation-strict-separate9/log.expect
@@ -0,0 +1,57 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 20 node1/crm: migrate service 'vm:102' to node 'node3' (running)
+info 20 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+err 20 node1/crm: service 'vm:103' cannot run on 'node1', but no recovery node found
+info 20 node1/crm: service 'vm:103': state changed from 'started' to 'error'
+err 20 node1/crm: service 'vm:104' cannot run on 'node1', but no recovery node found
+info 20 node1/crm: service 'vm:104': state changed from 'started' to 'error'
+err 20 node1/crm: service 'vm:105' cannot run on 'node1', but no recovery node found
+info 20 node1/crm: service 'vm:105': state changed from 'started' to 'error'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 21 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 21 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 21 node1/lrm: service vm:102 - end migrate to node 'node3'
+err 21 node1/lrm: service vm:103 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
+err 21 node1/lrm: service vm:104 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
+err 21 node1/lrm: service vm:105 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:101
+info 43 node2/lrm: service status vm:101 started
+info 45 node3/lrm: got lock 'ha_agent_node3_lock'
+info 45 node3/lrm: status change wait_for_agent_lock => active
+info 45 node3/lrm: starting service vm:102
+info 45 node3/lrm: service status vm:102 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate9/manager_status b/src/test/test-colocation-strict-separate9/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate9/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate9/rules_config b/src/test/test-colocation-strict-separate9/rules_config
new file mode 100644
index 0000000..478d70b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate9/rules_config
@@ -0,0 +1,3 @@
+colocation: lonely-must-too-many-vms-be
+ services vm:101,vm:102,vm:103,vm:104,vm:105
+ affinity separate
diff --git a/src/test/test-colocation-strict-separate9/service_config b/src/test/test-colocation-strict-separate9/service_config
new file mode 100644
index 0000000..a1d61f5
--- /dev/null
+++ b/src/test/test-colocation-strict-separate9/service_config
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services Daniel Kral
@ 2025-07-01 12:11 ` Michael Köppl
2025-07-01 12:23 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Michael Köppl @ 2025-07-01 12:11 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
On 6/20/25 16:31, Daniel Kral wrote:
> select_service_node(...) in 'none' mode will usually only return no
> node, if negative colocations specify more services than nodes
> available. In these cases, these cannot be separated as there are no
> more nodes left, so these are put in error state for now.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> This is not ideal and I'd rather make this be dropped in the
> check_feasibility(...) part, but then we'd need to introduce more state
> to the check helpers or make a direct call to
This also affects cases where it's not entirely clear why a service is
put into error state. One such case is having a "together" colocation
rule vor VMs 100 and 101 and also defining a location rule that says
that VM 100 has to be on a specific node A. VM 100 will then go into an
error state. From the user's perspective, it is not really transparent
why this happens. Could be that I just made a wrong assumption about
this, but I would've expected VM 100 to be migrated and, due to the
colocation rule, 101 also being migrated to the specified node A, which
is what would happen if migrated manually.
As discussed off-list, one approach to solve this could be to ask users
to create a location rule for each service involved in the "together"
colocation rule upon its creation. As an example:
- 100 has a location rule defined for node A
- User tries to create a colocation rule for 100 and 101
- Dialog asks user to first create a location rule for 101 and node A
With a large number of services this could become tedious, but it would
make combining location and colocation rules for the scenario described
above more explicit and reduce complexity in resolving and applying the
rules.
> PVE::Cluster::get_nodelist(...).
>
> changes since v1:
> - NEW!
>
> src/PVE/HA/Manager.pm | 13 +++++
> .../test-colocation-strict-separate9/README | 14 +++++
> .../test-colocation-strict-separate9/cmdlist | 3 +
> .../hardware_status | 5 ++
> .../log.expect | 57 +++++++++++++++++++
> .../manager_status | 1 +
> .../rules_config | 3 +
> .../service_config | 7 +++
> 8 files changed, 103 insertions(+)
> create mode 100644 src/test/test-colocation-strict-separate9/README
> create mode 100644 src/test/test-colocation-strict-separate9/cmdlist
> create mode 100644 src/test/test-colocation-strict-separate9/hardware_status
> create mode 100644 src/test/test-colocation-strict-separate9/log.expect
> create mode 100644 src/test/test-colocation-strict-separate9/manager_status
> create mode 100644 src/test/test-colocation-strict-separate9/rules_config
> create mode 100644 src/test/test-colocation-strict-separate9/service_config
>
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 66e5710..59b2998 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -1092,6 +1092,19 @@ sub next_state_started {
> );
> delete $sd->{maintenance_node};
> }
> + } elsif ($select_mode eq 'none' && !defined($node)) {
> + # Having no node here means that the service is started but cannot find any
> + # node it is allowed to run on, e.g. added negative colocation rule, while the
> + # nodes aren't separated yet.
> + # TODO Could be made impossible by a dynamic check to drop negative colocation
> + # rules which have defined more services than available nodes
> + $haenv->log(
> + 'err',
> + "service '$sid' cannot run on '$sd->{node}', but no recovery node found",
> + );
> +
> + # TODO Should this really move the service to the error state?
> + $change_service_state->($self, $sid, 'error');
> }
>
> # ensure service get started again if it went unexpected down
> diff --git a/src/test/test-colocation-strict-separate9/README b/src/test/test-colocation-strict-separate9/README
> new file mode 100644
> index 0000000..85494dd
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/README
> @@ -0,0 +1,14 @@
> +Test whether a strict negative colocation rule among five services on a three
> +node cluster, makes the services which are on the same node be put in error
> +state as there are not enough nodes to separate all of them and it's also not
> +clear which of the three is more important to run.
> +
> +The test scenario is:
> +- vm:101 through vm:105 must be kept separate
> +- vm:101 through vm:105 are all running on node1
> +
> +The expected outcome is:
> +- As the cluster comes up, vm:102 and vm:103 are migrated to node2 and node3
> +- vm:101, vm:104, and vm:105 will be put in error state as there are not enough
> + nodes left to separate them but it is also not clear which service is more
> + important to be run on the only node left.
> diff --git a/src/test/test-colocation-strict-separate9/cmdlist b/src/test/test-colocation-strict-separate9/cmdlist
> new file mode 100644
> index 0000000..3bfad44
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/cmdlist
> @@ -0,0 +1,3 @@
> +[
> + [ "power node1 on", "power node2 on", "power node3 on"]
> +]
> diff --git a/src/test/test-colocation-strict-separate9/hardware_status b/src/test/test-colocation-strict-separate9/hardware_status
> new file mode 100644
> index 0000000..451beb1
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/hardware_status
> @@ -0,0 +1,5 @@
> +{
> + "node1": { "power": "off", "network": "off" },
> + "node2": { "power": "off", "network": "off" },
> + "node3": { "power": "off", "network": "off" }
> +}
> diff --git a/src/test/test-colocation-strict-separate9/log.expect b/src/test/test-colocation-strict-separate9/log.expect
> new file mode 100644
> index 0000000..efe85a2
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/log.expect
> @@ -0,0 +1,57 @@
> +info 0 hardware: starting simulation
> +info 20 cmdlist: execute power node1 on
> +info 20 node1/crm: status change startup => wait_for_quorum
> +info 20 node1/lrm: status change startup => wait_for_agent_lock
> +info 20 cmdlist: execute power node2 on
> +info 20 node2/crm: status change startup => wait_for_quorum
> +info 20 node2/lrm: status change startup => wait_for_agent_lock
> +info 20 cmdlist: execute power node3 on
> +info 20 node3/crm: status change startup => wait_for_quorum
> +info 20 node3/lrm: status change startup => wait_for_agent_lock
> +info 20 node1/crm: got lock 'ha_manager_lock'
> +info 20 node1/crm: status change wait_for_quorum => master
> +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
> +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
> +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
> +info 20 node1/crm: adding new service 'vm:101' on node 'node1'
> +info 20 node1/crm: adding new service 'vm:102' on node 'node1'
> +info 20 node1/crm: adding new service 'vm:103' on node 'node1'
> +info 20 node1/crm: adding new service 'vm:104' on node 'node1'
> +info 20 node1/crm: adding new service 'vm:105' on node 'node1'
> +info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
> +info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
> +info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
> +info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
> +info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
> +info 20 node1/crm: migrate service 'vm:101' to node 'node2' (running)
> +info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
> +info 20 node1/crm: migrate service 'vm:102' to node 'node3' (running)
> +info 20 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
> +err 20 node1/crm: service 'vm:103' cannot run on 'node1', but no recovery node found
> +info 20 node1/crm: service 'vm:103': state changed from 'started' to 'error'
> +err 20 node1/crm: service 'vm:104' cannot run on 'node1', but no recovery node found
> +info 20 node1/crm: service 'vm:104': state changed from 'started' to 'error'
> +err 20 node1/crm: service 'vm:105' cannot run on 'node1', but no recovery node found
> +info 20 node1/crm: service 'vm:105': state changed from 'started' to 'error'
> +info 21 node1/lrm: got lock 'ha_agent_node1_lock'
> +info 21 node1/lrm: status change wait_for_agent_lock => active
> +info 21 node1/lrm: service vm:101 - start migrate to node 'node2'
> +info 21 node1/lrm: service vm:101 - end migrate to node 'node2'
> +info 21 node1/lrm: service vm:102 - start migrate to node 'node3'
> +info 21 node1/lrm: service vm:102 - end migrate to node 'node3'
> +err 21 node1/lrm: service vm:103 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
> +err 21 node1/lrm: service vm:104 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
> +err 21 node1/lrm: service vm:105 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
> +info 22 node2/crm: status change wait_for_quorum => slave
> +info 24 node3/crm: status change wait_for_quorum => slave
> +info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
> +info 40 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
> +info 43 node2/lrm: got lock 'ha_agent_node2_lock'
> +info 43 node2/lrm: status change wait_for_agent_lock => active
> +info 43 node2/lrm: starting service vm:101
> +info 43 node2/lrm: service status vm:101 started
> +info 45 node3/lrm: got lock 'ha_agent_node3_lock'
> +info 45 node3/lrm: status change wait_for_agent_lock => active
> +info 45 node3/lrm: starting service vm:102
> +info 45 node3/lrm: service status vm:102 started
> +info 620 hardware: exit simulation - done
> diff --git a/src/test/test-colocation-strict-separate9/manager_status b/src/test/test-colocation-strict-separate9/manager_status
> new file mode 100644
> index 0000000..9e26dfe
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/manager_status
> @@ -0,0 +1 @@
> +{}
> \ No newline at end of file
> diff --git a/src/test/test-colocation-strict-separate9/rules_config b/src/test/test-colocation-strict-separate9/rules_config
> new file mode 100644
> index 0000000..478d70b
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/rules_config
> @@ -0,0 +1,3 @@
> +colocation: lonely-must-too-many-vms-be
> + services vm:101,vm:102,vm:103,vm:104,vm:105
> + affinity separate
> diff --git a/src/test/test-colocation-strict-separate9/service_config b/src/test/test-colocation-strict-separate9/service_config
> new file mode 100644
> index 0000000..a1d61f5
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/service_config
> @@ -0,0 +1,7 @@
> +{
> + "vm:101": { "node": "node1", "state": "started" },
> + "vm:102": { "node": "node1", "state": "started" },
> + "vm:103": { "node": "node1", "state": "started" },
> + "vm:104": { "node": "node1", "state": "started" },
> + "vm:105": { "node": "node1", "state": "started" }
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services
2025-07-01 12:11 ` Michael Köppl
@ 2025-07-01 12:23 ` Daniel Kral
0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-07-01 12:23 UTC (permalink / raw)
To: Michael Köppl, Proxmox VE development discussion
On 7/1/25 14:11, Michael Köppl wrote:
> On 6/20/25 16:31, Daniel Kral wrote:
>> select_service_node(...) in 'none' mode will usually only return no
>> node, if negative colocations specify more services than nodes
>> available. In these cases, these cannot be separated as there are no
>> more nodes left, so these are put in error state for now.
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> This is not ideal and I'd rather make this be dropped in the
>> check_feasibility(...) part, but then we'd need to introduce more state
>> to the check helpers or make a direct call to
>
> This also affects cases where it's not entirely clear why a service is
> put into error state. One such case is having a "together" colocation
> rule vor VMs 100 and 101 and also defining a location rule that says
> that VM 100 has to be on a specific node A. VM 100 will then go into an
> error state. From the user's perspective, it is not really transparent
> why this happens. Could be that I just made a wrong assumption about
> this, but I would've expected VM 100 to be migrated and, due to the
> colocation rule, 101 also being migrated to the specified node A, which
> is what would happen if migrated manually.
>
> As discussed off-list, one approach to solve this could be to ask users
> to create a location rule for each service involved in the "together"
> colocation rule upon its creation. As an example:
>
> - 100 has a location rule defined for node A
> - User tries to create a colocation rule for 100 and 101
> - Dialog asks user to first create a location rule for 101 and node A
>
> With a large number of services this could become tedious, but it would
> make combining location and colocation rules for the scenario described
> above more explicit and reduce complexity in resolving and applying the
> rules.
Right, as already anticipated and discussed off-list, moving the service
in error state creates more trouble than necessary and is also confusing
to and unwanted by end users. I'll remove that in v3 as well.
I'd also rather restrict these combinations more in advance (i.e., in a
rule checker), that users need to specify the node affinity for _all_
services that are in a positive service affinity rule, as else it is
rather ambiguous what is to be done. More on that in my self-reply for
ha-manager patch #15.
We can still remove that restriction later and do some inference, but as
already discussed off-list, I think that is rather confusing with an
increasing amount of rules. But removing ambiguity from the start for
the user and the HA Manager is a benefit IMO.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 22/26] config: prune services from rules if services are deleted from config
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (24 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 23/26] api: introduce ha rules api endpoints Daniel Kral
` (16 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Remove services from rules, where these services are used, if they are
removed by delete_service_from_config(...), which is called by the
services' delete API endpoint and possibly external callers, e.g. if the
service is removed externally.
If all of the rules' services have been removed, the rule itself must be
removed as it would result in an erroneous rules config, which would
become user-visible at the next read and parse of the rules config.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/HA/Config.pm | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 2b3d726..3442d31 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -362,6 +362,25 @@ sub delete_service_from_config {
"delete resource failed",
);
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = read_rules_config();
+
+ return if !defined($rules->{ids});
+
+ for my $ruleid (keys %{ $rules->{ids} }) {
+ my $rule_services = $rules->{ids}->{$ruleid}->{services} // {};
+
+ delete $rule_services->{$sid};
+
+ delete $rules->{ids}->{$ruleid} if !%$rule_services;
+ }
+
+ write_rules_config($rules);
+ },
+ "delete resource from rules failed",
+ );
+
return !!$res;
}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 23/26] api: introduce ha rules api endpoints
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (25 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 22/26] config: prune services from rules if services are deleted from config Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-07-04 14:16 ` Michael Köppl
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 24/26] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
` (15 subsequent siblings)
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add CRUD API endpoints for HA rules, which assert whether the given
properties for the rules are valid and will not make the existing rule
set infeasible.
Disallowing changes to the rule set via the API, which would make this
and other rules infeasible, makes it safer for users of the HA Manager
to not disrupt the behavior that other rules already enforce.
This functionality can obviously not safeguard manual changes to the
rules config file itself, but manual changes that result in infeasible
rules will be dropped on the next canonalize(...) call by the HA
Manager anyway with a log message.
The use-location-rules feature flag controls whether location rules are
allowed to be created or modified.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
debian/pve-ha-manager.install | 1 +
src/PVE/API2/HA/Makefile | 2 +-
src/PVE/API2/HA/Rules.pm | 409 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Config.pm | 6 +
4 files changed, 417 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/API2/HA/Rules.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index e83c0de..d273959 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -16,6 +16,7 @@
/usr/share/man/man8/pve-ha-lrm.8.gz
/usr/share/perl5/PVE/API2/HA/Groups.pm
/usr/share/perl5/PVE/API2/HA/Resources.pm
+/usr/share/perl5/PVE/API2/HA/Rules.pm
/usr/share/perl5/PVE/API2/HA/Status.pm
/usr/share/perl5/PVE/CLI/ha_manager.pm
/usr/share/perl5/PVE/HA/CRM.pm
diff --git a/src/PVE/API2/HA/Makefile b/src/PVE/API2/HA/Makefile
index 5686efc..86c1013 100644
--- a/src/PVE/API2/HA/Makefile
+++ b/src/PVE/API2/HA/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Resources.pm Groups.pm Status.pm
+SOURCES=Resources.pm Groups.pm Rules.pm Status.pm
.PHONY: install
install:
diff --git a/src/PVE/API2/HA/Rules.pm b/src/PVE/API2/HA/Rules.pm
new file mode 100644
index 0000000..e5d6817
--- /dev/null
+++ b/src/PVE/API2/HA/Rules.pm
@@ -0,0 +1,409 @@
+package PVE::API2::HA::Rules;
+
+use strict;
+use warnings;
+
+use HTTP::Status qw(:constants);
+
+use Storable qw(dclone);
+
+use PVE::Cluster qw(cfs_read_file);
+use PVE::Exception;
+use PVE::Tools qw(extract_param);
+use PVE::JSONSchema qw(get_standard_option);
+
+use PVE::HA::Config;
+use PVE::HA::Groups;
+use PVE::HA::Rules;
+use PVE::HA::Rules::Location;
+
+use base qw(PVE::RESTHandler);
+
+my $get_api_ha_rule = sub {
+ my ($rules, $ruleid, $rule_errors) = @_;
+
+ die "no such ha rule '$ruleid'\n" if !$rules->{ids}->{$ruleid};
+
+ my $rule_cfg = dclone($rules->{ids}->{$ruleid});
+
+ $rule_cfg->{rule} = $ruleid;
+ $rule_cfg->{digest} = $rules->{digest};
+ $rule_cfg->{order} = $rules->{order}->{$ruleid};
+
+ # set optional rule parameter's default values
+ PVE::HA::Rules->set_rule_defaults($rule_cfg);
+
+ if ($rule_cfg->{services}) {
+ $rule_cfg->{services} =
+ PVE::HA::Rules->encode_value($rule_cfg->{type}, 'services', $rule_cfg->{services});
+ }
+
+ if ($rule_cfg->{nodes}) {
+ $rule_cfg->{nodes} =
+ PVE::HA::Rules->encode_value($rule_cfg->{type}, 'nodes', $rule_cfg->{nodes});
+ }
+
+ if ($rule_errors) {
+ $rule_cfg->{state} = 'contradictory';
+ $rule_cfg->{errors} = $rule_errors;
+ }
+
+ return $rule_cfg;
+};
+
+my $verify_rule_type_is_allowed = sub {
+ my ($type, $noerr) = @_;
+
+ return 1 if $type ne 'location' || PVE::HA::Config::is_ha_location_enabled();
+
+ die "location rules are disabled in the datacenter config\n" if !$noerr;
+ return 0;
+};
+
+my $assert_services_are_configured = sub {
+ my ($services) = @_;
+
+ my $unconfigured_services = [];
+
+ for my $service (sort keys %$services) {
+ push @$unconfigured_services, $service
+ if !PVE::HA::Config::service_is_configured($service);
+ }
+
+ die "cannot use unmanaged service(s) " . join(', ', @$unconfigured_services) . ".\n"
+ if @$unconfigured_services;
+};
+
+my $assert_nodes_do_exist = sub {
+ my ($nodes) = @_;
+
+ my $nonexistant_nodes = [];
+
+ for my $node (sort keys %$nodes) {
+ push @$nonexistant_nodes, $node
+ if !PVE::Cluster::check_node_exists($node, 1);
+ }
+
+ die "cannot use non-existant node(s) " . join(', ', @$nonexistant_nodes) . ".\n"
+ if @$nonexistant_nodes;
+};
+
+my $check_feasibility = sub {
+ my ($rules) = @_;
+
+ $rules = dclone($rules);
+
+ # set optional rule parameter's default values
+ for my $rule (values %{ $rules->{ids} }) {
+ PVE::HA::Rules->set_rule_defaults($rule);
+ }
+
+ # TODO PVE 10: Remove group migration when HA groups have been fully migrated to location rules
+ if (!PVE::HA::Config::is_ha_location_enabled()) {
+ my $groups = PVE::HA::Config::read_group_config();
+ my $services = PVE::HA::Config::read_and_check_resources_config();
+
+ PVE::HA::Rules::Location::delete_location_rules($rules);
+ PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $services);
+ }
+
+ return PVE::HA::Rules->check_feasibility($rules);
+};
+
+my $assert_feasibility = sub {
+ my ($rules, $ruleid) = @_;
+
+ my $global_errors = $check_feasibility->($rules);
+ my $rule_errors = $global_errors->{$ruleid};
+
+ return if !$rule_errors;
+
+ # stringify error messages
+ for my $opt (keys %$rule_errors) {
+ $rule_errors->{$opt} = join(', ', @{ $rule_errors->{$opt} });
+ }
+
+ my $param = {
+ code => HTTP_BAD_REQUEST,
+ errors => $rule_errors,
+ };
+
+ my $exc = PVE::Exception->new("Rule '$ruleid' is invalid.\n", %$param);
+
+ my ($pkg, $filename, $line) = caller;
+
+ $exc->{filename} = $filename;
+ $exc->{line} = $line;
+
+ die $exc;
+};
+
+__PACKAGE__->register_method({
+ name => 'index',
+ path => '',
+ method => 'GET',
+ description => "Get HA rules.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Audit']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ type => {
+ type => 'string',
+ description => "Limit the returned list to the specified rule type.",
+ enum => PVE::HA::Rules->lookup_types(),
+ optional => 1,
+ },
+ state => {
+ type => 'string',
+ description => "Limit the returned list to the specified rule state.",
+ enum => ['enabled', 'disabled'],
+ optional => 1,
+ },
+ service => {
+ type => 'string',
+ description =>
+ "Limit the returned list to rules affecting the specified service.",
+ completion => \&PVE::HA::Tools::complete_sid,
+ optional => 1,
+ },
+ },
+ },
+ returns => {
+ type => 'array',
+ items => {
+ type => 'object',
+ properties => {
+ rule => { type => 'string' },
+ },
+ links => [{ rel => 'child', href => '{rule}' }],
+ },
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $type = extract_param($param, 'type');
+ my $state = extract_param($param, 'state');
+ my $service = extract_param($param, 'service');
+
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ my $global_errors = $check_feasibility->($rules);
+
+ my $res = [];
+
+ PVE::HA::Rules::foreach_rule(
+ $rules,
+ sub {
+ my ($rule, $ruleid) = @_;
+
+ my $rule_errors = $global_errors->{$ruleid};
+ my $rule_cfg = $get_api_ha_rule->($rules, $ruleid, $rule_errors);
+
+ # skip rule types which are not allowed
+ return if !$verify_rule_type_is_allowed->($rule_cfg->{type}, 1);
+
+ push @$res, $rule_cfg;
+ },
+ {
+ type => $type,
+ state => $state,
+ sid => $service,
+ },
+ );
+
+ return $res;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'read_rule',
+ method => 'GET',
+ path => '{rule}',
+ description => "Read HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Audit']],
+ },
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ { completion => \&PVE::HA::Tools::complete_rule },
+ ),
+ },
+ },
+ returns => {
+ type => 'object',
+ properties => {
+ rule => get_standard_option('pve-ha-rule-id'),
+ type => {
+ type => 'string',
+ },
+ },
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ my $global_errors = $check_feasibility->($rules);
+ my $rule_errors = $global_errors->{$ruleid};
+
+ return $get_api_ha_rule->($rules, $ruleid, $rule_errors);
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'create_rule',
+ method => 'POST',
+ path => '',
+ description => "Create HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => PVE::HA::Rules->createSchema(),
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ PVE::Cluster::check_cfs_quorum();
+ mkdir("/etc/pve/ha");
+
+ my $type = extract_param($param, 'type');
+ my $ruleid = extract_param($param, 'rule');
+
+ my $plugin = PVE::HA::Rules->lookup($type);
+
+ my $opts = $plugin->check_config($ruleid, $param, 1, 1);
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ die "HA rule '$ruleid' already defined\n" if $rules->{ids}->{$ruleid};
+
+ $verify_rule_type_is_allowed->($type);
+ $assert_services_are_configured->($opts->{services});
+ $assert_nodes_do_exist->($opts->{nodes}) if $opts->{nodes};
+
+ my $maxorder = (sort { $a <=> $b } values %{ $rules->{order} })[0] || 0;
+
+ $rules->{order}->{$ruleid} = ++$maxorder;
+ $rules->{ids}->{$ruleid} = $opts;
+
+ $assert_feasibility->($rules, $ruleid);
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "create ha rule failed",
+ );
+
+ return undef;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'update_rule',
+ method => 'PUT',
+ path => '{rule}',
+ description => "Update HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => PVE::HA::Rules->updateSchema(),
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+ my $digest = extract_param($param, 'digest');
+ my $delete = extract_param($param, 'delete');
+
+ if ($delete) {
+ $delete = [PVE::Tools::split_list($delete)];
+ }
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ PVE::SectionConfig::assert_if_modified($rules, $digest);
+
+ my $rule = $rules->{ids}->{$ruleid} || die "HA rule '$ruleid' does not exist\n";
+
+ my $type = $rule->{type};
+ my $plugin = PVE::HA::Rules->lookup($type);
+ my $opts = $plugin->check_config($ruleid, $param, 0, 1);
+
+ $verify_rule_type_is_allowed->($type);
+ $assert_services_are_configured->($opts->{services});
+ $assert_nodes_do_exist->($opts->{nodes}) if $opts->{nodes};
+
+ my $options = $plugin->private()->{options}->{$type};
+ PVE::SectionConfig::delete_from_config($rule, $options, $opts, $delete);
+
+ $rule->{$_} = $opts->{$_} for keys $opts->%*;
+
+ $assert_feasibility->($rules, $ruleid);
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "update HA rules failed",
+ );
+
+ return undef;
+ },
+});
+
+__PACKAGE__->register_method({
+ name => 'delete_rule',
+ method => 'DELETE',
+ path => '{rule}',
+ description => "Delete HA rule.",
+ permissions => {
+ check => ['perm', '/', ['Sys.Console']],
+ },
+ protected => 1,
+ parameters => {
+ additionalProperties => 0,
+ properties => {
+ rule => get_standard_option(
+ 'pve-ha-rule-id',
+ { completion => \&PVE::HA::Tools::complete_rule },
+ ),
+ },
+ },
+ returns => {
+ type => 'null',
+ },
+ code => sub {
+ my ($param) = @_;
+
+ my $ruleid = extract_param($param, 'rule');
+
+ PVE::HA::Config::lock_ha_domain(
+ sub {
+ my $rules = PVE::HA::Config::read_rules_config();
+
+ delete $rules->{ids}->{$ruleid};
+
+ PVE::HA::Config::write_rules_config($rules);
+ },
+ "delete ha rule failed",
+ );
+
+ return undef;
+ },
+});
+
+1;
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 3442d31..de0fcec 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -427,4 +427,10 @@ sub get_service_status {
return $status;
}
+sub is_ha_location_enabled {
+ my $datacenter_cfg = eval { cfs_read_file('datacenter.cfg') } // {};
+
+ return $datacenter_cfg->{ha}->{'use-location-rules'};
+}
+
1;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH ha-manager v2 23/26] api: introduce ha rules api endpoints
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 23/26] api: introduce ha rules api endpoints Daniel Kral
@ 2025-07-04 14:16 ` Michael Köppl
0 siblings, 0 replies; 70+ messages in thread
From: Michael Köppl @ 2025-07-04 14:16 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
On 6/20/25 16:31, Daniel Kral wrote:
> +my $check_feasibility = sub {
> + my ($rules) = @_;
> +
> + $rules = dclone($rules);
> +
> + # set optional rule parameter's default values
> + for my $rule (values %{ $rules->{ids} }) {
> + PVE::HA::Rules->set_rule_defaults($rule);
> + }
> +
> + # TODO PVE 10: Remove group migration when HA groups have been fully migrated to location rules
> + if (!PVE::HA::Config::is_ha_location_enabled()) {
> + my $groups = PVE::HA::Config::read_group_config();
> + my $services = PVE::HA::Config::read_and_check_resources_config();
> +
> + PVE::HA::Rules::Location::delete_location_rules($rules);
> + PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $services);
> + }
> +
> + return PVE::HA::Rules->check_feasibility($rules);
> +};
> +
> +my $assert_feasibility = sub {
> + my ($rules, $ruleid) = @_;
> +
> + my $global_errors = $check_feasibility->($rules);
> + my $rule_errors = $global_errors->{$ruleid};
> +
> + return if !$rule_errors;
Consider the following scenario: I have a colocation rule with vm:100
and vm:101 together and a location rule for vm:100 and node1. I would
still be able to add another location rule for vm:101 and node2 because
$global_errors would contain an error for the colocation rule, but
$rule_errors would still be empty as it's strictly speaking not an error
for the location rule I just created and its ID is not contained in
$global_errors.
From a technical standpoint this makes sense, but from a user's
perspective I found this a bit confusing, as whether or not I received
an error dialog upon creation which also stopped me from creating a
contradictory rule depended on the order in which I create said rules.
I understand that, at the moment, location rules trump colocation rules
in this regard and as long as the colocation rule is disabled and
displays a warning as a result of such a scenario, I don't see a problem
with it. Just wanted to note this as a potential future improvement.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 24/26] cli: expose ha rules api endpoints to ha-manager cli
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (26 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 23/26] api: introduce ha rules api endpoints Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 25/26] api: groups, services: assert use-location-rules feature flag Daniel Kral
` (14 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Expose the HA rules API endpoints through the CLI in its own subcommand.
The names of the subsubcommands are chosen to be consistent with the
other commands provided by the ha-manager CLI for services and groups.
The properties specified for the 'rules config' command are chosen to
reflect the columns from the WebGUI for the HA rules.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/CLI/ha_manager.pm | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/src/PVE/CLI/ha_manager.pm b/src/PVE/CLI/ha_manager.pm
index ca230f2..564ac96 100644
--- a/src/PVE/CLI/ha_manager.pm
+++ b/src/PVE/CLI/ha_manager.pm
@@ -17,6 +17,7 @@ use PVE::HA::Env::PVE2;
use PVE::HA::Tools;
use PVE::API2::HA::Resources;
use PVE::API2::HA::Groups;
+use PVE::API2::HA::Rules;
use PVE::API2::HA::Status;
use base qw(PVE::CLIHandler);
@@ -199,6 +200,37 @@ our $cmddef = {
groupremove => ["PVE::API2::HA::Groups", 'delete', ['group']],
groupset => ["PVE::API2::HA::Groups", 'update', ['group']],
+ rules => {
+ list => [
+ 'PVE::API2::HA::Rules',
+ 'index',
+ [],
+ {},
+ sub {
+ my ($data, $schema, $options) = @_;
+ PVE::CLIFormatter::print_api_result($data, $schema, undef, $options);
+ },
+ $PVE::RESTHandler::standard_output_options,
+ ],
+ config => [
+ 'PVE::API2::HA::Rules',
+ 'index',
+ ['rule'],
+ {},
+ sub {
+ my ($data, $schema, $options) = @_;
+ my $props_to_print = [
+ 'rule', 'type', 'state', 'affinity', 'strict', 'services', 'nodes',
+ ];
+ PVE::CLIFormatter::print_api_result($data, $schema, $props_to_print, $options);
+ },
+ $PVE::RESTHandler::standard_output_options,
+ ],
+ add => ['PVE::API2::HA::Rules', 'create_rule', ['type', 'rule']],
+ remove => ['PVE::API2::HA::Rules', 'delete_rule', ['rule']],
+ set => ['PVE::API2::HA::Rules', 'update_rule', ['type', 'rule']],
+ },
+
add => ["PVE::API2::HA::Resources", 'create', ['sid']],
remove => ["PVE::API2::HA::Resources", 'delete', ['sid']],
set => ["PVE::API2::HA::Resources", 'update', ['sid']],
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 25/26] api: groups, services: assert use-location-rules feature flag
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (27 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 24/26] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 26/26] api: services: check for colocations for service motions Daniel Kral
` (13 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Assert whether certain properties are allowed to be passed for the HA
groups and HA services API endpoints depending on whether the
use-location-rules feature flag is enabled or disabled.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
src/PVE/API2/HA/Groups.pm | 20 ++++++++++++++++++++
src/PVE/API2/HA/Resources.pm | 30 ++++++++++++++++++++++++++----
src/PVE/API2/HA/Status.pm | 6 +++++-
3 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/src/PVE/API2/HA/Groups.pm b/src/PVE/API2/HA/Groups.pm
index 32350df..4dcb458 100644
--- a/src/PVE/API2/HA/Groups.pm
+++ b/src/PVE/API2/HA/Groups.pm
@@ -32,6 +32,15 @@ my $api_copy_config = sub {
return $group_cfg;
};
+my $verify_group_api_call_is_allowed = sub {
+ my ($noerr) = @_;
+
+ return 1 if !PVE::HA::Config::is_ha_location_enabled();
+
+ die "ha groups are not allowed because location rules are enabled\n" if !$noerr;
+ return 0;
+};
+
__PACKAGE__->register_method({
name => 'index',
path => '',
@@ -55,6 +64,9 @@ __PACKAGE__->register_method({
code => sub {
my ($param) = @_;
+ # return empty list instead of errors
+ return [] if !$verify_group_api_call_is_allowed->(1);
+
my $cfg = PVE::HA::Config::read_group_config();
my $res = [];
@@ -89,6 +101,8 @@ __PACKAGE__->register_method({
code => sub {
my ($param) = @_;
+ $verify_group_api_call_is_allowed->();
+
my $cfg = PVE::HA::Config::read_group_config();
return &$api_copy_config($cfg, $param->{group});
@@ -109,6 +123,8 @@ __PACKAGE__->register_method({
code => sub {
my ($param) = @_;
+ $verify_group_api_call_is_allowed->();
+
# create /etc/pve/ha directory
PVE::Cluster::check_cfs_quorum();
mkdir("/etc/pve/ha");
@@ -160,6 +176,8 @@ __PACKAGE__->register_method({
code => sub {
my ($param) = @_;
+ $verify_group_api_call_is_allowed->();
+
my $digest = extract_param($param, 'digest');
my $delete = extract_param($param, 'delete');
@@ -233,6 +251,8 @@ __PACKAGE__->register_method({
code => sub {
my ($param) = @_;
+ $verify_group_api_call_is_allowed->();
+
my $group = extract_param($param, 'group');
PVE::HA::Config::lock_ha_domain(
diff --git a/src/PVE/API2/HA/Resources.pm b/src/PVE/API2/HA/Resources.pm
index 5916204..f41fa2f 100644
--- a/src/PVE/API2/HA/Resources.pm
+++ b/src/PVE/API2/HA/Resources.pm
@@ -5,7 +5,7 @@ use warnings;
use PVE::SafeSyslog;
use PVE::Tools qw(extract_param);
-use PVE::Cluster;
+use PVE::Cluster qw(cfs_read_file);
use PVE::HA::Config;
use PVE::HA::Resources;
use HTTP::Status qw(:constants);
@@ -22,7 +22,7 @@ use base qw(PVE::RESTHandler);
my $resource_type_enum = PVE::HA::Resources->lookup_types();
my $api_copy_config = sub {
- my ($cfg, $sid) = @_;
+ my ($cfg, $sid, $remove_group) = @_;
die "no such resource '$sid'\n" if !$cfg->{ids}->{$sid};
@@ -30,9 +30,23 @@ my $api_copy_config = sub {
$scfg->{sid} = $sid;
$scfg->{digest} = $cfg->{digest};
+ delete $scfg->{group} if $remove_group;
+
return $scfg;
};
+my $assert_service_params_are_allowed = sub {
+ my ($param) = @_;
+
+ my $use_location_rules = PVE::HA::Config::is_ha_location_enabled();
+
+ die "'group' is not allowed because location rules are enabled in datacenter config\n"
+ if defined($param->{group}) && $use_location_rules;
+
+ die "'failback' is not allowed because location rules are disabled in datacenter config\n",
+ if defined($param->{failback}) && !$use_location_rules;
+};
+
sub check_service_state {
my ($sid, $req_state) = @_;
@@ -78,9 +92,11 @@ __PACKAGE__->register_method({
my $cfg = PVE::HA::Config::read_resources_config();
my $groups = PVE::HA::Config::read_group_config();
+ my $use_location_rules = PVE::HA::Config::is_ha_location_enabled();
+
my $res = [];
foreach my $sid (keys %{ $cfg->{ids} }) {
- my $scfg = &$api_copy_config($cfg, $sid);
+ my $scfg = &$api_copy_config($cfg, $sid, $use_location_rules);
next if $param->{type} && $param->{type} ne $scfg->{type};
if ($scfg->{group} && !$groups->{ids}->{ $scfg->{group} }) {
$scfg->{errors}->{group} = "group '$scfg->{group}' does not exist";
@@ -154,7 +170,9 @@ __PACKAGE__->register_method({
my $sid = PVE::HA::Config::parse_sid($param->{sid});
- return &$api_copy_config($cfg, $sid);
+ my $use_location_rules = PVE::HA::Config::is_ha_location_enabled();
+
+ return &$api_copy_config($cfg, $sid, $use_location_rules);
},
});
@@ -188,6 +206,8 @@ __PACKAGE__->register_method({
$plugin->exists($name);
+ $assert_service_params_are_allowed->($param);
+
my $opts = $plugin->check_config($sid, $param, 1, 1);
PVE::HA::Config::lock_ha_domain(
@@ -235,6 +255,8 @@ __PACKAGE__->register_method({
die "types does not match\n" if $param_type ne $type;
}
+ $assert_service_params_are_allowed->($param);
+
if (my $group = $param->{group}) {
my $group_cfg = PVE::HA::Config::read_group_config();
diff --git a/src/PVE/API2/HA/Status.pm b/src/PVE/API2/HA/Status.pm
index 1547e0e..eba3876 100644
--- a/src/PVE/API2/HA/Status.pm
+++ b/src/PVE/API2/HA/Status.pm
@@ -241,6 +241,8 @@ __PACKAGE__->register_method({
}
}
+ my $use_location_rules = PVE::HA::Config::is_ha_location_enabled();
+
my $add_service = sub {
my ($sid, $sc, $ss) = @_;
@@ -260,7 +262,9 @@ __PACKAGE__->register_method({
# also return common resource attributes
if (defined($sc)) {
$data->{request_state} = $sc->{state};
- foreach my $key (qw(group max_restart max_relocate comment)) {
+ my @attributes = qw(max_restart max_relocate comment);
+ push @attributes, 'group' if !$use_location_rules;
+ foreach my $key (@attributes) {
$data->{$key} = $sc->{$key} if defined($sc->{$key});
}
}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH ha-manager v2 26/26] api: services: check for colocations for service motions
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (28 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 25/26] api: groups, services: assert use-location-rules feature flag Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 1/5] ha: config: add section about ha rules Daniel Kral
` (12 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
The HA Manager already handles positive and negative colocations for
individual service migration, but the information about these is only
redirected to the HA environment's logger, i.e., for production usage
these messages are redirected to the HA Manager node's syslog.
Therefore, add checks when migrating/relocating services through their
respective API endpoints to give users information about side-effects,
i.e., positively colocated services, which are migrated together with
the service to the requested target node, and blockers, i.e., negative
colocated services, which are on the requested target node.
get_service_motion_info(...) is also callable from other packages, to
get a listing of all allowed and disallowed nodes with respect to the HA
Colocation rules, e.g., a migration precondition check.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch is still more a draft of what I thought this should work
like, i.e., that users get notified and not only the admin through the
HA Manager node's syslog. I wrote get_service_motion_info(...) roughly
so that it can also be called by the precondition checks in qemu-server
and pve-container at a later point to easily gather allowed and
disallowed nodes.
I'd also introduce a --force flag for the ha-manager migrate/relocate
CLI endpoints so that a callee must confirm that the side-effects should
really be done.
changes since v1:
- NEW!
src/PVE/API2/HA/Resources.pm | 78 +++++++++++++++++++++++++++++++++---
src/PVE/CLI/ha_manager.pm | 38 +++++++++++++++++-
src/PVE/HA/Config.pm | 60 +++++++++++++++++++++++++++
3 files changed, 168 insertions(+), 8 deletions(-)
diff --git a/src/PVE/API2/HA/Resources.pm b/src/PVE/API2/HA/Resources.pm
index f41fa2f..d217bb8 100644
--- a/src/PVE/API2/HA/Resources.pm
+++ b/src/PVE/API2/HA/Resources.pm
@@ -59,6 +59,14 @@ sub check_service_state {
}
}
+sub check_service_motion {
+ my ($sid, $req_node) = @_;
+
+ my ($allowed_nodes, $disallowed_nodes) = PVE::HA::Config::get_service_motion_info($sid);
+
+ return ($allowed_nodes->{$req_node}, $disallowed_nodes->{$req_node});
+}
+
__PACKAGE__->register_method({
name => 'index',
path => '',
@@ -331,19 +339,48 @@ __PACKAGE__->register_method({
),
},
},
- returns => { type => 'null' },
+ returns => {
+ type => 'object',
+ properties => {
+ 'requested-node' => {
+ description => "Node, which was requested to be migrated to.",
+ type => 'string',
+ optional => 0,
+ },
+ 'side-effects' => {
+ description => "Positively colocated HA resources, which are"
+ . " relocated to the same requested target node.",
+ type => 'array',
+ optional => 1,
+ },
+ },
+ },
code => sub {
my ($param) = @_;
+ my $result = {};
+
my ($sid, $type, $name) = PVE::HA::Config::parse_sid(extract_param($param, 'sid'));
+ my $req_node = extract_param($param, 'node');
PVE::HA::Config::service_is_ha_managed($sid);
check_service_state($sid);
- PVE::HA::Config::queue_crm_commands("migrate $sid $param->{node}");
+ my ($side_effects, $blockers) = check_service_motion($sid, $req_node);
- return undef;
+ PVE::HA::Config::queue_crm_commands("migrate $sid $req_node");
+ $result->{'requested-node'} = $req_node;
+
+ if (defined($blockers)) {
+ die "cannot migrate '$sid' to '$req_node' - negatively colocated service(s) "
+ . join(', ', @$blockers)
+ . " on target '$req_node'\n";
+ }
+
+ $result->{'side-effects'} = $side_effects if @$side_effects;
+
+ return $result;
},
});
@@ -373,19 +410,48 @@ __PACKAGE__->register_method({
),
},
},
- returns => { type => 'null' },
+ returns => {
+ type => 'object',
+ properties => {
+ 'requested-node' => {
+ description => "Node, which was requested to be relocated to.",
+ type => 'string',
+ optional => 0,
+ },
+ 'side-effects' => {
+ description => "Positively colocated HA resources, which are"
+ . " relocated to the same requested target node.",
+ type => 'array',
+ optional => 1,
+ },
+ },
+ },
code => sub {
my ($param) = @_;
+ my $result = {};
+
my ($sid, $type, $name) = PVE::HA::Config::parse_sid(extract_param($param, 'sid'));
+ my $req_node = extract_param($param, 'node');
PVE::HA::Config::service_is_ha_managed($sid);
check_service_state($sid);
- PVE::HA::Config::queue_crm_commands("relocate $sid $param->{node}");
+ my ($side_effects, $blockers) = check_service_motion($sid, $req_node);
- return undef;
+ PVE::HA::Config::queue_crm_commands("relocate $sid $req_node");
+ $result->{'requested-node'} = $req_node;
+
+ if (defined($blockers)) {
+ die "cannot relocate '$sid' to '$req_node' - negatively colocated service(s) "
+ . join(', ', @$blockers)
+ . " on target '$req_node'\n";
+ }
+
+ $result->{'side-effects'} = $side_effects if @$side_effects;
+
+ return $result;
},
});
diff --git a/src/PVE/CLI/ha_manager.pm b/src/PVE/CLI/ha_manager.pm
index 564ac96..e34c8eb 100644
--- a/src/PVE/CLI/ha_manager.pm
+++ b/src/PVE/CLI/ha_manager.pm
@@ -239,8 +239,42 @@ our $cmddef = {
relocate => { alias => 'crm-command relocate' },
'crm-command' => {
- migrate => ["PVE::API2::HA::Resources", 'migrate', ['sid', 'node']],
- relocate => ["PVE::API2::HA::Resources", 'relocate', ['sid', 'node']],
+ migrate => [
+ "PVE::API2::HA::Resources",
+ 'migrate',
+ ['sid', 'node'],
+ {},
+ sub {
+ my ($result) = @_;
+
+ if ($result->{'side-effects'}) {
+ my $req_node = $result->{'requested-node'};
+
+ for my $csid ($result->{'side-effects'}->@*) {
+ print
+ "also migrate positive colocated service '$csid' to '$req_node'\n";
+ }
+ }
+ },
+ ],
+ relocate => [
+ "PVE::API2::HA::Resources",
+ 'relocate',
+ ['sid', 'node'],
+ {},
+ sub {
+ my ($result) = @_;
+
+ if ($result->{'side-effects'}) {
+ my $req_node = $result->{'requested-node'};
+
+ for my $csid ($result->{'side-effects'}->@*) {
+ print
+ "also relocate positive colocated service '$csid' to '$req_node'\n";
+ }
+ }
+ },
+ ],
stop => [__PACKAGE__, 'stop', ['sid', 'timeout']],
'node-maintenance' => {
enable => [__PACKAGE__, 'node-maintenance-set', ['node'], { disable => 0 }],
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index de0fcec..c9172a5 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -8,6 +8,7 @@ use JSON;
use PVE::HA::Tools;
use PVE::HA::Groups;
use PVE::HA::Rules;
+use PVE::HA::Rules::Colocation qw(get_colocated_services);
use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
use PVE::HA::Resources;
@@ -223,6 +224,24 @@ sub read_and_check_rules_config {
return $rules;
}
+sub read_and_check_full_rules_config {
+
+ my $rules = read_and_check_rules_config();
+
+ # TODO PVE 10: Remove group migration when HA groups have been fully migrated to location rules
+ if (!is_ha_location_enabled()) {
+ my $groups = read_group_config();
+ my $services = read_and_check_resources_config();
+
+ PVE::HA::Rules::Location::delete_location_rules($rules);
+ PVE::HA::Groups::migrate_groups_to_rules($rules, $groups, $services);
+ }
+
+ PVE::HA::Rules->canonicalize($rules);
+
+ return $rules;
+}
+
sub write_rules_config {
my ($cfg) = @_;
@@ -345,6 +364,47 @@ sub service_is_configured {
return 0;
}
+sub get_service_motion_info {
+ my ($sid) = @_;
+
+ my $services = read_resources_config();
+
+ my $allowed_nodes = {};
+ my $disallowed_nodes = {};
+
+ if (&$service_check_ha_state($services, $sid)) {
+ my $manager_status = read_manager_status();
+ my $ss = $manager_status->{service_status};
+ my $ns = $manager_status->{node_status};
+
+ my $rules = read_and_check_full_rules_config();
+ my ($together, $separate) = get_colocated_services($rules, $sid);
+
+ for my $node (keys %$ns) {
+ next if $ns->{$node} ne 'online';
+
+ for my $csid (sort keys %$separate) {
+ next if $ss->{$csid}->{node} && $ss->{$csid}->{node} ne $node;
+ next if $ss->{$csid}->{target} && $ss->{$csid}->{target} ne $node;
+
+ push @{ $disallowed_nodes->{$node} }, $csid;
+ }
+
+ next if $disallowed_nodes->{$node};
+
+ $allowed_nodes->{$node} = [];
+ for my $csid (sort keys %$together) {
+ next if $ss->{$csid}->{node} && $ss->{$csid}->{node} eq $node;
+ next if $ss->{$csid}->{target} && $ss->{$csid}->{target} eq $node;
+
+ push @{ $allowed_nodes->{$node} }, $csid;
+ }
+ }
+ }
+
+ return ($allowed_nodes, $disallowed_nodes);
+}
+
# graceful, as long as locking + cfs_write works
sub delete_service_from_config {
my ($sid) = @_;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH docs v2 1/5] ha: config: add section about ha rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (29 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 26/26] api: services: check for colocations for service motions Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 2/5] update static files to include ha rules api endpoints Daniel Kral
` (11 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add section about how to create and modify ha rules, describing their
use cases and document their common and plugin-specific properties.
As of now, HA Location rules are controlled by the feature flag
'use-location-rules' in the datacenter config to replace HA Groups.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
There's still a few things missing here, e.g., what happens during
migration and pointing out that "connected" positive colocation rules
are handled as one positive colocation.
changes since v1:
- NEW!
Makefile | 3 +
gen-ha-rules-colocation-opts.pl | 20 +++
gen-ha-rules-location-opts.pl | 20 +++
gen-ha-rules-opts.pl | 17 +++
ha-manager.adoc | 231 +++++++++++++++++++++++++++++++-
ha-rules-colocation-opts.adoc | 8 ++
ha-rules-location-opts.adoc | 14 ++
ha-rules-opts.adoc | 12 ++
8 files changed, 318 insertions(+), 7 deletions(-)
create mode 100755 gen-ha-rules-colocation-opts.pl
create mode 100755 gen-ha-rules-location-opts.pl
create mode 100755 gen-ha-rules-opts.pl
create mode 100644 ha-rules-colocation-opts.adoc
create mode 100644 ha-rules-location-opts.adoc
create mode 100644 ha-rules-opts.adoc
diff --git a/Makefile b/Makefile
index f30d77a..b6da924 100644
--- a/Makefile
+++ b/Makefile
@@ -49,6 +49,9 @@ GEN_DEB_SOURCES= \
GEN_SCRIPTS= \
gen-ha-groups-opts.pl \
gen-ha-resources-opts.pl \
+ gen-ha-rules-opts.pl \
+ gen-ha-rules-colocation-opts.pl \
+ gen-ha-rules-location-opts.pl \
gen-datacenter.cfg.5-opts.pl \
gen-pct.conf.5-opts.pl \
gen-pct-network-opts.pl \
diff --git a/gen-ha-rules-colocation-opts.pl b/gen-ha-rules-colocation-opts.pl
new file mode 100755
index 0000000..203cbb6
--- /dev/null
+++ b/gen-ha-rules-colocation-opts.pl
@@ -0,0 +1,20 @@
+#!/usr/bin/perl
+
+use lib '.';
+use strict;
+use warnings;
+use PVE::RESTHandler;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::Colocation;
+
+my $private = PVE::HA::Rules::private();
+my $colocation_props = PVE::HA::Rules::Colocation::properties();
+my $properies = {
+ services => $private->{propertyList}->{services},
+ $colocation_props->%*,
+};
+
+print PVE::RESTHandler::dump_properties($properies);
diff --git a/gen-ha-rules-location-opts.pl b/gen-ha-rules-location-opts.pl
new file mode 100755
index 0000000..0564385
--- /dev/null
+++ b/gen-ha-rules-location-opts.pl
@@ -0,0 +1,20 @@
+#!/usr/bin/perl
+
+use lib '.';
+use strict;
+use warnings;
+use PVE::RESTHandler;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::Location;
+
+my $private = PVE::HA::Rules::private();
+my $location_props = PVE::HA::Rules::Location::properties();
+my $properies = {
+ services => $private->{propertyList}->{services},
+ $location_props->%*,
+};
+
+print PVE::RESTHandler::dump_properties($properies);
diff --git a/gen-ha-rules-opts.pl b/gen-ha-rules-opts.pl
new file mode 100755
index 0000000..012cb1a
--- /dev/null
+++ b/gen-ha-rules-opts.pl
@@ -0,0 +1,17 @@
+#!/usr/bin/perl
+
+use lib '.';
+use strict;
+use warnings;
+use PVE::RESTHandler;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+
+my $private = PVE::HA::Rules::private();
+my $properies = $private->{propertyList};
+delete $properies->{type};
+delete $properies->{rule};
+
+print PVE::RESTHandler::dump_properties($properies);
diff --git a/ha-manager.adoc b/ha-manager.adoc
index 3d6fc4a..12f73f1 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -670,6 +670,205 @@ up online again to investigate the cause of failure and check if it runs
stably again. Setting the `nofailback` flag prevents the recovered services from
moving straight back to the fenced node.
+[[ha_manager_rules]]
+Rules
+~~~~~
+
+HA rules are used to put certain constraints on HA-managed resources, which are
+defined in the HA rules configuration file `/etc/pve/ha/rules.cfg`.
+
+----
+<type>: <rule>
+ services <services_list>
+ <property> <value>
+ ...
+----
+
+include::ha-rules-opts.adoc[]
+
+.Available HA Rule Types
+[width="100%",cols="1,3",options="header"]
+|===========================================================
+| HA Rule Type | Description
+| `location` | Places affinity from one or more HA resources to one or more
+nodes.
+| `colocation` | Places affinity between two or more HA resources. The affinity
+`separate` specifies that HA resources are to be kept on separate nodes, while
+the affinity `together` specifies that HA resources are to be kept on the same
+node.
+|===========================================================
+
+[[ha_manager_location_rules]]
+Location
+^^^^^^^^
+
+NOTE: HA Location rules are equivalent to HA Groups and will replace them in an
+upcoming major release. They can be used by enabling the `use-location-rules`
+option in the web interface under __Datacenter -> Options -> HA__ or in the
+`datacenter.cfg`. The HA Groups will not be used anymore if Location rules are
+enabled.
+
+A common requirement is that a HA resource should run on a specific node.
+Usually, the resource is able to run on any cluster node, so you can define a
+location rule to make the resource `vm:100` prefer the node `node1`:
+
+----
+# ha-manager rules add location vm100_prefers_node1 --services vm:100 --nodes node1
+----
+
+By default, location rules are not strict, i.e., if there is no specified node
+available, the resource can be moved to other nodes. In the previous example,
+the rule can be modified to restrict the resource `vm:100` to be only on `node1`:
+
+----
+# ha-manager rules set location vm100_prefers_node1 --strict 1
+----
+
+For bigger clusters, it makes sense to define a more detailed failover behavior.
+For example, the resources `vm:200` and `ct:300` should run on `node1`. If
+`node1` becomes unavailable, the resources should be distributed on `node2` and
+`node3`. If `node2` and `node3` are also unavailable, the resources should run
+on `node4`.
+
+To implement this behavior in a location rule, nodes can be paired with
+priorities to order the preference for nodes. If two or more nodes have the same
+priority, the resources can run on any of them. For the above example, `node1`
+gets the highest priority, `node2` and `node3` get the same priority, and at
+last `node4` gets the lowest priority, which can be omitted to default to `0`:
+
+----
+# ha-manager rules add location priority_cascade \
+ --services vm:200,ct:300 --nodes "node1:2,node2:1,node3:1,node4"
+----
+
+The above commands created the following rules in the rules configuration file:
+
+.Location Rules Configuration Example (`/etc/pve/ha/rules.cfg`)
+----
+location: vm100_prefers_node1
+ services vm:100
+ nodes node1
+ strict 1
+
+location: priority_cascade
+ services vm:200,ct:300
+ nodes node1:2,node2:1,node3:1,node4
+----
+
+Location Rule Properties
+++++++++++++++++++++++++
+
+include::ha-rules-location-opts.adoc[]
+
+[[ha_manager_colocation_rules]]
+Colocation
+^^^^^^^^^^
+
+A common requirement is that two or more HA resources should run on either the
+same node, or should be distributed on separate nodes. These are also commonly
+called "Affinity/Anti-Affinity" constraints.
+
+For example, suppose there is a lot of communication traffic between the
+HA resources `vm:100` and `vm:200`, say a web server communicating with a
+database server. If those HA resources are on separate nodes, this could
+potentially result in a higher latency and unnecessary network load. Colocation
+rules with the affinity `together` implement this constraint to colocate the
+HA resources on the same node:
+
+----
+# ha-manager rules add colocation keep_together \
+ --affinity together --services vm:100,vm:200
+----
+
+However, suppose there are computationally expensive, and/or distributed
+programs running on the HA resources `vm:200` and `ct:300`, say sharded
+database instances. In that case, running them on the same node could
+potentially result in pressure on the hardware resources of the node and will
+slow down the operations of these HA resources. Colocation rules with the
+affinity `separate` implement this constraint to colocate the HA resources on
+separate nodes:
+
+----
+# ha-manager rules add colocation keep_separate \
+ --affinity separate --services vm:200,ct:300
+----
+
+Other than HA location rules, colocation rules are strict by default,
+i.e., if the constraints imposed by the colocation rules cannot be met,
+the HA Manager will put these in recovery state in case of a failover or
+in error state elsewhere.
+
+The above commands created the following rules in the rules configuration file:
+
+.Colocation Rules Configuration Example (`/etc/pve/ha/rules.cfg`)
+----
+colocation: keep_together
+ affinity together
+ services vm:100,vm:200
+
+colocation: keep_separate
+ affinity separate
+ services vm:200,ct:300
+----
+
+Colocation Rule Properties
+++++++++++++++++++++++++++
+
+include::ha-rules-colocation-opts.adoc[]
+
+[[ha_manager_rule_conflicts]]
+Rule Conflicts
+~~~~~~~~~~~~~~
+
+HA rules can impose rather complex constraints on the HA resources. To
+ensure that a new or modified HA rule does not introduce uncertainty
+into the HA stack's CRS scheduler, HA rules are tested for feasibility
+before these are applied. If a rule does not pass any of these tests,
+the rule is disabled until the conflict is resolved.
+
+Currently, HA rules are checked for the following feasibility tests:
+
+* A HA resource can only be referenced by a single HA location rule in
+ total. If two or more HA location rules specify the same HA service,
+ these HA location rules will be disabled.
+
+* A HA colocation rule must specify at least two HA resources to be
+ feasible. If a HA colocation rule does specify only one HA resource,
+ the HA colocation rule will be disabled.
+
+* A positive HA colocation rule cannot specify the same two or more HA
+ resources as a negative HA colocation rule. That is, two or more HA
+ resources cannot be kept together and separate at the same time. If
+ any pair of positive and negative HA colocation rules do specify the
+ same two or more HA resources, both HA colocation rules will be
+ disabled.
+
+* A HA resource, which is already constrained by a HA group or a HA
+ location rule, can only be referenced by a HA colocation rule, if the
+ HA group or HA location rule does only use a single priority group,
+ i.e., the specified nodes all have the same priority. If one of the HA
+ resources of a HA colocation rule is constrained by a HA group or HA
+ location rule with multiple priorities, the HA colocation rule will be
+ disabled.
+
+* A HA resource, which is already constrained by a HA group or a HA
+ location rule, can only be referenced by a positive HA colocation
+ rule, if the HA group or HA location rule specifies at least one
+ common node, where the other positively colocated HA resources are
+ also allowed to run on. Otherwise, the positively colocated HA
+ resources could only run on separate nodes. In other words, if two or
+ more HA resources of a positive HA colocation rule are constrained to
+ different nodes, the positive HA colocation rule will be disabled.
+
+* A HA resource, which is already constrained by a HA group or a HA
+ location rule, can only be referenced by a negative HA colocation
+ rule, if the HA group or HA location rule specifies at least one node,
+ where the other negatively colocated HA resources are not allowed to
+ run on. Otherwise, the negatively colocated HA resources do not have
+ enough nodes to be separated on. In other words, if two or more HA
+ resources of a negative HA colocation rule are constrained to less
+ nodes than needed to separate them on, the negative HA colocation rule
+ will be disabled.
[[ha_manager_fencing]]
Fencing
@@ -752,14 +951,15 @@ After a node failed and its fencing was successful, the CRM tries to
move services from the failed node to nodes which are still online.
The selection of nodes, on which those services gets recovered, is
-influenced by the resource `group` settings, the list of currently active
-nodes, and their respective active service count.
+influenced by the resource `group` settings, HA rules the service is
+specified in, the list of currently active nodes, and their respective
+active service count.
The CRM first builds a set out of the intersection between user selected
nodes (from `group` setting) and available nodes. It then choose the
-subset of nodes with the highest priority, and finally select the node
-with the lowest active service count. This minimizes the possibility
-of an overloaded node.
+subset of nodes with the highest priority, applies the colocation rules,
+and finally select the node with the lowest active service count. This
+minimizes the possibility of an overloaded node.
CAUTION: On node failure, the CRM distributes services to the
remaining nodes. This increases the service count on those nodes, and
@@ -874,8 +1074,8 @@ You can use the manual maintenance mode to mark the node as unavailable for HA
operation, prompting all services managed by HA to migrate to other nodes.
The target nodes for these migrations are selected from the other currently
-available nodes, and determined by the HA group configuration and the configured
-cluster resource scheduler (CRS) mode.
+available nodes, and determined by the HA group configuration, HA rules
+configuration, and the configured cluster resource scheduler (CRS) mode.
During each migration, the original node will be recorded in the HA managers'
state, so that the service can be moved back again automatically once the
maintenance mode is disabled and the node is back online.
@@ -1092,6 +1292,23 @@ The CRS is currently used at the following scheduling points:
new target node for the HA services in that group, matching the adapted
priority constraints.
+- HA rule config changes (always active). If a rule emposes different
+ constraints on the services, the HA stack will use the CRS algorithm to find
+ a new target node for the HA services affected by these rules depending on
+ the type of the new rules:
+
+** Location rules: Identical to HA group config changes.
+
+** Positive Colocation rules (`together`): If a positive colocation rule is
+ created or services are added to an existing positive colocation rule, the
+ HA stack will use the CRS algorithm to ensure that these positively
+ colocated services are moved to a common node.
+
+** Negative Colocation rules (`separate`): If a negative colocation rule is
+ created or services are added to an existing negative colocation rule, the
+ HA stack will use the CRS algorithm to ensure that these negatively
+ colocated services are moved to separate nodes.
+
- HA service stopped -> start transition (opt-in). Requesting that a stopped
service should be started is an good opportunity to check for the best suited
node as per the CRS algorithm, as moving stopped services is cheaper to do
diff --git a/ha-rules-colocation-opts.adoc b/ha-rules-colocation-opts.adoc
new file mode 100644
index 0000000..4340187
--- /dev/null
+++ b/ha-rules-colocation-opts.adoc
@@ -0,0 +1,8 @@
+`affinity`: `<separate | together>` ::
+
+Describes whether the services are supposed to be kept on separate nodes, or are supposed to be kept together on the same node.
+
+`services`: `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
diff --git a/ha-rules-location-opts.adoc b/ha-rules-location-opts.adoc
new file mode 100644
index 0000000..603f8db
--- /dev/null
+++ b/ha-rules-location-opts.adoc
@@ -0,0 +1,14 @@
+`nodes`: `<node>[:<pri>]{,<node>[:<pri>]}*` ::
+
+List of cluster node members, where a priority can be given to each node. A resource bound to a group will run on the available nodes with the highest priority. If there are more nodes in the highest priority class, the services will get distributed to those nodes. The priorities have a relative meaning only. The higher the number, the higher the priority.
+
+`services`: `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
+`strict`: `<boolean>` ('default =' `0`)::
+
+Describes whether the location rule is mandatory or optional.
+A mandatory location rule makes services be restricted to the defined nodes. If none of the nodes are available, the service will be stopped.
+An optional location rule makes services prefer to be on the defined nodes. If none of the nodes are available, the service may run on any other node.
+
diff --git a/ha-rules-opts.adoc b/ha-rules-opts.adoc
new file mode 100644
index 0000000..58c1bd7
--- /dev/null
+++ b/ha-rules-opts.adoc
@@ -0,0 +1,12 @@
+`comment`: `<string>` ::
+
+HA rule description.
+
+`services`: `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
+`state`: `<disabled | enabled>` ('default =' `enabled`)::
+
+State of the HA rule.
+
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH docs v2 2/5] update static files to include ha rules api endpoints
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (30 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 1/5] ha: config: add section about ha rules Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 3/5] update static files to include use-location-rules feature flag Daniel Kral
` (10 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch is more of a show-case how the static files changed.
changes since v1:
- NEW!
api-viewer/apidata.js | 363 +++++++++++++++++++++++++++++++++++++
ha-manager.1-synopsis.adoc | 138 ++++++++++++++
2 files changed, 501 insertions(+)
diff --git a/api-viewer/apidata.js b/api-viewer/apidata.js
index 57f5942..9f0f28a 100644
--- a/api-viewer/apidata.js
+++ b/api-viewer/apidata.js
@@ -7805,6 +7805,369 @@ const apiSchema = [
"path" : "/cluster/ha/groups",
"text" : "groups"
},
+ {
+ "children" : [
+ {
+ "info" : {
+ "DELETE" : {
+ "allowtoken" : 1,
+ "description" : "Delete HA rule.",
+ "method" : "DELETE",
+ "name" : "delete_rule",
+ "parameters" : {
+ "additionalProperties" : 0,
+ "properties" : {
+ "rule" : {
+ "description" : "HA rule identifier.",
+ "format" : "pve-configid",
+ "type" : "string",
+ "typetext" : "<string>"
+ }
+ }
+ },
+ "permissions" : {
+ "check" : [
+ "perm",
+ "/",
+ [
+ "Sys.Console"
+ ]
+ ]
+ },
+ "protected" : 1,
+ "returns" : {
+ "type" : "null"
+ }
+ },
+ "GET" : {
+ "allowtoken" : 1,
+ "description" : "Read HA rule.",
+ "method" : "GET",
+ "name" : "read_rule",
+ "parameters" : {
+ "additionalProperties" : 0,
+ "properties" : {
+ "rule" : {
+ "description" : "HA rule identifier.",
+ "format" : "pve-configid",
+ "type" : "string",
+ "typetext" : "<string>"
+ }
+ }
+ },
+ "permissions" : {
+ "check" : [
+ "perm",
+ "/",
+ [
+ "Sys.Audit"
+ ]
+ ]
+ },
+ "returns" : {
+ "properties" : {
+ "rule" : {
+ "description" : "HA rule identifier.",
+ "format" : "pve-configid",
+ "type" : "string"
+ },
+ "type" : {
+ "type" : "string"
+ }
+ },
+ "type" : "object"
+ }
+ },
+ "PUT" : {
+ "allowtoken" : 1,
+ "description" : "Update HA rule.",
+ "method" : "PUT",
+ "name" : "update_rule",
+ "parameters" : {
+ "additionalProperties" : 0,
+ "properties" : {
+ "affinity" : {
+ "description" : "Describes whether the services are supposed to be kept on separate nodes, or are supposed to be kept together on the same node.",
+ "enum" : [
+ "separate",
+ "together"
+ ],
+ "instance-types" : [
+ "colocation"
+ ],
+ "optional" : 1,
+ "type" : "string",
+ "type-property" : "type"
+ },
+ "comment" : {
+ "description" : "HA rule description.",
+ "maxLength" : 4096,
+ "optional" : 1,
+ "type" : "string",
+ "typetext" : "<string>"
+ },
+ "delete" : {
+ "description" : "A list of settings you want to delete.",
+ "format" : "pve-configid-list",
+ "maxLength" : 4096,
+ "optional" : 1,
+ "type" : "string",
+ "typetext" : "<string>"
+ },
+ "digest" : {
+ "description" : "Prevent changes if current configuration file has a different digest. This can be used to prevent concurrent modifications.",
+ "maxLength" : 64,
+ "optional" : 1,
+ "type" : "string",
+ "typetext" : "<string>"
+ },
+ "nodes" : {
+ "description" : "List of cluster node names with optional priority.",
+ "format" : "pve-ha-group-node-list",
+ "instance-types" : [
+ "location"
+ ],
+ "optional" : 1,
+ "type" : "string",
+ "type-property" : "type",
+ "typetext" : "<node>[:<pri>]{,<node>[:<pri>]}*",
+ "verbose_description" : "List of cluster node members, where a priority can be given to each node. A resource bound to a group will run on the available nodes with the highest priority. If there are more nodes in the highest priority class, the services will get distributed to those nodes. The priorities have a relative meaning only. The higher the number, the higher the priority."
+ },
+ "rule" : {
+ "description" : "HA rule identifier.",
+ "format" : "pve-configid",
+ "optional" : 0,
+ "type" : "string",
+ "typetext" : "<string>"
+ },
+ "services" : {
+ "description" : "List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).",
+ "format" : "pve-ha-resource-id-list",
+ "optional" : 1,
+ "type" : "string",
+ "typetext" : "<type>:<name>{,<type>:<name>}*"
+ },
+ "state" : {
+ "default" : "enabled",
+ "description" : "State of the HA rule.",
+ "enum" : [
+ "enabled",
+ "disabled"
+ ],
+ "optional" : 1,
+ "type" : "string"
+ },
+ "strict" : {
+ "default" : 0,
+ "description" : "Describes whether the location rule is mandatory or optional.",
+ "instance-types" : [
+ "location"
+ ],
+ "optional" : 1,
+ "type" : "boolean",
+ "type-property" : "type",
+ "typetext" : "<boolean>",
+ "verbose_description" : "Describes whether the location rule is mandatory or optional.\nA mandatory location rule makes services be restricted to the defined nodes. If none of the nodes are available, the service will be stopped.\nAn optional location rule makes services prefer to be on the defined nodes. If none of the nodes are available, the service may run on any other node."
+ },
+ "type" : {
+ "description" : "HA rule type.",
+ "enum" : [
+ "colocation",
+ "location"
+ ],
+ "type" : "string"
+ }
+ },
+ "type" : "object"
+ },
+ "permissions" : {
+ "check" : [
+ "perm",
+ "/",
+ [
+ "Sys.Console"
+ ]
+ ]
+ },
+ "protected" : 1,
+ "returns" : {
+ "type" : "null"
+ }
+ }
+ },
+ "leaf" : 1,
+ "path" : "/cluster/ha/rules/{rule}",
+ "text" : "{rule}"
+ }
+ ],
+ "info" : {
+ "GET" : {
+ "allowtoken" : 1,
+ "description" : "Get HA rules.",
+ "method" : "GET",
+ "name" : "index",
+ "parameters" : {
+ "additionalProperties" : 0,
+ "properties" : {
+ "service" : {
+ "description" : "Limit the returned list to rules affecting the specified service.",
+ "optional" : 1,
+ "type" : "string",
+ "typetext" : "<string>"
+ },
+ "state" : {
+ "description" : "Limit the returned list to the specified rule state.",
+ "enum" : [
+ "enabled",
+ "disabled"
+ ],
+ "optional" : 1,
+ "type" : "string"
+ },
+ "type" : {
+ "description" : "Limit the returned list to the specified rule type.",
+ "enum" : [
+ "colocation",
+ "location"
+ ],
+ "optional" : 1,
+ "type" : "string"
+ }
+ }
+ },
+ "permissions" : {
+ "check" : [
+ "perm",
+ "/",
+ [
+ "Sys.Audit"
+ ]
+ ]
+ },
+ "returns" : {
+ "items" : {
+ "links" : [
+ {
+ "href" : "{rule}",
+ "rel" : "child"
+ }
+ ],
+ "properties" : {
+ "rule" : {
+ "type" : "string"
+ }
+ },
+ "type" : "object"
+ },
+ "type" : "array"
+ }
+ },
+ "POST" : {
+ "allowtoken" : 1,
+ "description" : "Create HA rule.",
+ "method" : "POST",
+ "name" : "create_rule",
+ "parameters" : {
+ "additionalProperties" : 0,
+ "properties" : {
+ "affinity" : {
+ "description" : "Describes whether the services are supposed to be kept on separate nodes, or are supposed to be kept together on the same node.",
+ "enum" : [
+ "separate",
+ "together"
+ ],
+ "instance-types" : [
+ "colocation"
+ ],
+ "optional" : 1,
+ "type" : "string",
+ "type-property" : "type"
+ },
+ "comment" : {
+ "description" : "HA rule description.",
+ "maxLength" : 4096,
+ "optional" : 1,
+ "type" : "string",
+ "typetext" : "<string>"
+ },
+ "nodes" : {
+ "description" : "List of cluster node names with optional priority.",
+ "format" : "pve-ha-group-node-list",
+ "instance-types" : [
+ "location"
+ ],
+ "optional" : 1,
+ "type" : "string",
+ "type-property" : "type",
+ "typetext" : "<node>[:<pri>]{,<node>[:<pri>]}*",
+ "verbose_description" : "List of cluster node members, where a priority can be given to each node. A resource bound to a group will run on the available nodes with the highest priority. If there are more nodes in the highest priority class, the services will get distributed to those nodes. The priorities have a relative meaning only. The higher the number, the higher the priority."
+ },
+ "rule" : {
+ "description" : "HA rule identifier.",
+ "format" : "pve-configid",
+ "optional" : 0,
+ "type" : "string",
+ "typetext" : "<string>"
+ },
+ "services" : {
+ "description" : "List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).",
+ "format" : "pve-ha-resource-id-list",
+ "optional" : 0,
+ "type" : "string",
+ "typetext" : "<type>:<name>{,<type>:<name>}*"
+ },
+ "state" : {
+ "default" : "enabled",
+ "description" : "State of the HA rule.",
+ "enum" : [
+ "enabled",
+ "disabled"
+ ],
+ "optional" : 1,
+ "type" : "string"
+ },
+ "strict" : {
+ "default" : 0,
+ "description" : "Describes whether the location rule is mandatory or optional.",
+ "instance-types" : [
+ "location"
+ ],
+ "optional" : 1,
+ "type" : "boolean",
+ "type-property" : "type",
+ "typetext" : "<boolean>",
+ "verbose_description" : "Describes whether the location rule is mandatory or optional.\nA mandatory location rule makes services be restricted to the defined nodes. If none of the nodes are available, the service will be stopped.\nAn optional location rule makes services prefer to be on the defined nodes. If none of the nodes are available, the service may run on any other node."
+ },
+ "type" : {
+ "description" : "HA rule type.",
+ "enum" : [
+ "colocation",
+ "location"
+ ],
+ "type" : "string"
+ }
+ },
+ "type" : "object"
+ },
+ "permissions" : {
+ "check" : [
+ "perm",
+ "/",
+ [
+ "Sys.Console"
+ ]
+ ]
+ },
+ "protected" : 1,
+ "returns" : {
+ "type" : "null"
+ }
+ }
+ },
+ "leaf" : 0,
+ "path" : "/cluster/ha/rules",
+ "text" : "rules"
+ },
{
"children" : [
{
diff --git a/ha-manager.1-synopsis.adoc b/ha-manager.1-synopsis.adoc
index 0e4c5ab..540c7ca 100644
--- a/ha-manager.1-synopsis.adoc
+++ b/ha-manager.1-synopsis.adoc
@@ -193,6 +193,144 @@ Delete resource configuration.
HA resource ID. This consists of a resource type followed by a resource specific name, separated with colon (example: vm:100 / ct:100). For virtual machines and containers, you can simply use the VM or CT id as a shortcut (example: 100).
+*ha-manager rules add* `<type> <rule> --services <string>` `[OPTIONS]`
+
+Create HA rule.
+
+`<type>`: `<colocation | location>` ::
+
+HA rule type.
+
+`<rule>`: `<string>` ::
+
+HA rule identifier.
+
+`--comment` `<string>` ::
+
+HA rule description.
+
+`--services` `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
+`--state` `<disabled | enabled>` ('default =' `enabled`)::
+
+State of the HA rule.
+
+
+
+
+`Conditional options:`
+
+`[type=colocation]` ;;
+
+`--affinity` `<separate | together>` ::
+
+Describes whether the services are supposed to be kept on separate nodes, or are supposed to be kept together on the same node.
+
+`[type=location]` ;;
+
+`--nodes` `<node>[:<pri>]{,<node>[:<pri>]}*` ::
+
+List of cluster node names with optional priority.
+
+`--strict` `<boolean>` ('default =' `0`)::
+
+Describes whether the location rule is mandatory or optional.
+
+*ha-manager rules config* `[OPTIONS]` `[FORMAT_OPTIONS]`
+
+Get HA rules.
+
+`--service` `<string>` ::
+
+Limit the returned list to rules affecting the specified service.
+
+`--state` `<disabled | enabled>` ::
+
+Limit the returned list to the specified rule state.
+
+`--type` `<colocation | location>` ::
+
+Limit the returned list to the specified rule type.
+
+*ha-manager rules list* `[OPTIONS]` `[FORMAT_OPTIONS]`
+
+Get HA rules.
+
+`--service` `<string>` ::
+
+Limit the returned list to rules affecting the specified service.
+
+`--state` `<disabled | enabled>` ::
+
+Limit the returned list to the specified rule state.
+
+`--type` `<colocation | location>` ::
+
+Limit the returned list to the specified rule type.
+
+*ha-manager rules remove* `<rule>`
+
+Delete HA rule.
+
+`<rule>`: `<string>` ::
+
+HA rule identifier.
+
+*ha-manager rules set* `<type> <rule>` `[OPTIONS]`
+
+Update HA rule.
+
+`<type>`: `<colocation | location>` ::
+
+HA rule type.
+
+`<rule>`: `<string>` ::
+
+HA rule identifier.
+
+`--comment` `<string>` ::
+
+HA rule description.
+
+`--delete` `<string>` ::
+
+A list of settings you want to delete.
+
+`--digest` `<string>` ::
+
+Prevent changes if current configuration file has a different digest. This can be used to prevent concurrent modifications.
+
+`--services` `<type>:<name>{,<type>:<name>}*` ::
+
+List of HA resource IDs. This consists of a list of resource types followed by a resource specific name separated with a colon (example: vm:100,ct:101).
+
+`--state` `<disabled | enabled>` ('default =' `enabled`)::
+
+State of the HA rule.
+
+
+
+
+`Conditional options:`
+
+`[type=colocation]` ;;
+
+`--affinity` `<separate | together>` ::
+
+Describes whether the services are supposed to be kept on separate nodes, or are supposed to be kept together on the same node.
+
+`[type=location]` ;;
+
+`--nodes` `<node>[:<pri>]{,<node>[:<pri>]}*` ::
+
+List of cluster node names with optional priority.
+
+`--strict` `<boolean>` ('default =' `0`)::
+
+Describes whether the location rule is mandatory or optional.
+
*ha-manager set* `<sid>` `[OPTIONS]`
Update resource configuration.
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH docs v2 3/5] update static files to include use-location-rules feature flag
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (31 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 2/5] update static files to include ha rules api endpoints Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 4/5] update static files to include ha resources failback flag Daniel Kral
` (9 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch is more of a show-case how the static files changed.
changes since v1:
- NEW!
api-viewer/apidata.js | 9 ++++++++-
datacenter.cfg.5-opts.adoc | 6 +++++-
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/api-viewer/apidata.js b/api-viewer/apidata.js
index 9f0f28a..c52afc8 100644
--- a/api-viewer/apidata.js
+++ b/api-viewer/apidata.js
@@ -14800,13 +14800,20 @@ const apiSchema = [
"conditional",
"migrate"
],
+ "optional" : 1,
"type" : "string",
"verbose_description" : "Describes the policy for handling HA services on poweroff or reboot of a node. Freeze will always freeze services which are still located on the node on shutdown, those services won't be recovered by the HA manager. Failover will not mark the services as frozen and thus the services will get recovered to other nodes, if the shutdown node does not come up again quickly (< 1min). 'conditional' chooses automatically depending on the type of shutdown, i.e., on a reboot the service will be frozen but on a poweroff the service will stay as is, and thus get recovered after about 2 minutes. Migrate will try to move all running services to another node when a reboot or shutdown was triggered. The poweroff process will only continue once no running services are located on the node anymore. If the node comes up again, the service will be moved back to the previously powered-off node, at least if no other migration, reloaction or recover
y took place."
+ },
+ "use-location-rules" : {
+ "default" : 0,
+ "description" : "Whether HA Location rules should be used instead of HA groups.",
+ "optional" : 1,
+ "type" : "boolean"
}
},
"optional" : 1,
"type" : "string",
- "typetext" : "shutdown_policy=<enum>"
+ "typetext" : "[shutdown_policy=<enum>] [,use-location-rules=<1|0>]"
},
"http_proxy" : {
"description" : "Specify external http proxy which is used for downloads (example: 'http://username:password@host:port/')",
diff --git a/datacenter.cfg.5-opts.adoc b/datacenter.cfg.5-opts.adoc
index 7a42b12..8001870 100644
--- a/datacenter.cfg.5-opts.adoc
+++ b/datacenter.cfg.5-opts.adoc
@@ -56,7 +56,7 @@ Set the fencing mode of the HA cluster. Hardware mode needs a valid configuratio
+
WARNING: 'hardware' and 'both' are EXPERIMENTAL & WIP
-`ha`: `shutdown_policy=<enum>` ::
+`ha`: `[shutdown_policy=<enum>] [,use-location-rules=<1|0>]` ::
Cluster wide HA settings.
@@ -64,6 +64,10 @@ Cluster wide HA settings.
Describes the policy for handling HA services on poweroff or reboot of a node. Freeze will always freeze services which are still located on the node on shutdown, those services won't be recovered by the HA manager. Failover will not mark the services as frozen and thus the services will get recovered to other nodes, if the shutdown node does not come up again quickly (< 1min). 'conditional' chooses automatically depending on the type of shutdown, i.e., on a reboot the service will be frozen but on a poweroff the service will stay as is, and thus get recovered after about 2 minutes. Migrate will try to move all running services to another node when a reboot or shutdown was triggered. The poweroff process will only continue once no running services are located on the node anymore. If the node comes up again, the service will be moved back to the previously powered-off node, at least if no other migration, reloaction or recovery took place.
+`use-location-rules`=`<boolean>` ('default =' `0`);;
+
+Whether HA Location rules should be used instead of HA groups.
+
`http_proxy`: `http://.*` ::
Specify external http proxy which is used for downloads (example: 'http://username:password@host:port/')
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH docs v2 4/5] update static files to include ha resources failback flag
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (32 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 3/5] update static files to include use-location-rules feature flag Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 5/5] update static files to include ha service motion return value schema Daniel Kral
` (8 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch is more of a show-case how the static files changed.
changes since v1:
- NEW!
api-viewer/apidata.js | 14 ++++++++++++++
ha-manager.1-synopsis.adoc | 8 ++++++++
ha-resources-opts.adoc | 4 ++++
3 files changed, 26 insertions(+)
diff --git a/api-viewer/apidata.js b/api-viewer/apidata.js
index c52afc8..bbfc7b7 100644
--- a/api-viewer/apidata.js
+++ b/api-viewer/apidata.js
@@ -7350,6 +7350,13 @@ const apiSchema = [
"type" : "string",
"typetext" : "<string>"
},
+ "failback" : {
+ "default" : 1,
+ "description" : "Automatically migrate service to the node with the highest priority according to their location rules, if a node with a higher priority than the current node comes online, or migrate to the node, which doesn't violate any colocation rule.",
+ "optional" : 1,
+ "type" : "boolean",
+ "typetext" : "<boolean>"
+ },
"group" : {
"description" : "The HA group identifier.",
"format" : "pve-configid",
@@ -7478,6 +7485,13 @@ const apiSchema = [
"type" : "string",
"typetext" : "<string>"
},
+ "failback" : {
+ "default" : 1,
+ "description" : "Automatically migrate service to the node with the highest priority according to their location rules, if a node with a higher priority than the current node comes online, or migrate to the node, which doesn't violate any colocation rule.",
+ "optional" : 1,
+ "type" : "boolean",
+ "typetext" : "<boolean>"
+ },
"group" : {
"description" : "The HA group identifier.",
"format" : "pve-configid",
diff --git a/ha-manager.1-synopsis.adoc b/ha-manager.1-synopsis.adoc
index 540c7ca..5ea3160 100644
--- a/ha-manager.1-synopsis.adoc
+++ b/ha-manager.1-synopsis.adoc
@@ -12,6 +12,10 @@ HA resource ID. This consists of a resource type followed by a resource specific
Description.
+`--failback` `<boolean>` ('default =' `1`)::
+
+Automatically migrate service to the node with the highest priority according to their location rules, if a node with a higher priority than the current node comes online, or migrate to the node, which doesn't violate any colocation rule.
+
`--group` `<string>` ::
The HA group identifier.
@@ -351,6 +355,10 @@ A list of settings you want to delete.
Prevent changes if current configuration file has a different digest. This can be used to prevent concurrent modifications.
+`--failback` `<boolean>` ('default =' `1`)::
+
+Automatically migrate service to the node with the highest priority according to their location rules, if a node with a higher priority than the current node comes online, or migrate to the node, which doesn't violate any colocation rule.
+
`--group` `<string>` ::
The HA group identifier.
diff --git a/ha-resources-opts.adoc b/ha-resources-opts.adoc
index 29a4479..6caebae 100644
--- a/ha-resources-opts.adoc
+++ b/ha-resources-opts.adoc
@@ -2,6 +2,10 @@
Description.
+`failback`: `<boolean>` ('default =' `1`)::
+
+Automatically migrate service to the node with the highest priority according to their location rules, if a node with a higher priority than the current node comes online, or migrate to the node, which doesn't violate any colocation rule.
+
`group`: `<string>` ::
The HA group identifier.
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH docs v2 5/5] update static files to include ha service motion return value schema
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (33 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 4/5] update static files to include ha resources failback flag Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 1/5] api: ha: add ha rules api endpoints Daniel Kral
` (7 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch is more of a show-case how the static files changed.
changes since v1:
- NEW!
api-viewer/apidata.js | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/api-viewer/apidata.js b/api-viewer/apidata.js
index bbfc7b7..e07e3a5 100644
--- a/api-viewer/apidata.js
+++ b/api-viewer/apidata.js
@@ -7159,7 +7159,19 @@ const apiSchema = [
},
"protected" : 1,
"returns" : {
- "type" : "null"
+ "properties" : {
+ "requested-node" : {
+ "description" : "Node, which was requested to be migrated to.",
+ "optional" : 0,
+ "type" : "string"
+ },
+ "side-effects" : {
+ "description" : "Positively colocated HA resources, which are relocated to the same requested target node.",
+ "optional" : 1,
+ "type" : "array"
+ }
+ },
+ "type" : "object"
}
}
},
@@ -7202,7 +7214,19 @@ const apiSchema = [
},
"protected" : 1,
"returns" : {
- "type" : "null"
+ "properties" : {
+ "requested-node" : {
+ "description" : "Node, which was requested to be relocated to.",
+ "optional" : 0,
+ "type" : "string"
+ },
+ "side-effects" : {
+ "description" : "Positively colocated HA resources, which are relocated to the same requested target node.",
+ "optional" : 1,
+ "type" : "array"
+ }
+ },
+ "type" : "object"
}
}
},
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH manager v2 1/5] api: ha: add ha rules api endpoints
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (34 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 5/5] update static files to include ha service motion return value schema Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 2/5] ui: add use-location-rules feature flag Daniel Kral
` (6 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
PVE/API2/HAConfig.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/PVE/API2/HAConfig.pm b/PVE/API2/HAConfig.pm
index 35f49cbb..d29211fb 100644
--- a/PVE/API2/HAConfig.pm
+++ b/PVE/API2/HAConfig.pm
@@ -12,6 +12,7 @@ use PVE::JSONSchema qw(get_standard_option);
use PVE::Exception qw(raise_param_exc);
use PVE::API2::HA::Resources;
use PVE::API2::HA::Groups;
+use PVE::API2::HA::Rules;
use PVE::API2::HA::Status;
use base qw(PVE::RESTHandler);
@@ -26,6 +27,11 @@ __PACKAGE__->register_method({
path => 'groups',
});
+__PACKAGE__->register_method({
+ subclass => "PVE::API2::HA::Rules",
+ path => 'rules',
+});
+
__PACKAGE__->register_method({
subclass => "PVE::API2::HA::Status",
path => 'status',
@@ -57,7 +63,7 @@ __PACKAGE__->register_method({
my ($param) = @_;
my $res = [
- { id => 'status' }, { id => 'resources' }, { id => 'groups' },
+ { id => 'status' }, { id => 'resources' }, { id => 'groups' }, { id => 'rules' },
];
return $res;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH manager v2 2/5] ui: add use-location-rules feature flag
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (35 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 1/5] api: ha: add ha rules api endpoints Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 3/5] ui: ha: hide ha groups if use-location-rules is enabled Daniel Kral
` (5 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add 'use-location-rules' feature flag to the datacenter options input
panel to control the behavior of the HA Manager, API endpoints, and web
interface to either use and show HA Groups (disabled), or use and show
HA Location rules (enabled).
The util helper is used in following patches to control existing and new
behavior to act correctly.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
I'm not that happy with the many calls that are made to the
getHALocationFeatureStatus(...) helper function and in general the calls
to the API endpoint. I'd like some more feedback how we could handle the
migration part better (for this and the following patches).
changes since v1:
- NEW!
www/manager6/Utils.js | 5 +++++
www/manager6/dc/OptionView.js | 13 +++++++++++++
2 files changed, 18 insertions(+)
diff --git a/www/manager6/Utils.js b/www/manager6/Utils.js
index 29334111..6f58fd20 100644
--- a/www/manager6/Utils.js
+++ b/www/manager6/Utils.js
@@ -45,6 +45,11 @@ Ext.define('PVE.Utils', {
return levelMap;
},
+ getHALocationFeatureStatus: async function () {
+ let { result } = await Proxmox.Async.api2({ url: '/cluster/options' });
+ return result?.data?.ha?.['use-location-rules'] === 1;
+ },
+
kvm_ostypes: {
Linux: [
{ desc: '6.x - 2.6 Kernel', val: 'l26' },
diff --git a/www/manager6/dc/OptionView.js b/www/manager6/dc/OptionView.js
index 20d74b6f..68309e39 100644
--- a/www/manager6/dc/OptionView.js
+++ b/www/manager6/dc/OptionView.js
@@ -148,6 +148,19 @@ Ext.define('PVE.dc.OptionView', {
],
defaultValue: '__default__',
},
+ {
+ xtype: 'proxmoxcheckbox',
+ name: 'use-location-rules',
+ fieldLabel: gettext('Use HA Location rules'),
+ boxLabel: gettext('Replace HA Groups with HA Location rules'),
+ value: 0,
+ },
+ {
+ xtype: 'box',
+ html:
+ `<span class='pmx-hint'>${gettext('Note:')}</span> ` +
+ gettext('HA Groups need to be manually migrated to HA Location rules.'),
+ },
],
});
me.add_inputpanel_row('crs', gettext('Cluster Resource Scheduling'), {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH manager v2 3/5] ui: ha: hide ha groups if use-location-rules is enabled
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (36 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 2/5] ui: add use-location-rules feature flag Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 4/5] ui: ha: adapt resources components " Daniel Kral
` (4 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Remove the HA Groups entry from the datacenter's config tabs if the
use-location-rules feature flag is enabled.
As changing the use-location-rules feature flag doesn't automatically
reload the web interface, show an empty message if the HA Groups page is
still open.
Remove the 'ha-groups' from the state provider as the ha-groups page
only exists conditionally now and the StateProvider expects all entries
to exist at any time.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
www/manager6/StateProvider.js | 1 -
www/manager6/Workspace.js | 20 ++++++++++++++++++++
www/manager6/dc/Config.js | 24 ++++++++++++++----------
www/manager6/ha/Groups.js | 6 ++++++
4 files changed, 40 insertions(+), 11 deletions(-)
diff --git a/www/manager6/StateProvider.js b/www/manager6/StateProvider.js
index 5137ee55..d8686014 100644
--- a/www/manager6/StateProvider.js
+++ b/www/manager6/StateProvider.js
@@ -54,7 +54,6 @@ Ext.define('PVE.StateProvider', {
system: 50,
monitor: 49,
'ha-fencing': 48,
- 'ha-groups': 47,
'ha-resources': 46,
'ceph-log': 45,
'ceph-crushmap': 44,
diff --git a/www/manager6/Workspace.js b/www/manager6/Workspace.js
index e6b18bf7..f9680429 100644
--- a/www/manager6/Workspace.js
+++ b/www/manager6/Workspace.js
@@ -164,6 +164,26 @@ Ext.define('PVE.StdWorkspace', {
PVE.UIOptions.update();
+ Proxmox.Utils.API2Request({
+ url: '/cluster/options',
+ method: 'GET',
+ success: function (response) {
+ let dcConfig = response.result?.data ?? {};
+ PVE.HALocationEnabled = dcConfig?.ha?.['use-location-rules'] === 1;
+
+ // remove HA Groups menu item if HA Location rules are enabled
+ if (PVE.HALocationEnabled) {
+ let haMenu = Ext.ComponentQuery.query('treelistitem[text="HA"]')[0];
+ let haGroupsMenu = Object.values(haMenu?.itemMap ?? {}).find(
+ (element) => element.getText() === 'Groups',
+ );
+ if (haGroupsMenu) {
+ haGroupsMenu.addCls('x-hidden-display');
+ }
+ }
+ },
+ });
+
Proxmox.Utils.API2Request({
url: '/cluster/sdn',
method: 'GET',
diff --git a/www/manager6/dc/Config.js b/www/manager6/dc/Config.js
index 6173a9b2..7e39c85f 100644
--- a/www/manager6/dc/Config.js
+++ b/www/manager6/dc/Config.js
@@ -169,21 +169,25 @@ Ext.define('PVE.dc.Config', {
iconCls: 'fa fa-heartbeat',
itemId: 'ha',
},
- {
+ );
+
+ if (!PVE.HALocationEnabled) {
+ me.items.push({
title: gettext('Groups'),
groups: ['ha'],
xtype: 'pveHAGroupsView',
iconCls: 'fa fa-object-group',
itemId: 'ha-groups',
- },
- {
- title: gettext('Fencing'),
- groups: ['ha'],
- iconCls: 'fa fa-bolt',
- xtype: 'pveFencingView',
- itemId: 'ha-fencing',
- },
- );
+ });
+ }
+
+ me.items.push({
+ title: gettext('Fencing'),
+ groups: ['ha'],
+ iconCls: 'fa fa-bolt',
+ xtype: 'pveFencingView',
+ itemId: 'ha-fencing',
+ });
// always show on initial load, will be hiddea later if the SDN API calls don't exist,
// else it won't be shown at first if the user initially loads with DC selected
if (PVE.SDNInfo || PVE.SDNInfo === undefined) {
diff --git a/www/manager6/ha/Groups.js b/www/manager6/ha/Groups.js
index 6b4958f0..4aad0dda 100644
--- a/www/manager6/ha/Groups.js
+++ b/www/manager6/ha/Groups.js
@@ -112,6 +112,12 @@ Ext.define('PVE.ha.GroupsView', {
},
});
+ PVE.Utils.getHALocationFeatureStatus().then((isHALocationEnabled) => {
+ if (isHALocationEnabled) {
+ me.emptyText = gettext('HA Location rules are used instead of HA Groups');
+ }
+ });
+
me.callParent();
},
});
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH manager v2 4/5] ui: ha: adapt resources components if use-location-rules is enabled
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (37 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 3/5] ui: ha: hide ha groups if use-location-rules is enabled Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry Daniel Kral
` (3 subsequent siblings)
42 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Remove the group selector from the Resources grid view and edit window
and replace it with the 'failback' field if the use-location-rules
feature flag is enabled.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
www/manager6/ha/ResourceEdit.js | 27 ++++++++++++++++++++++-----
www/manager6/ha/Resources.js | 7 +++++++
2 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/www/manager6/ha/ResourceEdit.js b/www/manager6/ha/ResourceEdit.js
index 1048ccca..f900db62 100644
--- a/www/manager6/ha/ResourceEdit.js
+++ b/www/manager6/ha/ResourceEdit.js
@@ -12,6 +12,7 @@ Ext.define('PVE.ha.VMResourceInputPanel', {
delete values.vmid;
PVE.Utils.delete_if_default(values, 'group', '', me.isCreate);
+ PVE.Utils.delete_if_default(values, 'failback', '1', me.isCreate);
PVE.Utils.delete_if_default(values, 'max_restart', '1', me.isCreate);
PVE.Utils.delete_if_default(values, 'max_relocate', '1', me.isCreate);
@@ -109,11 +110,6 @@ Ext.define('PVE.ha.VMResourceInputPanel', {
];
me.column2 = [
- {
- xtype: 'pveHAGroupSelector',
- name: 'group',
- fieldLabel: gettext('Group'),
- },
{
xtype: 'proxmoxKVComboBox',
name: 'state',
@@ -138,6 +134,26 @@ Ext.define('PVE.ha.VMResourceInputPanel', {
disabledHint,
];
+ if (me.showGroups) {
+ me.column2.unshift({
+ xtype: 'pveHAGroupSelector',
+ name: 'group',
+ fieldLabel: gettext('Group'),
+ });
+ } else {
+ me.column2.push({
+ xtype: 'proxmoxcheckbox',
+ name: 'failback',
+ fieldLabel: gettext('Failback'),
+ autoEl: {
+ tag: 'div',
+ 'data-qtip': gettext('Enable if service should be on highest priority node.'),
+ },
+ uncheckedValue: 0,
+ value: 1,
+ });
+ }
+
me.columnB = [
{
xtype: 'textfield',
@@ -177,6 +193,7 @@ Ext.define('PVE.ha.VMResourceEdit', {
isCreate: me.isCreate,
vmid: me.vmid,
guestType: me.guestType,
+ showGroups: me.showGroups,
});
Ext.apply(me, {
diff --git a/www/manager6/ha/Resources.js b/www/manager6/ha/Resources.js
index e8e53b3b..6f09a714 100644
--- a/www/manager6/ha/Resources.js
+++ b/www/manager6/ha/Resources.js
@@ -14,6 +14,11 @@ Ext.define('PVE.ha.ResourcesView', {
throw 'no store given';
}
+ me.showGroups = true;
+ PVE.Utils.getHALocationFeatureStatus().then((isHALocationEnabled) => {
+ me.showGroups = !isHALocationEnabled;
+ });
+
Proxmox.Utils.monStoreErrors(me, me.rstore);
let store = Ext.create('Proxmox.data.DiffStore', {
rstore: me.rstore,
@@ -42,6 +47,7 @@ Ext.define('PVE.ha.ResourcesView', {
destroy: () => me.rstore.load(),
},
autoShow: true,
+ showGroups: me.showGroups,
});
};
@@ -63,6 +69,7 @@ Ext.define('PVE.ha.ResourcesView', {
destroy: () => me.rstore.load(),
},
autoShow: true,
+ showGroups: me.showGroups,
});
},
},
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (38 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 4/5] ui: ha: adapt resources components " Daniel Kral
@ 2025-06-20 14:31 ` Daniel Kral
2025-06-30 15:09 ` Michael Köppl
2025-07-01 14:38 ` Michael Köppl
2025-06-20 15:43 ` [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (2 subsequent siblings)
42 siblings, 2 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 14:31 UTC (permalink / raw)
To: pve-devel
Add components for basic CRUD operations on the HA rules and viewing
potentially errors of contradictory HA rules, which are currently only
possible by manually editing the file right now.
The feature flag 'use-location-rules' controls whether location rules
can be created from the web interface. Location rules are not removed if
the flag is unset as the API is expected to remove these entries.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
- NEW!
www/manager6/Makefile | 7 +
www/manager6/dc/Config.js | 23 +-
www/manager6/ha/RuleEdit.js | 149 +++++++++++++
www/manager6/ha/RuleErrorsModal.js | 50 +++++
www/manager6/ha/Rules.js | 228 ++++++++++++++++++++
www/manager6/ha/rules/ColocationRuleEdit.js | 24 +++
www/manager6/ha/rules/ColocationRules.js | 31 +++
www/manager6/ha/rules/LocationRuleEdit.js | 145 +++++++++++++
www/manager6/ha/rules/LocationRules.js | 36 ++++
9 files changed, 686 insertions(+), 7 deletions(-)
create mode 100644 www/manager6/ha/RuleEdit.js
create mode 100644 www/manager6/ha/RuleErrorsModal.js
create mode 100644 www/manager6/ha/Rules.js
create mode 100644 www/manager6/ha/rules/ColocationRuleEdit.js
create mode 100644 www/manager6/ha/rules/ColocationRules.js
create mode 100644 www/manager6/ha/rules/LocationRuleEdit.js
create mode 100644 www/manager6/ha/rules/LocationRules.js
diff --git a/www/manager6/Makefile b/www/manager6/Makefile
index ca641e34..636d8edb 100644
--- a/www/manager6/Makefile
+++ b/www/manager6/Makefile
@@ -147,8 +147,15 @@ JSSRC= \
ha/Groups.js \
ha/ResourceEdit.js \
ha/Resources.js \
+ ha/RuleEdit.js \
+ ha/RuleErrorsModal.js \
+ ha/Rules.js \
ha/Status.js \
ha/StatusView.js \
+ ha/rules/ColocationRuleEdit.js \
+ ha/rules/ColocationRules.js \
+ ha/rules/LocationRuleEdit.js \
+ ha/rules/LocationRules.js \
dc/ACLView.js \
dc/ACMEClusterView.js \
dc/AuthEditBase.js \
diff --git a/www/manager6/dc/Config.js b/www/manager6/dc/Config.js
index 7e39c85f..690213fb 100644
--- a/www/manager6/dc/Config.js
+++ b/www/manager6/dc/Config.js
@@ -181,13 +181,22 @@ Ext.define('PVE.dc.Config', {
});
}
- me.items.push({
- title: gettext('Fencing'),
- groups: ['ha'],
- iconCls: 'fa fa-bolt',
- xtype: 'pveFencingView',
- itemId: 'ha-fencing',
- });
+ me.items.push(
+ {
+ title: gettext('Rules'),
+ groups: ['ha'],
+ xtype: 'pveHARulesView',
+ iconCls: 'fa fa-gears',
+ itemId: 'ha-rules',
+ },
+ {
+ title: gettext('Fencing'),
+ groups: ['ha'],
+ iconCls: 'fa fa-bolt',
+ xtype: 'pveFencingView',
+ itemId: 'ha-fencing',
+ },
+ );
// always show on initial load, will be hiddea later if the SDN API calls don't exist,
// else it won't be shown at first if the user initially loads with DC selected
if (PVE.SDNInfo || PVE.SDNInfo === undefined) {
diff --git a/www/manager6/ha/RuleEdit.js b/www/manager6/ha/RuleEdit.js
new file mode 100644
index 00000000..a6c2a7d2
--- /dev/null
+++ b/www/manager6/ha/RuleEdit.js
@@ -0,0 +1,149 @@
+Ext.define('PVE.ha.RuleInputPanel', {
+ extend: 'Proxmox.panel.InputPanel',
+
+ onlineHelp: 'ha_manager_rules',
+
+ formatServiceListString: function (services) {
+ let me = this;
+
+ return services.map((vmid) => {
+ if (me.servicesStore.getById(`qemu/${vmid}`)) {
+ return `vm:${vmid}`;
+ } else if (me.servicesStore.getById(`lxc/${vmid}`)) {
+ return `ct:${vmid}`;
+ } else {
+ Ext.Msg.alert(gettext('Error'), `Could not find resource type for ${vmid}`);
+ throw `Unknown resource type: ${vmid}`;
+ }
+ });
+ },
+
+ onGetValues: function (values) {
+ let me = this;
+
+ values.type = me.ruleType;
+
+ if (!me.isCreate) {
+ delete values.rule;
+ }
+
+ if (!values.enabled) {
+ values.state = 'disabled';
+ } else {
+ values.state = 'enabled';
+ }
+ delete values.enabled;
+
+ values.services = me.formatServiceListString(values.services);
+
+ return values;
+ },
+
+ initComponent: function () {
+ let me = this;
+
+ let servicesStore = Ext.create('Ext.data.Store', {
+ model: 'PVEResources',
+ autoLoad: true,
+ sorters: 'vmid',
+ filters: [
+ {
+ property: 'type',
+ value: /lxc|qemu/,
+ },
+ {
+ property: 'hastate',
+ operator: '!=',
+ value: 'unmanaged',
+ },
+ ],
+ });
+
+ Ext.apply(me, {
+ servicesStore: servicesStore,
+ });
+
+ me.column1.unshift(
+ {
+ xtype: me.isCreate ? 'textfield' : 'displayfield',
+ name: 'rule',
+ value: me.ruleId || '',
+ fieldLabel: 'ID',
+ allowBlank: false,
+ },
+ {
+ xtype: 'vmComboSelector',
+ name: 'services',
+ fieldLabel: gettext('Services'),
+ store: me.servicesStore,
+ allowBlank: false,
+ autoSelect: false,
+ multiSelect: true,
+ validateExists: true,
+ },
+ );
+
+ me.column2 = me.column2 ?? [];
+
+ me.column2.unshift({
+ xtype: 'proxmoxcheckbox',
+ name: 'enabled',
+ fieldLabel: gettext('Enable'),
+ uncheckedValue: 0,
+ defaultValue: 1,
+ checked: true,
+ });
+
+ me.callParent();
+ },
+});
+
+Ext.define('PVE.ha.RuleEdit', {
+ extend: 'Proxmox.window.Edit',
+
+ defaultFocus: undefined, // prevent the vmComboSelector to be expanded when focusing the window
+
+ initComponent: function () {
+ let me = this;
+
+ me.isCreate = !me.ruleId;
+
+ if (me.isCreate) {
+ me.url = '/api2/extjs/cluster/ha/rules';
+ me.method = 'POST';
+ } else {
+ me.url = `/api2/extjs/cluster/ha/rules/${me.ruleId}`;
+ me.method = 'PUT';
+ }
+
+ let inputPanel = Ext.create(me.panelType, {
+ ruleId: me.ruleId,
+ ruleType: me.ruleType,
+ isCreate: me.isCreate,
+ });
+
+ Ext.apply(me, {
+ subject: me.panelName,
+ isAdd: true,
+ items: [inputPanel],
+ });
+
+ me.callParent();
+
+ if (!me.isCreate) {
+ me.load({
+ success: (response, options) => {
+ let values = response.result.data;
+
+ values.services = values.services
+ .split(',')
+ .map((service) => service.split(':')[1]);
+
+ values.enabled = values.state === 'enabled';
+
+ inputPanel.setValues(values);
+ },
+ });
+ }
+ },
+});
diff --git a/www/manager6/ha/RuleErrorsModal.js b/www/manager6/ha/RuleErrorsModal.js
new file mode 100644
index 00000000..aac1ef87
--- /dev/null
+++ b/www/manager6/ha/RuleErrorsModal.js
@@ -0,0 +1,50 @@
+Ext.define('PVE.ha.RuleErrorsModal', {
+ extend: 'Ext.window.Window',
+ alias: ['widget.pveHARulesErrorsModal'],
+ mixins: ['Proxmox.Mixin.CBind'],
+
+ modal: true,
+ scrollable: true,
+ resizable: false,
+
+ title: gettext('Rule errors'),
+
+ initComponent: function () {
+ let me = this;
+
+ let renderHARuleErrors = (errors) => {
+ if (!errors) {
+ return gettext('HA Rule has no errors.');
+ }
+
+ let errorListItemsHtml = '';
+
+ for (let [opt, messages] of Object.entries(errors)) {
+ errorListItemsHtml += messages
+ .map((message) => `<li>${Ext.htmlEncode(`${opt}: ${message}`)}</li>`)
+ .join('');
+ }
+
+ return `<div>
+ <p>${gettext('The HA rule has the following errors:')}</p>
+ <ul>${errorListItemsHtml}</ul>
+ </div>`;
+ };
+
+ Ext.apply(me, {
+ modal: true,
+ border: false,
+ layout: 'fit',
+ items: [
+ {
+ xtype: 'displayfield',
+ padding: 20,
+ scrollable: true,
+ value: renderHARuleErrors(me.errors),
+ },
+ ],
+ });
+
+ me.callParent();
+ },
+});
diff --git a/www/manager6/ha/Rules.js b/www/manager6/ha/Rules.js
new file mode 100644
index 00000000..d69aa3b2
--- /dev/null
+++ b/www/manager6/ha/Rules.js
@@ -0,0 +1,228 @@
+Ext.define('PVE.ha.RulesBaseView', {
+ extend: 'Ext.grid.GridPanel',
+
+ initComponent: function () {
+ let me = this;
+
+ if (!me.ruleType) {
+ throw 'no rule type given';
+ }
+
+ let store = new Ext.data.Store({
+ model: 'pve-ha-rules',
+ autoLoad: true,
+ filters: [
+ {
+ property: 'type',
+ value: me.ruleType,
+ },
+ ],
+ });
+
+ let reloadStore = () => store.load();
+
+ let sm = Ext.create('Ext.selection.RowModel', {});
+
+ let createRuleEditWindow = (ruleId) => {
+ if (!me.inputPanel) {
+ throw `no editor registered for ha rule type: ${me.ruleType}`;
+ }
+
+ Ext.create('PVE.ha.RuleEdit', {
+ panelType: `PVE.ha.rules.${me.inputPanel}`,
+ panelName: me.ruleTitle,
+ ruleType: me.ruleType,
+ ruleId: ruleId,
+ autoShow: true,
+ listeners: {
+ destroy: reloadStore,
+ },
+ });
+ };
+
+ let runEditor = () => {
+ let rec = sm.getSelection()[0];
+ if (!rec) {
+ return;
+ }
+ let { rule } = rec.data;
+ createRuleEditWindow(rule);
+ };
+
+ let editButton = Ext.create('Proxmox.button.Button', {
+ text: gettext('Edit'),
+ disabled: true,
+ selModel: sm,
+ handler: runEditor,
+ });
+
+ let removeButton = Ext.create('Proxmox.button.StdRemoveButton', {
+ selModel: sm,
+ baseurl: '/cluster/ha/rules/',
+ callback: reloadStore,
+ });
+
+ Ext.apply(me, {
+ store: store,
+ selModel: sm,
+ viewConfig: {
+ trackOver: false,
+ },
+ emptyText: Ext.String.format(gettext('No {0} rules configured.'), me.ruleTitle),
+ tbar: [
+ {
+ text: gettext('Add'),
+ handler: () => createRuleEditWindow(),
+ },
+ editButton,
+ removeButton,
+ ],
+ listeners: {
+ activate: reloadStore,
+ itemdblclick: runEditor,
+ },
+ });
+
+ me.columns.unshift(
+ {
+ header: gettext('State'),
+ xtype: 'actioncolumn',
+ width: 25,
+ align: 'center',
+ dataIndex: 'state',
+ items: [
+ {
+ isActionDisabled: (table, rowIndex, colIndex, item, { data }) =>
+ data.state !== 'contradictory',
+ handler: (table, rowIndex, colIndex, item, event, { data }) => {
+ Ext.create('PVE.ha.RuleErrorsModal', {
+ autoShow: true,
+ errors: data.errors ?? {},
+ });
+ },
+ getTip: (value) => {
+ switch (value) {
+ case 'contradictory':
+ return gettext('Errors');
+ case 'disabled':
+ return gettext('Disabled');
+ default:
+ return gettext('Enabled');
+ }
+ },
+ getClass: (value) => {
+ let iconName = 'check';
+
+ if (value === 'contradictory') {
+ iconName = 'exclamation-triangle';
+ } else if (value === 'disabled') {
+ iconName = 'minus';
+ }
+
+ return `fa fa-${iconName}`;
+ },
+ },
+ ],
+ },
+ {
+ header: gettext('Rule'),
+ width: 200,
+ dataIndex: 'rule',
+ },
+ );
+
+ me.columns.push({
+ header: gettext('Comment'),
+ flex: 1,
+ renderer: Ext.String.htmlEncode,
+ dataIndex: 'comment',
+ });
+
+ me.callParent();
+ },
+});
+
+Ext.define(
+ 'PVE.ha.RulesView',
+ {
+ extend: 'Ext.panel.Panel',
+ alias: 'widget.pveHARulesView',
+ mixins: ['Proxmox.Mixin.CBind'],
+
+ onlineHelp: 'ha_manager_rules',
+
+ layout: {
+ type: 'vbox',
+ align: 'stretch',
+ },
+
+ viewModel: {
+ data: {
+ isHALocationEnabled: false,
+ },
+ formulas: {
+ showHALocation: (get) => get('isHALocationEnabled'),
+ },
+ },
+
+ items: [
+ {
+ title: gettext('HA Location'),
+ xtype: 'pveHALocationRulesView',
+ flex: 1,
+ border: 0,
+ bind: {
+ hidden: '{!isHALocationEnabled}',
+ },
+ },
+ {
+ xtype: 'splitter',
+ collapsible: false,
+ performCollapse: false,
+ },
+ {
+ title: gettext('HA Colocation'),
+ xtype: 'pveHAColocationRulesView',
+ flex: 1,
+ border: 0,
+ },
+ ],
+
+ initComponent: function () {
+ let me = this;
+
+ let viewModel = me.getViewModel();
+
+ PVE.Utils.getHALocationFeatureStatus().then((isHALocationEnabled) => {
+ viewModel.set('isHALocationEnabled', isHALocationEnabled);
+ });
+
+ me.callParent();
+ },
+ },
+ function () {
+ Ext.define('pve-ha-rules', {
+ extend: 'Ext.data.Model',
+ fields: [
+ 'rule',
+ 'type',
+ 'nodes',
+ 'state',
+ 'digest',
+ 'comment',
+ 'affinity',
+ 'services',
+ 'conflicts',
+ {
+ name: 'strict',
+ type: 'boolean',
+ },
+ ],
+ proxy: {
+ type: 'proxmox',
+ url: '/api2/json/cluster/ha/rules',
+ },
+ idProperty: 'rule',
+ });
+ },
+);
diff --git a/www/manager6/ha/rules/ColocationRuleEdit.js b/www/manager6/ha/rules/ColocationRuleEdit.js
new file mode 100644
index 00000000..d8c5223c
--- /dev/null
+++ b/www/manager6/ha/rules/ColocationRuleEdit.js
@@ -0,0 +1,24 @@
+Ext.define('PVE.ha.rules.ColocationInputPanel', {
+ extend: 'PVE.ha.RuleInputPanel',
+
+ initComponent: function () {
+ let me = this;
+
+ me.column1 = [];
+
+ me.column2 = [
+ {
+ xtype: 'proxmoxKVComboBox',
+ name: 'affinity',
+ fieldLabel: gettext('Affinity'),
+ allowBlank: false,
+ comboItems: [
+ ['separate', gettext('Keep separate')],
+ ['together', gettext('Keep together')],
+ ],
+ },
+ ];
+
+ me.callParent();
+ },
+});
diff --git a/www/manager6/ha/rules/ColocationRules.js b/www/manager6/ha/rules/ColocationRules.js
new file mode 100644
index 00000000..f8c410de
--- /dev/null
+++ b/www/manager6/ha/rules/ColocationRules.js
@@ -0,0 +1,31 @@
+Ext.define('PVE.ha.ColocationRulesView', {
+ extend: 'PVE.ha.RulesBaseView',
+ alias: 'widget.pveHAColocationRulesView',
+
+ title: gettext('HA Colocation'),
+ ruleType: 'colocation',
+ inputPanel: 'ColocationInputPanel',
+ faIcon: 'link',
+
+ stateful: true,
+ stateId: 'grid-ha-colocation-rules',
+
+ initComponent: function () {
+ let me = this;
+
+ me.columns = [
+ {
+ header: gettext('Affinity'),
+ flex: 1,
+ dataIndex: 'affinity',
+ },
+ {
+ header: gettext('Services'),
+ flex: 1,
+ dataIndex: 'services',
+ },
+ ];
+
+ me.callParent();
+ },
+});
diff --git a/www/manager6/ha/rules/LocationRuleEdit.js b/www/manager6/ha/rules/LocationRuleEdit.js
new file mode 100644
index 00000000..cd540a18
--- /dev/null
+++ b/www/manager6/ha/rules/LocationRuleEdit.js
@@ -0,0 +1,145 @@
+Ext.define('PVE.ha.rules.LocationInputPanel', {
+ extend: 'PVE.ha.RuleInputPanel',
+
+ initComponent: function () {
+ let me = this;
+
+ me.column1 = [
+ {
+ xtype: 'proxmoxcheckbox',
+ name: 'strict',
+ fieldLabel: gettext('Strict'),
+ autoEl: {
+ tag: 'div',
+ 'data-qtip': gettext('Enable if the services must be restricted to the nodes.'),
+ },
+ uncheckedValue: 0,
+ defaultValue: 0,
+ },
+ ];
+
+ /* TODO Code copied from GroupEdit, should be factored out in component */
+ let update_nodefield, update_node_selection;
+
+ let sm = Ext.create('Ext.selection.CheckboxModel', {
+ mode: 'SIMPLE',
+ listeners: {
+ selectionchange: function (model, selected) {
+ update_nodefield(selected);
+ },
+ },
+ });
+
+ let store = Ext.create('Ext.data.Store', {
+ fields: ['node', 'mem', 'cpu', 'priority'],
+ data: PVE.data.ResourceStore.getNodes(), // use already cached data to avoid an API call
+ proxy: {
+ type: 'memory',
+ reader: { type: 'json' },
+ },
+ sorters: [
+ {
+ property: 'node',
+ direction: 'ASC',
+ },
+ ],
+ });
+
+ var nodegrid = Ext.createWidget('grid', {
+ store: store,
+ border: true,
+ height: 300,
+ selModel: sm,
+ columns: [
+ {
+ header: gettext('Node'),
+ flex: 1,
+ dataIndex: 'node',
+ },
+ {
+ header: gettext('Memory usage') + ' %',
+ renderer: PVE.Utils.render_mem_usage_percent,
+ sortable: true,
+ width: 150,
+ dataIndex: 'mem',
+ },
+ {
+ header: gettext('CPU usage'),
+ renderer: Proxmox.Utils.render_cpu,
+ sortable: true,
+ width: 150,
+ dataIndex: 'cpu',
+ },
+ {
+ header: gettext('Priority'),
+ xtype: 'widgetcolumn',
+ dataIndex: 'priority',
+ sortable: true,
+ stopSelection: true,
+ widget: {
+ xtype: 'proxmoxintegerfield',
+ minValue: 0,
+ maxValue: 1000,
+ isFormField: false,
+ listeners: {
+ change: function (numberfield, value, old_value) {
+ let record = numberfield.getWidgetRecord();
+ record.set('priority', value);
+ update_nodefield(sm.getSelection());
+ record.commit();
+ },
+ },
+ },
+ },
+ ],
+ });
+
+ let nodefield = Ext.create('Ext.form.field.Hidden', {
+ name: 'nodes',
+ value: '',
+ listeners: {
+ change: function (field, value) {
+ update_node_selection(value);
+ },
+ },
+ isValid: function () {
+ let value = this.getValue();
+ return value && value.length !== 0;
+ },
+ });
+
+ update_node_selection = function (string) {
+ sm.deselectAll(true);
+
+ string.split(',').forEach(function (e, idx, array) {
+ let [node, priority] = e.split(':');
+ store.each(function (record) {
+ if (record.get('node') === node) {
+ sm.select(record, true);
+ record.set('priority', priority);
+ record.commit();
+ }
+ });
+ });
+ nodegrid.reconfigure(store);
+ };
+
+ update_nodefield = function (selected) {
+ let nodes = selected
+ .map(({ data }) => data.node + (data.priority ? `:${data.priority}` : ''))
+ .join(',');
+
+ // nodefield change listener calls us again, which results in a
+ // endless recursion, suspend the event temporary to avoid this
+ nodefield.suspendEvent('change');
+ nodefield.setValue(nodes);
+ nodefield.resumeEvent('change');
+ };
+
+ me.column2 = [nodefield];
+
+ me.columnB = [nodegrid];
+
+ me.callParent();
+ },
+});
diff --git a/www/manager6/ha/rules/LocationRules.js b/www/manager6/ha/rules/LocationRules.js
new file mode 100644
index 00000000..6201a5bf
--- /dev/null
+++ b/www/manager6/ha/rules/LocationRules.js
@@ -0,0 +1,36 @@
+Ext.define('PVE.ha.LocationRulesView', {
+ extend: 'PVE.ha.RulesBaseView',
+ alias: 'widget.pveHALocationRulesView',
+
+ ruleType: 'location',
+ ruleTitle: gettext('HA Location'),
+ inputPanel: 'LocationInputPanel',
+ faIcon: 'map-pin',
+
+ stateful: true,
+ stateId: 'grid-ha-location-rules',
+
+ initComponent: function () {
+ let me = this;
+
+ me.columns = [
+ {
+ header: gettext('Strict'),
+ width: 50,
+ dataIndex: 'strict',
+ },
+ {
+ header: gettext('Services'),
+ flex: 1,
+ dataIndex: 'services',
+ },
+ {
+ header: gettext('Nodes'),
+ flex: 1,
+ dataIndex: 'nodes',
+ },
+ ];
+
+ me.callParent();
+ },
+});
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry Daniel Kral
@ 2025-06-30 15:09 ` Michael Köppl
2025-07-01 14:38 ` Michael Köppl
1 sibling, 0 replies; 70+ messages in thread
From: Michael Köppl @ 2025-06-30 15:09 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
On 6/20/25 16:31, Daniel Kral wrote:
> diff --git a/www/manager6/ha/Rules.js b/www/manager6/ha/Rules.js
> new file mode 100644
> index 00000000..d69aa3b2
> --- /dev/null
> +++ b/www/manager6/ha/Rules.js
> @@ -0,0 +1,228 @@
> +Ext.define('PVE.ha.RulesBaseView', {
> + extend: 'Ext.grid.GridPanel',
> +
> + initComponent: function () {
> + let me = this;
> +
> + if (!me.ruleType) {
> + throw 'no rule type given';
> + }
> +
> + let store = new Ext.data.Store({
> + model: 'pve-ha-rules',
> + autoLoad: true,
> + filters: [
> + {
> + property: 'type',
> + value: me.ruleType,
> + },
> + ],
> + });
> +
> + let reloadStore = () => store.load();
> +
> + let sm = Ext.create('Ext.selection.RowModel', {});
> +
> + let createRuleEditWindow = (ruleId) => {
> + if (!me.inputPanel) {
> + throw `no editor registered for ha rule type: ${me.ruleType}`;
> + }
> +
> + Ext.create('PVE.ha.RuleEdit', {
> + panelType: `PVE.ha.rules.${me.inputPanel}`,
> + panelName: me.ruleTitle,
> + ruleType: me.ruleType,
> + ruleId: ruleId,
> + autoShow: true,
> + listeners: {
> + destroy: reloadStore,
> + },
> + });
> + };
> +
> + let runEditor = () => {
> + let rec = sm.getSelection()[0];
> + if (!rec) {
> + return;
> + }
> + let { rule } = rec.data;
> + createRuleEditWindow(rule);
> + };
> +
> + let editButton = Ext.create('Proxmox.button.Button', {
> + text: gettext('Edit'),
> + disabled: true,
> + selModel: sm,
> + handler: runEditor,
> + });
> +
> + let removeButton = Ext.create('Proxmox.button.StdRemoveButton', {
> + selModel: sm,
> + baseurl: '/cluster/ha/rules/',
> + callback: reloadStore,
> + });
> +
> + Ext.apply(me, {
> + store: store,
> + selModel: sm,
> + viewConfig: {
> + trackOver: false,
> + },
> + emptyText: Ext.String.format(gettext('No {0} rules configured.'), me.ruleTitle),
> + tbar: [
> + {
> + text: gettext('Add'),
> + handler: () => createRuleEditWindow(),
> + },
> + editButton,
> + removeButton,
> + ],
> + listeners: {
> + activate: reloadStore,
> + itemdblclick: runEditor,
> + },
> + });
> +
> + me.columns.unshift(
> + {
> + header: gettext('State'),
> + xtype: 'actioncolumn',
> + width: 25,
This is very narrow. Prior to resizing the column, I only see "S". A
width of 65 works quite nicely.
> + align: 'center',
> + dataIndex: 'state',
> + items: [
> + {
> + isActionDisabled: (table, rowIndex, colIndex, item, { data }) =>
> + data.state !== 'contradictory',
> + handler: (table, rowIndex, colIndex, item, event, { data }) => {
> + Ext.create('PVE.ha.RuleErrorsModal', {
> + autoShow: true,
> + errors: data.errors ?? {},
> + });
> + },
> + getTip: (value) => {
> + switch (value) {
> + case 'contradictory':
> + return gettext('Errors');
> + case 'disabled':
> + return gettext('Disabled');
> + default:
> + return gettext('Enabled');
> + }
> + },
> + getClass: (value) => {
> + let iconName = 'check';
> +
> + if (value === 'contradictory') {
> + iconName = 'exclamation-triangle';
> + } else if (value === 'disabled') {
> + iconName = 'minus';
> + }
> +
> + return `fa fa-${iconName}`;
> + },
> + },
> + ],
> + },
> + {
> + header: gettext('Rule'),
> + width: 200,
> + dataIndex: 'rule',
> + },
> + );
> +
> + me.columns.push({
> + header: gettext('Comment'),
> + flex: 1,
> + renderer: Ext.String.htmlEncode,
> + dataIndex: 'comment',
> + });
> +
> + me.callParent();
> + },
> +});
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry Daniel Kral
2025-06-30 15:09 ` Michael Köppl
@ 2025-07-01 14:38 ` Michael Köppl
1 sibling, 0 replies; 70+ messages in thread
From: Michael Köppl @ 2025-07-01 14:38 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
On 6/20/25 16:31, Daniel Kral wrote:
> +
> + me.columns.unshift(
> + {
> + header: gettext('State'),
> + xtype: 'actioncolumn',
> + width: 25,
> + align: 'center',
> + dataIndex: 'state',
> + items: [
> + {
> + isActionDisabled: (table, rowIndex, colIndex, item, { data }) =>
> + data.state !== 'contradictory',
> + handler: (table, rowIndex, colIndex, item, event, { data }) => {
> + Ext.create('PVE.ha.RuleErrorsModal', {
> + autoShow: true,
> + errors: data.errors ?? {},
> + });
Even though not super important to this series, I think you could avoid
defining a new modal here by doing something like this:
let listItems = Object.entries(data.errors ?? {})
.flatMap(([opt, messages]) => messages.map(message =>
`<li>${Ext.htmlEncode(`${opt}: ${message}`)}</li>`))
.join('');
Ext.Msg.show({
title: gettext("Rule errors"),
icon: Ext.Msg.WARNING,
msg: `<div>
<p>${gettext('The HA rule has the following errors:')}</p>
<ul style="list-style-position: inside; padding-left: 0;">
${listItems}
</ul>
</div>`,
});
Just a suggestion, of course. If you plan to keep the RuleErrorsModal,
the padding made it seem a bit off compared to other dialogs throughout
the manager.
> + },
> + getTip: (value) => {
> + switch (value) {
> + case 'contradictory':
> + return gettext('Errors');
> + case 'disabled':
> + return gettext('Disabled');
> + default:
> + return gettext('Enabled');
> + }
> + },
> + getClass: (value) => {
> + let iconName = 'check';
> +
> + if (value === 'contradictory') {
> + iconName = 'exclamation-triangle';
> + } else if (value === 'disabled') {
> + iconName = 'minus';
> + }
> +
> + return `fa fa-${iconName}`;
> + },
> + },
> + ],
> + },
> + {
> + header: gettext('Rule'),
> + width: 200,
> + dataIndex: 'rule',
> + },
> + );
> +
> + me.columns.push({
> + header: gettext('Comment'),
> + flex: 1,
> + renderer: Ext.String.htmlEncode,
> + dataIndex: 'comment',
> + });
> +
> + me.callParent();
> + },
> +});
> +
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (39 preceding siblings ...)
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry Daniel Kral
@ 2025-06-20 15:43 ` Daniel Kral
2025-06-20 17:11 ` Jillian Morgan
2025-06-23 8:11 ` DERUMIER, Alexandre via pve-devel
[not found] ` <bf973ec4e8c52a10535ed35ad64bf0ec8d1ad37d.camel@groupe-cyllene.com>
42 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-20 15:43 UTC (permalink / raw)
To: pve-devel
On 6/20/25 16:31, Daniel Kral wrote:
> Changelog
> ---------
Just noticed that I missed one detail that might be beneficial to know,
so following the patch changes is easier:
- migrate ha groups internally in the HA Manager to ha location rules,
so that internally these can already be replaced; the test cases in
ha-manager patch #09 (stuck in moderator review because it became
quite large) are there to ensure that the migration produces the same
result for the migrated location rules
On 6/20/25 16:31, Daniel Kral wrote:
> TODO
> ----
>
> There are some things left to be done or discussed for a proper patch
> series:
Also other small things to point out:
- Add missing comment field in rule edit dialog
- Since ha location rules were designed so that these will never be
dropped by the rule checks (this is because there was no notion of
dropping ha groups), location rules are the only rules that can
introduce conflicts, e.g. introducing another priority group in the
location rule or restricting colocated services too much.
What should we do here? allow dropping location rules when these are
not automatically migrated from groups? Or show a conformation dialog
when creating these, so that users are warned? Both of them would
introduce some more complexity/state in how rules are checked. For
now, these conflicts are created silently.
- Reload both ha location and ha colocation rules if one of them gets
changed (e.g. when a location rule is added that creates a conflict in
ha colocation rules, then it will only show the conflict on the next
reload).
>
> - Implement check which does not allow negative colocation rules with
> more services than nodes, because these cannot be applied. Or should
> we just fail the remaining services which cannot be separated to any
> node since these do not have anywhere to go?
>
> - How can the migration process from HA groups to HA location rules be
> improved? Add a 'Migrate' button to the HA Groups page and then
> auto-toggle the use-location-rules feature flag? Should the
> use-location-rules feature flag even be user-toggleable?
Another point here for the migration of HA groups to HA location rules
is how we would name these new location rules? In the auto-migration the
code currently prefixes the group name with `_group_` so that they
cannot conflict with config keys as these cannot start with a
underscore. If we introduce a manual "Migrate" button, then we'd need to
handle name collisions with either already existing HA location rules
(especially if we allow switching back and forward) and existing HA
colocation rules.
>
> - Add web interface and/or CLI facing messages about the HA service
> migration blockers and side-effects. The rough idea would be like the
> following (feedback highly appreciated!):
>
> - For the web interface, I'd make these visible through the already
> existing precondition checks (which need to also be added for
> containers, as there is no existing API endpoint there).
> Side-effects would be 'warning' items, which just state that some
> positively colocated service is migrated with them (the 'Migrate'
> button then is the confirmation for that). Blockers would be
> 'error' items, which state that a negatively colocated service is
> on the requested target node and therefore the migration is
> blocked because of that.
>
> - For bulk migrations in the web interface, these are still visible
> through the console that is popped up afterwards, which should
> print the messages from the migrate/relocate crm-command API
> endpoints.
>
> - For the CLI, I'd add another 'force' flag or something similar. If
> there are side-effects and the force flag is not set, then no
> migration happens at all, but the user gets a list of the
> migrations that will be done and should confirm by making another
> call to 'migrate'/'relocate' with the force flag set to confirm
> these choices.
>
> - Add more user documentation (especially about conflicts, migrations,
> restrictions and failover scenario handling)
>
> - Add mixed test cases with HA location and HA colocation rules
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-20 15:43 ` [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
@ 2025-06-20 17:11 ` Jillian Morgan
2025-06-20 17:45 ` DERUMIER, Alexandre via pve-devel
[not found] ` <476c41123dced9d560dfbf27640ef8705fd90f11.camel@groupe-cyllene.com>
0 siblings, 2 replies; 70+ messages in thread
From: Jillian Morgan @ 2025-06-20 17:11 UTC (permalink / raw)
To: Proxmox VE development discussion
Daniel,
Firstly I want to say thank you very, very, very much! This extensive work
obviously took a lot of time and effort. I feel like one of my Top-5 gripes
with Proxmox (after moving from oVirt) will finally be resolved by this new
feature.
Next, however, I would like to add my two cents to the discussion over the
nomenclature being chosen, since it seems to be not-quite set in stone yet.
Here are my thoughts:
1) Having "location" and "colocation" rules is, I think, going to be
unnecessarily confusing for people. While it isn't too complicated to glean
the distinction once having read the descriptions of them (and I had to go
read the descriptions), they don't convey immediately how they
differentiate themselves from each other. I think the concepts are better
described by something like "host-service affinity" (for positive or
negative affinity between service(s) and specific host(s)/Resource Pools),
and "service-service affinity" (for positive or negative affinity between
multiple services (where any relationship to specific hosts are
inconsequential or specifically undesirable).
2) Your own discussion seems to refer to "affinity" quite regularly, so
calling the rules by some other names in the documentation/CLI/UI seems to
be a choice made to try to 'simplify' the concept for an audience that
doesn't need the concept to be simplified, and in fact probably just
confuses things.
3) Despite your feeling otherwise, I believe that naming them "affinity"
rules will be very well understood by anyone who has worked with other
cluster systems, basic system administration (CPU pinning, for example), or
any other sort of computer science or engineering background. I think the
number of people coming into the world of Proxmox with zero prior
experience in the field is probably very low, and they would be well-served
to learn the word "affinity", since that is what's most commonly used in
the industry.
4) Similarly, your own discussion refers to "positive" and "negative"
affinity, yet a choice was made to identify these in the rule configuration
as "together" and "separate", which while relatively clear, feels entirely
contrived (as well as upsetting the language part of my brain by being
adverbs when adjectives are warranted) since affinity (in computing /
resource scheduling contexts at least) is very commonly described as
positive and negative.
Happy to discuss further, or be pointed to prior debates over this that
I've likely missed.
And, of course, I'd happily suffer the cringe-worth nomenclature to have
the feature sooner than later! Just saying: Thumbs Up!
--
Jillian Morgan (she/her)
Systems & Networking Specialist
Primordial Software Group & I.T. Consultancy
https://www.primordial.ca
On Fri, Jun 20, 2025 at 11:43 AM Daniel Kral <d.kral@proxmox.com> wrote:
> On 6/20/25 16:31, Daniel Kral wrote:
> > Changelog
> > ---------
>
> Just noticed that I missed one detail that might be beneficial to know,
> so following the patch changes is easier:
>
> - migrate ha groups internally in the HA Manager to ha location rules,
> so that internally these can already be replaced; the test cases in
> ha-manager patch #09 (stuck in moderator review because it became
> quite large) are there to ensure that the migration produces the same
> result for the migrated location rules
>
> On 6/20/25 16:31, Daniel Kral wrote:
> > TODO
> > ----
> >
> > There are some things left to be done or discussed for a proper patch
> > series:
>
> Also other small things to point out:
>
> - Add missing comment field in rule edit dialog
>
> - Since ha location rules were designed so that these will never be
> dropped by the rule checks (this is because there was no notion of
> dropping ha groups), location rules are the only rules that can
> introduce conflicts, e.g. introducing another priority group in the
> location rule or restricting colocated services too much.
>
> What should we do here? allow dropping location rules when these are
> not automatically migrated from groups? Or show a conformation dialog
> when creating these, so that users are warned? Both of them would
> introduce some more complexity/state in how rules are checked. For
> now, these conflicts are created silently.
>
> - Reload both ha location and ha colocation rules if one of them gets
> changed (e.g. when a location rule is added that creates a conflict in
> ha colocation rules, then it will only show the conflict on the next
> reload).
>
> >
> > - Implement check which does not allow negative colocation rules with
> > more services than nodes, because these cannot be applied. Or should
> > we just fail the remaining services which cannot be separated to any
> > node since these do not have anywhere to go?
> >
> > - How can the migration process from HA groups to HA location rules be
> > improved? Add a 'Migrate' button to the HA Groups page and then
> > auto-toggle the use-location-rules feature flag? Should the
> > use-location-rules feature flag even be user-toggleable?
>
> Another point here for the migration of HA groups to HA location rules
> is how we would name these new location rules? In the auto-migration the
> code currently prefixes the group name with `_group_` so that they
> cannot conflict with config keys as these cannot start with a
> underscore. If we introduce a manual "Migrate" button, then we'd need to
> handle name collisions with either already existing HA location rules
> (especially if we allow switching back and forward) and existing HA
> colocation rules.
>
> >
> > - Add web interface and/or CLI facing messages about the HA service
> > migration blockers and side-effects. The rough idea would be like the
> > following (feedback highly appreciated!):
> >
> > - For the web interface, I'd make these visible through the already
> > existing precondition checks (which need to also be added for
> > containers, as there is no existing API endpoint there).
> > Side-effects would be 'warning' items, which just state that some
> > positively colocated service is migrated with them (the 'Migrate'
> > button then is the confirmation for that). Blockers would be
> > 'error' items, which state that a negatively colocated service is
> > on the requested target node and therefore the migration is
> > blocked because of that.
> >
> > - For bulk migrations in the web interface, these are still visible
> > through the console that is popped up afterwards, which should
> > print the messages from the migrate/relocate crm-command API
> > endpoints.
> >
> > - For the CLI, I'd add another 'force' flag or something similar. If
> > there are side-effects and the force flag is not set, then no
> > migration happens at all, but the user gets a list of the
> > migrations that will be done and should confirm by making another
> > call to 'migrate'/'relocate' with the force flag set to confirm
> > these choices.
> >
> > - Add more user documentation (especially about conflicts, migrations,
> > restrictions and failover scenario handling)
> >
> > - Add mixed test cases with HA location and HA colocation rules
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-20 17:11 ` Jillian Morgan
@ 2025-06-20 17:45 ` DERUMIER, Alexandre via pve-devel
[not found] ` <476c41123dced9d560dfbf27640ef8705fd90f11.camel@groupe-cyllene.com>
1 sibling, 0 replies; 70+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-06-20 17:45 UTC (permalink / raw)
To: pve-devel; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14153 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
Date: Fri, 20 Jun 2025 17:45:47 +0000
Message-ID: <476c41123dced9d560dfbf27640ef8705fd90f11.camel@groupe-cyllene.com>
>>1) Having "location" and "colocation" rules is, I think, going to be
>>unnecessarily confusing for people. While it isn't too complicated to
>>glean
>>the distinction once having read the descriptions of them (and I had
>>to go
>>read the descriptions), they don't convey immediately how they
>>differentiate themselves from each other. I think the concepts are
>>better
>>described by something like "host-service affinity" (for positive or
>>negative affinity between service(s) and specific host(s)/Resource
>>Pools),
>>and "service-service affinity" (for positive or negative affinity
>>between
>>multiple services (where any relationship to specific hosts are
>>inconsequential or specifically undesirable).
Hi, I had already said the same as comment of the v1 patch,
I don't care personally, but all my customers coming from vmware, xcp-
ng, or cloud provider with ec2 or gcp, everybody in the industry is
using "affinity/antifinity host/vms" since 20years , and I'm pretty
sure that if they read the doc and some whitepaper/benchmark
comparaison on the net (not even talking about chatgpt lol), they'll
think that the feature is not available.
Alexandre
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
[parent not found: <476c41123dced9d560dfbf27640ef8705fd90f11.camel@groupe-cyllene.com>]
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
[not found] ` <476c41123dced9d560dfbf27640ef8705fd90f11.camel@groupe-cyllene.com>
@ 2025-06-23 15:36 ` Thomas Lamprecht
2025-06-24 8:48 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Thomas Lamprecht @ 2025-06-23 15:36 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
Am 20.06.25 um 19:45 schrieb DERUMIER, Alexandre:
>>> 1) Having "location" and "colocation" rules is, I think, going to be
>>> unnecessarily confusing for people. While it isn't too complicated to
>>> glean
>>> the distinction once having read the descriptions of them (and I had
>>> to go
>>> read the descriptions), they don't convey immediately how they
>>> differentiate themselves from each other. I think the concepts are
>>> better
>>> described by something like "host-service affinity" (for positive or
>>> negative affinity between service(s) and specific host(s)/Resource
>>> Pools),
>>> and "service-service affinity" (for positive or negative affinity
>>> between
>>> multiple services (where any relationship to specific hosts are
>>> inconsequential or specifically undesirable).
>
> Hi, I had already said the same as comment of the v1 patch,
>
> I don't care personally, but all my customers coming from vmware, xcp-
> ng, or cloud provider with ec2 or gcp, everybody in the industry is
> using "affinity/antifinity host/vms" since 20years , and I'm pretty
> sure that if they read the doc and some whitepaper/benchmark
> comparaison on the net (not even talking about chatgpt lol), they'll
> think that the feature is not available.
IIRC Daniel took that nomenclature from pacemaker, albeit I mentioned
that I really would not use that complex (!) project as example to
follow, the PVE HA manager exists explicitly due to that being rather
confusing and hard to configure for simple(r) use cases.
Anyhow, the names can be changed rather easily, and the input of you
two certainly puts some additional weight for the "affinity" and
"anti-affinity" terminology, so thanks for chiming in.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-23 15:36 ` Thomas Lamprecht
@ 2025-06-24 8:48 ` Daniel Kral
2025-06-27 12:23 ` Friedrich Weber
0 siblings, 1 reply; 70+ messages in thread
From: Daniel Kral @ 2025-06-24 8:48 UTC (permalink / raw)
To: Proxmox VE development discussion, Thomas Lamprecht, DERUMIER,
Alexandre, Jillian Morgan
On 6/23/25 17:36, Thomas Lamprecht wrote:
> Am 20.06.25 um 19:45 schrieb DERUMIER, Alexandre:
>>>> 1) Having "location" and "colocation" rules is, I think, going to be
>>>> unnecessarily confusing for people. While it isn't too complicated to
>>>> glean
>>>> the distinction once having read the descriptions of them (and I had
>>>> to go
>>>> read the descriptions), they don't convey immediately how they
>>>> differentiate themselves from each other. I think the concepts are
>>>> better
>>>> described by something like "host-service affinity" (for positive or
>>>> negative affinity between service(s) and specific host(s)/Resource
>>>> Pools),
>>>> and "service-service affinity" (for positive or negative affinity
>>>> between
>>>> multiple services (where any relationship to specific hosts are
>>>> inconsequential or specifically undesirable).
>>
>> Hi, I had already said the same as comment of the v1 patch,
>>
>> I don't care personally, but all my customers coming from vmware, xcp-
>> ng, or cloud provider with ec2 or gcp, everybody in the industry is
>> using "affinity/antifinity host/vms" since 20years , and I'm pretty
>> sure that if they read the doc and some whitepaper/benchmark
>> comparaison on the net (not even talking about chatgpt lol), they'll
>> think that the feature is not available.
>
> IIRC Daniel took that nomenclature from pacemaker, albeit I mentioned
> that I really would not use that complex (!) project as example to
> follow, the PVE HA manager exists explicitly due to that being rather
> confusing and hard to configure for simple(r) use cases.
>
> Anyhow, the names can be changed rather easily, and the input of you
> two certainly puts some additional weight for the "affinity" and
> "anti-affinity" terminology, so thanks for chiming in.
Correct, I got those from pacemaker, but I don't have any hard feelings
changing them and will do so happily for the patch series, especially as
it helps users grasp the concepts quicker without needing to consult the
documentation just for understanding the names.
If it's not too much burden on the developer-side, I'd stick to
"location" and "colocation" (positive/negative) in the code itself, as
there short names are always a benefit IMO (with a notice what they're
referred to on the user-facing side), but no hard feelings to change
them there too if it's confusing otherwise.
If the following names are good to all as well, I'd change the rule
names from/to:
"location" => "Service-Host Affinity"
"colocation" => "Service-Service Affinity"
and for colocation rules from/to:
"together" => "positive"
"separate" => "negative"
as suggested by @Jillian Morgan, but I'm very open for feedback on that.
Especially if there's a good way to integrate the "affinity" and
"anti-affinity" terminology here, but "Service-Host Affinity" rules
don't have that yet (but could be a future addition if there are user
requests for that).
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-24 8:48 ` Daniel Kral
@ 2025-06-27 12:23 ` Friedrich Weber
2025-06-27 12:41 ` Daniel Kral
0 siblings, 1 reply; 70+ messages in thread
From: Friedrich Weber @ 2025-06-27 12:23 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral, Thomas Lamprecht,
DERUMIER, Alexandre, Jillian Morgan
Hi, as Daniel and I talked a bit off-list about the naming aspects, I'm
chiming in too.
On 24/06/2025 10:48, Daniel Kral wrote:
> On 6/23/25 17:36, Thomas Lamprecht wrote:
>> Am 20.06.25 um 19:45 schrieb DERUMIER, Alexandre:
>>>>> 1) Having "location" and "colocation" rules is, I think, going to be
>>>>> unnecessarily confusing for people. While it isn't too complicated to
>>>>> glean
>>>>> the distinction once having read the descriptions of them (and I had
>>>>> to go
>>>>> read the descriptions), they don't convey immediately how they
>>>>> differentiate themselves from each other. I think the concepts are
>>>>> better
>>>>> described by something like "host-service affinity" (for positive or
>>>>> negative affinity between service(s) and specific host(s)/Resource
>>>>> Pools),
>>>>> and "service-service affinity" (for positive or negative affinity
>>>>> between
>>>>> multiple services (where any relationship to specific hosts are
>>>>> inconsequential or specifically undesirable).
>>>
>>> Hi, I had already said the same as comment of the v1 patch,
>>>
>>> I don't care personally, but all my customers coming from vmware, xcp-
>>> ng, or cloud provider with ec2 or gcp, everybody in the industry is
>>> using "affinity/antifinity host/vms" since 20years , and I'm pretty
>>> sure that if they read the doc and some whitepaper/benchmark
>>> comparaison on the net (not even talking about chatgpt lol), they'll
>>> think that the feature is not available.
>>
>> IIRC Daniel took that nomenclature from pacemaker, albeit I mentioned
>> that I really would not use that complex (!) project as example to
>> follow, the PVE HA manager exists explicitly due to that being rather
>> confusing and hard to configure for simple(r) use cases.
>>
>> Anyhow, the names can be changed rather easily, and the input of you
>> two certainly puts some additional weight for the "affinity" and
>> "anti-affinity" terminology, so thanks for chiming in.
>
> Correct, I got those from pacemaker, but I don't have any hard feelings
> changing them and will do so happily for the patch series, especially as
> it helps users grasp the concepts quicker without needing to consult the
> documentation just for understanding the names.
>
> If it's not too much burden on the developer-side, I'd stick to
> "location" and "colocation" (positive/negative) in the code itself, as
> there short names are always a benefit IMO (with a notice what they're
> referred to on the user-facing side), but no hard feelings to change
> them there too if it's confusing otherwise.
I think, if it's not too awkward, it would be nicer to use the same
nomenclature in the user-facing interfaces (docs, config, cli, ...) and
in the internal code -- one never knows if some internal names "leak" to
the outside and may cause confusion.
> If the following names are good to all as well, I'd change the rule
> names from/to:
>
> "location" => "Service-Host Affinity"
> "colocation" => "Service-Service Affinity"
I'm not a huge fan of the "colocation" naming, especially because
"negative colocation" sounds like an oxymoron to me (because of the
association "co" = together but "negative" = not together), but that
might just be me.
Since the proposed "Service-Host Affinity" and "Service-Service
Affinity" are quite long: What about shortening those to "Host Affinity"
and "Service Affinity"? Since affinity rules are defined for HA
services, it should be clear that the subject is the "Service" in both
cases. Well, unless with this naming one could get the impression that
"host affinity" rules are defined for hosts, and "service affinity"
rules are defined for services, which would be wrong ...
And one last thought, I'd replace "Host Affinity" with "Node Affinity",
since I think in a cluster context we refer to the cluster hosts as
"nodes" much more often.
>
> and for colocation rules from/to:
>
> "together" => "positive"
> "separate" => "negative"
>
> as suggested by @Jillian Morgan, but I'm very open for feedback on that.
> Especially if there's a good way to integrate the "affinity" and "anti-
> affinity" terminology here, but "Service-Host Affinity" rules don't have
> that yet (but could be a future addition if there are user requests for
> that).
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-27 12:23 ` Friedrich Weber
@ 2025-06-27 12:41 ` Daniel Kral
0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kral @ 2025-06-27 12:41 UTC (permalink / raw)
To: Friedrich Weber, Proxmox VE development discussion,
Thomas Lamprecht, DERUMIER, Alexandre, Jillian Morgan
On 6/27/25 14:23, Friedrich Weber wrote:
> Hi, as Daniel and I talked a bit off-list about the naming aspects, I'm
> chiming in too.
>
> On 24/06/2025 10:48, Daniel Kral wrote:
>> If it's not too much burden on the developer-side, I'd stick to
>> "location" and "colocation" (positive/negative) in the code itself, as
>> there short names are always a benefit IMO (with a notice what they're
>> referred to on the user-facing side), but no hard feelings to change
>> them there too if it's confusing otherwise.
>
> I think, if it's not too awkward, it would be nicer to use the same
> nomenclature in the user-facing interfaces (docs, config, cli, ...) and
> in the internal code -- one never knows if some internal names "leak" to
> the outside and may cause confusion.
Right, I see that now too and with the shorter proposed names from you
below here, I can see the same names used in the code as well.
The only thing I also pointed out in our brief discussion is that I
liked the shortness of "positively colocated services" and "negatively
colocated services" for services that are in a "colocation" relationship
(which are declared by the colocation rules).
But I'll find another nice brief equivalent description for these with
the new names as well, e.g., "services in positive affinity with service
..." and "services in negative affinity with service ...". That could
also work for the node affinity: "services in affinity to node ...".
>
>> If the following names are good to all as well, I'd change the rule
>> names from/to:
>>
>> "location" => "Service-Host Affinity"
>> "colocation" => "Service-Service Affinity"
>
> I'm not a huge fan of the "colocation" naming, especially because
> "negative colocation" sounds like an oxymoron to me (because of the
> association "co" = together but "negative" = not together), but that
> might just be me.
>
> Since the proposed "Service-Host Affinity" and "Service-Service
> Affinity" are quite long: What about shortening those to "Host Affinity"
> and "Service Affinity"? Since affinity rules are defined for HA
> services, it should be clear that the subject is the "Service" in both
> cases. Well, unless with this naming one could get the impression that
> "host affinity" rules are defined for hosts, and "service affinity"
> rules are defined for services, which would be wrong ...
>
> And one last thought, I'd replace "Host Affinity" with "Node Affinity",
> since I think in a cluster context we refer to the cluster hosts as
> "nodes" much more often.
I think those are even better names as already mentioned, as they
quickly describe what they do while being short enough for mentioning
them anywhere. Another point is that "HA Service" and "HA Resource" are
synonyms to each other but "HA Resource" is mentioned much more often
and is also the first name used in the documentation, so I'd go for
- "HA Node Affinity (Rules)"
- "HA Resource Affinity (Rules)"
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
` (40 preceding siblings ...)
2025-06-20 15:43 ` [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
@ 2025-06-23 8:11 ` DERUMIER, Alexandre via pve-devel
[not found] ` <bf973ec4e8c52a10535ed35ad64bf0ec8d1ad37d.camel@groupe-cyllene.com>
42 siblings, 0 replies; 70+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-06-23 8:11 UTC (permalink / raw)
To: pve-devel, d.kral; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 12874 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "d.kral@proxmox.com" <d.kral@proxmox.com>
Subject: Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
Date: Mon, 23 Jun 2025 08:11:50 +0000
Message-ID: <bf973ec4e8c52a10535ed35ad64bf0ec8d1ad37d.camel@groupe-cyllene.com>
Hi Daniel,
Thanks for your hard work on this.
I don't known if it's the best place, but 1 thing missing currently,
if ressource affinity, like for example,
if a vm use a specific storage, it need to run on a node where the
storage is present.
Same for the number of cores of vm (the host numbers of cores need to
be >= than the vm cores).
I don't known if it could be possible to reuse your rules framework,
and add some kind of implicit rules. (and maybe them dynamic if vm
config change) ?
With current HA for example, you setup local zfs mirroring between 2
nodes, and if the vm is migrated to the wrong node, the vm is
completly stuck, and the only way to fix it to disable HA and move the
vm config file manually to the correct node.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
[parent not found: <bf973ec4e8c52a10535ed35ad64bf0ec8d1ad37d.camel@groupe-cyllene.com>]
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
[not found] ` <bf973ec4e8c52a10535ed35ad64bf0ec8d1ad37d.camel@groupe-cyllene.com>
@ 2025-06-23 15:28 ` Thomas Lamprecht
2025-06-23 23:21 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 70+ messages in thread
From: Thomas Lamprecht @ 2025-06-23 15:28 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, d.kral
Am 23.06.25 um 10:11 schrieb DERUMIER, Alexandre:
> I don't known if it's the best place, but 1 thing missing currently,
> if ressource affinity, like for example,
>
> if a vm use a specific storage, it need to run on a node where the
> storage is present.
> Same for the number of cores of vm (the host numbers of cores need to
> be >= than the vm cores).
>
> I don't known if it could be possible to reuse your rules framework,
> and add some kind of implicit rules. (and maybe them dynamic if vm
> config change) ?
>
>
> With current HA for example, you setup local zfs mirroring between 2
> nodes, and if the vm is migrated to the wrong node, the vm is
> completly stuck, and the only way to fix it to disable HA and move the
> vm config file manually to the correct node.
These are definitively good points and wanted features, but I do
not think that we need to have them in the first version applied,
IMO the series is already big enough as is.
So, it's probably easier to add them later on, as auto-generated
constraints derived from the VM config and updated if that config
changes; as we track the "config version" in pmxcfs already anyway
it should be possible to detect if something might have changed
quite cheaply.
This would also benefit from re-evaluating if we can transform the
HA stack into a general resource manager (which it basically already
is prepared for) with HA+fencing for specific guests as opt-in
extra feature, as that would better fit the balancing feature, which
should take all guests into account, and also allow setting affinity
constraints on guests that are not HA managed.
But again, I would do that on top of this series, it should not
really change anything for the implementation here.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
2025-06-23 15:28 ` Thomas Lamprecht
@ 2025-06-23 23:21 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 70+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-06-23 23:21 UTC (permalink / raw)
To: pve-devel, d.kral, t.lamprecht; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 13132 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "d.kral@proxmox.com" <d.kral@proxmox.com>, "t.lamprecht@proxmox.com" <t.lamprecht@proxmox.com>
Subject: Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
Date: Mon, 23 Jun 2025 23:21:52 +0000
Message-ID: <748914432fb985d537c63a06f2ac3a0b9a70675a.camel@groupe-cyllene.com>
>>But again, I would do that on top of this series, it should not
>>really change anything for the implementation here.
Yes sure, no rush, we still hae workaround with manually setting host
colocation rules.
I asked that to be sure that this case could be possible to implemented
later without need to rewrite all the code.
Thanks !
Alexandre
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 70+ messages in thread