* [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
@ 2025-03-25 15:12 Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
` (18 more replies)
0 siblings, 19 replies; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
This RFC patch series is a draft for the implementation to allow users
to specify colocation rules (or affinity/anti-affinity) for the HA
Manager, so that two or more services are either kept together or apart
with respect to each other in case of service recovery or if
auto-rebalancing on service start is enabled.
I chose the name "colocation" in favor of affinity/anti-affinity, since
it is a bit more concise that it is about co-locating services between
each other in contrast to locating services on nodes, but no hard
feelings to change it (same for any other names in this series).
Many thanks to @Thomas, @Fiona, @Friedrich, and @Hannes Duerr for the
discussions about this feature off-list!
Recap: HA groups
----------------
The HA Manager currently allows a service to be assigned to one HA
groups, which essentially implements an affinity to a set of nodes. This
affinity can either be unrestricted or restricted, where the first
allows recovery to nodes outside of the HA group's nodes, if those are
currently unavailable.
This allows users to constrain the set of nodes, that can be selected
from as the starting and/or recovery node. Furthermore, each node in a
HA group can have an individual priority. This further constraints the
set of possible recovery nodes to the subset of online nodes in the
highest priority group.
Introduction
------------
Colocation is the concept of an inter-service affinity relationship,
which can either be positive (keep services together) or negative (keep
services apart). This is in contrast with the service-nodes affinity
relationship implemented by HA groups.
In addition to the positive-negative dimension, there's also the
mandatory-optional axis. Currently, this is a binary setting, whether
failing to meet the colocation relationship results in a service
- (1) being kept in recovery for a mandatory colocation rule, or
- (2) is migrated in ignorance to the optional colocation rule.
Motivation
----------
There are many different use cases to support colocation, but two simple
examples that come to mind are:
- Two or more services need to communicate with each other very
frequently. To reduce the communication path length and therefore
hopefully the latency, keep them together on one node.
- Two or more services need a lot of computational resources and will
therefore consume much of the assigned node's resource capacity. To
reduce starving and memory stalls, keep them separate on multiple
nodes, so that they have enough resources for themselves.
And some more concrete use cases from current HA Manager users:
- "For example: let's say we have three DB VMs (DB nodes in a cluster)
which we want to run on ANY PVE host, but we don't want them to be on
the same host." [0]
- "An example is: When Server from the DMZ zone start on the same host
like the firewall or another example the application servers start on
the same host like the sql server. Services they depend on each other
have short latency to each other." [1]
HA Rules
--------
To implement colocation, this patch series introduces HA rules, which
allows users to specify the colocation requirements on services. These
are implemented with the widely used section config, where each type of
rule is a individual plugin (for now only 'colocation').
This introduces some small initial complexity for testing satisfiability
of the rules, but allows the constraint interface to be extensible, and
hopefully allow easier reasoning about the node selection process with
the added constraint rules in the future.
Colocation Rules
----------------
The two properties of colocation rules, as described in the
introduction, are rather straightforward. A typical colocation rule
inside of the config would look like the following:
colocation: some-lonely-services
services vm:101,vm:103,ct:909
affinity separate
strict 1
This means that the three services vm:101, vm:103 and ct:909 must be
kept separate on different nodes. I'm very keen on naming suggestions
since I think there could be a better word than 'affinity' here. I
played around with 'keep-services', since then it would always read
something like 'keep-services separate', which is very declarative, but
this might suggest that this is a binary option to too much users (I
mean it is, but not with the values 0 and 1).
Satisfiability and Inference
----------------------------
Since rules allow more complexity, it is necessary to check whether
rules can be (1) satisfied, (2) simplified, and (3) infer other
constraints. There's a static part (i.e. the configuration file) and a
dynamic part (i.e. when deciding the next node) for this.
| Satisfiability
----------
Statically, colocation rules currently must satisfy:
- Two or more services must not be in both a positive and negative
colocation rule.
- Two or more services in a positive colocation rule must not be in
restricted HA groups with disjoint node sets.
- Two or more services in a negative colocation rule, which are in
restricted HA groups, must have at least as many statically available
nodes as node-restricted services.
The first is obvious. The second one asserts whether there is at least a
common node that can be recovered to. The third one asserts whether
there are enough nodes that can be selected from for recovery of the
services, which are restricted to a set of node.
Of course, it doesn't make sense to have three services in a negative
colocation relation in case of a failover, if there are only three
cluster nodes, but the static part is only a best effort to reduce
obvious misconfigurations.
| Canonicalization
----------
Additionally, colocation rules are currently simplified as follows:
- If there are multiple positive colocation rules with common services
and the same strictness, these are merged to a single positive
colocation rule.
| Inference rules
----------
There are currently no inference rules implemented for the RFC, but
there could be potential to further simplify some code paths in the
future, e.g. a positive colocation rule where one service is part of a
restricted HA group makes the other services in the positive colocation
rule a part of this HA group as well.
I leave this open for discussion here.
Special negative colocation scenarios
-------------------------------------
Just to be aware of these, there's a distinction between the following
two sets of negative colocation rules:
colocation: separate-vms
services vm:101,vm:102,vm:103
affinity separate
strict 1
and
colocation: separate-vms1
services vm:101,vm:102
affinity separate
strict 1
colocation: separate-vms2
services vm:102,vm:103
affinity separate
strict 1
The first keeps all three services separate from each other, while the
second only keeps pair-wise services separate from each other, but
vm:101 and vm:103 might be migrated to the same node.
Test cases
----------
The test cases are quite straight forward and I designed them so they
would fail without the colocation rules applied. This can be verified,
if the `apply_colocation_rules(...)` is removed from the
`select_service_node()` body.
They are not completely exhaustive and I didn't implement test cases
with HA groups yet (both for the ha-tester and rules config tests), but
would be implemented in a post-RFC.
Also the loose tests are complete copies of their strict counterparts,
where only the expected log and the rules are changed from 'strict 1' to
'strict 0'.
TODO
----
- WebGUI Integration
- User Documentation
- Add test cases with HA groups and more complex scenarios
- CLI / API endpoints for CRUD and maybe verification
- Cleanup the `select_service_node` signature into two structs as
suggested by @Thomas in [3]
Additional and/or future ideas
------------------------------
- Transforming HA groups to location rules (see comment below).
- Make recomputing the online node usage more granular.
- Add information of overall free node resources to improve decision
heuristic when recovering services to nodes.
- Improve recovery node selection for optional positive colocation.
Correlated with the idea about free node resources above.
- When deciding the recovery node for positively colocated services,
account for the needed resources of all to-be-migrated services rather
than just the first one. This is a non-trivial problem as we currently
solve this as a online bin covering problem, i.e. selecting for each
service alone instead of selecting for all services together.
- When migrating a service manually, migrate the colocated services too.
But this would also mean that we need to check whether a migration is
legal according to the colocation rules, which we do not do yet for HA
groups.
- Dynamic colocation rule health statistics (e.g. warn on the
satisfiability of a colocation rule), e.g. in the WebGUI and/or API.
- Property for mandatory colocation rules to specify whether all
services should be stopped if the rule cannot be satisfied.
Comment about HA groups -> Location Rules
-----------------------------------------
This part is not really part of the patch series, but still worth for an
on-list discussion.
I'd like to suggest to also transform the existing HA groups to location
rules, if the rule concept turns out to be a good fit for the colocation
feature in the HA Manager, as HA groups seem to integrate quite easily
into this concept.
This would make service-node relationships a little more flexible for
users and we'd be able to have both configurable / visible in the same
WebUI view, API endpoint, and configuration file. Also, some code paths
could be a little more consise, e.g. checking changes to constraints and
canonicalizing the rules config.
The how should be rather straightforward for the obvious use cases:
- Services in unrestricted HA groups -> Location rules with the nodes of
the HA group; We could either split each node priority group into
separate location rules (with each having their score / weight) or
keep the input format of HA groups with a list of
`<node>(:<priority>)` in each rule
- Services in restricted HA groups -> Same as above, but also using
either `+inf` for a mandatory location rule or `strict` property
depending on how we decide on the colocation rule properties
This would allow most of the use cases of HA groups to be easily
migratable to location rules. We could also keep the inference of the
'default group' for unrestricted HA groups (any node that is available
is added as a group member with priority -1).
The only thing that I'm unsure about this, is how we would migrate the
`nofailback` option, since this operates on the group-level. If we keep
the `<node>(:<priority>)` syntax and restrict that each service can only
be part of one location rule, it'd be easy to have the same flag. If we
go with multiple location rules per service and each having a score or
weight (for the priority), then we wouldn't be able to have this flag
anymore. I think we could keep the semantic if we move this flag to the
service config, but I'm thankful for any comments on this.
[0] https://clusterlabs.org/projects/pacemaker/doc/3.0/Pacemaker_Explained/html/constraints.html#colocation-properties
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5260
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=5332
[3] https://lore.proxmox.com/pve-devel/c8fa7b8c-fb37-5389-1302-2002780d4ee2@proxmox.com/
Diffstat
--------
pve-cluster:
Daniel Kral (1):
cfs: add 'ha/rules.cfg' to observed files
src/PVE/Cluster.pm | 1 +
src/pmxcfs/status.c | 1 +
2 files changed, 2 insertions(+)
pve-ha-manager:
Daniel Kral (15):
ignore output of fence config tests in tree
tools: add hash set helper subroutines
usage: add get_service_node and pin_service_node methods
add rules section config base plugin
rules: add colocation rule plugin
config, env, hw: add rules read and parse methods
manager: read and update rules config
manager: factor out prioritized nodes in select_service_node
manager: apply colocation rules when selecting service nodes
sim: resources: add option to limit start and migrate tries to node
test: ha tester: add test cases for strict negative colocation rules
test: ha tester: add test cases for strict positive colocation rules
test: ha tester: add test cases for loose colocation rules
test: ha tester: add test cases in more complex scenarios
test: add test cases for rules config
.gitignore | 3 +
debian/pve-ha-manager.install | 2 +
src/PVE/HA/Config.pm | 12 +
src/PVE/HA/Env.pm | 6 +
src/PVE/HA/Env/PVE2.pm | 13 +
src/PVE/HA/Makefile | 3 +-
src/PVE/HA/Manager.pm | 235 ++++++++++-
src/PVE/HA/Rules.pm | 118 ++++++
src/PVE/HA/Rules/Colocation.pm | 391 ++++++++++++++++++
src/PVE/HA/Rules/Makefile | 6 +
src/PVE/HA/Sim/Env.pm | 15 +
src/PVE/HA/Sim/Hardware.pm | 15 +
src/PVE/HA/Sim/Resources/VirtFail.pm | 37 +-
src/PVE/HA/Tools.pm | 53 +++
src/PVE/HA/Usage.pm | 12 +
src/PVE/HA/Usage/Basic.pm | 15 +
src/PVE/HA/Usage/Static.pm | 14 +
src/test/Makefile | 4 +-
.../connected-positive-colocations.cfg | 34 ++
.../connected-positive-colocations.cfg.expect | 54 +++
.../rules_cfgs/illdefined-colocations.cfg | 9 +
.../illdefined-colocations.cfg.expect | 12 +
.../inner-inconsistent-colocations.cfg | 14 +
.../inner-inconsistent-colocations.cfg.expect | 13 +
.../test-colocation-loose-separate1/README | 13 +
.../test-colocation-loose-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-loose-separate4/README | 17 +
.../test-colocation-loose-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 73 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-loose-together1/README | 11 +
.../test-colocation-loose-together1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 66 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-loose-together3/README | 16 +
.../test-colocation-loose-together3/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 93 +++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 +
.../test-colocation-strict-separate1/README | 13 +
.../test-colocation-strict-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-separate2/README | 15 +
.../test-colocation-strict-separate2/cmdlist | 4 +
.../hardware_status | 7 +
.../log.expect | 90 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 10 +
.../test-colocation-strict-separate3/README | 16 +
.../test-colocation-strict-separate3/cmdlist | 4 +
.../hardware_status | 7 +
.../log.expect | 110 +++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 10 +
.../test-colocation-strict-separate4/README | 17 +
.../test-colocation-strict-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 69 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-separate5/README | 11 +
.../test-colocation-strict-separate5/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 56 +++
.../manager_status | 1 +
.../rules_config | 9 +
.../service_config | 5 +
.../test-colocation-strict-together1/README | 11 +
.../test-colocation-strict-together1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 66 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-together2/README | 11 +
.../test-colocation-strict-together2/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 80 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 +
.../test-colocation-strict-together3/README | 17 +
.../test-colocation-strict-together3/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 89 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 +
.../test-crs-static-rebalance-coloc1/README | 26 ++
.../test-crs-static-rebalance-coloc1/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 120 ++++++
.../manager_status | 1 +
.../rules_config | 24 ++
.../service_config | 10 +
.../static_service_stats | 10 +
.../test-crs-static-rebalance-coloc2/README | 16 +
.../test-crs-static-rebalance-coloc2/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 86 ++++
.../manager_status | 1 +
.../rules_config | 14 +
.../service_config | 5 +
.../static_service_stats | 5 +
.../test-crs-static-rebalance-coloc3/README | 14 +
.../test-crs-static-rebalance-coloc3/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 7 +
.../log.expect | 156 +++++++
.../manager_status | 1 +
.../rules_config | 49 +++
.../service_config | 7 +
.../static_service_stats | 5 +
src/test/test_failover1.pl | 4 +-
src/test/test_rules_config.pl | 100 +++++
137 files changed, 3113 insertions(+), 20 deletions(-)
create mode 100644 src/PVE/HA/Rules.pm
create mode 100644 src/PVE/HA/Rules/Colocation.pm
create mode 100644 src/PVE/HA/Rules/Makefile
create mode 100644 src/test/rules_cfgs/connected-positive-colocations.cfg
create mode 100644 src/test/rules_cfgs/connected-positive-colocations.cfg.expect
create mode 100644 src/test/rules_cfgs/illdefined-colocations.cfg
create mode 100644 src/test/rules_cfgs/illdefined-colocations.cfg.expect
create mode 100644 src/test/rules_cfgs/inner-inconsistent-colocations.cfg
create mode 100644 src/test/rules_cfgs/inner-inconsistent-colocations.cfg.expect
create mode 100644 src/test/test-colocation-loose-separate1/README
create mode 100644 src/test/test-colocation-loose-separate1/cmdlist
create mode 100644 src/test/test-colocation-loose-separate1/hardware_status
create mode 100644 src/test/test-colocation-loose-separate1/log.expect
create mode 100644 src/test/test-colocation-loose-separate1/manager_status
create mode 100644 src/test/test-colocation-loose-separate1/rules_config
create mode 100644 src/test/test-colocation-loose-separate1/service_config
create mode 100644 src/test/test-colocation-loose-separate4/README
create mode 100644 src/test/test-colocation-loose-separate4/cmdlist
create mode 100644 src/test/test-colocation-loose-separate4/hardware_status
create mode 100644 src/test/test-colocation-loose-separate4/log.expect
create mode 100644 src/test/test-colocation-loose-separate4/manager_status
create mode 100644 src/test/test-colocation-loose-separate4/rules_config
create mode 100644 src/test/test-colocation-loose-separate4/service_config
create mode 100644 src/test/test-colocation-loose-together1/README
create mode 100644 src/test/test-colocation-loose-together1/cmdlist
create mode 100644 src/test/test-colocation-loose-together1/hardware_status
create mode 100644 src/test/test-colocation-loose-together1/log.expect
create mode 100644 src/test/test-colocation-loose-together1/manager_status
create mode 100644 src/test/test-colocation-loose-together1/rules_config
create mode 100644 src/test/test-colocation-loose-together1/service_config
create mode 100644 src/test/test-colocation-loose-together3/README
create mode 100644 src/test/test-colocation-loose-together3/cmdlist
create mode 100644 src/test/test-colocation-loose-together3/hardware_status
create mode 100644 src/test/test-colocation-loose-together3/log.expect
create mode 100644 src/test/test-colocation-loose-together3/manager_status
create mode 100644 src/test/test-colocation-loose-together3/rules_config
create mode 100644 src/test/test-colocation-loose-together3/service_config
create mode 100644 src/test/test-colocation-strict-separate1/README
create mode 100644 src/test/test-colocation-strict-separate1/cmdlist
create mode 100644 src/test/test-colocation-strict-separate1/hardware_status
create mode 100644 src/test/test-colocation-strict-separate1/log.expect
create mode 100644 src/test/test-colocation-strict-separate1/manager_status
create mode 100644 src/test/test-colocation-strict-separate1/rules_config
create mode 100644 src/test/test-colocation-strict-separate1/service_config
create mode 100644 src/test/test-colocation-strict-separate2/README
create mode 100644 src/test/test-colocation-strict-separate2/cmdlist
create mode 100644 src/test/test-colocation-strict-separate2/hardware_status
create mode 100644 src/test/test-colocation-strict-separate2/log.expect
create mode 100644 src/test/test-colocation-strict-separate2/manager_status
create mode 100644 src/test/test-colocation-strict-separate2/rules_config
create mode 100644 src/test/test-colocation-strict-separate2/service_config
create mode 100644 src/test/test-colocation-strict-separate3/README
create mode 100644 src/test/test-colocation-strict-separate3/cmdlist
create mode 100644 src/test/test-colocation-strict-separate3/hardware_status
create mode 100644 src/test/test-colocation-strict-separate3/log.expect
create mode 100644 src/test/test-colocation-strict-separate3/manager_status
create mode 100644 src/test/test-colocation-strict-separate3/rules_config
create mode 100644 src/test/test-colocation-strict-separate3/service_config
create mode 100644 src/test/test-colocation-strict-separate4/README
create mode 100644 src/test/test-colocation-strict-separate4/cmdlist
create mode 100644 src/test/test-colocation-strict-separate4/hardware_status
create mode 100644 src/test/test-colocation-strict-separate4/log.expect
create mode 100644 src/test/test-colocation-strict-separate4/manager_status
create mode 100644 src/test/test-colocation-strict-separate4/rules_config
create mode 100644 src/test/test-colocation-strict-separate4/service_config
create mode 100644 src/test/test-colocation-strict-separate5/README
create mode 100644 src/test/test-colocation-strict-separate5/cmdlist
create mode 100644 src/test/test-colocation-strict-separate5/hardware_status
create mode 100644 src/test/test-colocation-strict-separate5/log.expect
create mode 100644 src/test/test-colocation-strict-separate5/manager_status
create mode 100644 src/test/test-colocation-strict-separate5/rules_config
create mode 100644 src/test/test-colocation-strict-separate5/service_config
create mode 100644 src/test/test-colocation-strict-together1/README
create mode 100644 src/test/test-colocation-strict-together1/cmdlist
create mode 100644 src/test/test-colocation-strict-together1/hardware_status
create mode 100644 src/test/test-colocation-strict-together1/log.expect
create mode 100644 src/test/test-colocation-strict-together1/manager_status
create mode 100644 src/test/test-colocation-strict-together1/rules_config
create mode 100644 src/test/test-colocation-strict-together1/service_config
create mode 100644 src/test/test-colocation-strict-together2/README
create mode 100644 src/test/test-colocation-strict-together2/cmdlist
create mode 100644 src/test/test-colocation-strict-together2/hardware_status
create mode 100644 src/test/test-colocation-strict-together2/log.expect
create mode 100644 src/test/test-colocation-strict-together2/manager_status
create mode 100644 src/test/test-colocation-strict-together2/rules_config
create mode 100644 src/test/test-colocation-strict-together2/service_config
create mode 100644 src/test/test-colocation-strict-together3/README
create mode 100644 src/test/test-colocation-strict-together3/cmdlist
create mode 100644 src/test/test-colocation-strict-together3/hardware_status
create mode 100644 src/test/test-colocation-strict-together3/log.expect
create mode 100644 src/test/test-colocation-strict-together3/manager_status
create mode 100644 src/test/test-colocation-strict-together3/rules_config
create mode 100644 src/test/test-colocation-strict-together3/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/README
create mode 100644 src/test/test-crs-static-rebalance-coloc1/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc1/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc1/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc2/README
create mode 100644 src/test/test-crs-static-rebalance-coloc2/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc2/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc2/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc3/README
create mode 100644 src/test/test-crs-static-rebalance-coloc3/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc3/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc3/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/static_service_stats
create mode 100755 src/test/test_rules_config.pl
Summary over all repositories:
139 files changed, 3115 insertions(+), 20 deletions(-)
--
Generated by git-murpp 0.8.0
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
` (17 subsequent siblings)
18 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/Cluster.pm | 1 +
src/pmxcfs/status.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index e0e3ee9..afbb36f 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -69,6 +69,7 @@ my $observed = {
'ha/crm_commands' => 1,
'ha/manager_status' => 1,
'ha/resources.cfg' => 1,
+ 'ha/rules.cfg' => 1,
'ha/groups.cfg' => 1,
'ha/fence.cfg' => 1,
'status.cfg' => 1,
diff --git a/src/pmxcfs/status.c b/src/pmxcfs/status.c
index ff5fcc4..cee0c57 100644
--- a/src/pmxcfs/status.c
+++ b/src/pmxcfs/status.c
@@ -97,6 +97,7 @@ static memdb_change_t memdb_change_array[] = {
{ .path = "ha/crm_commands" },
{ .path = "ha/manager_status" },
{ .path = "ha/resources.cfg" },
+ { .path = "ha/rules.cfg" },
{ .path = "ha/groups.cfg" },
{ .path = "ha/fence.cfg" },
{ .path = "status.cfg" },
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-03-25 17:49 ` [pve-devel] applied: " Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
` (16 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
.gitignore | 2 ++
1 file changed, 2 insertions(+)
diff --git a/.gitignore b/.gitignore
index 5b748c4..c35280e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,3 +4,5 @@
*.buildinfo
*.tar.gz
/src/test/test-*/status/*
+/src/test/fence_cfgs/*.cfg.commands
+/src/test/fence_cfgs/*.cfg.write
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-03-25 17:53 ` Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
` (15 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Implement helper subroutines, which implement basic set operations done
on hash sets, i.e. hashes with elements set to a true value, e.g. 1.
These will be used for various tasks in the HA Manager colocation rules,
e.g. for verifying the satisfiability of the rules or applying the
colocation rules on the allowed set of nodes.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
If they're useful somewhere else, I can move them to PVE::Tools
post-RFC, but it'd be probably useful to prefix them with `hash_` there.
AFAICS there weren't any other helpers for this with a quick grep over
all projects and `PVE::Tools::array_intersect()` wasn't what I needed.
src/PVE/HA/Tools.pm | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index 0f9e9a5..fc3282c 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -115,6 +115,48 @@ sub write_json_to_file {
PVE::Tools::file_set_contents($filename, $raw);
}
+sub is_disjoint {
+ my ($hash1, $hash2) = @_;
+
+ for my $key (keys %$hash1) {
+ return 0 if exists($hash2->{$key});
+ }
+
+ return 1;
+};
+
+sub intersect {
+ my ($hash1, $hash2) = @_;
+
+ my $result = { map { $_ => $hash2->{$_} } keys %$hash1 };
+
+ for my $key (keys %$result) {
+ delete $result->{$key} if !defined($result->{$key});
+ }
+
+ return $result;
+};
+
+sub set_difference {
+ my ($hash1, $hash2) = @_;
+
+ my $result = { map { $_ => 1 } keys %$hash1 };
+
+ for my $key (keys %$result) {
+ delete $result->{$key} if defined($hash2->{$key});
+ }
+
+ return $result;
+};
+
+sub union {
+ my ($hash1, $hash2) = @_;
+
+ my $result = { map { $_ => 1 } keys %$hash1, keys %$hash2 };
+
+ return $result;
+};
+
sub count_fenced_services {
my ($ss, $node) = @_;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (2 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-24 12:29 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
` (14 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add methods get_service_node() and pin_service_node() to the Usage class
to retrieve and pin the current node of a specific service.
This is used to retrieve the current node of a service for colocation
rules inside of select_service_node(), where there is currently no
access to the global services state.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
For me this is more of a temporary change, since I don't think putting
this information here is very useful in the future. It was more of a
workaround for the moment, since `select_service_node()` doesn't have
access to the global service configuration data, which is needed here.
I would like to give `select_service_node()` the information from e.g.
$sc directly post-RFC.
src/PVE/HA/Usage.pm | 12 ++++++++++++
src/PVE/HA/Usage/Basic.pm | 15 +++++++++++++++
src/PVE/HA/Usage/Static.pm | 14 ++++++++++++++
3 files changed, 41 insertions(+)
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 66d9572..e4f86d7 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -27,6 +27,18 @@ sub list_nodes {
die "implement in subclass";
}
+sub get_service_node {
+ my ($self, $sid) = @_;
+
+ die "implement in subclass";
+}
+
+sub pin_service_node {
+ my ($self, $sid, $node) = @_;
+
+ die "implement in subclass";
+}
+
sub contains_node {
my ($self, $nodename) = @_;
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index d6b3d6c..50d687b 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -10,6 +10,7 @@ sub new {
return bless {
nodes => {},
+ services => {},
haenv => $haenv,
}, $class;
}
@@ -38,11 +39,25 @@ sub contains_node {
return defined($self->{nodes}->{$nodename});
}
+sub get_service_node {
+ my ($self, $sid) = @_;
+
+ return $self->{services}->{$sid};
+}
+
+sub pin_service_node {
+ my ($self, $sid, $node) = @_;
+
+ $self->{services}->{$sid} = $node;
+}
+
sub add_service_usage_to_node {
my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
if ($self->contains_node($nodename)) {
+ $self->{total}++;
$self->{nodes}->{$nodename}++;
+ $self->{services}->{$sid} = $nodename;
} else {
$self->{haenv}->log(
'warning',
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 3d0af3a..8db9202 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -22,6 +22,7 @@ sub new {
'service-stats' => {},
haenv => $haenv,
scheduler => $scheduler,
+ 'service-nodes' => {},
'service-counts' => {}, # Service count on each node. Fallback if scoring calculation fails.
}, $class;
}
@@ -85,9 +86,22 @@ my sub get_service_usage {
return $service_stats;
}
+sub get_service_node {
+ my ($self, $sid) = @_;
+
+ return $self->{'service-nodes'}->{$sid};
+}
+
+sub pin_service_node {
+ my ($self, $sid, $node) = @_;
+
+ $self->{'service-nodes'}->{$sid} = $node;
+}
+
sub add_service_usage_to_node {
my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+ $self->{'service-nodes'}->{$sid} = $nodename;
$self->{'service-counts'}->{$nodename}++;
eval {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (3 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-24 13:03 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
` (13 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add a rules section config base plugin to allow users to specify
different kinds of rules in a single configuration file.
The interface is designed to allow sub plugins to implement their own
{decode,encode}_value() methods and also offer a canonicalized version
of their rules with canonicalize(), i.e. with any inconsistencies
removed and ambiguities resolved. There is also a are_satisfiable()
method for anticipation of the verification of additions or changes to
the rules config via the API.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 2 +-
src/PVE/HA/Rules.pm | 118 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Tools.pm | 5 ++
4 files changed, 125 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Rules.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 0ffbd8d..9bbd375 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -32,6 +32,7 @@
/usr/share/perl5/PVE/HA/Resources.pm
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
+/usr/share/perl5/PVE/HA/Rules.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 8c91b97..489cbc0 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -1,4 +1,4 @@
-SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
+SIM_SOURCES=CRM.pm Env.pm Groups.pm Rules.pm Resources.pm LRM.pm Manager.pm \
NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
SOURCES=${SIM_SOURCES} Config.pm
diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
new file mode 100644
index 0000000..bff3375
--- /dev/null
+++ b/src/PVE/HA/Rules.pm
@@ -0,0 +1,118 @@
+package PVE::HA::Rules;
+
+use strict;
+use warnings;
+
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::SectionConfig;
+use PVE::HA::Tools;
+
+use base qw(PVE::SectionConfig);
+
+# TODO Add descriptions, completions, etc.
+my $defaultData = {
+ propertyList => {
+ type => { description => "Rule type." },
+ ruleid => get_standard_option('pve-ha-rule-id'),
+ comment => {
+ type => 'string',
+ maxLength => 4096,
+ description => "Rule description.",
+ },
+ },
+};
+
+sub private {
+ return $defaultData;
+}
+
+sub options {
+ return {
+ type => { optional => 0 },
+ ruleid => { optional => 0 },
+ comment => { optional => 1 },
+ };
+};
+
+sub decode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::decode_text($value);
+ }
+
+ my $plugin = __PACKAGE__->lookup($type);
+ return $plugin->decode_value($type, $key, $value);
+}
+
+sub encode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'comment') {
+ return PVE::Tools::encode_text($value);
+ }
+
+ my $plugin = __PACKAGE__->lookup($type);
+ return $plugin->encode_value($type, $key, $value);
+}
+
+sub parse_section_header {
+ my ($class, $line) = @_;
+
+ if ($line =~ m/^(\S+):\s*(\S+)\s*$/) {
+ my ($type, $ruleid) = (lc($1), $2);
+ my $errmsg = undef; # set if you want to skip whole section
+ eval { PVE::JSONSchema::pve_verify_configid($ruleid); };
+ $errmsg = $@ if $@;
+ my $config = {}; # to return additional attributes
+ return ($type, $ruleid, $errmsg, $config);
+ }
+ return undef;
+}
+
+sub foreach_service_rule {
+ my ($rules, $func, $opts) = @_;
+
+ my $sid = $opts->{sid};
+ my $type = $opts->{type};
+
+ my @ruleids = sort {
+ $rules->{order}->{$a} <=> $rules->{order}->{$b}
+ } keys %{$rules->{ids}};
+
+ for my $ruleid (@ruleids) {
+ my $rule = $rules->{ids}->{$ruleid};
+
+ next if !$rule; # invalid rules are kept undef in section config, delete them
+ next if $type && $rule->{type} ne $type;
+ next if $sid && !defined($rule->{services}->{$sid});
+
+ $func->($rule, $ruleid);
+ }
+}
+
+sub canonicalize {
+ my ($class, $rules, $groups, $services) = @_;
+
+ die "implement in subclass";
+}
+
+sub are_satisfiable {
+ my ($class, $rules, $groups, $services) = @_;
+
+ die "implement in subclass";
+}
+
+sub checked_config {
+ my ($rules, $groups, $services) = @_;
+
+ my $types = __PACKAGE__->lookup_types();
+
+ for my $type (@$types) {
+ my $plugin = __PACKAGE__->lookup($type);
+
+ $plugin->canonicalize($rules, $groups, $services);
+ }
+}
+
+1;
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index fc3282c..35107c9 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -92,6 +92,11 @@ PVE::JSONSchema::register_standard_option('pve-ha-group-id', {
type => 'string', format => 'pve-configid',
});
+PVE::JSONSchema::register_standard_option('pve-ha-rule-id', {
+ description => "The HA rule identifier.",
+ type => 'string', format => 'pve-configid',
+});
+
sub read_json_from_file {
my ($filename, $default) = @_;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (4 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-03 12:16 ` Fabian Grünbichler
2025-04-25 14:05 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
` (12 subsequent siblings)
18 siblings, 2 replies; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add the colocation rule plugin to allow users to specify inter-service
affinity constraints.
These colocation rules can either be positive (keeping services
together) or negative (keeping service separate). Their strictness can
also be specified as either a MUST or a SHOULD, where the first
specifies that any service the constraint cannot be applied for stays in
recovery, while the latter specifies that that any service the
constraint cannot be applied for is lifted from the constraint.
The initial implementation also implements four basic transformations,
where colocation rules with not enough services are dropped, transitive
positive colocation rules are merged, and inter-colocation rule
inconsistencies as well as colocation rule inconsistencies with respect
to the location constraints specified in HA groups are dropped.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Makefile | 1 +
src/PVE/HA/Rules/Colocation.pm | 391 +++++++++++++++++++++++++++++++++
src/PVE/HA/Rules/Makefile | 6 +
src/PVE/HA/Tools.pm | 6 +
5 files changed, 405 insertions(+)
create mode 100644 src/PVE/HA/Rules/Colocation.pm
create mode 100644 src/PVE/HA/Rules/Makefile
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 9bbd375..89f9144 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,6 +33,7 @@
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
/usr/share/perl5/PVE/HA/Rules.pm
+/usr/share/perl5/PVE/HA/Rules/Colocation.pm
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index 489cbc0..e386cbf 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -8,6 +8,7 @@ install:
install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
make -C Resources install
+ make -C Rules install
make -C Usage install
make -C Env install
diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
new file mode 100644
index 0000000..808d48e
--- /dev/null
+++ b/src/PVE/HA/Rules/Colocation.pm
@@ -0,0 +1,391 @@
+package PVE::HA::Rules::Colocation;
+
+use strict;
+use warnings;
+
+use Data::Dumper;
+
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::HA::Tools;
+
+use base qw(PVE::HA::Rules);
+
+sub type {
+ return 'colocation';
+}
+
+sub properties {
+ return {
+ services => get_standard_option('pve-ha-resource-id-list'),
+ affinity => {
+ description => "Describes whether the services are supposed to be kept on separate"
+ . " nodes, or are supposed to be kept together on the same node.",
+ type => 'string',
+ enum => ['separate', 'together'],
+ optional => 0,
+ },
+ strict => {
+ description => "Describes whether the colocation rule is mandatory or optional.",
+ type => 'boolean',
+ optional => 0,
+ },
+ }
+}
+
+sub options {
+ return {
+ services => { optional => 0 },
+ strict => { optional => 0 },
+ affinity => { optional => 0 },
+ comment => { optional => 1 },
+ };
+};
+
+sub decode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'services') {
+ my $res = {};
+
+ for my $service (PVE::Tools::split_list($value)) {
+ if (PVE::HA::Tools::pve_verify_ha_resource_id($service)) {
+ $res->{$service} = 1;
+ }
+ }
+
+ return $res;
+ }
+
+ return $value;
+}
+
+sub encode_value {
+ my ($class, $type, $key, $value) = @_;
+
+ if ($key eq 'services') {
+ PVE::HA::Tools::pve_verify_ha_resource_id($_) for (keys %$value);
+
+ return join(',', keys %$value);
+ }
+
+ return $value;
+}
+
+sub foreach_colocation_rule {
+ my ($rules, $func, $opts) = @_;
+
+ my $my_opts = { map { $_ => $opts->{$_} } keys %$opts };
+ $my_opts->{type} = 'colocation';
+
+ PVE::HA::Rules::foreach_service_rule($rules, $func, $my_opts);
+}
+
+sub split_colocation_rules {
+ my ($rules) = @_;
+
+ my $positive_ruleids = [];
+ my $negative_ruleids = [];
+
+ foreach_colocation_rule($rules, sub {
+ my ($rule, $ruleid) = @_;
+
+ my $ruleid_set = $rule->{affinity} eq 'together' ? $positive_ruleids : $negative_ruleids;
+ push @$ruleid_set, $ruleid;
+ });
+
+ return ($positive_ruleids, $negative_ruleids);
+}
+
+=head3 check_service_count($rules)
+
+Returns a list of conflicts caused by colocation rules, which do not have
+enough services in them, defined in C<$rules>.
+
+If there are no conflicts, the returned list is empty.
+
+=cut
+
+sub check_services_count {
+ my ($rules) = @_;
+
+ my $conflicts = [];
+
+ foreach_colocation_rule($rules, sub {
+ my ($rule, $ruleid) = @_;
+
+ push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}}) < 2);
+ });
+
+ return $conflicts;
+}
+
+=head3 check_positive_intransitivity($rules)
+
+Returns a list of conflicts caused by transitive positive colocation rules
+defined in C<$rules>.
+
+Transitive positive colocation rules exist, if there are at least two positive
+colocation rules with the same strictness, which put at least the same two
+services in relation. This means, that these rules can be merged together.
+
+If there are no conflicts, the returned list is empty.
+
+=cut
+
+sub check_positive_intransitivity {
+ my ($rules) = @_;
+
+ my $conflicts = {};
+ my ($positive_ruleids) = split_colocation_rules($rules);
+
+ while (my $outerid = shift(@$positive_ruleids)) {
+ my $outer = $rules->{ids}->{$outerid};
+
+ for my $innerid (@$positive_ruleids) {
+ my $inner = $rules->{ids}->{$innerid};
+
+ next if $outerid eq $innerid;
+ next if $outer->{strict} != $inner->{strict};
+ next if PVE::HA::Tools::is_disjoint($outer->{services}, $inner->{services});
+
+ push @{$conflicts->{$outerid}}, $innerid;
+ }
+ }
+
+ return $conflicts;
+}
+
+=head3 check_inner_consistency($rules)
+
+Returns a list of conflicts caused by inconsistencies between positive and
+negative colocation rules defined in C<$rules>.
+
+Inner inconsistent colocation rules exist, if there are at least the same two
+services in a positive and a negative colocation relation, which is an
+impossible constraint as they are opposites of each other.
+
+If there are no conflicts, the returned list is empty.
+
+=cut
+
+sub check_inner_consistency {
+ my ($rules) = @_;
+
+ my $conflicts = [];
+ my ($positive_ruleids, $negative_ruleids) = split_colocation_rules($rules);
+
+ for my $outerid (@$positive_ruleids) {
+ my $outer = $rules->{ids}->{$outerid}->{services};
+
+ for my $innerid (@$negative_ruleids) {
+ my $inner = $rules->{ids}->{$innerid}->{services};
+
+ my $intersection = PVE::HA::Tools::intersect($outer, $inner);
+ next if scalar(keys %$intersection < 2);
+
+ push @$conflicts, [$outerid, $innerid];
+ }
+ }
+
+ return $conflicts;
+}
+
+=head3 check_positive_group_consistency(...)
+
+Returns a list of conflicts caused by inconsistencies between positive
+colocation rules defined in C<$rules> and node restrictions defined in
+C<$groups> and C<$service>.
+
+A positive colocation rule inconsistency with groups exists, if at least two
+services in a positive colocation rule are restricted to disjoint sets of
+nodes, i.e. they are in restricted HA groups, which have a disjoint set of
+nodes.
+
+If there are no conflicts, the returned list is empty.
+
+=cut
+
+sub check_positive_group_consistency {
+ my ($rules, $groups, $services, $positive_ruleids, $conflicts) = @_;
+
+ for my $ruleid (@$positive_ruleids) {
+ my $rule_services = $rules->{ids}->{$ruleid}->{services};
+ my $nodes;
+
+ for my $sid (keys %$rule_services) {
+ my $groupid = $services->{$sid}->{group};
+ return if !$groupid;
+
+ my $group = $groups->{ids}->{$groupid};
+ return if !$group;
+ return if !$group->{restricted};
+
+ $nodes = { map { $_ => 1 } keys %{$group->{nodes}} } if !defined($nodes);
+ $nodes = PVE::HA::Tools::intersect($nodes, $group->{nodes});
+ }
+
+ if (defined($nodes) && scalar keys %$nodes < 1) {
+ push @$conflicts, ['positive', $ruleid];
+ }
+ }
+}
+
+=head3 check_negative_group_consistency(...)
+
+Returns a list of conflicts caused by inconsistencies between negative
+colocation rules defined in C<$rules> and node restrictions defined in
+C<$groups> and C<$service>.
+
+A negative colocation rule inconsistency with groups exists, if at least two
+services in a negative colocation rule are restricted to less nodes in total
+than services in the rule, i.e. they are in restricted HA groups, where the
+union of all restricted node sets have less elements than restricted services.
+
+If there are no conflicts, the returned list is empty.
+
+=cut
+
+sub check_negative_group_consistency {
+ my ($rules, $groups, $services, $negative_ruleids, $conflicts) = @_;
+
+ for my $ruleid (@$negative_ruleids) {
+ my $rule_services = $rules->{ids}->{$ruleid}->{services};
+ my $restricted_services = 0;
+ my $restricted_nodes;
+
+ for my $sid (keys %$rule_services) {
+ my $groupid = $services->{$sid}->{group};
+ return if !$groupid;
+
+ my $group = $groups->{ids}->{$groupid};
+ return if !$group;
+ return if !$group->{restricted};
+
+ $restricted_services++;
+
+ $restricted_nodes = {} if !defined($restricted_nodes);
+ $restricted_nodes = PVE::HA::Tools::union($restricted_nodes, $group->{nodes});
+ }
+
+ if (defined($restricted_nodes)
+ && scalar keys %$restricted_nodes < $restricted_services) {
+ push @$conflicts, ['negative', $ruleid];
+ }
+ }
+}
+
+sub check_consistency_with_groups {
+ my ($rules, $groups, $services) = @_;
+
+ my $conflicts = [];
+ my ($positive_ruleids, $negative_ruleids) = split_colocation_rules($rules);
+
+ check_positive_group_consistency($rules, $groups, $services, $positive_ruleids, $conflicts);
+ check_negative_group_consistency($rules, $groups, $services, $negative_ruleids, $conflicts);
+
+ return $conflicts;
+}
+
+sub canonicalize {
+ my ($class, $rules, $groups, $services) = @_;
+
+ my $illdefined_ruleids = check_services_count($rules);
+
+ for my $ruleid (@$illdefined_ruleids) {
+ print "Drop colocation rule '$ruleid', because it does not have enough services defined.\n";
+
+ delete $rules->{ids}->{$ruleid};
+ }
+
+ my $mergeable_positive_ruleids = check_positive_intransitivity($rules);
+
+ for my $outerid (sort keys %$mergeable_positive_ruleids) {
+ my $outer = $rules->{ids}->{$outerid};
+ my $innerids = $mergeable_positive_ruleids->{$outerid};
+
+ for my $innerid (@$innerids) {
+ my $inner = $rules->{ids}->{$innerid};
+
+ $outer->{services}->{$_} = 1 for (keys %{$inner->{services}});
+
+ print "Merge services of positive colocation rule '$innerid' into positive colocation"
+ . " rule '$outerid', because they share at least one service.\n";
+
+ delete $rules->{ids}->{$innerid};
+ }
+ }
+
+ my $inner_conflicts = check_inner_consistency($rules);
+
+ for my $conflict (@$inner_conflicts) {
+ my ($positiveid, $negativeid) = @$conflict;
+
+ print "Drop positive colocation rule '$positiveid' and negative colocation rule"
+ . " '$negativeid', because they share two or more services.\n";
+
+ delete $rules->{ids}->{$positiveid};
+ delete $rules->{ids}->{$negativeid};
+ }
+
+ my $group_conflicts = check_consistency_with_groups($rules, $groups, $services);
+
+ for my $conflict (@$group_conflicts) {
+ my ($type, $ruleid) = @$conflict;
+
+ if ($type eq 'positive') {
+ print "Drop positive colocation rule '$ruleid', because two or more services are"
+ . " restricted to different nodes.\n";
+ } elsif ($type eq 'negative') {
+ print "Drop negative colocation rule '$ruleid', because two or more services are"
+ . " restricted to less nodes than services.\n";
+ } else {
+ die "Invalid group conflict type $type\n";
+ }
+
+ delete $rules->{ids}->{$ruleid};
+ }
+}
+
+# TODO This will be used to verify modifications to the rules config over the API
+sub are_satisfiable {
+ my ($class, $rules, $groups, $services) = @_;
+
+ my $illdefined_ruleids = check_services_count($rules);
+
+ for my $ruleid (@$illdefined_ruleids) {
+ print "Colocation rule '$ruleid' does not have enough services defined.\n";
+ }
+
+ my $inner_conflicts = check_inner_consistency($rules);
+
+ for my $conflict (@$inner_conflicts) {
+ my ($positiveid, $negativeid) = @$conflict;
+
+ print "Positive colocation rule '$positiveid' is inconsistent with negative colocation rule"
+ . " '$negativeid', because they share two or more services between them.\n";
+ }
+
+ my $group_conflicts = check_consistency_with_groups($rules, $groups, $services);
+
+ for my $conflict (@$group_conflicts) {
+ my ($type, $ruleid) = @$conflict;
+
+ if ($type eq 'positive') {
+ print "Positive colocation rule '$ruleid' is unapplicable, because two or more services"
+ . " are restricted to different nodes.\n";
+ } elsif ($type eq 'negative') {
+ print "Negative colocation rule '$ruleid' is unapplicable, because two or more services"
+ . " are restricted to less nodes than services.\n";
+ } else {
+ die "Invalid group conflict type $type\n";
+ }
+ }
+
+ if (scalar(@$inner_conflicts) || scalar(@$group_conflicts)) {
+ return 0;
+ }
+
+ return 1;
+}
+
+1;
diff --git a/src/PVE/HA/Rules/Makefile b/src/PVE/HA/Rules/Makefile
new file mode 100644
index 0000000..8cb91ac
--- /dev/null
+++ b/src/PVE/HA/Rules/Makefile
@@ -0,0 +1,6 @@
+SOURCES=Colocation.pm
+
+.PHONY: install
+install:
+ install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Rules
+ for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Rules/$$i; done
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index 35107c9..52251d7 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -46,6 +46,12 @@ PVE::JSONSchema::register_standard_option('pve-ha-resource-id', {
type => 'string', format => 'pve-ha-resource-id',
});
+PVE::JSONSchema::register_standard_option('pve-ha-resource-id-list', {
+ description => "List of HA resource IDs.",
+ typetext => "<type>:<name>{,<type>:<name>}*",
+ type => 'string', format => 'pve-ha-resource-id-list',
+});
+
PVE::JSONSchema::register_format('pve-ha-resource-or-vm-id', \&pve_verify_ha_resource_or_vm_id);
sub pve_verify_ha_resource_or_vm_id {
my ($sid, $noerr) = @_;
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (5 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-25 14:11 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
` (11 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Adds methods to the HA environment to read and parse the rules
configuration file for the specific environment implementation.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Config.pm | 12 ++++++++++++
src/PVE/HA/Env.pm | 6 ++++++
src/PVE/HA/Env/PVE2.pm | 13 +++++++++++++
src/PVE/HA/Sim/Env.pm | 15 +++++++++++++++
src/PVE/HA/Sim/Hardware.pm | 15 +++++++++++++++
5 files changed, 61 insertions(+)
diff --git a/src/PVE/HA/Config.pm b/src/PVE/HA/Config.pm
index 129236d..99ae33a 100644
--- a/src/PVE/HA/Config.pm
+++ b/src/PVE/HA/Config.pm
@@ -7,12 +7,14 @@ use JSON;
use PVE::HA::Tools;
use PVE::HA::Groups;
+use PVE::HA::Rules;
use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
use PVE::HA::Resources;
my $manager_status_filename = "ha/manager_status";
my $ha_groups_config = "ha/groups.cfg";
my $ha_resources_config = "ha/resources.cfg";
+my $ha_rules_config = "ha/rules.cfg";
my $crm_commands_filename = "ha/crm_commands";
my $ha_fence_config = "ha/fence.cfg";
@@ -31,6 +33,11 @@ cfs_register_file(
sub { PVE::HA::Resources->parse_config(@_); },
sub { PVE::HA::Resources->write_config(@_); },
);
+cfs_register_file(
+ $ha_rules_config,
+ sub { PVE::HA::Rules->parse_config(@_); },
+ sub { PVE::HA::Rules->write_config(@_); },
+);
cfs_register_file($manager_status_filename, \&json_reader, \&json_writer);
cfs_register_file(
$ha_fence_config, \&PVE::HA::FenceConfig::parse_config, \&PVE::HA::FenceConfig::write_config);
@@ -193,6 +200,11 @@ sub parse_sid {
return wantarray ? ($sid, $type, $name) : $sid;
}
+sub read_rules_config {
+
+ return cfs_read_file($ha_rules_config);
+}
+
sub read_group_config {
return cfs_read_file($ha_groups_config);
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index bb28a75..bdcbed8 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -131,6 +131,12 @@ sub steal_service {
return $self->{plug}->steal_service($sid, $current_node, $new_node);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return $self->{plug}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 1de4b69..3157e56 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -28,6 +28,13 @@ PVE::HA::Resources::PVECT->register();
PVE::HA::Resources->init();
+use PVE::HA::Rules;
+use PVE::HA::Rules::Colocation;
+
+PVE::HA::Rules::Colocation->register();
+
+PVE::HA::Rules->init();
+
my $lockdir = "/etc/pve/priv/lock";
sub new {
@@ -188,6 +195,12 @@ sub steal_service {
$self->cluster_state_update();
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ return PVE::HA::Config::read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index b2ab231..2f73859 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -20,6 +20,13 @@ PVE::HA::Sim::Resources::VirtFail->register();
PVE::HA::Resources->init();
+use PVE::HA::Rules;
+use PVE::HA::Rules::Colocation;
+
+PVE::HA::Rules::Colocation->register();
+
+PVE::HA::Rules->init();
+
sub new {
my ($this, $nodename, $hardware, $log_id) = @_;
@@ -245,6 +252,14 @@ sub exec_fence_agent {
return $self->{hardware}->exec_fence_agent($agent, $node, @param);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ $assert_cfs_can_rw->($self);
+
+ return $self->{hardware}->read_rules_config();
+}
+
sub read_group_config {
my ($self) = @_;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 859e0a3..24bc8b9 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -28,6 +28,7 @@ my $watchdog_timeout = 60;
# $testdir/cmdlist Command list for simulation
# $testdir/hardware_status Hardware description (number of nodes, ...)
# $testdir/manager_status CRM status (start with {})
+# $testdir/rules_config Contraints / Rules configuration
# $testdir/service_config Service configuration
# $testdir/static_service_stats Static service usage information (cpu, memory)
# $testdir/groups HA groups configuration
@@ -319,6 +320,16 @@ sub read_crm_commands {
return $self->global_lock($code);
}
+sub read_rules_config {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/rules_config";
+ my $raw = '';
+ $raw = PVE::Tools::file_get_contents($filename) if -f $filename;
+
+ return PVE::HA::Rules->parse_config($filename, $raw);
+}
+
sub read_group_config {
my ($self) = @_;
@@ -391,6 +402,10 @@ sub new {
# copy initial configuartion
copy("$testdir/manager_status", "$statusdir/manager_status"); # optional
+ if (-f "$testdir/rules_config") {
+ copy("$testdir/rules_config", "$statusdir/rules_config");
+ }
+
if (-f "$testdir/groups") {
copy("$testdir/groups", "$statusdir/groups");
} else {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (6 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-25 14:30 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
` (10 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Read the rules configuration in each round and update the canonicalized
rules configuration if there were any changes since the last round to
reduce the amount of times of verifying the rule set.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
As noted inline already, there's a missing check whether the service
configuration changed, which includes the HA group assignment (and is
only needed for these), since there's no digest as for groups/rules.
I was hesitant to change the structure of `%sc` or the return value of
`read_service_config()` as it's used quite often and didn't want to
create a sha1 digest here just for this check. This is another plus
point to have all of these constraints in a single configuration file.
src/PVE/HA/Manager.pm | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index d983672..7a8e7dc 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -11,6 +11,9 @@ use PVE::HA::NodeStatus;
use PVE::HA::Usage::Basic;
use PVE::HA::Usage::Static;
+use PVE::HA::Rules;
+use PVE::HA::Rules::Colocation;
+
## Variable Name & Abbreviations Convention
#
# The HA stack has some variables it uses frequently and thus abbreviates it such that it may be
@@ -41,7 +44,12 @@ sub new {
my $class = ref($this) || $this;
- my $self = bless { haenv => $haenv, crs => {} }, $class;
+ my $self = bless {
+ haenv => $haenv,
+ crs => {},
+ last_rules_digest => '',
+ last_groups_digest => '',
+ }, $class;
my $old_ms = $haenv->read_manager_status();
@@ -497,6 +505,19 @@ sub manage {
delete $ss->{$sid};
}
+ my $new_rules = $haenv->read_rules_config();
+
+ # TODO We should also check for a service digest here, but we would've to
+ # calculate it here independently or also expose it through read_service_config()
+ if ($new_rules->{digest} ne $self->{last_rules_digest}
+ || $self->{groups}->{digest} ne $self->{last_groups_digest}) {
+ $self->{rules} = $new_rules;
+ PVE::HA::Rules::checked_config($self->{rules}, $self->{groups}, $sc);
+ }
+
+ $self->{last_rules_digest} = $self->{rules}->{digest};
+ $self->{last_groups_digest} = $self->{groups}->{digest};
+
$self->update_crm_commands();
for (;;) {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (7 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-28 13:03 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
` (9 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Factor out the prioritized node hash set in the select_service_node as
it is used multiple times and makes the intent a little clearer.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 7a8e7dc..8f2ab3d 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -175,23 +175,24 @@ sub select_service_node {
# select node from top priority node list
my $top_pri = $pri_list[0];
+ my $pri_nodes = $pri_groups->{$top_pri};
# try to avoid nodes where the service failed already if we want to relocate
if ($try_next) {
foreach my $node (@$tried_nodes) {
- delete $pri_groups->{$top_pri}->{$node};
+ delete $pri_nodes->{$node};
}
}
return $maintenance_fallback
- if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+ if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
- return $current_node if (!$try_next && !$best_scored) && $pri_groups->{$top_pri}->{$current_node};
+ return $current_node if (!$try_next && !$best_scored) && $pri_nodes->{$current_node};
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
$scores->{$a} <=> $scores->{$b} || $a cmp $b
- } keys %{$pri_groups->{$top_pri}};
+ } keys %$pri_nodes;
my $found;
for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (8 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-03 12:17 ` Fabian Grünbichler
` (2 more replies)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
` (8 subsequent siblings)
18 siblings, 3 replies; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add a mechanism to the node selection subroutine, which enforces the
colocation rules defined in the rules config.
The algorithm manipulates the set of nodes directly, which the service
is allowed to run on, depending on the type and strictness of the
colocation rules, if there are any.
This makes it depend on the prior removal of any nodes, which are
unavailable (i.e. offline, unreachable, or weren't able to start the
service in previous tries) or are not allowed to be run on otherwise
(i.e. HA group node restrictions) to function correctly.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Manager.pm | 203 ++++++++++++++++++++++++++++++++++++-
src/test/test_failover1.pl | 4 +-
2 files changed, 205 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 8f2ab3d..79b6555 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -157,8 +157,201 @@ sub get_node_priority_groups {
return ($pri_groups, $group_members);
}
+=head3 get_colocated_services($rules, $sid, $online_node_usage)
+
+Returns a hash map of all services, which are specified as being in a positive
+or negative colocation in C<$rules> with the given service with id C<$sid>.
+
+Each service entry consists of the type of colocation, strictness of colocation
+and the node the service is currently assigned to, if any, according to
+C<$online_node_usage>.
+
+For example, a service C<'vm:101'> being strictly colocated together (positive)
+with two other services C<'vm:102'> and C<'vm:103'> and loosely colocated
+separate with another service C<'vm:104'> results in the hash map:
+
+ {
+ 'vm:102' => {
+ affinity => 'together',
+ strict => 1,
+ node => 'node2'
+ },
+ 'vm:103' => {
+ affinity => 'together',
+ strict => 1,
+ node => 'node2'
+ },
+ 'vm:104' => {
+ affinity => 'separate',
+ strict => 0,
+ node => undef
+ }
+ }
+
+=cut
+
+sub get_colocated_services {
+ my ($rules, $sid, $online_node_usage) = @_;
+
+ my $services = {};
+
+ PVE::HA::Rules::Colocation::foreach_colocation_rule($rules, sub {
+ my ($rule) = @_;
+
+ for my $csid (sort keys %{$rule->{services}}) {
+ next if $csid eq $sid;
+
+ $services->{$csid} = {
+ node => $online_node_usage->get_service_node($csid),
+ affinity => $rule->{affinity},
+ strict => $rule->{strict},
+ };
+ }
+ }, {
+ sid => $sid,
+ });
+
+ return $services;
+}
+
+=head3 get_colocation_preference($rules, $sid, $online_node_usage)
+
+Returns a list of two hashes, where each is a hash map of the colocation
+preference of C<$sid>, according to the colocation rules in C<$rules> and the
+service locations in C<$online_node_usage>.
+
+The first hash is the positive colocation preference, where each element
+represents properties for how much C<$sid> prefers to be on the node.
+Currently, this is a binary C<$strict> field, which means either it should be
+there (C<0>) or must be there (C<1>).
+
+The second hash is the negative colocation preference, where each element
+represents properties for how much C<$sid> prefers not to be on the node.
+Currently, this is a binary C<$strict> field, which means either it should not
+be there (C<0>) or must not be there (C<1>).
+
+=cut
+
+sub get_colocation_preference {
+ my ($rules, $sid, $online_node_usage) = @_;
+
+ my $services = get_colocated_services($rules, $sid, $online_node_usage);
+
+ my $together = {};
+ my $separate = {};
+
+ for my $service (values %$services) {
+ my $node = $service->{node};
+
+ next if !$node;
+
+ my $node_set = $service->{affinity} eq 'together' ? $together : $separate;
+ $node_set->{$node}->{strict} = $node_set->{$node}->{strict} || $service->{strict};
+ }
+
+ return ($together, $separate);
+}
+
+=head3 apply_positive_colocation_rules($together, $allowed_nodes)
+
+Applies the positive colocation preference C<$together> on the allowed node
+hash set C<$allowed_nodes> directly.
+
+Positive colocation means keeping services together on a single node, and
+therefore minimizing the separation of services.
+
+The allowed node hash set C<$allowed_nodes> is expected to contain any node,
+which is available to the service, i.e. each node is currently online, is
+available according to other location constraints, and the service has not
+failed running there yet.
+
+=cut
+
+sub apply_positive_colocation_rules {
+ my ($together, $allowed_nodes) = @_;
+
+ return if scalar(keys %$together) < 1;
+
+ my $mandatory_nodes = {};
+ my $possible_nodes = PVE::HA::Tools::intersect($allowed_nodes, $together);
+
+ for my $node (sort keys %$together) {
+ $mandatory_nodes->{$node} = 1 if $together->{$node}->{strict};
+ }
+
+ if (scalar keys %$mandatory_nodes) {
+ # limit to only the nodes the service must be on.
+ for my $node (keys %$allowed_nodes) {
+ next if exists($mandatory_nodes->{$node});
+
+ delete $allowed_nodes->{$node};
+ }
+ } elsif (scalar keys %$possible_nodes) {
+ # limit to the possible nodes the service should be on, if there are any.
+ for my $node (keys %$allowed_nodes) {
+ next if exists($possible_nodes->{$node});
+
+ delete $allowed_nodes->{$node};
+ }
+ }
+}
+
+=head3 apply_negative_colocation_rules($separate, $allowed_nodes)
+
+Applies the negative colocation preference C<$separate> on the allowed node
+hash set C<$allowed_nodes> directly.
+
+Negative colocation means keeping services separate on multiple nodes, and
+therefore maximizing the separation of services.
+
+The allowed node hash set C<$allowed_nodes> is expected to contain any node,
+which is available to the service, i.e. each node is currently online, is
+available according to other location constraints, and the service has not
+failed running there yet.
+
+=cut
+
+sub apply_negative_colocation_rules {
+ my ($separate, $allowed_nodes) = @_;
+
+ return if scalar(keys %$separate) < 1;
+
+ my $mandatory_nodes = {};
+ my $possible_nodes = PVE::HA::Tools::set_difference($allowed_nodes, $separate);
+
+ for my $node (sort keys %$separate) {
+ $mandatory_nodes->{$node} = 1 if $separate->{$node}->{strict};
+ }
+
+ if (scalar keys %$mandatory_nodes) {
+ # limit to the nodes the service must not be on.
+ for my $node (keys %$allowed_nodes) {
+ next if !exists($mandatory_nodes->{$node});
+
+ delete $allowed_nodes->{$node};
+ }
+ } elsif (scalar keys %$possible_nodes) {
+ # limit to the nodes the service should not be on, if any.
+ for my $node (keys %$allowed_nodes) {
+ next if exists($possible_nodes->{$node});
+
+ delete $allowed_nodes->{$node};
+ }
+ }
+}
+
+sub apply_colocation_rules {
+ my ($rules, $sid, $allowed_nodes, $online_node_usage) = @_;
+
+ my ($together, $separate) = get_colocation_preference($rules, $sid, $online_node_usage);
+
+ apply_positive_colocation_rules($together, $allowed_nodes);
+ apply_negative_colocation_rules($separate, $allowed_nodes);
+}
+
sub select_service_node {
- my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
+ # TODO Cleanup this signature post-RFC
+ my ($rules, $groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
my $group = get_service_group($groups, $online_node_usage, $service_conf);
@@ -189,6 +382,8 @@ sub select_service_node {
return $current_node if (!$try_next && !$best_scored) && $pri_nodes->{$current_node};
+ apply_colocation_rules($rules, $sid, $pri_nodes, $online_node_usage);
+
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
$scores->{$a} <=> $scores->{$b} || $a cmp $b
@@ -758,6 +953,7 @@ sub next_state_request_start {
if ($self->{crs}->{rebalance_on_request_start}) {
my $selected_node = select_service_node(
+ $self->{rules},
$self->{groups},
$self->{online_node_usage},
$sid,
@@ -771,6 +967,9 @@ sub next_state_request_start {
my $select_text = $selected_node ne $current_node ? 'new' : 'current';
$haenv->log('info', "service $sid: re-balance selected $select_text node $selected_node for startup");
+ # TODO It would be better if this information would be retrieved from $ss/$sd post-RFC
+ $self->{online_node_usage}->pin_service_node($sid, $selected_node);
+
if ($selected_node ne $current_node) {
$change_service_state->($self, $sid, 'request_start_balance', node => $current_node, target => $selected_node);
return;
@@ -898,6 +1097,7 @@ sub next_state_started {
}
my $node = select_service_node(
+ $self->{rules},
$self->{groups},
$self->{online_node_usage},
$sid,
@@ -1004,6 +1204,7 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
my $recovery_node = select_service_node(
+ $self->{rules},
$self->{groups},
$self->{online_node_usage},
$sid,
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 308eab3..4c84fbd 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -8,6 +8,8 @@ use PVE::HA::Groups;
use PVE::HA::Manager;
use PVE::HA::Usage::Basic;
+my $rules = {};
+
my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
group: prefer_node1
nodes node1
@@ -31,7 +33,7 @@ sub test {
my ($expected_node, $try_next) = @_;
my $node = PVE::HA::Manager::select_service_node
- ($groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
+ ($rules, $groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
my (undef, undef, $line) = caller();
die "unexpected result: $node != ${expected_node} at line $line\n"
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (9 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-28 13:20 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
` (7 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add an option to the VirtFail's name to allow the start and migrate fail
counts to only apply on a certain node number with a specific naming
scheme.
This allows a slightly more elaborate test type, e.g. where a service
can start on one node (or any other in that case), but fails to start on
a specific node, which it is expected to start on after a migration.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
src/PVE/HA/Sim/Resources/VirtFail.pm | 37 +++++++++++++++++++---------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/src/PVE/HA/Sim/Resources/VirtFail.pm b/src/PVE/HA/Sim/Resources/VirtFail.pm
index ce88391..fddecd6 100644
--- a/src/PVE/HA/Sim/Resources/VirtFail.pm
+++ b/src/PVE/HA/Sim/Resources/VirtFail.pm
@@ -10,25 +10,36 @@ use base qw(PVE::HA::Sim::Resources);
# To make it more interesting we can encode some behavior in the VMID
# with the following format, where fa: is the type and a, b, c, ...
# are digits in base 10, i.e. the full service ID would be:
-# fa:abcde
+# fa:abcdef
# And the digits after the fa: type prefix would mean:
# - a: no meaning but can be used for differentiating similar resources
# - b: how many tries are needed to start correctly (0 is normal behavior) (should be set)
# - c: how many tries are needed to migrate correctly (0 is normal behavior) (should be set)
# - d: should shutdown be successful (0 = yes, anything else no) (optional)
# - e: return value of $plugin->exists() defaults to 1 if not set (optional)
+# - f: limits the constraints of b and c to the nodeX (0 = apply to all nodes) (optional)
my $decode_id = sub {
my $id = shift;
- my ($start, $migrate, $stop, $exists) = $id =~ /^\d(\d)(\d)(\d)?(\d)?/g;
+ my ($start, $migrate, $stop, $exists, $limit_to_node) = $id =~ /^\d(\d)(\d)(\d)?(\d)?(\d)?/g;
$start = 0 if !defined($start);
$migrate = 0 if !defined($migrate);
$stop = 0 if !defined($stop);
$exists = 1 if !defined($exists);
+ $limit_to_node = 0 if !defined($limit_to_node);
- return ($start, $migrate, $stop, $exists)
+ return ($start, $migrate, $stop, $exists, $limit_to_node);
+};
+
+my $should_retry_action = sub {
+ my ($haenv, $limit_to_node) = @_;
+
+ my ($node) = $haenv->nodename() =~ /^node(\d)/g;
+ $node = 0 if !defined($node);
+
+ return $limit_to_node == 0 || $limit_to_node == $node;
};
my $tries = {
@@ -53,12 +64,14 @@ sub exists {
sub start {
my ($class, $haenv, $id) = @_;
- my ($start_failure_count) = &$decode_id($id);
+ my ($start_failure_count, $limit_to_node) = (&$decode_id($id))[0,4];
- $tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
- $tries->{start}->{$id}++;
+ if ($should_retry_action->($haenv, $limit_to_node)) {
+ $tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
+ $tries->{start}->{$id}++;
- return if $start_failure_count >= $tries->{start}->{$id};
+ return if $start_failure_count >= $tries->{start}->{$id};
+ }
$tries->{start}->{$id} = 0; # reset counts
@@ -79,12 +92,14 @@ sub shutdown {
sub migrate {
my ($class, $haenv, $id, $target, $online) = @_;
- my (undef, $migrate_failure_count) = &$decode_id($id);
+ my ($migrate_failure_count, $limit_to_node) = (&$decode_id($id))[1,4];
- $tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
- $tries->{migrate}->{$id}++;
+ if ($should_retry_action->($haenv, $limit_to_node)) {
+ $tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
+ $tries->{migrate}->{$id}++;
- return if $migrate_failure_count >= $tries->{migrate}->{$id};
+ return if $migrate_failure_count >= $tries->{migrate}->{$id};
+ }
$tries->{migrate}->{$id} = 0; # reset counts
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (10 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-28 13:44 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
` (6 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add test cases for strict negative colocation rules, i.e. where services
must be kept on separate nodes. These verify the behavior of the
services in strict negative colocation rules in case of a failover of
the node of one or more of these services in the following scenarios:
- 2 neg. colocated services in a 3 node cluster; 1 node failing
- 3 neg. colocated services in a 5 node cluster; 1 node failing
- 3 neg. colocated services in a 5 node cluster; 2 nodes failing
- 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
recovery node cannot start the service
- Pair of 2 neg. colocated services (with one common service in both) in
a 3 node cluster; 1 node failing
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
.../test-colocation-strict-separate1/README | 13 +++
.../test-colocation-strict-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 ++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-separate2/README | 15 +++
.../test-colocation-strict-separate2/cmdlist | 4 +
.../hardware_status | 7 ++
.../log.expect | 90 ++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 10 ++
.../test-colocation-strict-separate3/README | 16 +++
.../test-colocation-strict-separate3/cmdlist | 4 +
.../hardware_status | 7 ++
.../log.expect | 110 ++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 10 ++
.../test-colocation-strict-separate4/README | 17 +++
.../test-colocation-strict-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 69 +++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-separate5/README | 11 ++
.../test-colocation-strict-separate5/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 56 +++++++++
.../manager_status | 1 +
.../rules_config | 9 ++
.../service_config | 5 +
35 files changed, 573 insertions(+)
create mode 100644 src/test/test-colocation-strict-separate1/README
create mode 100644 src/test/test-colocation-strict-separate1/cmdlist
create mode 100644 src/test/test-colocation-strict-separate1/hardware_status
create mode 100644 src/test/test-colocation-strict-separate1/log.expect
create mode 100644 src/test/test-colocation-strict-separate1/manager_status
create mode 100644 src/test/test-colocation-strict-separate1/rules_config
create mode 100644 src/test/test-colocation-strict-separate1/service_config
create mode 100644 src/test/test-colocation-strict-separate2/README
create mode 100644 src/test/test-colocation-strict-separate2/cmdlist
create mode 100644 src/test/test-colocation-strict-separate2/hardware_status
create mode 100644 src/test/test-colocation-strict-separate2/log.expect
create mode 100644 src/test/test-colocation-strict-separate2/manager_status
create mode 100644 src/test/test-colocation-strict-separate2/rules_config
create mode 100644 src/test/test-colocation-strict-separate2/service_config
create mode 100644 src/test/test-colocation-strict-separate3/README
create mode 100644 src/test/test-colocation-strict-separate3/cmdlist
create mode 100644 src/test/test-colocation-strict-separate3/hardware_status
create mode 100644 src/test/test-colocation-strict-separate3/log.expect
create mode 100644 src/test/test-colocation-strict-separate3/manager_status
create mode 100644 src/test/test-colocation-strict-separate3/rules_config
create mode 100644 src/test/test-colocation-strict-separate3/service_config
create mode 100644 src/test/test-colocation-strict-separate4/README
create mode 100644 src/test/test-colocation-strict-separate4/cmdlist
create mode 100644 src/test/test-colocation-strict-separate4/hardware_status
create mode 100644 src/test/test-colocation-strict-separate4/log.expect
create mode 100644 src/test/test-colocation-strict-separate4/manager_status
create mode 100644 src/test/test-colocation-strict-separate4/rules_config
create mode 100644 src/test/test-colocation-strict-separate4/service_config
create mode 100644 src/test/test-colocation-strict-separate5/README
create mode 100644 src/test/test-colocation-strict-separate5/cmdlist
create mode 100644 src/test/test-colocation-strict-separate5/hardware_status
create mode 100644 src/test/test-colocation-strict-separate5/log.expect
create mode 100644 src/test/test-colocation-strict-separate5/manager_status
create mode 100644 src/test/test-colocation-strict-separate5/rules_config
create mode 100644 src/test/test-colocation-strict-separate5/service_config
diff --git a/src/test/test-colocation-strict-separate1/README b/src/test/test-colocation-strict-separate1/README
new file mode 100644
index 0000000..5a03d99
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/README
@@ -0,0 +1,13 @@
+Test whether a strict negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other in case of a
+failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 must be kept separate
+- vm:101 and vm:102 are currently running on node2 and node3 respectively
+- node1 has a higher service count than node2 to test the colocation rule is
+ applied even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, vm:102 is migrated to node1; even though the utilization of
+ node1 is high already, the services must be kept separate
diff --git a/src/test/test-colocation-strict-separate1/cmdlist b/src/test/test-colocation-strict-separate1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate1/hardware_status b/src/test/test-colocation-strict-separate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate1/log.expect b/src/test/test-colocation-strict-separate1/log.expect
new file mode 100644
index 0000000..475db39
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/log.expect
@@ -0,0 +1,60 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate1/manager_status b/src/test/test-colocation-strict-separate1/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-strict-separate1/rules_config b/src/test/test-colocation-strict-separate1/rules_config
new file mode 100644
index 0000000..21c5608
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102
+ affinity separate
+ strict 1
diff --git a/src/test/test-colocation-strict-separate1/service_config b/src/test/test-colocation-strict-separate1/service_config
new file mode 100644
index 0000000..6582e8c
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate2/README b/src/test/test-colocation-strict-separate2/README
new file mode 100644
index 0000000..f494d2b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/README
@@ -0,0 +1,15 @@
+Test whether a strict negative colocation rule among three services makes one
+of the services migrate to a different node than the other services in case of
+a failover of the service's previously assigned node.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept separate
+- vm:101, vm:102, and vm:103 are on node3, node4, and node5 respectively
+- node1 and node2 have each both higher service counts than node3, node4 and
+ node5 to test the rule is applied even though the scheduler would prefer the
+ less utilizied nodes node3, node4, or node5
+
+Therefore, the expected outcome is:
+- As node5 fails, vm:103 is migrated to node2; even though the utilization of
+ node2 is high already, the services must be kept separate; node2 is chosen
+ since node1 has one more service running on it
diff --git a/src/test/test-colocation-strict-separate2/cmdlist b/src/test/test-colocation-strict-separate2/cmdlist
new file mode 100644
index 0000000..89d09c9
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+ [ "network node5 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate2/hardware_status b/src/test/test-colocation-strict-separate2/hardware_status
new file mode 100644
index 0000000..7b8e961
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/hardware_status
@@ -0,0 +1,7 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" },
+ "node4": { "power": "off", "network": "off" },
+ "node5": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate2/log.expect b/src/test/test-colocation-strict-separate2/log.expect
new file mode 100644
index 0000000..858d3c9
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/log.expect
@@ -0,0 +1,90 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node4 on
+info 20 node4/crm: status change startup => wait_for_quorum
+info 20 node4/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node5 on
+info 20 node5/crm: status change startup => wait_for_quorum
+info 20 node5/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node4'
+info 20 node1/crm: adding new service 'vm:103' on node 'node5'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node1'
+info 20 node1/crm: adding new service 'vm:107' on node 'node2'
+info 20 node1/crm: adding new service 'vm:108' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node4)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node5)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 21 node1/lrm: starting service vm:106
+info 21 node1/lrm: service status vm:106 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:107
+info 23 node2/lrm: service status vm:107 started
+info 23 node2/lrm: starting service vm:108
+info 23 node2/lrm: service status vm:108 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 26 node4/crm: status change wait_for_quorum => slave
+info 27 node4/lrm: got lock 'ha_agent_node4_lock'
+info 27 node4/lrm: status change wait_for_agent_lock => active
+info 27 node4/lrm: starting service vm:102
+info 27 node4/lrm: service status vm:102 started
+info 28 node5/crm: status change wait_for_quorum => slave
+info 29 node5/lrm: got lock 'ha_agent_node5_lock'
+info 29 node5/lrm: status change wait_for_agent_lock => active
+info 29 node5/lrm: starting service vm:103
+info 29 node5/lrm: service status vm:103 started
+info 120 cmdlist: execute network node5 off
+info 120 node1/crm: node 'node5': state changed from 'online' => 'unknown'
+info 128 node5/crm: status change slave => wait_for_quorum
+info 129 node5/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node5'
+info 170 watchdog: execute power node5 off
+info 169 node5/crm: killed by poweroff
+info 170 node5/lrm: killed by poweroff
+info 170 hardware: server 'node5' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node5_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate2/manager_status b/src/test/test-colocation-strict-separate2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate2/rules_config b/src/test/test-colocation-strict-separate2/rules_config
new file mode 100644
index 0000000..4167bab
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102,vm:103
+ affinity separate
+ strict 1
diff --git a/src/test/test-colocation-strict-separate2/service_config b/src/test/test-colocation-strict-separate2/service_config
new file mode 100644
index 0000000..2c27816
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/service_config
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node4", "state": "started" },
+ "vm:103": { "node": "node5", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node1", "state": "started" },
+ "vm:107": { "node": "node2", "state": "started" },
+ "vm:108": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate3/README b/src/test/test-colocation-strict-separate3/README
new file mode 100644
index 0000000..44d88ef
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/README
@@ -0,0 +1,16 @@
+Test whether a strict negative colocation rule among three services makes two
+of the services migrate to two different recovery nodes than the node of the
+third service in case of a failover of their two previously assigned nodes.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept separate
+- vm:101, vm:102, and vm:103 are respectively on node3, node4, and node5
+- node1 and node2 have both higher service counts than node3, node4 and node5
+ to test the colocation rule is enforced even though the utilization would
+ prefer the other node3, node4, and node5
+
+Therefore, the expected outcome is:
+- As node4 and node5 fails, vm:102 and vm:103 are migrated to node2 and node1
+ respectively; even though the utilization of node1 and node2 are high
+ already, the services must be kept separate; node2 is chosen first since
+ node1 has one more service running on it
diff --git a/src/test/test-colocation-strict-separate3/cmdlist b/src/test/test-colocation-strict-separate3/cmdlist
new file mode 100644
index 0000000..1934596
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+ [ "network node4 off", "network node5 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate3/hardware_status b/src/test/test-colocation-strict-separate3/hardware_status
new file mode 100644
index 0000000..7b8e961
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/hardware_status
@@ -0,0 +1,7 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" },
+ "node4": { "power": "off", "network": "off" },
+ "node5": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate3/log.expect b/src/test/test-colocation-strict-separate3/log.expect
new file mode 100644
index 0000000..4acdcec
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/log.expect
@@ -0,0 +1,110 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node4 on
+info 20 node4/crm: status change startup => wait_for_quorum
+info 20 node4/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node5 on
+info 20 node5/crm: status change startup => wait_for_quorum
+info 20 node5/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node4'
+info 20 node1/crm: adding new service 'vm:103' on node 'node5'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node1'
+info 20 node1/crm: adding new service 'vm:107' on node 'node2'
+info 20 node1/crm: adding new service 'vm:108' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node4)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node5)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 21 node1/lrm: starting service vm:106
+info 21 node1/lrm: service status vm:106 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:107
+info 23 node2/lrm: service status vm:107 started
+info 23 node2/lrm: starting service vm:108
+info 23 node2/lrm: service status vm:108 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 26 node4/crm: status change wait_for_quorum => slave
+info 27 node4/lrm: got lock 'ha_agent_node4_lock'
+info 27 node4/lrm: status change wait_for_agent_lock => active
+info 27 node4/lrm: starting service vm:102
+info 27 node4/lrm: service status vm:102 started
+info 28 node5/crm: status change wait_for_quorum => slave
+info 29 node5/lrm: got lock 'ha_agent_node5_lock'
+info 29 node5/lrm: status change wait_for_agent_lock => active
+info 29 node5/lrm: starting service vm:103
+info 29 node5/lrm: service status vm:103 started
+info 120 cmdlist: execute network node4 off
+info 120 cmdlist: execute network node5 off
+info 120 node1/crm: node 'node4': state changed from 'online' => 'unknown'
+info 120 node1/crm: node 'node5': state changed from 'online' => 'unknown'
+info 126 node4/crm: status change slave => wait_for_quorum
+info 127 node4/lrm: status change active => lost_agent_lock
+info 128 node5/crm: status change slave => wait_for_quorum
+info 129 node5/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node4': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node4'
+info 160 node1/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node5'
+info 168 watchdog: execute power node4 off
+info 167 node4/crm: killed by poweroff
+info 168 node4/lrm: killed by poweroff
+info 168 hardware: server 'node4' stopped by poweroff (watchdog)
+info 170 watchdog: execute power node5 off
+info 169 node5/crm: killed by poweroff
+info 170 node5/lrm: killed by poweroff
+info 170 hardware: server 'node5' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node4_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node4'
+info 240 node1/crm: node 'node4': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node4'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: got lock 'ha_agent_node5_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node4' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node1'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:103
+info 241 node1/lrm: service status vm:103 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate3/manager_status b/src/test/test-colocation-strict-separate3/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate3/rules_config b/src/test/test-colocation-strict-separate3/rules_config
new file mode 100644
index 0000000..4167bab
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+ services vm:101,vm:102,vm:103
+ affinity separate
+ strict 1
diff --git a/src/test/test-colocation-strict-separate3/service_config b/src/test/test-colocation-strict-separate3/service_config
new file mode 100644
index 0000000..2c27816
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/service_config
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node4", "state": "started" },
+ "vm:103": { "node": "node5", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node1", "state": "started" },
+ "vm:107": { "node": "node2", "state": "started" },
+ "vm:108": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate4/README b/src/test/test-colocation-strict-separate4/README
new file mode 100644
index 0000000..31f127d
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/README
@@ -0,0 +1,17 @@
+Test whether a strict negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other service in
+case of a failover of service's previously assigned node. As the service fails
+to start on the recovery node (e.g. insufficient resources), the failing
+service is kept on the recovery node.
+
+The test scenario is:
+- vm:101 and fa:120001 must be kept separate
+- vm:101 and fa:120001 are on node2 and node3 respectively
+- fa:120001 will fail to start on node1
+- node1 has a higher service count than node2 to test the colocation rule is
+ applied even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, fa:120001 is migrated to node1
+- fa:120001 will stay in recovery, since it cannot be started on node1, but
+ cannot be relocated to another one either due to the strict colocation rule
diff --git a/src/test/test-colocation-strict-separate4/cmdlist b/src/test/test-colocation-strict-separate4/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate4/hardware_status b/src/test/test-colocation-strict-separate4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate4/log.expect b/src/test/test-colocation-strict-separate4/log.expect
new file mode 100644
index 0000000..f772ea8
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/log.expect
@@ -0,0 +1,69 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120001' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'fa:120001': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120001
+info 25 node3/lrm: service status fa:120001 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120001': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120001': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120001' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'fa:120001': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service fa:120001
+warn 241 node1/lrm: unable to start service fa:120001
+warn 241 node1/lrm: restart policy: retry number 1 for service 'fa:120001'
+info 261 node1/lrm: starting service fa:120001
+warn 261 node1/lrm: unable to start service fa:120001
+err 261 node1/lrm: unable to start service fa:120001 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120001 on node 'node1' failed, relocating service.
+warn 280 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:120001', retry start on current node. Tried nodes: node1
+info 281 node1/lrm: starting service fa:120001
+info 281 node1/lrm: service status fa:120001 started
+info 300 node1/crm: relocation policy successful for 'fa:120001' on node 'node1', failed nodes: node1
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate4/manager_status b/src/test/test-colocation-strict-separate4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate4/rules_config b/src/test/test-colocation-strict-separate4/rules_config
new file mode 100644
index 0000000..3db0056
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+ services vm:101,fa:120001
+ affinity separate
+ strict 1
diff --git a/src/test/test-colocation-strict-separate4/service_config b/src/test/test-colocation-strict-separate4/service_config
new file mode 100644
index 0000000..f53c2bc
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "fa:120001": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate5/README b/src/test/test-colocation-strict-separate5/README
new file mode 100644
index 0000000..4cdcbf5
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/README
@@ -0,0 +1,11 @@
+Test whether two pair-wise strict negative colocation rules, i.e. where one
+service is in two separate non-colocation relationship with two other services,
+makes one of the outer services migrate to the same node as the other outer
+service in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102, and vm:101 and vm:103 must each be kept separate
+- vm:101, vm:102, and vm:103 are respectively on node1, node2, and node3
+
+Therefore, the expected outcome is:
+- As node3 fails, vm:103 is migrated to node2 - the same as vm:102
diff --git a/src/test/test-colocation-strict-separate5/cmdlist b/src/test/test-colocation-strict-separate5/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate5/hardware_status b/src/test/test-colocation-strict-separate5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate5/log.expect b/src/test/test-colocation-strict-separate5/log.expect
new file mode 100644
index 0000000..16156ad
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/log.expect
@@ -0,0 +1,56 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate5/manager_status b/src/test/test-colocation-strict-separate5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate5/rules_config b/src/test/test-colocation-strict-separate5/rules_config
new file mode 100644
index 0000000..f72fc66
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/rules_config
@@ -0,0 +1,9 @@
+colocation: lonely-must-some-vms-be1
+ services vm:101,vm:102
+ affinity separate
+ strict 1
+
+colocation: lonely-must-some-vms-be2
+ services vm:101,vm:103
+ affinity separate
+ strict 1
diff --git a/src/test/test-colocation-strict-separate5/service_config b/src/test/test-colocation-strict-separate5/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive colocation rules
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (11 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-28 13:51 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
` (5 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add test cases for strict positive colocation rules, i.e. where services
must be kept on the same node together. These verify the behavior of the
services in strict positive colocation rules in case of a failover of
their assigned nodes in the following scenarios:
- 2 pos. colocated services in a 3 node cluster; 1 node failing
- 3 pos. colocated services in a 3 node cluster; 1 node failing
- 3 pos. colocated services in a 3 node cluster; 1 node failing, but the
recovery node cannot start one of the services
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
The first test case can probably be dropped, since the second test case
shows the exact same behavior, just with a third service added.
.../test-colocation-strict-together1/README | 11 +++
.../test-colocation-strict-together1/cmdlist | 4 +
.../hardware_status | 5 ++
.../log.expect | 66 ++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 ++
.../test-colocation-strict-together2/README | 11 +++
.../test-colocation-strict-together2/cmdlist | 4 +
.../hardware_status | 5 ++
.../log.expect | 80 +++++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 ++
.../test-colocation-strict-together3/README | 17 ++++
.../test-colocation-strict-together3/cmdlist | 4 +
.../hardware_status | 5 ++
.../log.expect | 89 +++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 ++
21 files changed, 338 insertions(+)
create mode 100644 src/test/test-colocation-strict-together1/README
create mode 100644 src/test/test-colocation-strict-together1/cmdlist
create mode 100644 src/test/test-colocation-strict-together1/hardware_status
create mode 100644 src/test/test-colocation-strict-together1/log.expect
create mode 100644 src/test/test-colocation-strict-together1/manager_status
create mode 100644 src/test/test-colocation-strict-together1/rules_config
create mode 100644 src/test/test-colocation-strict-together1/service_config
create mode 100644 src/test/test-colocation-strict-together2/README
create mode 100644 src/test/test-colocation-strict-together2/cmdlist
create mode 100644 src/test/test-colocation-strict-together2/hardware_status
create mode 100644 src/test/test-colocation-strict-together2/log.expect
create mode 100644 src/test/test-colocation-strict-together2/manager_status
create mode 100644 src/test/test-colocation-strict-together2/rules_config
create mode 100644 src/test/test-colocation-strict-together2/service_config
create mode 100644 src/test/test-colocation-strict-together3/README
create mode 100644 src/test/test-colocation-strict-together3/cmdlist
create mode 100644 src/test/test-colocation-strict-together3/hardware_status
create mode 100644 src/test/test-colocation-strict-together3/log.expect
create mode 100644 src/test/test-colocation-strict-together3/manager_status
create mode 100644 src/test/test-colocation-strict-together3/rules_config
create mode 100644 src/test/test-colocation-strict-together3/service_config
diff --git a/src/test/test-colocation-strict-together1/README b/src/test/test-colocation-strict-together1/README
new file mode 100644
index 0000000..ab8a7d5
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/README
@@ -0,0 +1,11 @@
+Test whether a strict positive colocation rule makes two services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 must be kept together
+- vm:101 and vm:102 are both currently running on node3
+- node1 and node2 have the same service count to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, both services are migrated to node1
diff --git a/src/test/test-colocation-strict-together1/cmdlist b/src/test/test-colocation-strict-together1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-together1/hardware_status b/src/test/test-colocation-strict-together1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together1/log.expect b/src/test/test-colocation-strict-together1/log.expect
new file mode 100644
index 0000000..7d43314
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node1)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:101
+info 241 node1/lrm: service status vm:101 started
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together1/manager_status b/src/test/test-colocation-strict-together1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-together1/rules_config b/src/test/test-colocation-strict-together1/rules_config
new file mode 100644
index 0000000..e6bd30b
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-must-stick-together
+ services vm:101,vm:102
+ affinity together
+ strict 1
diff --git a/src/test/test-colocation-strict-together1/service_config b/src/test/test-colocation-strict-together1/service_config
new file mode 100644
index 0000000..9fb091d
--- /dev/null
+++ b/src/test/test-colocation-strict-together1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-together2/README b/src/test/test-colocation-strict-together2/README
new file mode 100644
index 0000000..c1abf68
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/README
@@ -0,0 +1,11 @@
+Test whether a strict positive colocation rule makes three services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept together
+- vm:101, vm:102, and vm:103 are all currently running on node3
+- node1 has a higher service count than node2 to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, all services are migrated to node2
diff --git a/src/test/test-colocation-strict-together2/cmdlist b/src/test/test-colocation-strict-together2/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-together2/hardware_status b/src/test/test-colocation-strict-together2/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together2/log.expect b/src/test/test-colocation-strict-together2/log.expect
new file mode 100644
index 0000000..78f4d66
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/log.expect
@@ -0,0 +1,80 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:106
+info 23 node2/lrm: service status vm:106 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together2/manager_status b/src/test/test-colocation-strict-together2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-together2/rules_config b/src/test/test-colocation-strict-together2/rules_config
new file mode 100644
index 0000000..904dc1f
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-must-stick-together
+ services vm:101,vm:102,vm:103
+ affinity together
+ strict 1
diff --git a/src/test/test-colocation-strict-together2/service_config b/src/test/test-colocation-strict-together2/service_config
new file mode 100644
index 0000000..fd4a87e
--- /dev/null
+++ b/src/test/test-colocation-strict-together2/service_config
@@ -0,0 +1,8 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-together3/README b/src/test/test-colocation-strict-together3/README
new file mode 100644
index 0000000..5332696
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/README
@@ -0,0 +1,17 @@
+Test whether a strict positive colocation rule makes three services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+If one of those fail to start on the recovery node (e.g. insufficient
+resources), the failing service will be kept on the recovery node.
+
+The test scenario is:
+- vm:101, vm:102, and fa:120002 must be kept together
+- vm:101, vm:102, and fa:120002 are all currently running on node3
+- fa:120002 will fail to start on node2
+- node1 has a higher service count than node2 to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, all services are migrated to node2
+- Two of those services will start successfully, but fa:120002 will stay in
+ recovery, since it cannot be started on this node, but cannot be relocated to
+ another one either due to the strict colocation rule
diff --git a/src/test/test-colocation-strict-together3/cmdlist b/src/test/test-colocation-strict-together3/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-together3/hardware_status b/src/test/test-colocation-strict-together3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-together3/log.expect b/src/test/test-colocation-strict-together3/log.expect
new file mode 100644
index 0000000..4a54cb3
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/log.expect
@@ -0,0 +1,89 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120002' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node2'
+info 20 node1/crm: service 'fa:120002': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:106
+info 23 node2/lrm: service status vm:106 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120002
+info 25 node3/lrm: service status fa:120002 started
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120002': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120002': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120002' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'fa:120002': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service fa:120002
+warn 243 node2/lrm: unable to start service fa:120002
+warn 243 node2/lrm: restart policy: retry number 1 for service 'fa:120002'
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 263 node2/lrm: starting service fa:120002
+warn 263 node2/lrm: unable to start service fa:120002
+err 263 node2/lrm: unable to start service fa:120002 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
+warn 280 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:120002', retry start on current node. Tried nodes: node2
+info 283 node2/lrm: starting service fa:120002
+info 283 node2/lrm: service status fa:120002 started
+info 300 node1/crm: relocation policy successful for 'fa:120002' on node 'node2', failed nodes: node2
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-together3/manager_status b/src/test/test-colocation-strict-together3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-strict-together3/rules_config b/src/test/test-colocation-strict-together3/rules_config
new file mode 100644
index 0000000..5feafb5
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-must-stick-together
+ services vm:101,vm:102,fa:120002
+ affinity together
+ strict 1
diff --git a/src/test/test-colocation-strict-together3/service_config b/src/test/test-colocation-strict-together3/service_config
new file mode 100644
index 0000000..3ce5f27
--- /dev/null
+++ b/src/test/test-colocation-strict-together3/service_config
@@ -0,0 +1,8 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "fa:120002": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node2", "state": "started" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose colocation rules
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (12 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-28 14:44 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
` (4 subsequent siblings)
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add test cases for loose positive and negative colocation rules, i.e.
where services should be kept on the same node together or kept separate
nodes. These are copies of their strict counterpart tests, but verify
the behavior if the colocation rule cannot be met, i.e. not adhering to
the colocation rule. The test scenarios are:
- 2 neg. colocated services in a 3 node cluster; 1 node failing
- 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
recovery node cannot start the service
- 2 pos. colocated services in a 3 node cluster; 1 node failing
- 3 pos. colocated services in a 3 node cluster; 1 node failing, but the
recovery node cannot start one of the services
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
.../test-colocation-loose-separate1/README | 13 +++
.../test-colocation-loose-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 ++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 ++
.../test-colocation-loose-separate4/README | 17 ++++
.../test-colocation-loose-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 73 +++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 ++
.../test-colocation-loose-together1/README | 11 +++
.../test-colocation-loose-together1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 66 +++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 ++
.../test-colocation-loose-together3/README | 16 ++++
.../test-colocation-loose-together3/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 93 +++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 ++
28 files changed, 431 insertions(+)
create mode 100644 src/test/test-colocation-loose-separate1/README
create mode 100644 src/test/test-colocation-loose-separate1/cmdlist
create mode 100644 src/test/test-colocation-loose-separate1/hardware_status
create mode 100644 src/test/test-colocation-loose-separate1/log.expect
create mode 100644 src/test/test-colocation-loose-separate1/manager_status
create mode 100644 src/test/test-colocation-loose-separate1/rules_config
create mode 100644 src/test/test-colocation-loose-separate1/service_config
create mode 100644 src/test/test-colocation-loose-separate4/README
create mode 100644 src/test/test-colocation-loose-separate4/cmdlist
create mode 100644 src/test/test-colocation-loose-separate4/hardware_status
create mode 100644 src/test/test-colocation-loose-separate4/log.expect
create mode 100644 src/test/test-colocation-loose-separate4/manager_status
create mode 100644 src/test/test-colocation-loose-separate4/rules_config
create mode 100644 src/test/test-colocation-loose-separate4/service_config
create mode 100644 src/test/test-colocation-loose-together1/README
create mode 100644 src/test/test-colocation-loose-together1/cmdlist
create mode 100644 src/test/test-colocation-loose-together1/hardware_status
create mode 100644 src/test/test-colocation-loose-together1/log.expect
create mode 100644 src/test/test-colocation-loose-together1/manager_status
create mode 100644 src/test/test-colocation-loose-together1/rules_config
create mode 100644 src/test/test-colocation-loose-together1/service_config
create mode 100644 src/test/test-colocation-loose-together3/README
create mode 100644 src/test/test-colocation-loose-together3/cmdlist
create mode 100644 src/test/test-colocation-loose-together3/hardware_status
create mode 100644 src/test/test-colocation-loose-together3/log.expect
create mode 100644 src/test/test-colocation-loose-together3/manager_status
create mode 100644 src/test/test-colocation-loose-together3/rules_config
create mode 100644 src/test/test-colocation-loose-together3/service_config
diff --git a/src/test/test-colocation-loose-separate1/README b/src/test/test-colocation-loose-separate1/README
new file mode 100644
index 0000000..ac7c395
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/README
@@ -0,0 +1,13 @@
+Test whether a loose negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other in case of a
+failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 should be kept separate
+- vm:101 and vm:102 are currently running on node2 and node3 respectively
+- node1 has a higher service count than node2 to test that the rule is applied
+ even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, vm:102 is migrated to node1; even though the utilization of
+ node1 is high already, the services must be kept separate
diff --git a/src/test/test-colocation-loose-separate1/cmdlist b/src/test/test-colocation-loose-separate1/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-separate1/hardware_status b/src/test/test-colocation-loose-separate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-separate1/log.expect b/src/test/test-colocation-loose-separate1/log.expect
new file mode 100644
index 0000000..475db39
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/log.expect
@@ -0,0 +1,60 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-separate1/manager_status b/src/test/test-colocation-loose-separate1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-separate1/rules_config b/src/test/test-colocation-loose-separate1/rules_config
new file mode 100644
index 0000000..5227309
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-should-vms-be
+ services vm:101,vm:102
+ affinity separate
+ strict 0
diff --git a/src/test/test-colocation-loose-separate1/service_config b/src/test/test-colocation-loose-separate1/service_config
new file mode 100644
index 0000000..6582e8c
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-separate4/README b/src/test/test-colocation-loose-separate4/README
new file mode 100644
index 0000000..5b68cde
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/README
@@ -0,0 +1,17 @@
+Test whether a loose negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other service in
+case of a failover of service's previously assigned node. As the service fails
+to start on the recovery node (e.g. insufficient resources), the failing
+service is kept on the recovery node.
+
+The test scenario is:
+- vm:101 and fa:120001 should be kept separate
+- vm:101 and fa:120001 are on node2 and node3 respectively
+- fa:120001 will fail to start on node1
+- node1 has a higher service count than node2 to test the colocation rule is
+ applied even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, fa:120001 is migrated to node1
+- fa:120001 will be relocated to another node, since it couldn't start on its
+ initial recovery node
diff --git a/src/test/test-colocation-loose-separate4/cmdlist b/src/test/test-colocation-loose-separate4/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-separate4/hardware_status b/src/test/test-colocation-loose-separate4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-separate4/log.expect b/src/test/test-colocation-loose-separate4/log.expect
new file mode 100644
index 0000000..bf70aca
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/log.expect
@@ -0,0 +1,73 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120001' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'fa:120001': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:101
+info 23 node2/lrm: service status vm:101 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120001
+info 25 node3/lrm: service status fa:120001 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120001': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120001': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120001' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'fa:120001': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service fa:120001
+warn 241 node1/lrm: unable to start service fa:120001
+warn 241 node1/lrm: restart policy: retry number 1 for service 'fa:120001'
+info 261 node1/lrm: starting service fa:120001
+warn 261 node1/lrm: unable to start service fa:120001
+err 261 node1/lrm: unable to start service fa:120001 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120001 on node 'node1' failed, relocating service.
+info 280 node1/crm: relocate service 'fa:120001' to node 'node2'
+info 280 node1/crm: service 'fa:120001': state changed from 'started' to 'relocate' (node = node1, target = node2)
+info 281 node1/lrm: service fa:120001 - start relocate to node 'node2'
+info 281 node1/lrm: service fa:120001 - end relocate to node 'node2'
+info 300 node1/crm: service 'fa:120001': state changed from 'relocate' to 'started' (node = node2)
+info 303 node2/lrm: starting service fa:120001
+info 303 node2/lrm: service status fa:120001 started
+info 320 node1/crm: relocation policy successful for 'fa:120001' on node 'node2', failed nodes: node1
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-separate4/manager_status b/src/test/test-colocation-loose-separate4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-separate4/rules_config b/src/test/test-colocation-loose-separate4/rules_config
new file mode 100644
index 0000000..8a4b869
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-should-vms-be
+ services vm:101,fa:120001
+ affinity separate
+ strict 0
diff --git a/src/test/test-colocation-loose-separate4/service_config b/src/test/test-colocation-loose-separate4/service_config
new file mode 100644
index 0000000..f53c2bc
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "fa:120001": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-together1/README b/src/test/test-colocation-loose-together1/README
new file mode 100644
index 0000000..2f5aeec
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/README
@@ -0,0 +1,11 @@
+Test whether a loose positive colocation rule makes two services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 should be kept together
+- vm:101 and vm:102 are both currently running on node3
+- node1 and node2 have the same service count to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, both services are migrated to node2
diff --git a/src/test/test-colocation-loose-together1/cmdlist b/src/test/test-colocation-loose-together1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-together1/hardware_status b/src/test/test-colocation-loose-together1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-together1/log.expect b/src/test/test-colocation-loose-together1/log.expect
new file mode 100644
index 0000000..7d43314
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/log.expect
@@ -0,0 +1,66 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node1)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:101
+info 241 node1/lrm: service status vm:101 started
+info 241 node1/lrm: starting service vm:102
+info 241 node1/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-together1/manager_status b/src/test/test-colocation-loose-together1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-together1/rules_config b/src/test/test-colocation-loose-together1/rules_config
new file mode 100644
index 0000000..37f6aab
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-might-stick-together
+ services vm:101,vm:102
+ affinity together
+ strict 0
diff --git a/src/test/test-colocation-loose-together1/service_config b/src/test/test-colocation-loose-together1/service_config
new file mode 100644
index 0000000..9fb091d
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-together3/README b/src/test/test-colocation-loose-together3/README
new file mode 100644
index 0000000..c2aebcf
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/README
@@ -0,0 +1,16 @@
+Test whether a loose positive colocation rule makes three services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+If one of those fail to start on the recovery node (e.g. insufficient
+resources), the failed service will be relocated to another node.
+
+The test scenario is:
+- vm:101, vm:102, and fa:120002 should be kept together
+- vm:101, vm:102, and fa:120002 are all currently running on node3
+- fa:120002 will fail to start on node2
+- node1 has a higher service count than node2 to test that the rule is applied
+ even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, all services are migrated to node2
+- Two of those services will start successfully, but fa:120002 will be
+ relocated to another node, since it couldn't start on the same recovery node
diff --git a/src/test/test-colocation-loose-together3/cmdlist b/src/test/test-colocation-loose-together3/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-together3/hardware_status b/src/test/test-colocation-loose-together3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-together3/log.expect b/src/test/test-colocation-loose-together3/log.expect
new file mode 100644
index 0000000..6ca8053
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/log.expect
@@ -0,0 +1,93 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'fa:120002' on node 'node3'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node2'
+info 20 node1/crm: service 'fa:120002': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:106
+info 23 node2/lrm: service status vm:106 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service fa:120002
+info 25 node3/lrm: service status fa:120002 started
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
+info 25 node3/lrm: starting service vm:102
+info 25 node3/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'fa:120002': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'fa:120002': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'fa:120002' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'fa:120002': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:101': state changed from 'recovery' to 'started' (node = node2)
+info 240 node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service fa:120002
+warn 243 node2/lrm: unable to start service fa:120002
+warn 243 node2/lrm: restart policy: retry number 1 for service 'fa:120002'
+info 243 node2/lrm: starting service vm:101
+info 243 node2/lrm: service status vm:101 started
+info 243 node2/lrm: starting service vm:102
+info 243 node2/lrm: service status vm:102 started
+info 263 node2/lrm: starting service fa:120002
+warn 263 node2/lrm: unable to start service fa:120002
+err 263 node2/lrm: unable to start service fa:120002 on local node after 1 retries
+warn 280 node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
+info 280 node1/crm: relocate service 'fa:120002' to node 'node1'
+info 280 node1/crm: service 'fa:120002': state changed from 'started' to 'relocate' (node = node2, target = node1)
+info 283 node2/lrm: service fa:120002 - start relocate to node 'node1'
+info 283 node2/lrm: service fa:120002 - end relocate to node 'node1'
+info 300 node1/crm: service 'fa:120002': state changed from 'relocate' to 'started' (node = node1)
+info 301 node1/lrm: starting service fa:120002
+info 301 node1/lrm: service status fa:120002 started
+info 320 node1/crm: relocation policy successful for 'fa:120002' on node 'node1', failed nodes: node2
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-together3/manager_status b/src/test/test-colocation-loose-together3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-loose-together3/rules_config b/src/test/test-colocation-loose-together3/rules_config
new file mode 100644
index 0000000..b43c087
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-might-stick-together
+ services vm:101,vm:102,fa:120002
+ affinity together
+ strict 0
diff --git a/src/test/test-colocation-loose-together3/service_config b/src/test/test-colocation-loose-together3/service_config
new file mode 100644
index 0000000..3ce5f27
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/service_config
@@ -0,0 +1,8 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "fa:120002": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" },
+ "vm:106": { "node": "node2", "state": "started" }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (13 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-04-29 8:54 ` Fiona Ebner
2025-04-29 9:01 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
` (3 subsequent siblings)
18 siblings, 2 replies; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add test cases, where colocation rules are used with the static
utilization scheduler and the rebalance on start option enabled. These
verify the behavior in the following scenarios:
- 7 services with intertwined colocation rules in a 3 node cluster;
1 node failing
- 3 neg. colocated services in a 3 node cluster, where the rules are
stated in a intransitive form; 1 node failing
- 5 neg. colocated services in a 5 node cluster, where the rules are
stated in a intransitive form; 2 nodes failing
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
.../test-crs-static-rebalance-coloc1/README | 26 +++
.../test-crs-static-rebalance-coloc1/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 120 ++++++++++++++
.../manager_status | 1 +
.../rules_config | 24 +++
.../service_config | 10 ++
.../static_service_stats | 10 ++
.../test-crs-static-rebalance-coloc2/README | 16 ++
.../test-crs-static-rebalance-coloc2/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 86 ++++++++++
.../manager_status | 1 +
.../rules_config | 14 ++
.../service_config | 5 +
.../static_service_stats | 5 +
.../test-crs-static-rebalance-coloc3/README | 14 ++
.../test-crs-static-rebalance-coloc3/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 7 +
.../log.expect | 156 ++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 49 ++++++
.../service_config | 7 +
.../static_service_stats | 5 +
27 files changed, 597 insertions(+)
create mode 100644 src/test/test-crs-static-rebalance-coloc1/README
create mode 100644 src/test/test-crs-static-rebalance-coloc1/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc1/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc1/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc2/README
create mode 100644 src/test/test-crs-static-rebalance-coloc2/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc2/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc2/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc3/README
create mode 100644 src/test/test-crs-static-rebalance-coloc3/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc3/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc3/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/static_service_stats
diff --git a/src/test/test-crs-static-rebalance-coloc1/README b/src/test/test-crs-static-rebalance-coloc1/README
new file mode 100644
index 0000000..c709f45
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/README
@@ -0,0 +1,26 @@
+Test whether a mixed set of strict colocation rules in conjunction with the
+static load scheduler with auto-rebalancing are applied correctly on service
+start enabled and in case of a subsequent failover.
+
+The test scenario is:
+- vm:101 and vm:102 are non-colocated services
+- Services that must be kept together:
+ - vm:102, and vm:107
+ - vm:104, vm:106, and vm:108
+- Services that must be kept separate:
+ - vm:103, vm:104, and vm:105
+ - vm:103, vm:106, and vm:107
+ - vm:107, and vm:108
+- Therefore, there are consistent interdependencies between the positive and
+ negative colocation rules' service members
+- vm:101 and vm:102 are currently assigned to node1 and node2 respectively
+- vm:103 through vm:108 are currently assigned to node3
+
+Therefore, the expected outcome is:
+- vm:101, vm:102, vm:103 should be started on node1, node2, and node3
+ respectively, as there's nothing running on there yet
+- vm:104, vm:106, and vm:108 should all be assigned on the same node, which
+ will be node1, since it has the most resources left for vm:104
+- vm:105 and vm:107 should both be assigned on the same node, which will be
+ node2, since both cannot be assigned to the other nodes because of the
+ colocation constraints
diff --git a/src/test/test-crs-static-rebalance-coloc1/cmdlist b/src/test/test-crs-static-rebalance-coloc1/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-static-rebalance-coloc1/datacenter.cfg b/src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
new file mode 100644
index 0000000..f2671a5
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-rebalance-on-start": 1
+ }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc1/hardware_status b/src/test/test-crs-static-rebalance-coloc1/hardware_status
new file mode 100644
index 0000000..84484af
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc1/log.expect b/src/test/test-crs-static-rebalance-coloc1/log.expect
new file mode 100644
index 0000000..cdd2497
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/log.expect
@@ -0,0 +1,120 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node3'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: adding new service 'vm:108' on node 'node3'
+info 20 node1/crm: service vm:101: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:102: re-balance selected current node node2 for startup
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service vm:103: re-balance selected current node node3 for startup
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service vm:104: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 20 node1/crm: service vm:105: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2)
+info 20 node1/crm: service vm:106: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 20 node1/crm: service vm:107: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2)
+info 20 node1/crm: service vm:108: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:108': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:102
+info 23 node2/lrm: service status vm:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 25 node3/lrm: service vm:104 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:104 - end relocate to node 'node1'
+info 25 node3/lrm: service vm:105 - start relocate to node 'node2'
+info 25 node3/lrm: service vm:105 - end relocate to node 'node2'
+info 25 node3/lrm: service vm:106 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:106 - end relocate to node 'node1'
+info 25 node3/lrm: service vm:107 - start relocate to node 'node2'
+info 25 node3/lrm: service vm:107 - end relocate to node 'node2'
+info 25 node3/lrm: service vm:108 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:108 - end relocate to node 'node1'
+info 40 node1/crm: service 'vm:104': state changed from 'request_start_balance' to 'started' (node = node1)
+info 40 node1/crm: service 'vm:105': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:106': state changed from 'request_start_balance' to 'started' (node = node1)
+info 40 node1/crm: service 'vm:107': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:108': state changed from 'request_start_balance' to 'started' (node = node1)
+info 41 node1/lrm: starting service vm:104
+info 41 node1/lrm: service status vm:104 started
+info 41 node1/lrm: starting service vm:106
+info 41 node1/lrm: service status vm:106 started
+info 41 node1/lrm: starting service vm:108
+info 41 node1/lrm: service status vm:108 started
+info 43 node2/lrm: starting service vm:105
+info 43 node2/lrm: service status vm:105 started
+info 43 node2/lrm: starting service vm:107
+info 43 node2/lrm: service status vm:107 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+err 240 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 260 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 280 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 300 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 320 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 340 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 360 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 380 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 400 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 420 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 440 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 460 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 480 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 500 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 520 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 540 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 560 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 580 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 600 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 620 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 640 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 660 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 680 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 700 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-rebalance-coloc1/manager_status b/src/test/test-crs-static-rebalance-coloc1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-rebalance-coloc1/rules_config b/src/test/test-crs-static-rebalance-coloc1/rules_config
new file mode 100644
index 0000000..129778f
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/rules_config
@@ -0,0 +1,24 @@
+colocation: vms-must-stick-together1
+ services vm:102,vm:107
+ affinity together
+ strict 1
+
+colocation: vms-must-stick-together2
+ services vm:104,vm:106,vm:108
+ affinity together
+ strict 1
+
+colocation: vms-must-stay-apart1
+ services vm:103,vm:104,vm:105
+ affinity separate
+ strict 1
+
+colocation: vms-must-stay-apart2
+ services vm:103,vm:106,vm:107
+ affinity separate
+ strict 1
+
+colocation: vms-must-stay-apart3
+ services vm:107,vm:108
+ affinity separate
+ strict 1
diff --git a/src/test/test-crs-static-rebalance-coloc1/service_config b/src/test/test-crs-static-rebalance-coloc1/service_config
new file mode 100644
index 0000000..02e4a07
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/service_config
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node2", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node3", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" },
+ "vm:108": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc1/static_service_stats b/src/test/test-crs-static-rebalance-coloc1/static_service_stats
new file mode 100644
index 0000000..c6472ca
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc1/static_service_stats
@@ -0,0 +1,10 @@
+{
+ "vm:101": { "maxcpu": 8, "maxmem": 16000000000 },
+ "vm:102": { "maxcpu": 4, "maxmem": 24000000000 },
+ "vm:103": { "maxcpu": 2, "maxmem": 32000000000 },
+ "vm:104": { "maxcpu": 4, "maxmem": 48000000000 },
+ "vm:105": { "maxcpu": 8, "maxmem": 16000000000 },
+ "vm:106": { "maxcpu": 4, "maxmem": 32000000000 },
+ "vm:107": { "maxcpu": 2, "maxmem": 64000000000 },
+ "vm:108": { "maxcpu": 8, "maxmem": 48000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/README b/src/test/test-crs-static-rebalance-coloc2/README
new file mode 100644
index 0000000..1b788f8
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/README
@@ -0,0 +1,16 @@
+Test whether a set of transitive strict negative colocation rules, i.e. there's
+negative colocation relations a->b, b->c and a->c, in conjunction with the
+static load scheduler with auto-rebalancing are applied correctly on service
+start and in case of a subsequent failover.
+
+The test scenario is:
+- vm:101 and vm:102 must be kept separate
+- vm:102 and vm:103 must be kept separate
+- vm:101 and vm:103 must be kept separate
+- Therefore, vm:101, vm:102, and vm:103 must be kept separate
+
+Therefore, the expected outcome is:
+- vm:101, vm:102, and vm:103 should be started on node1, node2, and node3
+ respectively, just as if the three negative colocation rule would've been
+ stated in a single negative colocation rule
+- As node3 fails, vm:103 cannot be recovered
diff --git a/src/test/test-crs-static-rebalance-coloc2/cmdlist b/src/test/test-crs-static-rebalance-coloc2/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-static-rebalance-coloc2/datacenter.cfg b/src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
new file mode 100644
index 0000000..f2671a5
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-rebalance-on-start": 1
+ }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/hardware_status b/src/test/test-crs-static-rebalance-coloc2/hardware_status
new file mode 100644
index 0000000..84484af
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/log.expect b/src/test/test-crs-static-rebalance-coloc2/log.expect
new file mode 100644
index 0000000..c59f286
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/log.expect
@@ -0,0 +1,86 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: service vm:101: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:102: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node2)
+info 20 node1/crm: service vm:103: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: service vm:102 - start relocate to node 'node2'
+info 21 node1/lrm: service vm:102 - end relocate to node 'node2'
+info 21 node1/lrm: service vm:103 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:103 - end relocate to node 'node3'
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:103': state changed from 'request_start_balance' to 'started' (node = node3)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:102
+info 43 node2/lrm: service status vm:102 started
+info 45 node3/lrm: got lock 'ha_agent_node3_lock'
+info 45 node3/lrm: status change wait_for_agent_lock => active
+info 45 node3/lrm: starting service vm:103
+info 45 node3/lrm: service status vm:103 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+err 240 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 260 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 280 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 300 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 320 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 340 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 360 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 380 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 400 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 420 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 440 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 460 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 480 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 500 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 520 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 540 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 560 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 580 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 600 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 620 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 640 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 660 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 680 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+err 700 node1/crm: recovering service 'vm:103' from fenced node 'node3' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-rebalance-coloc2/manager_status b/src/test/test-crs-static-rebalance-coloc2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-rebalance-coloc2/rules_config b/src/test/test-crs-static-rebalance-coloc2/rules_config
new file mode 100644
index 0000000..1545064
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/rules_config
@@ -0,0 +1,14 @@
+colocation: very-lonely-services1
+ services vm:101,vm:102
+ affinity separate
+ strict 1
+
+colocation: very-lonely-services2
+ services vm:102,vm:103
+ affinity separate
+ strict 1
+
+colocation: very-lonely-services3
+ services vm:101,vm:103
+ affinity separate
+ strict 1
diff --git a/src/test/test-crs-static-rebalance-coloc2/service_config b/src/test/test-crs-static-rebalance-coloc2/service_config
new file mode 100644
index 0000000..57e3579
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc2/static_service_stats b/src/test/test-crs-static-rebalance-coloc2/static_service_stats
new file mode 100644
index 0000000..d9dc9e7
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc2/static_service_stats
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "maxcpu": 8, "maxmem": 16000000000 },
+ "vm:102": { "maxcpu": 4, "maxmem": 24000000000 },
+ "vm:103": { "maxcpu": 2, "maxmem": 32000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/README b/src/test/test-crs-static-rebalance-coloc3/README
new file mode 100644
index 0000000..e54a2d4
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/README
@@ -0,0 +1,14 @@
+Test whether a more complex set of transitive strict negative colocation rules,
+i.e. there's negative colocation relations a->b, b->c and a->c, in conjunction
+with the static load scheduler with auto-rebalancing are applied correctly on
+service start and in case of a subsequent failover.
+
+The test scenario is:
+- Essentially, all 10 strict negative colocation rules say that, vm:101,
+ vm:102, vm:103, vm:104, and vm:105 must be kept together
+
+Therefore, the expected outcome is:
+- vm:101, vm:102, and vm:103 should be started on node1, node2, node3, node4,
+ and node5 respectively, just as if the 10 negative colocation rule would've
+ been stated in a single negative colocation rule
+- As node1 and node5 fails, vm:101 and vm:105 cannot be recovered
diff --git a/src/test/test-crs-static-rebalance-coloc3/cmdlist b/src/test/test-crs-static-rebalance-coloc3/cmdlist
new file mode 100644
index 0000000..a3d806d
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+ [ "network node1 off", "network node5 off" ]
+]
diff --git a/src/test/test-crs-static-rebalance-coloc3/datacenter.cfg b/src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
new file mode 100644
index 0000000..f2671a5
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-rebalance-on-start": 1
+ }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/hardware_status b/src/test/test-crs-static-rebalance-coloc3/hardware_status
new file mode 100644
index 0000000..511afb9
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/hardware_status
@@ -0,0 +1,7 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node4": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 },
+ "node5": { "power": "off", "network": "off", "cpus": 8, "memory": 112000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/log.expect b/src/test/test-crs-static-rebalance-coloc3/log.expect
new file mode 100644
index 0000000..ed36dbe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/log.expect
@@ -0,0 +1,156 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node4 on
+info 20 node4/crm: status change startup => wait_for_quorum
+info 20 node4/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node5 on
+info 20 node5/crm: status change startup => wait_for_quorum
+info 20 node5/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: service vm:101: re-balance selected current node node1 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service vm:102: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node2)
+info 20 node1/crm: service vm:103: re-balance selected new node node3 for startup
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node3)
+info 20 node1/crm: service vm:104: re-balance selected new node node4 for startup
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node4)
+info 20 node1/crm: service vm:105: re-balance selected new node node5 for startup
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'request_start_balance' (node = node1, target = node5)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: service vm:102 - start relocate to node 'node2'
+info 21 node1/lrm: service vm:102 - end relocate to node 'node2'
+info 21 node1/lrm: service vm:103 - start relocate to node 'node3'
+info 21 node1/lrm: service vm:103 - end relocate to node 'node3'
+info 21 node1/lrm: service vm:104 - start relocate to node 'node4'
+info 21 node1/lrm: service vm:104 - end relocate to node 'node4'
+info 21 node1/lrm: service vm:105 - start relocate to node 'node5'
+info 21 node1/lrm: service vm:105 - end relocate to node 'node5'
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 26 node4/crm: status change wait_for_quorum => slave
+info 28 node5/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:103': state changed from 'request_start_balance' to 'started' (node = node3)
+info 40 node1/crm: service 'vm:104': state changed from 'request_start_balance' to 'started' (node = node4)
+info 40 node1/crm: service 'vm:105': state changed from 'request_start_balance' to 'started' (node = node5)
+info 43 node2/lrm: got lock 'ha_agent_node2_lock'
+info 43 node2/lrm: status change wait_for_agent_lock => active
+info 43 node2/lrm: starting service vm:102
+info 43 node2/lrm: service status vm:102 started
+info 45 node3/lrm: got lock 'ha_agent_node3_lock'
+info 45 node3/lrm: status change wait_for_agent_lock => active
+info 45 node3/lrm: starting service vm:103
+info 45 node3/lrm: service status vm:103 started
+info 47 node4/lrm: got lock 'ha_agent_node4_lock'
+info 47 node4/lrm: status change wait_for_agent_lock => active
+info 47 node4/lrm: starting service vm:104
+info 47 node4/lrm: service status vm:104 started
+info 49 node5/lrm: got lock 'ha_agent_node5_lock'
+info 49 node5/lrm: status change wait_for_agent_lock => active
+info 49 node5/lrm: starting service vm:105
+info 49 node5/lrm: service status vm:105 started
+info 120 cmdlist: execute network node1 off
+info 120 cmdlist: execute network node5 off
+info 120 node1/crm: status change master => lost_manager_lock
+info 120 node1/crm: status change lost_manager_lock => wait_for_quorum
+info 121 node1/lrm: status change active => lost_agent_lock
+info 128 node5/crm: status change slave => wait_for_quorum
+info 129 node5/lrm: status change active => lost_agent_lock
+info 162 watchdog: execute power node1 off
+info 161 node1/crm: killed by poweroff
+info 162 node1/lrm: killed by poweroff
+info 162 hardware: server 'node1' stopped by poweroff (watchdog)
+info 170 watchdog: execute power node5 off
+info 169 node5/crm: killed by poweroff
+info 170 node5/lrm: killed by poweroff
+info 170 hardware: server 'node5' stopped by poweroff (watchdog)
+info 222 node3/crm: got lock 'ha_manager_lock'
+info 222 node3/crm: status change slave => master
+info 222 node3/crm: using scheduler mode 'static'
+info 222 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 222 node3/crm: node 'node5': state changed from 'online' => 'unknown'
+info 282 node3/crm: service 'vm:101': state changed from 'started' to 'fence'
+info 282 node3/crm: service 'vm:105': state changed from 'started' to 'fence'
+info 282 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 282 node3/crm: FENCE: Try to fence node 'node1'
+info 282 node3/crm: got lock 'ha_agent_node1_lock'
+info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info 282 node3/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai 282 node3/crm: FENCE: Try to fence node 'node5'
+info 282 node3/crm: got lock 'ha_agent_node5_lock'
+info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node5'
+info 282 node3/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info 282 node3/crm: service 'vm:105': state changed from 'fence' to 'recovery'
+err 282 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 282 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 302 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 302 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 322 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 322 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 342 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 342 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 362 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 362 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 382 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 382 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 402 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 402 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 422 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 422 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 442 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 442 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 462 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 462 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 482 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 482 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 502 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 502 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 522 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 522 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 542 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 542 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 562 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 562 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 582 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 582 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 602 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 602 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 622 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 622 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 642 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 642 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 662 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 662 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 682 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 682 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+err 702 node3/crm: recovering service 'vm:101' from fenced node 'node1' failed, no recovery node found
+err 702 node3/crm: recovering service 'vm:105' from fenced node 'node5' failed, no recovery node found
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-rebalance-coloc3/manager_status b/src/test/test-crs-static-rebalance-coloc3/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-rebalance-coloc3/rules_config b/src/test/test-crs-static-rebalance-coloc3/rules_config
new file mode 100644
index 0000000..6047eff
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/rules_config
@@ -0,0 +1,49 @@
+colocation: very-lonely-service1
+ services vm:101,vm:102
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service2
+ services vm:102,vm:103
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service3
+ services vm:103,vm:104
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service4
+ services vm:104,vm:105
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service5
+ services vm:101,vm:103
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service6
+ services vm:101,vm:104
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service7
+ services vm:101,vm:105
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service8
+ services vm:102,vm:104
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service9
+ services vm:102,vm:105
+ affinity separate
+ strict 1
+
+colocation: very-lonely-service10
+ services vm:103,vm:105
+ affinity separate
+ strict 1
diff --git a/src/test/test-crs-static-rebalance-coloc3/service_config b/src/test/test-crs-static-rebalance-coloc3/service_config
new file mode 100644
index 0000000..a1d61f5
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/service_config
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" },
+ "vm:105": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-rebalance-coloc3/static_service_stats b/src/test/test-crs-static-rebalance-coloc3/static_service_stats
new file mode 100644
index 0000000..d9dc9e7
--- /dev/null
+++ b/src/test/test-crs-static-rebalance-coloc3/static_service_stats
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "maxcpu": 8, "maxmem": 16000000000 },
+ "vm:102": { "maxcpu": 4, "maxmem": 24000000000 },
+ "vm:103": { "maxcpu": 2, "maxmem": 32000000000 }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (14 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
@ 2025-03-25 15:12 ` Daniel Kral
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (2 subsequent siblings)
18 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 15:12 UTC (permalink / raw)
To: pve-devel
Add test cases to verify the correct transformation of various types of
ill-defined colocation rules:
- Merging multiple, transitive positive colocation rules of the same
strictness level
- Dropping colocation rules with not enough defined services
- Dropping colocation rules which have inner conflicts
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
These aren't exhaustive yet since there's no tests in conjunction with
HA groups yet.
.gitignore | 1 +
src/test/Makefile | 4 +-
.../connected-positive-colocations.cfg | 34 ++++++
.../connected-positive-colocations.cfg.expect | 54 ++++++++++
.../rules_cfgs/illdefined-colocations.cfg | 9 ++
.../illdefined-colocations.cfg.expect | 12 +++
.../inner-inconsistent-colocations.cfg | 14 +++
.../inner-inconsistent-colocations.cfg.expect | 13 +++
src/test/test_rules_config.pl | 100 ++++++++++++++++++
9 files changed, 240 insertions(+), 1 deletion(-)
create mode 100644 src/test/rules_cfgs/connected-positive-colocations.cfg
create mode 100644 src/test/rules_cfgs/connected-positive-colocations.cfg.expect
create mode 100644 src/test/rules_cfgs/illdefined-colocations.cfg
create mode 100644 src/test/rules_cfgs/illdefined-colocations.cfg.expect
create mode 100644 src/test/rules_cfgs/inner-inconsistent-colocations.cfg
create mode 100644 src/test/rules_cfgs/inner-inconsistent-colocations.cfg.expect
create mode 100755 src/test/test_rules_config.pl
diff --git a/.gitignore b/.gitignore
index c35280e..35de63f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,3 +6,4 @@
/src/test/test-*/status/*
/src/test/fence_cfgs/*.cfg.commands
/src/test/fence_cfgs/*.cfg.write
+/src/test/rules_cfgs/*.cfg.output
diff --git a/src/test/Makefile b/src/test/Makefile
index e54959f..6da9e10 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -5,6 +5,7 @@ all:
test:
@echo "-- start regression tests --"
./test_failover1.pl
+ ./test_rules_config.pl
./ha-tester.pl
./test_fence_config.pl
@echo "-- end regression tests (success) --"
@@ -12,4 +13,5 @@ test:
.PHONY: clean
clean:
rm -rf *~ test-*/log test-*/*~ test-*/status \
- fence_cfgs/*.cfg.commands fence_cfgs/*.write
+ fence_cfgs/*.cfg.commands fence_cfgs/*.write \
+ rules_cfgs/*.cfg.output
diff --git a/src/test/rules_cfgs/connected-positive-colocations.cfg b/src/test/rules_cfgs/connected-positive-colocations.cfg
new file mode 100644
index 0000000..8cd6e0c
--- /dev/null
+++ b/src/test/rules_cfgs/connected-positive-colocations.cfg
@@ -0,0 +1,34 @@
+colocation: positive1
+ services vm:101,vm:106,vm:108
+ affinity together
+ strict 0
+
+colocation: positive2
+ services vm:106,vm:109
+ affinity together
+ strict 0
+
+colocation: positive3
+ services vm:107,vm:105
+ affinity together
+ strict 0
+
+colocation: positive4
+ services vm:101,vm:102,vm:103
+ affinity together
+ strict 0
+
+colocation: positive5
+ services vm:101,vm:104
+ affinity together
+ strict 1
+
+colocation: positive6
+ services vm:105,vm:110
+ affinity together
+ strict 0
+
+colocation: positive7
+ services vm:108,vm:104,vm:109
+ affinity together
+ strict 1
diff --git a/src/test/rules_cfgs/connected-positive-colocations.cfg.expect b/src/test/rules_cfgs/connected-positive-colocations.cfg.expect
new file mode 100644
index 0000000..f20a87e
--- /dev/null
+++ b/src/test/rules_cfgs/connected-positive-colocations.cfg.expect
@@ -0,0 +1,54 @@
+--- Log ---
+Merge services of positive colocation rule 'positive2' into positive colocation rule 'positive1', because they share at least one service.
+Merge services of positive colocation rule 'positive4' into positive colocation rule 'positive1', because they share at least one service.
+Merge services of positive colocation rule 'positive6' into positive colocation rule 'positive3', because they share at least one service.
+Merge services of positive colocation rule 'positive7' into positive colocation rule 'positive5', because they share at least one service.
+--- Config ---
+$VAR1 = {
+ 'digest' => '7781c41891832c7f955d835edcdc38fa6b673bea',
+ 'ids' => {
+ 'positive1' => {
+ 'affinity' => 'together',
+ 'services' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ 'vm:106' => 1,
+ 'vm:108' => 1,
+ 'vm:109' => 1
+ },
+ 'strict' => 0,
+ 'type' => 'colocation'
+ },
+ 'positive3' => {
+ 'affinity' => 'together',
+ 'services' => {
+ 'vm:105' => 1,
+ 'vm:107' => 1,
+ 'vm:110' => 1
+ },
+ 'strict' => 0,
+ 'type' => 'colocation'
+ },
+ 'positive5' => {
+ 'affinity' => 'together',
+ 'services' => {
+ 'vm:101' => 1,
+ 'vm:104' => 1,
+ 'vm:108' => 1,
+ 'vm:109' => 1
+ },
+ 'strict' => 1,
+ 'type' => 'colocation'
+ }
+ },
+ 'order' => {
+ 'positive1' => 1,
+ 'positive2' => 2,
+ 'positive3' => 3,
+ 'positive4' => 4,
+ 'positive5' => 5,
+ 'positive6' => 6,
+ 'positive7' => 7
+ }
+ };
diff --git a/src/test/rules_cfgs/illdefined-colocations.cfg b/src/test/rules_cfgs/illdefined-colocations.cfg
new file mode 100644
index 0000000..2a4bf9c
--- /dev/null
+++ b/src/test/rules_cfgs/illdefined-colocations.cfg
@@ -0,0 +1,9 @@
+colocation: lonely-service1
+ services vm:101
+ affinity together
+ strict 1
+
+colocation: lonely-service2
+ services vm:101
+ affinity separate
+ strict 1
diff --git a/src/test/rules_cfgs/illdefined-colocations.cfg.expect b/src/test/rules_cfgs/illdefined-colocations.cfg.expect
new file mode 100644
index 0000000..68ce44a
--- /dev/null
+++ b/src/test/rules_cfgs/illdefined-colocations.cfg.expect
@@ -0,0 +1,12 @@
+--- Log ---
+Drop colocation rule 'lonely-service1', because it does not have enough services defined.
+Drop colocation rule 'lonely-service2', because it does not have enough services defined.
+--- Config ---
+$VAR1 = {
+ 'digest' => 'd174e745359cbc8c2e0f950ab5a4d202ffaf15e2',
+ 'ids' => {},
+ 'order' => {
+ 'lonely-service1' => 1,
+ 'lonely-service2' => 2
+ }
+ };
diff --git a/src/test/rules_cfgs/inner-inconsistent-colocations.cfg b/src/test/rules_cfgs/inner-inconsistent-colocations.cfg
new file mode 100644
index 0000000..70ae352
--- /dev/null
+++ b/src/test/rules_cfgs/inner-inconsistent-colocations.cfg
@@ -0,0 +1,14 @@
+colocation: keep-apart1
+ services vm:102,vm:103
+ affinity separate
+ strict 1
+
+colocation: keep-apart2
+ services vm:102,vm:104,vm:106
+ affinity separate
+ strict 1
+
+colocation: stick-together1
+ services vm:101,vm:102,vm:103,vm:104,vm:106
+ affinity together
+ strict 1
diff --git a/src/test/rules_cfgs/inner-inconsistent-colocations.cfg.expect b/src/test/rules_cfgs/inner-inconsistent-colocations.cfg.expect
new file mode 100644
index 0000000..ea5b96b
--- /dev/null
+++ b/src/test/rules_cfgs/inner-inconsistent-colocations.cfg.expect
@@ -0,0 +1,13 @@
+--- Log ---
+Drop positive colocation rule 'stick-together1' and negative colocation rule 'keep-apart1', because they share two or more services.
+Drop positive colocation rule 'stick-together1' and negative colocation rule 'keep-apart2', because they share two or more services.
+--- Config ---
+$VAR1 = {
+ 'digest' => '1e6a049065bec399e5982d24eb348465eec8520b',
+ 'ids' => {},
+ 'order' => {
+ 'keep-apart1' => 1,
+ 'keep-apart2' => 2,
+ 'stick-together1' => 3
+ }
+ };
diff --git a/src/test/test_rules_config.pl b/src/test/test_rules_config.pl
new file mode 100755
index 0000000..0eb55c3
--- /dev/null
+++ b/src/test/test_rules_config.pl
@@ -0,0 +1,100 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+use lib qw(..);
+
+use Test::More;
+use Test::MockModule;
+
+use Data::Dumper;
+
+use PVE::HA::Rules;
+use PVE::HA::Rules::Colocation;
+
+PVE::HA::Rules::Colocation->register();
+
+PVE::HA::Rules->init();
+
+my $opt_nodiff;
+
+if (!GetOptions ("nodiff" => \$opt_nodiff)) {
+ print "usage: $0 [test.cfg] [--nodiff]\n";
+ exit -1;
+}
+
+sub _log {
+ my ($fh, $source, $message) = @_;
+
+ chomp $message;
+ $message = "[$source] $message" if $source;
+
+ print "$message\n";
+
+ $fh->print("$message\n");
+ $fh->flush();
+};
+
+sub check_cfg {
+ my ($cfg_fn, $outfile) = @_;
+
+ my $raw = PVE::Tools::file_get_contents($cfg_fn);
+
+ open(my $LOG, '>', "$outfile");
+ select($LOG);
+ $| = 1;
+
+ print "--- Log ---\n";
+ my $cfg = PVE::HA::Rules->parse_config($cfg_fn, $raw);
+ PVE::HA::Rules::checked_config($cfg, {}, {});
+ print "--- Config ---\n";
+ {
+ local $Data::Dumper::Sortkeys = 1;
+ print Dumper($cfg);
+ }
+
+ select(STDOUT);
+}
+
+sub run_test {
+ my ($cfg_fn) = @_;
+
+ print "* check: $cfg_fn\n";
+
+ my $outfile = "$cfg_fn.output";
+ my $expect = "$cfg_fn.expect";
+
+ eval {
+ check_cfg($cfg_fn, $outfile);
+ };
+ if (my $err = $@) {
+ die "Test '$cfg_fn' failed:\n$err\n";
+ }
+
+ return if $opt_nodiff;
+
+ my $res;
+
+ if (-f $expect) {
+ my $cmd = ['diff', '-u', $expect, $outfile];
+ $res = system(@$cmd);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ } else {
+ $res = system('cp', $outfile, $expect);
+ die "test '$cfg_fn' failed\n" if $res != 0;
+ }
+
+ print "* end rules test: $cfg_fn (success)\n\n";
+}
+
+# exec tests
+
+if (my $testcfg = shift) {
+ run_test($testcfg);
+} else {
+ for my $cfg (<rules_cfgs/*cfg>) {
+ run_test($cfg);
+ }
+}
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (15 preceding siblings ...)
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
@ 2025-03-25 16:47 ` Daniel Kral
2025-04-24 10:12 ` Fiona Ebner
2025-04-01 1:50 ` DERUMIER, Alexandre
2025-04-24 10:12 ` Fiona Ebner
18 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-03-25 16:47 UTC (permalink / raw)
To: pve-devel
On 3/25/25 16:12, Daniel Kral wrote:
> Colocation Rules
> ----------------
>
> The two properties of colocation rules, as described in the
> introduction, are rather straightforward. A typical colocation rule
> inside of the config would look like the following:
>
> colocation: some-lonely-services
> services vm:101,vm:103,ct:909
> affinity separate
> strict 1
>
> This means that the three services vm:101, vm:103 and ct:909 must be
> kept separate on different nodes. I'm very keen on naming suggestions
> since I think there could be a better word than 'affinity' here. I
> played around with 'keep-services', since then it would always read
> something like 'keep-services separate', which is very declarative, but
> this might suggest that this is a binary option to too much users (I
> mean it is, but not with the values 0 and 1).
Just to document this, I've played around with using a score to decide
whether the colocation rule is positive/negative, how strict and to
allow specifying a value on how much it is desired to meet the
colocation rule in case of an optional colocation rule, much like
pacemaker's version.
But in the end, I ditched the idea, since it didn't integrate well and
it was also not trivial to find a good scale for this weight value that
would correspond similarly as the node priority in HA groups, for
example, especially when we select for each service individually.
On 3/25/25 16:12, Daniel Kral wrote:
> [0] https://clusterlabs.org/projects/pacemaker/doc/3.0/Pacemaker_Explained/html/constraints.html#colocation-properties
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=5260
> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=5332
> [3] https://lore.proxmox.com/pve-devel/c8fa7b8c-fb37-5389-1302-2002780d4ee2@proxmox.com/
I forgot to update the footnotes here when sending this. The first
footnote was to the initial inspiration of a score-based colocation
rule, but as already said this was dropped.
So the references for the two quotes from our Bugzilla [0] and [1] map
to the foot note [1] and [2] here respectively.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* [pve-devel] applied: [PATCH ha-manager 01/15] ignore output of fence config tests in tree
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
@ 2025-03-25 17:49 ` Thomas Lamprecht
0 siblings, 0 replies; 67+ messages in thread
From: Thomas Lamprecht @ 2025-03-25 17:49 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> .gitignore | 2 ++
> 1 file changed, 2 insertions(+)
>
>
applied this one already, thanks!
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
@ 2025-03-25 17:53 ` Thomas Lamprecht
2025-04-03 12:16 ` Fabian Grünbichler
0 siblings, 1 reply; 67+ messages in thread
From: Thomas Lamprecht @ 2025-03-25 17:53 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Implement helper subroutines, which implement basic set operations done
> on hash sets, i.e. hashes with elements set to a true value, e.g. 1.
>
> These will be used for various tasks in the HA Manager colocation rules,
> e.g. for verifying the satisfiability of the rules or applying the
> colocation rules on the allowed set of nodes.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> If they're useful somewhere else, I can move them to PVE::Tools
> post-RFC, but it'd be probably useful to prefix them with `hash_` there.
meh, not a big fan of growing the overly generic PVE::Tools more, if, this
should go into a dedicated module for hash/data structure helpers ...
> AFAICS there weren't any other helpers for this with a quick grep over
> all projects and `PVE::Tools::array_intersect()` wasn't what I needed.
... which those existing one should then also move into, but out of scope
of this series.
>
> src/PVE/HA/Tools.pm | 42 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
> index 0f9e9a5..fc3282c 100644
> --- a/src/PVE/HA/Tools.pm
> +++ b/src/PVE/HA/Tools.pm
> @@ -115,6 +115,48 @@ sub write_json_to_file {
> PVE::Tools::file_set_contents($filename, $raw);
> }
>
> +sub is_disjoint {
IMO a bit too generic name for being in a Tools named module, maybe
prefix them all with hash_ or hashes_ ?
> + my ($hash1, $hash2) = @_;
> +
> + for my $key (keys %$hash1) {
> + return 0 if exists($hash2->{$key});
> + }
> +
> + return 1;
> +};
> +
> +sub intersect {
> + my ($hash1, $hash2) = @_;
> +
> + my $result = { map { $_ => $hash2->{$_} } keys %$hash1 };
> +
> + for my $key (keys %$result) {
> + delete $result->{$key} if !defined($result->{$key});
> + }
> +
> + return $result;
> +};
> +
> +sub set_difference {
> + my ($hash1, $hash2) = @_;
> +
> + my $result = { map { $_ => 1 } keys %$hash1 };
> +
> + for my $key (keys %$result) {
> + delete $result->{$key} if defined($hash2->{$key});
> + }
> +
> + return $result;
> +};
> +
> +sub union {
> + my ($hash1, $hash2) = @_;
> +
> + my $result = { map { $_ => 1 } keys %$hash1, keys %$hash2 };
> +
> + return $result;
> +};
> +
> sub count_fenced_services {
> my ($ss, $node) = @_;
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (16 preceding siblings ...)
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
@ 2025-04-01 1:50 ` DERUMIER, Alexandre
2025-04-01 9:39 ` Daniel Kral
2025-04-24 10:12 ` Fiona Ebner
18 siblings, 1 reply; 67+ messages in thread
From: DERUMIER, Alexandre @ 2025-04-01 1:50 UTC (permalink / raw)
To: pve-devel
Hi Daniel,
thanks for working on this !
>>I chose the name "colocation" in favor of affinity/anti-affinity,
>>since
>>it is a bit more concise that it is about co-locating services
>>between
>>each other in contrast to locating services on nodes, but no hard
>>feelings to change it (same for any other names in this series).
my 2cents, but everybody in the industry is calling this
affinity/antiafifnity (vmware, nutanix, hyperv, openstack, ...).
More precisely, vm affinity rules (vm<->vm) vs node affinity rules
(vm->node , the current HA group)
Personnally I don't care, it's just a name ^_^ .
But I have a lot of customers asking about "does proxmox support
affinity/anti-affinity". and if they are doing their own research, they
will think that it doesnt exist.
(or at minimum, write somewhere in the doc something like "aka vm
affinity" or in commercial presentation ^_^)
More serious question : Don't have read yet all the code, but how does
it play with the current topsis placement algorithm ?
>>Additional and/or future ideas
>>------------------------------
Small feature request from students && customers: they are a lot
asking to be able to use vm tags in the colocation/affinity
>>I'd like to suggest to also transform the existing HA groups to
>>location
>>rules, if the rule concept turns out to be a good fit for the
>>colocation
>>feature in the HA Manager, as HA groups seem to integrate quite
>>easily
>>>into this concept.
I agree with that too
Thanks again !
Alexandre
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-01 1:50 ` DERUMIER, Alexandre
@ 2025-04-01 9:39 ` Daniel Kral
2025-04-01 11:05 ` DERUMIER, Alexandre via pve-devel
` (2 more replies)
0 siblings, 3 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-01 9:39 UTC (permalink / raw)
To: Proxmox VE development discussion, DERUMIER, Alexandre
On 4/1/25 03:50, DERUMIER, Alexandre wrote:
> my 2cents, but everybody in the industry is calling this
> affinity/antiafifnity (vmware, nutanix, hyperv, openstack, ...).
> More precisely, vm affinity rules (vm<->vm) vs node affinity rules
> (vm->node , the current HA group)
>
> Personnally I don't care, it's just a name ^_^ .
>
> But I have a lot of customers asking about "does proxmox support
> affinity/anti-affinity". and if they are doing their own research, they
> will think that it doesnt exist.
> (or at minimum, write somewhere in the doc something like "aka vm
> affinity" or in commercial presentation ^_^)
I see your point and also called it affinity/anti-affinity before, but
if we go for the HA Rules route here, it'd be really neat to have
"Location Rules" and "Colocation Rules" in the end to coexist and
clearly show the distinction between them, as both are affinity rules at
least for me.
I'd definitely make sure that it is clear from the release notes and
documentation, that this adds the feature to assign affinity between
services, but let's wait for some other comments on this ;).
On 4/1/25 03:50, DERUMIER, Alexandre wrote:
> More serious question : Don't have read yet all the code, but how does
> it play with the current topsis placement algorithm ?
I currently implemented the colocation rules to put a constraint on
which nodes the manager can select from for the to-be-migrated service.
So if users use the static load scheduler (and the basic / service count
scheduler for that matter too), the colocation rules just make sure that
no recovery node is selected, which contradicts the colocation rules. So
the TOPSIS algorithm isn't changed at all.
There are two things that should/could be changed in the future (besides
the many future ideas that I pointed out already), which are
- (1) the schedulers will still consider all online nodes, i.e. even
though HA groups and/or colocation rules restrict the allowed nodes in
the end, the calculation is done for all nodes which could be
significant for larger clusters, and
- (2) the service (generally) are currently recovered one-by-one in a
best-fit fashion, i.e. there's no order on the service's needed
resources, etc. There could be some edge cases (e.g. think about a
failing node with a bunch of service to be kept together; these should
now be migrated to the same node, if possible, or put them on the
minimum amount of nodes), where the algorithm could find better
solutions if it either orders the to-be-recovered services, and/or the
utilization scheduler has knowledge about the 'keep together'
colocations and considers these (and all subsets) as a single service.
For the latter, the complexity explodes a bit and is harder to test for,
which is why I've gone for the current implementation, as it also
reduces the burden on users to think about what could happen with a
specific set of rules and already allows the notion of MUST/SHOULD. This
gives enough flexibility to improve the decision making of the scheduler
in the future.
On 4/1/25 03:50, DERUMIER, Alexandre wrote:
> Small feature request from students && customers: they are a lot
> asking to be able to use vm tags in the colocation/affinity
Good idea! We were thinking about this too and I forgot to add it to the
list, thanks for bringing it up again!
Yes, the idea would be to make pools and tags available as selectors for
rules here, so that the changes can be made rather dynamic by just
adding a tag to a service.
The only thing we have to consider here is that HA rules have some
verification phase and invalid rules will be dropped or modified to make
them applicable. Also these external changes must be identified somehow
in the HA stack, as I want to keep the amount of runs through the
verification code to a minimum, i.e. only when the configuration is
changed by the user. But that will be a discussion for another series ;).
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-01 9:39 ` Daniel Kral
@ 2025-04-01 11:05 ` DERUMIER, Alexandre via pve-devel
2025-04-03 12:26 ` Fabian Grünbichler
2025-04-24 10:12 ` Fiona Ebner
2 siblings, 0 replies; 67+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-01 11:05 UTC (permalink / raw)
To: pve-devel, d.kral; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 17669 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "d.kral@proxmox.com" <d.kral@proxmox.com>
Subject: Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
Date: Tue, 1 Apr 2025 11:05:57 +0000
Message-ID: <ff6ab6753d00e1d6daa85fc985db90a1d056585e.camel@groupe-cyllene.com>
>>I currently implemented the colocation rules to put a constraint on
>>which nodes the manager can select from for the to-be-migrated
>>service.
>>So if users use the static load scheduler (and the basic / service
>>count
>>scheduler for that matter too), the colocation rules just make sure
>>that
>>no recovery node is selected, which contradicts the colocation rules.
>>So
>>the TOPSIS algorithm isn't changed at all.
Ah ok, got it, so it's an hard constraint (MUST) filtering the target
nodes.
>>There are two things that should/could be changed in the future
(besides
>>the many future ideas that I pointed out already), which are
>>- (1) the schedulers will still consider all online nodes, i.e. even
>>though HA groups and/or colocation rules restrict the allowed nodes
>>in
>>the end, the calculation is done for all nodes which could be
>>significant for larger clusters, and
>>- (2) the service (generally) are currently recovered one-by-one in a
>>best-fit fashion, i.e. there's no order on the service's needed
>>resources, etc. There could be some edge cases (e.g. think about a
>>failing node with a bunch of service to be kept together; these
>>should
>>now be migrated to the same node, if possible, or put them on the
>>minimum amount of nodes), where the algorithm could find better
>>solutions if it either orders the to-be-recovered services, and/or
>>the
>>utilization scheduler has knowledge about the 'keep together'
>>colocations and considers these (and all subsets) as a single
service.
>>
>>For the latter, the complexity explodes a bit and is harder to test
>>for,
>>which is why I've gone for the current implementation, as it also
>>reduces the burden on users to think about what could happen with a
>>specific set of rules and already allows the notion of MUST/SHOULD.
>>This
>>gives enough flexibility to improve the decision making of the
>>scheduler
>>in the future.
yes, soft constraint (SHOULD) is not so easy indeed.
I remember to have done some tests, putting in the topsis the number of
conflicting constraint by vm for each host, and migrate vm with the
more constraint first.
I had not too bad results, but this need to be tested at scale.
Hard constraint is already a good step. (should work for 90% of people
without 10000 constraints mixed together )
On 4/1/25 03:50, DERUMIER, Alexandre wrote:
> Small feature request from students && customers: they are a lot
> asking to be able to use vm tags in the colocation/affinity
>>Good idea! We were thinking about this too and I forgot to add it to
>>the
>>list, thanks for bringing it up again!
Ye>>s, the idea would be to make pools and tags available as selectors
>>for
>>rules here, so that the changes can be made rather dynamic by just
>>adding a tag to a service.
could be perfect :)
>>The only thing we have to consider here is that HA rules have some
>>verification phase and invalid rules will be dropped or modified to
>>make
>>them applicable. Also these external changes must be identified
>>somehow
>>in the HA stack, as I want to keep the amount of runs through the
>>verification code to a minimum, i.e. only when the configuration is
>>changed by the user. But that will be a discussion for another series
>>;).
yes sure!
BTW, another improvement could be hard constraint on storage
availability, as currently the HA stack is moving the vm blinding,
try to start, then move the vm to another node if storage is available.
The only workaround is to create HA server group, but this could be a
improvment.
Same for the number of cores available on host. (host number of cores
must be > than vm cores )
I'll try to take time to follow && test your patches !
Alexandre
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines
2025-03-25 17:53 ` Thomas Lamprecht
@ 2025-04-03 12:16 ` Fabian Grünbichler
2025-04-11 11:24 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fabian Grünbichler @ 2025-04-03 12:16 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion
On March 25, 2025 6:53 pm, Thomas Lamprecht wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> Implement helper subroutines, which implement basic set operations done
>> on hash sets, i.e. hashes with elements set to a true value, e.g. 1.
>>
>> These will be used for various tasks in the HA Manager colocation rules,
>> e.g. for verifying the satisfiability of the rules or applying the
>> colocation rules on the allowed set of nodes.
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> If they're useful somewhere else, I can move them to PVE::Tools
>> post-RFC, but it'd be probably useful to prefix them with `hash_` there.
>
> meh, not a big fan of growing the overly generic PVE::Tools more, if, this
> should go into a dedicated module for hash/data structure helpers ...
>
>> AFAICS there weren't any other helpers for this with a quick grep over
>> all projects and `PVE::Tools::array_intersect()` wasn't what I needed.
>
> ... which those existing one should then also move into, but out of scope
> of this series.
>
>>
>> src/PVE/HA/Tools.pm | 42 ++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 42 insertions(+)
>>
>> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
>> index 0f9e9a5..fc3282c 100644
>> --- a/src/PVE/HA/Tools.pm
>> +++ b/src/PVE/HA/Tools.pm
>> @@ -115,6 +115,48 @@ sub write_json_to_file {
>> PVE::Tools::file_set_contents($filename, $raw);
>> }
>>
>> +sub is_disjoint {
>
> IMO a bit too generic name for being in a Tools named module, maybe
> prefix them all with hash_ or hashes_ ?
is_disjoint also only really makes sense as a name if you see it as an
operation *on* $hash1, rather than an operation involving both hashes..
i.e., in Rust
set1.is_disjoint(&set2);
makes sense..
in Perl
is_disjoint($set1, $set2)
reads weird, and should maybe be
check_disjoint($set1, $set2)
or something like that?
>
>> + my ($hash1, $hash2) = @_;
>> +
>> + for my $key (keys %$hash1) {
>> + return 0 if exists($hash2->{$key});
>> + }
>> +
>> + return 1;
>> +};
>> +
>> +sub intersect {
>> + my ($hash1, $hash2) = @_;
>> +
>> + my $result = { map { $_ => $hash2->{$_} } keys %$hash1 };
this is a bit dangerous if $hash2->{$key} is itself a reference? if I
later modify $result I'll modify $hash2.. I know the commit message says
that the hashes are all just of the form key => 1, but nothing here
tells me that a year later when I am looking for a generic hash
intersection helper ;) I think this should also be clearly mentioned in
the module, and ideally, also in the helper names (i.e., have "set"
there everywhere and a comment above each that it only works for
hashes-as-sets and not generic hashes).
wouldn't it be faster/simpler to iterate over either hash once?
my $result = {};
for my $key (keys %$hash1) {
$result->{$key} = 1 if $hash1->{$key} && $hash2->{$key};
}
return $result;
>> +
>> + for my $key (keys %$result) {
>> + delete $result->{$key} if !defined($result->{$key});
>> + }
>> +
>> + return $result;
>> +};
>> +
>> +sub set_difference {
>> + my ($hash1, $hash2) = @_;
>> +
>> + my $result = { map { $_ => 1 } keys %$hash1 };
if $hash1 is only of the form key => 1, then this is just
my $result = { %$hash1 };
>> +
>> + for my $key (keys %$result) {
>> + delete $result->{$key} if defined($hash2->{$key});
>> + }
>> +
but the whole thing can be
return { map { $hash2->{$_} ? ($_ => 1) : () } keys %$hash1 };
this transforms hash1 into its keys, and then returns either ($key => 1)
if the key is true in $hash2, or the empty tuple if not. the outer {}
then turn this sequence of tuples into a hash again, which skips empty
tuples ;) can of course also be adapted to use the value from either
hash, check for definedness instead of truthiness, ..
>> + return $result;
>> +};
>> +
>> +sub union {
>> + my ($hash1, $hash2) = @_;
>> +
>> + my $result = { map { $_ => 1 } keys %$hash1, keys %$hash2 };
>> +
>> + return $result;
>> +};
>> +
>> sub count_fenced_services {
>> my ($ss, $node) = @_;
>>
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
@ 2025-04-03 12:16 ` Fabian Grünbichler
2025-04-11 11:04 ` Daniel Kral
2025-04-25 14:05 ` Fiona Ebner
1 sibling, 1 reply; 67+ messages in thread
From: Fabian Grünbichler @ 2025-04-03 12:16 UTC (permalink / raw)
To: Proxmox VE development discussion
On March 25, 2025 4:12 pm, Daniel Kral wrote:
> Add the colocation rule plugin to allow users to specify inter-service
> affinity constraints.
>
> These colocation rules can either be positive (keeping services
> together) or negative (keeping service separate). Their strictness can
> also be specified as either a MUST or a SHOULD, where the first
> specifies that any service the constraint cannot be applied for stays in
> recovery, while the latter specifies that that any service the
> constraint cannot be applied for is lifted from the constraint.
>
> The initial implementation also implements four basic transformations,
> where colocation rules with not enough services are dropped, transitive
> positive colocation rules are merged, and inter-colocation rule
> inconsistencies as well as colocation rule inconsistencies with respect
> to the location constraints specified in HA groups are dropped.
a high level question: theres a lot of loops and sorts over rules,
services, groups here - granted that is all in memory, so should be
reasonably fast, do we have concerns here/should we look for further
optimization potential?
e.g. right now I count (coming in via canonicalize):
- check_services_count
-- sort of ruleids (foreach_colocation_rule)
-- loop over rules (foreach_colocation_rule)
--- keys on services of each rule
- loop over the results (should be empty)
- check_positive_intransitivity
-- sort of ruleids, 1x loop over rules (foreach_colocation_rule via split_colocation_rules)
-- loop over each unique pair of ruleids
--- is_disjoint on services of each pair (loop over service keys)
- loop over resulting ruleids (might be many!)
-- loop over mergeable rules for each merge target
--- loop over services of each mergeable rule
- check_inner_consistency
-- sort of ruleids, 1x loop over rules (foreach_colocation_rule via split_colocation_rules)
-- loop over positive rules
--- for every positive rule, loop over negative rules
---- for each pair of positive+negative rule, check service
intersections
- loop over resulting conflicts (should be empty)
- check_consistency_with_groups
-- sort of ruleids, 1x loop over rules (foreach_colocation_rule via split_colocation_rules)
-- loop over positive rules
--- loop over services
---- loop over nodes of service's group
-- loop over negative rules
--- loop over services
---- loop over nodes of service's group
- loop over resulting conflicts (should be empty)
possibly splitting the rules (instead of just the IDs) once and keeping
a list of sorted rule IDs we could save some overhead?
might not be worth it (yet) though, but something to keep in mind if the
rules are getting more complicated over time..
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> debian/pve-ha-manager.install | 1 +
> src/PVE/HA/Makefile | 1 +
> src/PVE/HA/Rules/Colocation.pm | 391 +++++++++++++++++++++++++++++++++
> src/PVE/HA/Rules/Makefile | 6 +
> src/PVE/HA/Tools.pm | 6 +
> 5 files changed, 405 insertions(+)
> create mode 100644 src/PVE/HA/Rules/Colocation.pm
> create mode 100644 src/PVE/HA/Rules/Makefile
>
> diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
> index 9bbd375..89f9144 100644
> --- a/debian/pve-ha-manager.install
> +++ b/debian/pve-ha-manager.install
> @@ -33,6 +33,7 @@
> /usr/share/perl5/PVE/HA/Resources/PVECT.pm
> /usr/share/perl5/PVE/HA/Resources/PVEVM.pm
> /usr/share/perl5/PVE/HA/Rules.pm
> +/usr/share/perl5/PVE/HA/Rules/Colocation.pm
> /usr/share/perl5/PVE/HA/Tools.pm
> /usr/share/perl5/PVE/HA/Usage.pm
> /usr/share/perl5/PVE/HA/Usage/Basic.pm
> diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
> index 489cbc0..e386cbf 100644
> --- a/src/PVE/HA/Makefile
> +++ b/src/PVE/HA/Makefile
> @@ -8,6 +8,7 @@ install:
> install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
> for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
> make -C Resources install
> + make -C Rules install
> make -C Usage install
> make -C Env install
>
> diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
> new file mode 100644
> index 0000000..808d48e
> --- /dev/null
> +++ b/src/PVE/HA/Rules/Colocation.pm
> @@ -0,0 +1,391 @@
> +package PVE::HA::Rules::Colocation;
> +
> +use strict;
> +use warnings;
> +
> +use Data::Dumper;
leftover dumper ;)
> +
> +use PVE::JSONSchema qw(get_standard_option);
> +use PVE::HA::Tools;
> +
> +use base qw(PVE::HA::Rules);
> +
> +sub type {
> + return 'colocation';
> +}
> +
> +sub properties {
> + return {
> + services => get_standard_option('pve-ha-resource-id-list'),
> + affinity => {
> + description => "Describes whether the services are supposed to be kept on separate"
> + . " nodes, or are supposed to be kept together on the same node.",
> + type => 'string',
> + enum => ['separate', 'together'],
> + optional => 0,
> + },
> + strict => {
> + description => "Describes whether the colocation rule is mandatory or optional.",
> + type => 'boolean',
> + optional => 0,
> + },
> + }
> +}
> +
> +sub options {
> + return {
> + services => { optional => 0 },
> + strict => { optional => 0 },
> + affinity => { optional => 0 },
> + comment => { optional => 1 },
> + };
> +};
> +
> +sub decode_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + if ($key eq 'services') {
> + my $res = {};
> +
> + for my $service (PVE::Tools::split_list($value)) {
> + if (PVE::HA::Tools::pve_verify_ha_resource_id($service)) {
> + $res->{$service} = 1;
> + }
> + }
> +
> + return $res;
> + }
> +
> + return $value;
> +}
> +
> +sub encode_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + if ($key eq 'services') {
> + PVE::HA::Tools::pve_verify_ha_resource_id($_) for (keys %$value);
> +
> + return join(',', keys %$value);
> + }
> +
> + return $value;
> +}
> +
> +sub foreach_colocation_rule {
> + my ($rules, $func, $opts) = @_;
> +
> + my $my_opts = { map { $_ => $opts->{$_} } keys %$opts };
why? if the caller doesn't want $opts to be modified, they could just
pass in a copy (or you could require it to be passed by value instead of
by reference?).
there's only a single caller that does (introduced by a later patch) and
that one constructs the hash reference right at the call site, so unless
I am missing something this seems a bit overkill..
> + $my_opts->{type} = 'colocation';
> +
> + PVE::HA::Rules::foreach_service_rule($rules, $func, $my_opts);
> +}
> +
> +sub split_colocation_rules {
> + my ($rules) = @_;
> +
> + my $positive_ruleids = [];
> + my $negative_ruleids = [];
> +
> + foreach_colocation_rule($rules, sub {
> + my ($rule, $ruleid) = @_;
> +
> + my $ruleid_set = $rule->{affinity} eq 'together' ? $positive_ruleids : $negative_ruleids;
> + push @$ruleid_set, $ruleid;
> + });
> +
> + return ($positive_ruleids, $negative_ruleids);
> +}
> +
> +=head3 check_service_count($rules)
> +
> +Returns a list of conflicts caused by colocation rules, which do not have
> +enough services in them, defined in C<$rules>.
> +
> +If there are no conflicts, the returned list is empty.
> +
> +=cut
> +
> +sub check_services_count {
> + my ($rules) = @_;
> +
> + my $conflicts = [];
> +
> + foreach_colocation_rule($rules, sub {
> + my ($rule, $ruleid) = @_;
> +
> + push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}}) < 2);
> + });
> +
> + return $conflicts;
> +}
is this really an issue? a colocation rule with a single service is just
a nop? there's currently no cleanup AFAICT if a resource is removed, but
if we add that part (we maybe should?) then one can easily end up in a
situation where a rule temporarily contains a single or no service?
> +
> +=head3 check_positive_intransitivity($rules)
> +
> +Returns a list of conflicts caused by transitive positive colocation rules
> +defined in C<$rules>.
> +
> +Transitive positive colocation rules exist, if there are at least two positive
> +colocation rules with the same strictness, which put at least the same two
> +services in relation. This means, that these rules can be merged together.
> +
> +If there are no conflicts, the returned list is empty.
The terminology here is quit confusing - conflict meaning that two rules
are "transitive" and thus mergeable (which is good, cause it makes
things easier to handle?) is quite weird, as "conflict" is a rather
negative term..
there's only a single call site in the same module, maybe we could just
rename this into "find_mergeable_positive_ruleids", similar to the
variable where the result is stored?
> +
> +=cut
> +
> +sub check_positive_intransitivity {
> + my ($rules) = @_;
> +
> + my $conflicts = {};
> + my ($positive_ruleids) = split_colocation_rules($rules);
> +
> + while (my $outerid = shift(@$positive_ruleids)) {
> + my $outer = $rules->{ids}->{$outerid};
> +
> + for my $innerid (@$positive_ruleids) {
so this is in practice a sort of "optimized" loop over all pairs of
rules - iterating over the positive rules twice, but skipping pairs that
were already visited by virtue of the shift on the outer loop..
might be worth a short note, together with the $inner and $outer
terminology I was a bit confused at first..
> + my $inner = $rules->{ids}->{$innerid};
> +
> + next if $outerid eq $innerid;
> + next if $outer->{strict} != $inner->{strict};
> + next if PVE::HA::Tools::is_disjoint($outer->{services}, $inner->{services});
> +
> + push @{$conflicts->{$outerid}}, $innerid;
> + }
> + }
> +
> + return $conflicts;
> +}
> +
> +=head3 check_inner_consistency($rules)
> +
> +Returns a list of conflicts caused by inconsistencies between positive and
> +negative colocation rules defined in C<$rules>.
> +
> +Inner inconsistent colocation rules exist, if there are at least the same two
> +services in a positive and a negative colocation relation, which is an
> +impossible constraint as they are opposites of each other.
> +
> +If there are no conflicts, the returned list is empty.
here the conflicts and check terminology makes sense - we are checking
an invariant that must be satisfied after all :)
> +
> +=cut
> +
> +sub check_inner_consistency {
but 'inner' is a weird term since this is consistency between rules?
it basically checks that no pair of services should both be colocated
and not be colocated at the same time, but not sure how to encode that
concisely..
> + my ($rules) = @_;
> +
> + my $conflicts = [];
> + my ($positive_ruleids, $negative_ruleids) = split_colocation_rules($rules);
> +
> + for my $outerid (@$positive_ruleids) {
> + my $outer = $rules->{ids}->{$outerid}->{services};
s/outer/positive ?
> +
> + for my $innerid (@$negative_ruleids) {
> + my $inner = $rules->{ids}->{$innerid}->{services};
s/inner/negative ?
> +
> + my $intersection = PVE::HA::Tools::intersect($outer, $inner);
> + next if scalar(keys %$intersection < 2);
the keys there is not needed, but the parentheses are in the wrong place
instead ;) it does work by accident though, because the result of keys
will be coerced to a scalar anyway, so you get the result of your
comparison wrapped by another call to scalar, so you end up with either
1 or '' depending on whether the check was true or false..
> +
> + push @$conflicts, [$outerid, $innerid];
> + }
> + }
> +
> + return $conflicts;
> +}
> +
> +=head3 check_positive_group_consistency(...)
> +
> +Returns a list of conflicts caused by inconsistencies between positive
> +colocation rules defined in C<$rules> and node restrictions defined in
> +C<$groups> and C<$service>.
services?
> +
> +A positive colocation rule inconsistency with groups exists, if at least two
> +services in a positive colocation rule are restricted to disjoint sets of
> +nodes, i.e. they are in restricted HA groups, which have a disjoint set of
> +nodes.
> +
> +If there are no conflicts, the returned list is empty.
> +
> +=cut
> +
> +sub check_positive_group_consistency {
> + my ($rules, $groups, $services, $positive_ruleids, $conflicts) = @_;
this could just get $positive_rules (filtered via grep) instead?
> +
> + for my $ruleid (@$positive_ruleids) {
> + my $rule_services = $rules->{ids}->{$ruleid}->{services};
and this could be
while (my ($ruleid, $rule) = each %$positive_rules) {
my $nodes;
..
}
> + my $nodes;
> +
> + for my $sid (keys %$rule_services) {
> + my $groupid = $services->{$sid}->{group};
> + return if !$groupid;
should this really be a return?
> +
> + my $group = $groups->{ids}->{$groupid};
> + return if !$group;
> + return if !$group->{restricted};
same here?
> +
> + $nodes = { map { $_ => 1 } keys %{$group->{nodes}} } if !defined($nodes);
isn't $group->{nodes} already a hash set of the desired format? so this could be
$nodes = { $group->{nodes}->%* };
?
> + $nodes = PVE::HA::Tools::intersect($nodes, $group->{nodes});
could add a break here with the same condition as below?
> + }
> +
> + if (defined($nodes) && scalar keys %$nodes < 1) {
> + push @$conflicts, ['positive', $ruleid];
> + }
> + }
> +}
> +
> +=head3 check_negative_group_consistency(...)
> +
> +Returns a list of conflicts caused by inconsistencies between negative
> +colocation rules defined in C<$rules> and node restrictions defined in
> +C<$groups> and C<$service>.
> +
> +A negative colocation rule inconsistency with groups exists, if at least two
> +services in a negative colocation rule are restricted to less nodes in total
> +than services in the rule, i.e. they are in restricted HA groups, where the
> +union of all restricted node sets have less elements than restricted services.
> +
> +If there are no conflicts, the returned list is empty.
> +
> +=cut
> +
> +sub check_negative_group_consistency {
> + my ($rules, $groups, $services, $negative_ruleids, $conflicts) = @_;
same question here
> +
> + for my $ruleid (@$negative_ruleids) {
> + my $rule_services = $rules->{ids}->{$ruleid}->{services};
> + my $restricted_services = 0;
> + my $restricted_nodes;
> +
> + for my $sid (keys %$rule_services) {
> + my $groupid = $services->{$sid}->{group};
> + return if !$groupid;
same question as above ;)
> +
> + my $group = $groups->{ids}->{$groupid};
> + return if !$group;
> + return if !$group->{restricted};
same here
> +
> + $restricted_services++;
> +
> + $restricted_nodes = {} if !defined($restricted_nodes);
> + $restricted_nodes = PVE::HA::Tools::union($restricted_nodes, $group->{nodes});
here as well - if restricted_services > restricted_nodes, haven't we
already found a violation of the invariant and should break even if
another service would then be added in the next iteration that can run
on 5 move new nodes..
> + }
> +
> + if (defined($restricted_nodes)
> + && scalar keys %$restricted_nodes < $restricted_services) {
> + push @$conflicts, ['negative', $ruleid];
> + }
> + }
> +}
> +
> +sub check_consistency_with_groups {
> + my ($rules, $groups, $services) = @_;
> +
> + my $conflicts = [];
> + my ($positive_ruleids, $negative_ruleids) = split_colocation_rules($rules);
> +
> + check_positive_group_consistency($rules, $groups, $services, $positive_ruleids, $conflicts);
> + check_negative_group_consistency($rules, $groups, $services, $negative_ruleids, $conflicts);
> +
> + return $conflicts;
> +}
> +
> +sub canonicalize {
> + my ($class, $rules, $groups, $services) = @_;
should this note that it will modify $rules in-place? this is only
called by PVE::HA::Rules::checked_config which also does not note that
and could be interpreted as "config is checked now" ;)
> +
> + my $illdefined_ruleids = check_services_count($rules);
> +
> + for my $ruleid (@$illdefined_ruleids) {
> + print "Drop colocation rule '$ruleid', because it does not have enough services defined.\n";
> +
> + delete $rules->{ids}->{$ruleid};
> + }
> +
> + my $mergeable_positive_ruleids = check_positive_intransitivity($rules);
> +
> + for my $outerid (sort keys %$mergeable_positive_ruleids) {
> + my $outer = $rules->{ids}->{$outerid};
> + my $innerids = $mergeable_positive_ruleids->{$outerid};
> +
> + for my $innerid (@$innerids) {
> + my $inner = $rules->{ids}->{$innerid};
> +
> + $outer->{services}->{$_} = 1 for (keys %{$inner->{services}});
> +
> + print "Merge services of positive colocation rule '$innerid' into positive colocation"
> + . " rule '$outerid', because they share at least one service.\n";
this is a bit confusing because it modifies the rule while continuing to
refer to it using the old name afterwards.. should we merge them and
give them a new name?
> +
> + delete $rules->{ids}->{$innerid};
> + }
> + }
> +
> + my $inner_conflicts = check_inner_consistency($rules);
> +
> + for my $conflict (@$inner_conflicts) {
> + my ($positiveid, $negativeid) = @$conflict;
> +
> + print "Drop positive colocation rule '$positiveid' and negative colocation rule"
> + . " '$negativeid', because they share two or more services.\n";
> +
> + delete $rules->{ids}->{$positiveid};
> + delete $rules->{ids}->{$negativeid};
> + }
> +
> + my $group_conflicts = check_consistency_with_groups($rules, $groups, $services);
> +
> + for my $conflict (@$group_conflicts) {
> + my ($type, $ruleid) = @$conflict;
> +
> + if ($type eq 'positive') {
> + print "Drop positive colocation rule '$ruleid', because two or more services are"
> + . " restricted to different nodes.\n";
> + } elsif ($type eq 'negative') {
> + print "Drop negative colocation rule '$ruleid', because two or more services are"
> + . " restricted to less nodes than services.\n";
> + } else {
> + die "Invalid group conflict type $type\n";
> + }
> +
> + delete $rules->{ids}->{$ruleid};
> + }
> +}
> +
> +# TODO This will be used to verify modifications to the rules config over the API
> +sub are_satisfiable {
this is basically canonicalize, but
- without deleting rules
- without the transitivity check
- with slightly adapted messages
should they be combined so that we have roughly the same logic when
doing changes via the API and when loading the rules for operations?
> + my ($class, $rules, $groups, $services) = @_;
> +
> + my $illdefined_ruleids = check_services_count($rules);
> +
> + for my $ruleid (@$illdefined_ruleids) {
> + print "Colocation rule '$ruleid' does not have enough services defined.\n";
> + }
> +
> + my $inner_conflicts = check_inner_consistency($rules);
> +
> + for my $conflict (@$inner_conflicts) {
> + my ($positiveid, $negativeid) = @$conflict;
> +
> + print "Positive colocation rule '$positiveid' is inconsistent with negative colocation rule"
> + . " '$negativeid', because they share two or more services between them.\n";
> + }
> +
> + my $group_conflicts = check_consistency_with_groups($rules, $groups, $services);
> +
> + for my $conflict (@$group_conflicts) {
> + my ($type, $ruleid) = @$conflict;
> +
> + if ($type eq 'positive') {
> + print "Positive colocation rule '$ruleid' is unapplicable, because two or more services"
> + . " are restricted to different nodes.\n";
> + } elsif ($type eq 'negative') {
> + print "Negative colocation rule '$ruleid' is unapplicable, because two or more services"
> + . " are restricted to less nodes than services.\n";
> + } else {
> + die "Invalid group conflict type $type\n";
> + }
> + }
> +
> + if (scalar(@$inner_conflicts) || scalar(@$group_conflicts)) {
> + return 0;
> + }
> +
> + return 1;
> +}
> +
> +1;
> diff --git a/src/PVE/HA/Rules/Makefile b/src/PVE/HA/Rules/Makefile
> new file mode 100644
> index 0000000..8cb91ac
> --- /dev/null
> +++ b/src/PVE/HA/Rules/Makefile
> @@ -0,0 +1,6 @@
> +SOURCES=Colocation.pm
> +
> +.PHONY: install
> +install:
> + install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Rules
> + for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Rules/$$i; done
> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
> index 35107c9..52251d7 100644
> --- a/src/PVE/HA/Tools.pm
> +++ b/src/PVE/HA/Tools.pm
> @@ -46,6 +46,12 @@ PVE::JSONSchema::register_standard_option('pve-ha-resource-id', {
> type => 'string', format => 'pve-ha-resource-id',
> });
>
> +PVE::JSONSchema::register_standard_option('pve-ha-resource-id-list', {
> + description => "List of HA resource IDs.",
> + typetext => "<type>:<name>{,<type>:<name>}*",
> + type => 'string', format => 'pve-ha-resource-id-list',
> +});
> +
> PVE::JSONSchema::register_format('pve-ha-resource-or-vm-id', \&pve_verify_ha_resource_or_vm_id);
> sub pve_verify_ha_resource_or_vm_id {
> my ($sid, $noerr) = @_;
> --
> 2.39.5
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
@ 2025-04-03 12:17 ` Fabian Grünbichler
2025-04-11 15:56 ` Daniel Kral
2025-04-28 12:26 ` Fiona Ebner
2025-04-30 11:09 ` Daniel Kral
2 siblings, 1 reply; 67+ messages in thread
From: Fabian Grünbichler @ 2025-04-03 12:17 UTC (permalink / raw)
To: Proxmox VE development discussion
On March 25, 2025 4:12 pm, Daniel Kral wrote:
> Add a mechanism to the node selection subroutine, which enforces the
> colocation rules defined in the rules config.
>
> The algorithm manipulates the set of nodes directly, which the service
> is allowed to run on, depending on the type and strictness of the
> colocation rules, if there are any.
shouldn't this first attempt to satisfy all rules, and if that fails,
retry with just the strict ones, or something similar? see comments
below (maybe I am missing/misunderstanding something)
>
> This makes it depend on the prior removal of any nodes, which are
> unavailable (i.e. offline, unreachable, or weren't able to start the
> service in previous tries) or are not allowed to be run on otherwise
> (i.e. HA group node restrictions) to function correctly.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> src/PVE/HA/Manager.pm | 203 ++++++++++++++++++++++++++++++++++++-
> src/test/test_failover1.pl | 4 +-
> 2 files changed, 205 insertions(+), 2 deletions(-)
>
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 8f2ab3d..79b6555 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -157,8 +157,201 @@ sub get_node_priority_groups {
> return ($pri_groups, $group_members);
> }
>
> +=head3 get_colocated_services($rules, $sid, $online_node_usage)
> +
> +Returns a hash map of all services, which are specified as being in a positive
> +or negative colocation in C<$rules> with the given service with id C<$sid>.
> +
> +Each service entry consists of the type of colocation, strictness of colocation
> +and the node the service is currently assigned to, if any, according to
> +C<$online_node_usage>.
> +
> +For example, a service C<'vm:101'> being strictly colocated together (positive)
> +with two other services C<'vm:102'> and C<'vm:103'> and loosely colocated
> +separate with another service C<'vm:104'> results in the hash map:
> +
> + {
> + 'vm:102' => {
> + affinity => 'together',
> + strict => 1,
> + node => 'node2'
> + },
> + 'vm:103' => {
> + affinity => 'together',
> + strict => 1,
> + node => 'node2'
> + },
> + 'vm:104' => {
> + affinity => 'separate',
> + strict => 0,
> + node => undef
> + }
> + }
> +
> +=cut
> +
> +sub get_colocated_services {
> + my ($rules, $sid, $online_node_usage) = @_;
> +
> + my $services = {};
> +
> + PVE::HA::Rules::Colocation::foreach_colocation_rule($rules, sub {
> + my ($rule) = @_;
> +
> + for my $csid (sort keys %{$rule->{services}}) {
> + next if $csid eq $sid;
> +
> + $services->{$csid} = {
> + node => $online_node_usage->get_service_node($csid),
> + affinity => $rule->{affinity},
> + strict => $rule->{strict},
> + };
> + }
> + }, {
> + sid => $sid,
> + });
> +
> + return $services;
> +}
> +
> +=head3 get_colocation_preference($rules, $sid, $online_node_usage)
> +
> +Returns a list of two hashes, where each is a hash map of the colocation
> +preference of C<$sid>, according to the colocation rules in C<$rules> and the
> +service locations in C<$online_node_usage>.
> +
> +The first hash is the positive colocation preference, where each element
> +represents properties for how much C<$sid> prefers to be on the node.
> +Currently, this is a binary C<$strict> field, which means either it should be
> +there (C<0>) or must be there (C<1>).
> +
> +The second hash is the negative colocation preference, where each element
> +represents properties for how much C<$sid> prefers not to be on the node.
> +Currently, this is a binary C<$strict> field, which means either it should not
> +be there (C<0>) or must not be there (C<1>).
> +
> +=cut
> +
> +sub get_colocation_preference {
> + my ($rules, $sid, $online_node_usage) = @_;
> +
> + my $services = get_colocated_services($rules, $sid, $online_node_usage);
> +
> + my $together = {};
> + my $separate = {};
> +
> + for my $service (values %$services) {
> + my $node = $service->{node};
> +
> + next if !$node;
> +
> + my $node_set = $service->{affinity} eq 'together' ? $together : $separate;
> + $node_set->{$node}->{strict} = $node_set->{$node}->{strict} || $service->{strict};
> + }
> +
> + return ($together, $separate);
> +}
> +
> +=head3 apply_positive_colocation_rules($together, $allowed_nodes)
> +
> +Applies the positive colocation preference C<$together> on the allowed node
> +hash set C<$allowed_nodes> directly.
> +
> +Positive colocation means keeping services together on a single node, and
> +therefore minimizing the separation of services.
> +
> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
> +which is available to the service, i.e. each node is currently online, is
> +available according to other location constraints, and the service has not
> +failed running there yet.
> +
> +=cut
> +
> +sub apply_positive_colocation_rules {
> + my ($together, $allowed_nodes) = @_;
> +
> + return if scalar(keys %$together) < 1;
> +
> + my $mandatory_nodes = {};
> + my $possible_nodes = PVE::HA::Tools::intersect($allowed_nodes, $together);
> +
> + for my $node (sort keys %$together) {
> + $mandatory_nodes->{$node} = 1 if $together->{$node}->{strict};
> + }
> +
> + if (scalar keys %$mandatory_nodes) {
> + # limit to only the nodes the service must be on.
> + for my $node (keys %$allowed_nodes) {
> + next if exists($mandatory_nodes->{$node});
> +
> + delete $allowed_nodes->{$node};
> + }
> + } elsif (scalar keys %$possible_nodes) {
I am not sure I follow this logic here.. if there are any strict
requirements, we only honor those.. if there are no strict requirements,
we only honor the non-strict ones?
> + # limit to the possible nodes the service should be on, if there are any.
> + for my $node (keys %$allowed_nodes) {
> + next if exists($possible_nodes->{$node});
> +
> + delete $allowed_nodes->{$node};
> + }
this is the same code twice, just operating on different hash
references, so could probably be a lot shorter. the next and delete
could also be combined (`delete .. if !...`).
> + }
> +}
> +
> +=head3 apply_negative_colocation_rules($separate, $allowed_nodes)
> +
> +Applies the negative colocation preference C<$separate> on the allowed node
> +hash set C<$allowed_nodes> directly.
> +
> +Negative colocation means keeping services separate on multiple nodes, and
> +therefore maximizing the separation of services.
> +
> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
> +which is available to the service, i.e. each node is currently online, is
> +available according to other location constraints, and the service has not
> +failed running there yet.
> +
> +=cut
> +
> +sub apply_negative_colocation_rules {
> + my ($separate, $allowed_nodes) = @_;
> +
> + return if scalar(keys %$separate) < 1;
> +
> + my $mandatory_nodes = {};
> + my $possible_nodes = PVE::HA::Tools::set_difference($allowed_nodes, $separate);
this is confusing or I misunderstand something here, see below..
> +
> + for my $node (sort keys %$separate) {
> + $mandatory_nodes->{$node} = 1 if $separate->{$node}->{strict};
> + }
> +
> + if (scalar keys %$mandatory_nodes) {
> + # limit to the nodes the service must not be on.
this is missing a not?
we are limiting to the nodes the service must not not be on :-P
should we rename mandatory_nodes to forbidden_nodes?
> + for my $node (keys %$allowed_nodes) {
this could just loop over the forbidden nodes and delete them from
allowed nodes?
> + next if !exists($mandatory_nodes->{$node});
> +
> + delete $allowed_nodes->{$node};
> + }
> + } elsif (scalar keys %$possible_nodes) {
similar to above - if we have strict exclusions, we honor them, but we
ignore the non-strict exclusions unless there are no strict ones?
> + # limit to the nodes the service should not be on, if any.
> + for my $node (keys %$allowed_nodes) {
> + next if exists($possible_nodes->{$node});
> +
> + delete $allowed_nodes->{$node};
> + }
> + }
> +}
> +
> +sub apply_colocation_rules {
> + my ($rules, $sid, $allowed_nodes, $online_node_usage) = @_;
> +
> + my ($together, $separate) = get_colocation_preference($rules, $sid, $online_node_usage);
> +
> + apply_positive_colocation_rules($together, $allowed_nodes);
> + apply_negative_colocation_rules($separate, $allowed_nodes);
> +}
> +
> sub select_service_node {
> - my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
> + # TODO Cleanup this signature post-RFC
> + my ($rules, $groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
>
> my $group = get_service_group($groups, $online_node_usage, $service_conf);
>
> @@ -189,6 +382,8 @@ sub select_service_node {
>
> return $current_node if (!$try_next && !$best_scored) && $pri_nodes->{$current_node};
>
> + apply_colocation_rules($rules, $sid, $pri_nodes, $online_node_usage);
> +
> my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
> my @nodes = sort {
> $scores->{$a} <=> $scores->{$b} || $a cmp $b
> @@ -758,6 +953,7 @@ sub next_state_request_start {
>
> if ($self->{crs}->{rebalance_on_request_start}) {
> my $selected_node = select_service_node(
> + $self->{rules},
> $self->{groups},
> $self->{online_node_usage},
> $sid,
> @@ -771,6 +967,9 @@ sub next_state_request_start {
> my $select_text = $selected_node ne $current_node ? 'new' : 'current';
> $haenv->log('info', "service $sid: re-balance selected $select_text node $selected_node for startup");
>
> + # TODO It would be better if this information would be retrieved from $ss/$sd post-RFC
> + $self->{online_node_usage}->pin_service_node($sid, $selected_node);
> +
> if ($selected_node ne $current_node) {
> $change_service_state->($self, $sid, 'request_start_balance', node => $current_node, target => $selected_node);
> return;
> @@ -898,6 +1097,7 @@ sub next_state_started {
> }
>
> my $node = select_service_node(
> + $self->{rules},
> $self->{groups},
> $self->{online_node_usage},
> $sid,
> @@ -1004,6 +1204,7 @@ sub next_state_recovery {
> $self->recompute_online_node_usage(); # we want the most current node state
>
> my $recovery_node = select_service_node(
> + $self->{rules},
> $self->{groups},
> $self->{online_node_usage},
> $sid,
> diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
> index 308eab3..4c84fbd 100755
> --- a/src/test/test_failover1.pl
> +++ b/src/test/test_failover1.pl
> @@ -8,6 +8,8 @@ use PVE::HA::Groups;
> use PVE::HA::Manager;
> use PVE::HA::Usage::Basic;
>
> +my $rules = {};
> +
> my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
> group: prefer_node1
> nodes node1
> @@ -31,7 +33,7 @@ sub test {
> my ($expected_node, $try_next) = @_;
>
> my $node = PVE::HA::Manager::select_service_node
> - ($groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
> + ($rules, $groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
>
> my (undef, undef, $line) = caller();
> die "unexpected result: $node != ${expected_node} at line $line\n"
> --
> 2.39.5
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-01 9:39 ` Daniel Kral
2025-04-01 11:05 ` DERUMIER, Alexandre via pve-devel
@ 2025-04-03 12:26 ` Fabian Grünbichler
2025-04-24 10:12 ` Fiona Ebner
2 siblings, 0 replies; 67+ messages in thread
From: Fabian Grünbichler @ 2025-04-03 12:26 UTC (permalink / raw)
To: DERUMIER, Alexandre, Proxmox VE development discussion
On April 1, 2025 11:39 am, Daniel Kral wrote:
> On 4/1/25 03:50, DERUMIER, Alexandre wrote:
>> Small feature request from students && customers: they are a lot
>> asking to be able to use vm tags in the colocation/affinity
>
> Good idea! We were thinking about this too and I forgot to add it to the
> list, thanks for bringing it up again!
>
> Yes, the idea would be to make pools and tags available as selectors for
> rules here, so that the changes can be made rather dynamic by just
> adding a tag to a service.
>
> The only thing we have to consider here is that HA rules have some
> verification phase and invalid rules will be dropped or modified to make
> them applicable. Also these external changes must be identified somehow
> in the HA stack, as I want to keep the amount of runs through the
> verification code to a minimum, i.e. only when the configuration is
> changed by the user. But that will be a discussion for another series ;).
something to also consider is HA permissions:
https://bugzilla.proxmox.com/show_bug.cgi?id=4597
e.g., who is supposed to define (affinity or other) rules, who sees
them, what if there are conflicts, ..
what about conflicting requests? let's say we have a set of 5 VMs that
should run on the same node, but one is requested to be migrated to node
A, and a second one to node B? if a user doesn't see the rules for lack
of privileges this could get rather confusing behaviour wise in the end?
what about things like VMs X and Y needing to run together, but Z not
being allowed to run together with Y, and user A that only "sees" X
requesting X to be migrated to the node where Z is currently running?
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-04-03 12:16 ` Fabian Grünbichler
@ 2025-04-11 11:04 ` Daniel Kral
2025-04-25 14:06 ` Fiona Ebner
0 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-11 11:04 UTC (permalink / raw)
To: Proxmox VE development discussion, Fabian Grünbichler
Thanks for the review, Fabian!
Sorry for the wait, I was more focused on testing other patch series
which were already ready to merge for PVE 8.4 ;). But I'm going to be
working on this again now, so that it will be ready for the next release
or even before that :)
Thanks for the suggestions, I'll implement them shortly. I still have a
few questions/discussion points below.
On 4/3/25 14:16, Fabian Grünbichler wrote:
> On March 25, 2025 4:12 pm, Daniel Kral wrote:
>> Add the colocation rule plugin to allow users to specify inter-service
>> affinity constraints.
>>
>> These colocation rules can either be positive (keeping services
>> together) or negative (keeping service separate). Their strictness can
>> also be specified as either a MUST or a SHOULD, where the first
>> specifies that any service the constraint cannot be applied for stays in
>> recovery, while the latter specifies that that any service the
>> constraint cannot be applied for is lifted from the constraint.
>>
>> The initial implementation also implements four basic transformations,
>> where colocation rules with not enough services are dropped, transitive
>> positive colocation rules are merged, and inter-colocation rule
>> inconsistencies as well as colocation rule inconsistencies with respect
>> to the location constraints specified in HA groups are dropped.
>
> a high level question: theres a lot of loops and sorts over rules,
> services, groups here - granted that is all in memory, so should be
> reasonably fast, do we have concerns here/should we look for further
> optimization potential?
>
> e.g. right now I count (coming in via canonicalize):
>
> - check_services_count
> -- sort of ruleids (foreach_colocation_rule)
> -- loop over rules (foreach_colocation_rule)
> --- keys on services of each rule
> - loop over the results (should be empty)
> - check_positive_intransitivity
> -- sort of ruleids, 1x loop over rules (foreach_colocation_rule via split_colocation_rules)
> -- loop over each unique pair of ruleids
> --- is_disjoint on services of each pair (loop over service keys)
> - loop over resulting ruleids (might be many!)
> -- loop over mergeable rules for each merge target
> --- loop over services of each mergeable rule
> - check_inner_consistency
> -- sort of ruleids, 1x loop over rules (foreach_colocation_rule via split_colocation_rules)
> -- loop over positive rules
> --- for every positive rule, loop over negative rules
> ---- for each pair of positive+negative rule, check service
> intersections
> - loop over resulting conflicts (should be empty)
> - check_consistency_with_groups
> -- sort of ruleids, 1x loop over rules (foreach_colocation_rule via split_colocation_rules)
> -- loop over positive rules
> --- loop over services
> ---- loop over nodes of service's group
> -- loop over negative rules
> --- loop over services
> ---- loop over nodes of service's group
> - loop over resulting conflicts (should be empty)
>
> possibly splitting the rules (instead of just the IDs) once and keeping
> a list of sorted rule IDs we could save some overhead?
>
> might not be worth it (yet) though, but something to keep in mind if the
> rules are getting more complicated over time..
Thanks for the nice call graph!
I think it would be reasonable to do this already, especially to reduce
the code duplication between canonicalize() and are_satisfiable() you
already mentioned below.
I was thinking about something like $cmddef or another registry-type
structure, which has an entry for each checking subroutine and also a
handler for what to print/do for both canonicalize() as well as
are_satisfiable(). Then those would have to only iterate over the list
and call the subroutines.
For every checking subroutine, we could pass the whole of $rules, and a
rule type-specific variable, e.g. [$positive_ids, $negative_ids] here,
or as you already suggested below [$positive_rules, $negative_rules].
One small thing I haven't explicitly mentioned here before is that at
least the check for mergeable positive colocation rules
(`check_positive_intransitivity`) and the check for inconsistency
between positive and negative colocation rules
(`check_inner_consistency`) do depend on each other somewhat, so order
for these rules stays important here as well as that modifications to
$rules are written correctly before the next check handler is called.
I've written about an example why this is necessary in a comment below
`check_positive_intransitivity` and will document this more clearly in
the v1.
The only semi-blocker here is that check_consistency_with_groups(...)
also needs access to $groups and $services, but for the time being we
could just pass those two to every subroutine and ignore it, where it
isn't needed.
Another approach could be to write any service group membership into
$rules internally already and just work with the data from there, so
that transitioning to replacing "HA Groups" with "HA Location Rules"
could go more smoothly in a future major version, if we want to do this
in the end. Or we'll already allow creating location rules explicitly,
which are synchronized with HA groups.
>
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> debian/pve-ha-manager.install | 1 +
>> src/PVE/HA/Makefile | 1 +
>> src/PVE/HA/Rules/Colocation.pm | 391 +++++++++++++++++++++++++++++++++
>> src/PVE/HA/Rules/Makefile | 6 +
>> src/PVE/HA/Tools.pm | 6 +
>> 5 files changed, 405 insertions(+)
>> create mode 100644 src/PVE/HA/Rules/Colocation.pm
>> create mode 100644 src/PVE/HA/Rules/Makefile
>>
>> diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
>> index 9bbd375..89f9144 100644
>> --- a/debian/pve-ha-manager.install
>> +++ b/debian/pve-ha-manager.install
>> @@ -33,6 +33,7 @@
>> /usr/share/perl5/PVE/HA/Resources/PVECT.pm
>> /usr/share/perl5/PVE/HA/Resources/PVEVM.pm
>> /usr/share/perl5/PVE/HA/Rules.pm
>> +/usr/share/perl5/PVE/HA/Rules/Colocation.pm
>> /usr/share/perl5/PVE/HA/Tools.pm
>> /usr/share/perl5/PVE/HA/Usage.pm
>> /usr/share/perl5/PVE/HA/Usage/Basic.pm
>> diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
>> index 489cbc0..e386cbf 100644
>> --- a/src/PVE/HA/Makefile
>> +++ b/src/PVE/HA/Makefile
>> @@ -8,6 +8,7 @@ install:
>> install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
>> for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
>> make -C Resources install
>> + make -C Rules install
>> make -C Usage install
>> make -C Env install
>>
>> diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
>> new file mode 100644
>> index 0000000..808d48e
>> --- /dev/null
>> +++ b/src/PVE/HA/Rules/Colocation.pm
>> @@ -0,0 +1,391 @@
>> +package PVE::HA::Rules::Colocation;
>> +
>> +use strict;
>> +use warnings;
>> +
>> +use Data::Dumper;
>
> leftover dumper ;)
>
>> +
>> +use PVE::JSONSchema qw(get_standard_option);
>> +use PVE::HA::Tools;
>> +
>> +use base qw(PVE::HA::Rules);
>> +
>> +sub type {
>> + return 'colocation';
>> +}
>> +
>> +sub properties {
>> + return {
>> + services => get_standard_option('pve-ha-resource-id-list'),
>> + affinity => {
>> + description => "Describes whether the services are supposed to be kept on separate"
>> + . " nodes, or are supposed to be kept together on the same node.",
>> + type => 'string',
>> + enum => ['separate', 'together'],
>> + optional => 0,
>> + },
>> + strict => {
>> + description => "Describes whether the colocation rule is mandatory or optional.",
>> + type => 'boolean',
>> + optional => 0,
>> + },
>> + }
>> +}
>> +
>> +sub options {
>> + return {
>> + services => { optional => 0 },
>> + strict => { optional => 0 },
>> + affinity => { optional => 0 },
>> + comment => { optional => 1 },
>> + };
>> +};
>> +
>> +sub decode_value {
>> + my ($class, $type, $key, $value) = @_;
>> +
>> + if ($key eq 'services') {
>> + my $res = {};
>> +
>> + for my $service (PVE::Tools::split_list($value)) {
>> + if (PVE::HA::Tools::pve_verify_ha_resource_id($service)) {
>> + $res->{$service} = 1;
>> + }
>> + }
>> +
>> + return $res;
>> + }
>> +
>> + return $value;
>> +}
>> +
>> +sub encode_value {
>> + my ($class, $type, $key, $value) = @_;
>> +
>> + if ($key eq 'services') {
>> + PVE::HA::Tools::pve_verify_ha_resource_id($_) for (keys %$value);
>> +
>> + return join(',', keys %$value);
>> + }
>> +
>> + return $value;
>> +}
>> +
>> +sub foreach_colocation_rule {
>> + my ($rules, $func, $opts) = @_;
>> +
>> + my $my_opts = { map { $_ => $opts->{$_} } keys %$opts };
>
> why? if the caller doesn't want $opts to be modified, they could just
> pass in a copy (or you could require it to be passed by value instead of
> by reference?).
>
> there's only a single caller that does (introduced by a later patch) and
> that one constructs the hash reference right at the call site, so unless
> I am missing something this seems a bit overkill..
Right, I didn't think about this clearly enough and this could very well
be just a direct write to the passed hash here.
Will change that in the v1!
>
>> + $my_opts->{type} = 'colocation';
>> +
>> + PVE::HA::Rules::foreach_service_rule($rules, $func, $my_opts);
>> +}
>> +
>> +sub split_colocation_rules {
>> + my ($rules) = @_;
>> +
>> + my $positive_ruleids = [];
>> + my $negative_ruleids = [];
>> +
>> + foreach_colocation_rule($rules, sub {
>> + my ($rule, $ruleid) = @_;
>> +
>> + my $ruleid_set = $rule->{affinity} eq 'together' ? $positive_ruleids : $negative_ruleids;
>> + push @$ruleid_set, $ruleid;
>> + });
>> +
>> + return ($positive_ruleids, $negative_ruleids);
>> +}
>> +
>> +=head3 check_service_count($rules)
>> +
>> +Returns a list of conflicts caused by colocation rules, which do not have
>> +enough services in them, defined in C<$rules>.
>> +
>> +If there are no conflicts, the returned list is empty.
>> +
>> +=cut
>> +
>> +sub check_services_count {
>> + my ($rules) = @_;
>> +
>> + my $conflicts = [];
>> +
>> + foreach_colocation_rule($rules, sub {
>> + my ($rule, $ruleid) = @_;
>> +
>> + push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}}) < 2);
>> + });
>> +
>> + return $conflicts;
>> +}
>
> is this really an issue? a colocation rule with a single service is just
> a nop? there's currently no cleanup AFAICT if a resource is removed, but
You're right, AFAICS those are a noop when selecting the service node. I
guess I was a little pedantic / overprotective here about which rules
make sense in general instead of what the algorithm does in the end.
And good point about handling when resources are removed, adding that to
delete_service_from_config comes right on my TODO list for the v1!
> if we add that part (we maybe should?) then one can easily end up in a
> situation where a rule temporarily contains a single or no service?
Hm, yes, especially if we add pools/tags at a later point to select
services for the rule, then this could happen very easily. But as you
already mentioned, those two cases would be noops too.
Nevertheless, should we drop this? I think it could benefit users in
identifying that some rules might not do something they wanted and give
them a reason why, i.e. there's only one service in there, but at the
same time it could be a little noisy if there are a lot of affected rules.
>
>> +
>> +=head3 check_positive_intransitivity($rules)
>> +
>> +Returns a list of conflicts caused by transitive positive colocation rules
>> +defined in C<$rules>.
>> +
>> +Transitive positive colocation rules exist, if there are at least two positive
>> +colocation rules with the same strictness, which put at least the same two
>> +services in relation. This means, that these rules can be merged together.
>> +
>> +If there are no conflicts, the returned list is empty.
>
> The terminology here is quit confusing - conflict meaning that two rules
> are "transitive" and thus mergeable (which is good, cause it makes
> things easier to handle?) is quite weird, as "conflict" is a rather
> negative term..
>
> there's only a single call site in the same module, maybe we could just
> rename this into "find_mergeable_positive_ruleids", similar to the
> variable where the result is stored?
Yeah, I was probably to keen on the `$conflict = check_something(...)`
pattern here, but it would be much more readable with a simpler name,
I'll change that for the v1!
-----
Ad why: I'll also add some documentation about the rationale why this is
needed in the first place.
The main reason was because the latter rule check
'check_inner_consistency' depends on the fact that the positive
colocation rules have been merged already, as it assumes that each
positive colocation rule has all of the services in there, which are
positively colocated. If it weren't so, it wouldn't detect that the
following three rules are inconsistent with each other:
colocation: stick-together1
services vm:101,vm:104
affinity together
strict 1
colocation: stick-together2
services vm:104,vm:102
affinity together
strict 1
colocation: keep-apart
services vm:101,vm:102,vm:103
affinity separate
strict 1
This reduces the complexity of the logic a little in
'check_inner_consistency' as there it doesn't have to handle this
special case as 'stick-together1' and 'stick-together2' are already
merged in to one and it is easily apparent that vm 101 and vm 102 cannot
be colocated and non-colocated at the same time.
-----
Also, I was curious about how that would work out for the case where a
negative colocation rule was defined for three nodes with those rules
split into three rules (essentially a cycle dependence). This should in
theory have the same semantics as the above rule set:
colocation: stick-together1
services vm:101,vm:104
affinity together
strict 1
colocation: stick-together2
services vm:104,vm:102
affinity together
strict 1
colocation: very-lonely-services1
services vm:101,vm:102
affinity separate
strict 1
colocation: very-lonely-services2
services vm:102,vm:103
affinity separate
strict 1
colocation: very-lonely-services3
services vm:101,vm:103
affinity separate
strict 1
Without the merge of positive rules, 'check_inner_consistency' would
again not detect the inconsistency here. But with the merge correctly
applied before checking the consistency, this would be resolved and the
effective rule set would be:
colocation: very-lonely-services2
services vm:102,vm:103
affinity separate
strict 1
colocation: very-lonely-services3
services vm:101,vm:103
affinity separate
strict 1
It could be argued, that the negative colocation rules should be merged
in a similar manner here, as there's now a "effective" difference in the
semantics of the above rule sets, as the negative colocation rule
between vm 101 and vm 103 and vm 102 and vm 103 remains.
What do you think?
>
>> +
>> +=cut
>> +
>> +sub check_positive_intransitivity {
>> + my ($rules) = @_;
>> +
>> + my $conflicts = {};
>> + my ($positive_ruleids) = split_colocation_rules($rules);
>> +
>> + while (my $outerid = shift(@$positive_ruleids)) {
>> + my $outer = $rules->{ids}->{$outerid};
>> +
>> + for my $innerid (@$positive_ruleids) {
>
> so this is in practice a sort of "optimized" loop over all pairs of
> rules - iterating over the positive rules twice, but skipping pairs that
> were already visited by virtue of the shift on the outer loop..
>
> might be worth a short note, together with the $inner and $outer
> terminology I was a bit confused at first..
Sorry, I'll make that clearer in a comment above or with better naming
of the variables!
The `while(shift ...)` was motivated by not having to prune duplicates
afterwards and of course not having to check the same rules again, but
lacks a little in readability here.
>
>> + my $inner = $rules->{ids}->{$innerid};
>> +
>> + next if $outerid eq $innerid;
>> + next if $outer->{strict} != $inner->{strict};
>> + next if PVE::HA::Tools::is_disjoint($outer->{services}, $inner->{services});
>> +
>> + push @{$conflicts->{$outerid}}, $innerid;
>> + }
>> + }
>> +
>> + return $conflicts;
>> +}
>> +
>> +=head3 check_inner_consistency($rules)
>> +
>> +Returns a list of conflicts caused by inconsistencies between positive and
>> +negative colocation rules defined in C<$rules>.
>> +
>> +Inner inconsistent colocation rules exist, if there are at least the same two
>> +services in a positive and a negative colocation relation, which is an
>> +impossible constraint as they are opposites of each other.
>> +
>> +If there are no conflicts, the returned list is empty.
>
> here the conflicts and check terminology makes sense - we are checking
> an invariant that must be satisfied after all :)
>
>> +
>> +=cut
>> +
>> +sub check_inner_consistency {
>
> but 'inner' is a weird term since this is consistency between rules?
>
> it basically checks that no pair of services should both be colocated
> and not be colocated at the same time, but not sure how to encode that
> concisely..
Hm right, 'intra' wouldn't make this any simpler. I'll come up with a
better name for the next revision!
>
>> + my ($rules) = @_;
>> +
>> + my $conflicts = [];
>> + my ($positive_ruleids, $negative_ruleids) = split_colocation_rules($rules);
>> +
>> + for my $outerid (@$positive_ruleids) {
>> + my $outer = $rules->{ids}->{$outerid}->{services};
>
> s/outer/positive ?
ACK for this and all the following instances ;)
>
>> +
>> + for my $innerid (@$negative_ruleids) {
>> + my $inner = $rules->{ids}->{$innerid}->{services};
>
> s/inner/negative ?
>
>> +
>> + my $intersection = PVE::HA::Tools::intersect($outer, $inner);
>> + next if scalar(keys %$intersection < 2);
>
> the keys there is not needed, but the parentheses are in the wrong place
> instead ;) it does work by accident though, because the result of keys
> will be coerced to a scalar anyway, so you get the result of your
> comparison wrapped by another call to scalar, so you end up with either
> 1 or '' depending on whether the check was true or false..
Oh, what a luck coincidence ;)! Thanks for catching that, I'll fix that!
>
>> +
>> + push @$conflicts, [$outerid, $innerid];
>> + }
>> + }
>> +
>> + return $conflicts;
>> +}
>> +
>> +=head3 check_positive_group_consistency(...)
>> +
>> +Returns a list of conflicts caused by inconsistencies between positive
>> +colocation rules defined in C<$rules> and node restrictions defined in
>> +C<$groups> and C<$service>.
>
> services?
ACK.
>
>> +
>> +A positive colocation rule inconsistency with groups exists, if at least two
>> +services in a positive colocation rule are restricted to disjoint sets of
>> +nodes, i.e. they are in restricted HA groups, which have a disjoint set of
>> +nodes.
>> +
>> +If there are no conflicts, the returned list is empty.
>> +
>> +=cut
>> +
>> +sub check_positive_group_consistency {
>> + my ($rules, $groups, $services, $positive_ruleids, $conflicts) = @_;
>
> this could just get $positive_rules (filtered via grep) instead?
>
>> +
>> + for my $ruleid (@$positive_ruleids) {
>> + my $rule_services = $rules->{ids}->{$ruleid}->{services};
>
> and this could be
>
> while (my ($ruleid, $rule) = each %$positive_rules) {
> my $nodes;
> ..
> }
Thanks for the suggestion here and above, will use that for the v1!
>
>> + my $nodes;
>> +
>> + for my $sid (keys %$rule_services) {
>> + my $groupid = $services->{$sid}->{group};
>> + return if !$groupid;
>
> should this really be a return?
Oops, no that shouldn't be a return, but a next obviously. Forgot to
change them back after I moved them back from a handler (since there's
minimal duplicated code with the next subroutine). I'll change this and
all the other instances for the v1.
>
>> +
>> + my $group = $groups->{ids}->{$groupid};
>> + return if !$group;
>> + return if !$group->{restricted};
>
> same here?
>
>> +
>> + $nodes = { map { $_ => 1 } keys %{$group->{nodes}} } if !defined($nodes);
>
> isn't $group->{nodes} already a hash set of the desired format? so this could be
>
> $nodes = { $group->{nodes}->%* };
>
> ?
Right, yes it is!
I was still somewhat confused about what the ->%* operation exactly did
or did not really know that it existed before, but I now I finally
looked up about postfix dereferencing ;).
>
>> + $nodes = PVE::HA::Tools::intersect($nodes, $group->{nodes});
>
> could add a break here with the same condition as below?
Right for this and the same comment for
`check_negative_group_consistency`, I'll definitely also add a comment
above that to make it clear why. Thanks for the suggestion!
>
>> + }
>> +
>> + if (defined($nodes) && scalar keys %$nodes < 1) {
>> + push @$conflicts, ['positive', $ruleid];
>> + }
>> + }
>> +}
>> +
>> +=head3 check_negative_group_consistency(...)
>> +
>> +Returns a list of conflicts caused by inconsistencies between negative
>> +colocation rules defined in C<$rules> and node restrictions defined in
>> +C<$groups> and C<$service>.
>> +
>> +A negative colocation rule inconsistency with groups exists, if at least two
>> +services in a negative colocation rule are restricted to less nodes in total
>> +than services in the rule, i.e. they are in restricted HA groups, where the
>> +union of all restricted node sets have less elements than restricted services.
>> +
>> +If there are no conflicts, the returned list is empty.
>> +
>> +=cut
>> +
>> +sub check_negative_group_consistency {
>> + my ($rules, $groups, $services, $negative_ruleids, $conflicts) = @_;
>
> same question here
>
>> +
>> + for my $ruleid (@$negative_ruleids) {
>> + my $rule_services = $rules->{ids}->{$ruleid}->{services};
>> + my $restricted_services = 0;
>> + my $restricted_nodes;
>> +
>> + for my $sid (keys %$rule_services) {
>> + my $groupid = $services->{$sid}->{group};
>> + return if !$groupid;
>
> same question as above ;)
>
>> +
>> + my $group = $groups->{ids}->{$groupid};
>> + return if !$group;
>> + return if !$group->{restricted};
>
> same here
>
>> +
>> + $restricted_services++;
>> +
>> + $restricted_nodes = {} if !defined($restricted_nodes);
>> + $restricted_nodes = PVE::HA::Tools::union($restricted_nodes, $group->{nodes});
>
> here as well - if restricted_services > restricted_nodes, haven't we
> already found a violation of the invariant and should break even if
> another service would then be added in the next iteration that can run
> on 5 move new nodes..
Thanks for catching this, will do that as already said in the above comment!
>
>> + }
>> +
>> + if (defined($restricted_nodes)
>> + && scalar keys %$restricted_nodes < $restricted_services) {
>> + push @$conflicts, ['negative', $ruleid];
>> + }
>> + }
>> +}
>> +
>> +sub check_consistency_with_groups {
>> + my ($rules, $groups, $services) = @_;
>> +
>> + my $conflicts = [];
>> + my ($positive_ruleids, $negative_ruleids) = split_colocation_rules($rules);
>> +
>> + check_positive_group_consistency($rules, $groups, $services, $positive_ruleids, $conflicts);
>> + check_negative_group_consistency($rules, $groups, $services, $negative_ruleids, $conflicts);
>> +
>> + return $conflicts;
>> +}
>> +
>> +sub canonicalize {
>> + my ($class, $rules, $groups, $services) = @_;
>
> should this note that it will modify $rules in-place? this is only
> called by PVE::HA::Rules::checked_config which also does not note that
> and could be interpreted as "config is checked now" ;)
Yes, it should really be pointed out by checked_config, but it doesn't
hurt at all to document it for both. checked_config could also have a
better name.
>
>> +
>> + my $illdefined_ruleids = check_services_count($rules);
>> +
>> + for my $ruleid (@$illdefined_ruleids) {
>> + print "Drop colocation rule '$ruleid', because it does not have enough services defined.\n";
>> +
>> + delete $rules->{ids}->{$ruleid};
>> + }
>> +
>> + my $mergeable_positive_ruleids = check_positive_intransitivity($rules);
>> +
>> + for my $outerid (sort keys %$mergeable_positive_ruleids) {
>> + my $outer = $rules->{ids}->{$outerid};
>> + my $innerids = $mergeable_positive_ruleids->{$outerid};
>> +
>> + for my $innerid (@$innerids) {
>> + my $inner = $rules->{ids}->{$innerid};
>> +
>> + $outer->{services}->{$_} = 1 for (keys %{$inner->{services}});
>> +
>> + print "Merge services of positive colocation rule '$innerid' into positive colocation"
>> + . " rule '$outerid', because they share at least one service.\n";
>
> this is a bit confusing because it modifies the rule while continuing to
> refer to it using the old name afterwards.. should we merge them and
> give them a new name?
Good call, I would go for just appending the name, but depending on how
many rules that are affected this could get rather long... We could also
just do some temporary name at each new merge, but this could harder to
follow if there are more than two merge actions.
I think I'll prefer just appending it since it seems that Perl can
handle hash keys of 2**31 "fine" anyway :P and hope for the better that
there won't be too many affected rules for users so that the key doesn't
grow that long. Or what do you think?
>
>> +
>> + delete $rules->{ids}->{$innerid};
>> + }
>> + }
>> +
>> + my $inner_conflicts = check_inner_consistency($rules);
>> +
>> + for my $conflict (@$inner_conflicts) {
>> + my ($positiveid, $negativeid) = @$conflict;
>> +
>> + print "Drop positive colocation rule '$positiveid' and negative colocation rule"
>> + . " '$negativeid', because they share two or more services.\n";
>> +
>> + delete $rules->{ids}->{$positiveid};
>> + delete $rules->{ids}->{$negativeid};
>> + }
>> +
>> + my $group_conflicts = check_consistency_with_groups($rules, $groups, $services);
>> +
>> + for my $conflict (@$group_conflicts) {
>> + my ($type, $ruleid) = @$conflict;
>> +
>> + if ($type eq 'positive') {
>> + print "Drop positive colocation rule '$ruleid', because two or more services are"
>> + . " restricted to different nodes.\n";
>> + } elsif ($type eq 'negative') {
>> + print "Drop negative colocation rule '$ruleid', because two or more services are"
>> + . " restricted to less nodes than services.\n";
>> + } else {
>> + die "Invalid group conflict type $type\n";
>> + }
>> +
>> + delete $rules->{ids}->{$ruleid};
>> + }
>> +}
>> +
>> +# TODO This will be used to verify modifications to the rules config over the API
>> +sub are_satisfiable {
>
> this is basically canonicalize, but
> - without deleting rules
> - without the transitivity check
> - with slightly adapted messages
>
> should they be combined so that we have roughly the same logic when
> doing changes via the API and when loading the rules for operations?
That would be much better, yes, as it's easy to miss adding it to both
and could become cumbersome if there are more checks needed in the future.
If there's nothing speaking against that, I would go for the structure I
have mentioned in the first inline comment to improve this, so that the
check routine and handlers for canonicalize() and are_satisfiable() are
closer together.
>
>> + my ($class, $rules, $groups, $services) = @_;
>> +
>> + my $illdefined_ruleids = check_services_count($rules);
>> +
>> + for my $ruleid (@$illdefined_ruleids) {
>> + print "Colocation rule '$ruleid' does not have enough services defined.\n";
>> + }
>> +
>> + my $inner_conflicts = check_inner_consistency($rules);
>> +
>> + for my $conflict (@$inner_conflicts) {
>> + my ($positiveid, $negativeid) = @$conflict;
>> +
>> + print "Positive colocation rule '$positiveid' is inconsistent with negative colocation rule"
>> + . " '$negativeid', because they share two or more services between them.\n";
>> + }
>> +
>> + my $group_conflicts = check_consistency_with_groups($rules, $groups, $services);
>> +
>> + for my $conflict (@$group_conflicts) {
>> + my ($type, $ruleid) = @$conflict;
>> +
>> + if ($type eq 'positive') {
>> + print "Positive colocation rule '$ruleid' is unapplicable, because two or more services"
>> + . " are restricted to different nodes.\n";
>> + } elsif ($type eq 'negative') {
>> + print "Negative colocation rule '$ruleid' is unapplicable, because two or more services"
>> + . " are restricted to less nodes than services.\n";
>> + } else {
>> + die "Invalid group conflict type $type\n";
>> + }
>> + }
>> +
>> + if (scalar(@$inner_conflicts) || scalar(@$group_conflicts)) {
>> + return 0;
>> + }
>> +
>> + return 1;
>> +}
>> +
>> +1;
>> diff --git a/src/PVE/HA/Rules/Makefile b/src/PVE/HA/Rules/Makefile
>> new file mode 100644
>> index 0000000..8cb91ac
>> --- /dev/null
>> +++ b/src/PVE/HA/Rules/Makefile
>> @@ -0,0 +1,6 @@
>> +SOURCES=Colocation.pm
>> +
>> +.PHONY: install
>> +install:
>> + install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Rules
>> + for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Rules/$$i; done
>> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
>> index 35107c9..52251d7 100644
>> --- a/src/PVE/HA/Tools.pm
>> +++ b/src/PVE/HA/Tools.pm
>> @@ -46,6 +46,12 @@ PVE::JSONSchema::register_standard_option('pve-ha-resource-id', {
>> type => 'string', format => 'pve-ha-resource-id',
>> });
>>
>> +PVE::JSONSchema::register_standard_option('pve-ha-resource-id-list', {
>> + description => "List of HA resource IDs.",
>> + typetext => "<type>:<name>{,<type>:<name>}*",
>> + type => 'string', format => 'pve-ha-resource-id-list',
>> +});
>> +
>> PVE::JSONSchema::register_format('pve-ha-resource-or-vm-id', \&pve_verify_ha_resource_or_vm_id);
>> sub pve_verify_ha_resource_or_vm_id {
>> my ($sid, $noerr) = @_;
>> --
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
>>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines
2025-04-03 12:16 ` Fabian Grünbichler
@ 2025-04-11 11:24 ` Daniel Kral
0 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-11 11:24 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox VE development discussion
Thanks here for the feedback from both of you.
I agree with all the comments and will make the helpers more reusable so
that they can be moved to a new data structure/hash module in PVE::Tools.
On 4/3/25 14:16, Fabian Grünbichler wrote:
> On March 25, 2025 6:53 pm, Thomas Lamprecht wrote:
>> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>>> Implement helper subroutines, which implement basic set operations done
>>> on hash sets, i.e. hashes with elements set to a true value, e.g. 1.
>>>
>>> These will be used for various tasks in the HA Manager colocation rules,
>>> e.g. for verifying the satisfiability of the rules or applying the
>>> colocation rules on the allowed set of nodes.
>>>
>>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>>> ---
>>> If they're useful somewhere else, I can move them to PVE::Tools
>>> post-RFC, but it'd be probably useful to prefix them with `hash_` there.
>>
>> meh, not a big fan of growing the overly generic PVE::Tools more, if, this
>> should go into a dedicated module for hash/data structure helpers ...
>>
>>> AFAICS there weren't any other helpers for this with a quick grep over
>>> all projects and `PVE::Tools::array_intersect()` wasn't what I needed.
>>
>> ... which those existing one should then also move into, but out of scope
>> of this series.
>>
>>>
>>> src/PVE/HA/Tools.pm | 42 ++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 42 insertions(+)
>>>
>>> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
>>> index 0f9e9a5..fc3282c 100644
>>> --- a/src/PVE/HA/Tools.pm
>>> +++ b/src/PVE/HA/Tools.pm
>>> @@ -115,6 +115,48 @@ sub write_json_to_file {
>>> PVE::Tools::file_set_contents($filename, $raw);
>>> }
>>>
>>> +sub is_disjoint {
>>
>> IMO a bit too generic name for being in a Tools named module, maybe
>> prefix them all with hash_ or hashes_ ?
Yes, good call, I think I'll go for what Fabian mentioned below to
prefix them with hash_set_ / set_ or something similar.
And as we're working towards making those helpers more accessible for
other use cases, I'll also move them to a separate PVE::Tools::* module
as suggested above :)
>
> is_disjoint also only really makes sense as a name if you see it as an
> operation *on* $hash1, rather than an operation involving both hashes..
>
> i.e., in Rust
>
> set1.is_disjoint(&set2);
>
> makes sense..
>
> in Perl
>
> is_disjoint($set1, $set2)
>
> reads weird, and should maybe be
>
> check_disjoint($set1, $set2)
>
> or something like that?
Yes makes sense, I was going for `are_disjoint`, but both are fine for me.
>
>>
>>> + my ($hash1, $hash2) = @_;
>>> +
>>> + for my $key (keys %$hash1) {
>>> + return 0 if exists($hash2->{$key});
>>> + }
>>> +
>>> + return 1;
>>> +};
>>> +
>>> +sub intersect {
>>> + my ($hash1, $hash2) = @_;
>>> +
>>> + my $result = { map { $_ => $hash2->{$_} } keys %$hash1 };
>
> this is a bit dangerous if $hash2->{$key} is itself a reference? if I
> later modify $result I'll modify $hash2.. I know the commit message says
> that the hashes are all just of the form key => 1, but nothing here
> tells me that a year later when I am looking for a generic hash
> intersection helper ;) I think this should also be clearly mentioned in
> the module, and ideally, also in the helper names (i.e., have "set"
> there everywhere and a comment above each that it only works for
> hashes-as-sets and not generic hashes).
>
> wouldn't it be faster/simpler to iterate over either hash once?
>
> my $result = {};
> for my $key (keys %$hash1) {
> $result->{$key} = 1 if $hash1->{$key} && $hash2->{$key};
> }
> return $result;
I haven't thought too much about what { map {} } would cost here for the
RFC, but the above is both easier to read and also safer, so I'll adapt
the subroutine to the above, thanks :).
>
>
>>> +
>>> + for my $key (keys %$result) {
>>> + delete $result->{$key} if !defined($result->{$key});
>>> + }
>>> +
>>> + return $result;
>>> +};
>>> +
>>> +sub set_difference {
>>> + my ($hash1, $hash2) = @_;
>>> +
>>> + my $result = { map { $_ => 1 } keys %$hash1 };
>
> if $hash1 is only of the form key => 1, then this is just
>
> my $result = { %$hash1 };
But $result would then be a copy instead of a reference to %$hash1 here
right? But only if there's no other references in there?
>
>>> +
>>> + for my $key (keys %$result) {
>>> + delete $result->{$key} if defined($hash2->{$key});
>>> + }
>>> +
>
> but the whole thing can be
>
> return { map { $hash2->{$_} ? ($_ => 1) : () } keys %$hash1 };
>
> this transforms hash1 into its keys, and then returns either ($key => 1)
> if the key is true in $hash2, or the empty tuple if not. the outer {}
> then turn this sequence of tuples into a hash again, which skips empty
> tuples ;) can of course also be adapted to use the value from either
> hash, check for definedness instead of truthiness, ..
I'll have to check out more of the perldoc of the more common functions,
didn't know that map will skip empty lists here, thanks :)
>
>>> + return $result;
>>> +};
>>> +
>>> +sub union {
>>> + my ($hash1, $hash2) = @_;
>>> +
>>> + my $result = { map { $_ => 1 } keys %$hash1, keys %$hash2 };
>>> +
>>> + return $result;
>>> +};
>>> +
>>> sub count_fenced_services {
>>> my ($ss, $node) = @_;
>>>
>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
>>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-03 12:17 ` Fabian Grünbichler
@ 2025-04-11 15:56 ` Daniel Kral
2025-04-28 12:46 ` Fiona Ebner
0 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-11 15:56 UTC (permalink / raw)
To: Proxmox VE development discussion, Fabian Grünbichler
Thanks for the taking the time here too!
I'm unsure if the documentation wasn't clear enough or I'm just blinded
here in some details how the division between strict/non-strict should
be, but I hope I could clarify some points about my understanding here.
Please correct me here in any case there are scenarios where the current
implementation will break user expectations, that's definitely not
something that I want ;).
I'll definitely take some time to improve the control flow and names of
variables/subroutines here to make it easier to understand and add
examples how the content of $together and $separate look like at
different stages.
The algorithm is online and is quite dependent on many other things like
that $allowed_nodes has already nodes removed that were already tried
and failed on, etc., so it's pretty dynamic here.
On 4/3/25 14:17, Fabian Grünbichler wrote:
> On March 25, 2025 4:12 pm, Daniel Kral wrote:
>> Add a mechanism to the node selection subroutine, which enforces the
>> colocation rules defined in the rules config.
>>
>> The algorithm manipulates the set of nodes directly, which the service
>> is allowed to run on, depending on the type and strictness of the
>> colocation rules, if there are any.
>
> shouldn't this first attempt to satisfy all rules, and if that fails,
> retry with just the strict ones, or something similar? see comments
> below (maybe I am missing/misunderstanding something)
Hm, I'm not sure if I can follow what you mean here.
I tried to come up with some scenarios, where there could be conflicts
because of "loose" colocation rules being overshadowed by strict
colocation rules, but I'm currently not seeing that. But I've also been
mostly concerned with smaller clusters (3 to 5 nodes) for now, so I'll
take a closer look for larger applications/environments.
In general, when applying colocation rules, the logic is less concerned
about which rules specifically get applied, but to make sure that none
is violated in general.
This is also why a single service colocation rules turns out to be a
noop, since it will never depend on the location of another service (the
rule will never add something to $together/$separate, since there's only
an entry there if other services already have a node pinned to them).
I hope the comments below clarify this a little bit or make it clearer
where I'm missing something, so that the code/behavior/documentation can
be improved ;).
>
>>
>> This makes it depend on the prior removal of any nodes, which are
>> unavailable (i.e. offline, unreachable, or weren't able to start the
>> service in previous tries) or are not allowed to be run on otherwise
>> (i.e. HA group node restrictions) to function correctly.
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> src/PVE/HA/Manager.pm | 203 ++++++++++++++++++++++++++++++++++++-
>> src/test/test_failover1.pl | 4 +-
>> 2 files changed, 205 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
>> index 8f2ab3d..79b6555 100644
>> --- a/src/PVE/HA/Manager.pm
>> +++ b/src/PVE/HA/Manager.pm
>> @@ -157,8 +157,201 @@ sub get_node_priority_groups {
>> return ($pri_groups, $group_members);
>> }
>>
>> +=head3 get_colocated_services($rules, $sid, $online_node_usage)
>> +
>> +Returns a hash map of all services, which are specified as being in a positive
>> +or negative colocation in C<$rules> with the given service with id C<$sid>.
>> +
>> +Each service entry consists of the type of colocation, strictness of colocation
>> +and the node the service is currently assigned to, if any, according to
>> +C<$online_node_usage>.
>> +
>> +For example, a service C<'vm:101'> being strictly colocated together (positive)
>> +with two other services C<'vm:102'> and C<'vm:103'> and loosely colocated
>> +separate with another service C<'vm:104'> results in the hash map:
>> +
>> + {
>> + 'vm:102' => {
>> + affinity => 'together',
>> + strict => 1,
>> + node => 'node2'
>> + },
>> + 'vm:103' => {
>> + affinity => 'together',
>> + strict => 1,
>> + node => 'node2'
>> + },
>> + 'vm:104' => {
>> + affinity => 'separate',
>> + strict => 0,
>> + node => undef
>> + }
>> + }
>> +
>> +=cut
>> +
>> +sub get_colocated_services {
>> + my ($rules, $sid, $online_node_usage) = @_;
>> +
>> + my $services = {};
>> +
>> + PVE::HA::Rules::Colocation::foreach_colocation_rule($rules, sub {
>> + my ($rule) = @_;
>> +
>> + for my $csid (sort keys %{$rule->{services}}) {
>> + next if $csid eq $sid;
>> +
>> + $services->{$csid} = {
>> + node => $online_node_usage->get_service_node($csid),
>> + affinity => $rule->{affinity},
>> + strict => $rule->{strict},
>> + };
>> + }
>> + }, {
>> + sid => $sid,
>> + });
>> +
>> + return $services;
>> +}
>> +
>> +=head3 get_colocation_preference($rules, $sid, $online_node_usage)
>> +
>> +Returns a list of two hashes, where each is a hash map of the colocation
>> +preference of C<$sid>, according to the colocation rules in C<$rules> and the
>> +service locations in C<$online_node_usage>.
>> +
>> +The first hash is the positive colocation preference, where each element
>> +represents properties for how much C<$sid> prefers to be on the node.
>> +Currently, this is a binary C<$strict> field, which means either it should be
>> +there (C<0>) or must be there (C<1>).
>> +
>> +The second hash is the negative colocation preference, where each element
>> +represents properties for how much C<$sid> prefers not to be on the node.
>> +Currently, this is a binary C<$strict> field, which means either it should not
>> +be there (C<0>) or must not be there (C<1>).
>> +
>> +=cut
>> +
>> +sub get_colocation_preference {
>> + my ($rules, $sid, $online_node_usage) = @_;
>> +
>> + my $services = get_colocated_services($rules, $sid, $online_node_usage);
>> +
>> + my $together = {};
>> + my $separate = {};
>> +
>> + for my $service (values %$services) {
>> + my $node = $service->{node};
>> +
>> + next if !$node;
>> +
>> + my $node_set = $service->{affinity} eq 'together' ? $together : $separate;
>> + $node_set->{$node}->{strict} = $node_set->{$node}->{strict} || $service->{strict};
>> + }
>> +
>> + return ($together, $separate);
>> +}
>> +
>> +=head3 apply_positive_colocation_rules($together, $allowed_nodes)
>> +
>> +Applies the positive colocation preference C<$together> on the allowed node
>> +hash set C<$allowed_nodes> directly.
>> +
>> +Positive colocation means keeping services together on a single node, and
>> +therefore minimizing the separation of services.
>> +
>> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
>> +which is available to the service, i.e. each node is currently online, is
>> +available according to other location constraints, and the service has not
>> +failed running there yet.
>> +
>> +=cut
>> +
>> +sub apply_positive_colocation_rules {
>> + my ($together, $allowed_nodes) = @_;
>> +
>> + return if scalar(keys %$together) < 1;
>> +
>> + my $mandatory_nodes = {};
>> + my $possible_nodes = PVE::HA::Tools::intersect($allowed_nodes, $together);
>> +
>> + for my $node (sort keys %$together) {
>> + $mandatory_nodes->{$node} = 1 if $together->{$node}->{strict};
>> + }
>> +
>> + if (scalar keys %$mandatory_nodes) {
>> + # limit to only the nodes the service must be on.
>> + for my $node (keys %$allowed_nodes) {
>> + next if exists($mandatory_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>> + } elsif (scalar keys %$possible_nodes) {
>
> I am not sure I follow this logic here.. if there are any strict
> requirements, we only honor those.. if there are no strict requirements,
> we only honor the non-strict ones?
Please correct me if I'm wrong, but at least for my understanding this
seems right, because the nodes in $together are the nodes, which other
co-located services are already running on.
If there is a co-located service already running somewhere and the
services MUST be kept together, then there will be an entry like 'node3'
=> { strict => 1 } in $together. AFAICS we can then ignore any
non-strict nodes here, because we already know where the service MUST run.
If there is a co-located service already running somewhere and the
services SHOULD be kept together, then there will be one or more
entries, e.g. $together = { 'node1' => { strict => 0 }, 'node2' => {
strict => 0 } };
If there is no co-located service already running somewhere, then
$together = {}; and this subroutine won't do anything to $allowed_nodes.
In theory, we could assume that %$mandatory_nodes has always only one
node, because it is mandatory. But currently, we do not hinder users
manually migrating against colocation rules (maybe we should?) or what
if rules suddenly change from non-strict to strict. We do not
auto-migrate if rules change (maybe we should?).
-----
On another note, intersect() here is used with $together (and
set_difference() with $separate below), which goes against what I said
in patch #5 to only use hash sets, but as it tries to get only the truth
value anyway, it was fine here. I'll make that more robust in a v1.
>
>> + # limit to the possible nodes the service should be on, if there are any.
>> + for my $node (keys %$allowed_nodes) {
>> + next if exists($possible_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>
> this is the same code twice, just operating on different hash
> references, so could probably be a lot shorter. the next and delete
> could also be combined (`delete .. if !...`).
Yes, I wanted to break it down more and will improve it, thanks for the
suggestion with the delete post-if!
I guess we can also move the definition + assignment of $possible_nodes
down here too, as it won't be needed for the $mandatory_nodes case,
depending if the general behavior won't be changed.
>
>> + }
>> +}
>> +
>> +=head3 apply_negative_colocation_rules($separate, $allowed_nodes)
>> +
>> +Applies the negative colocation preference C<$separate> on the allowed node
>> +hash set C<$allowed_nodes> directly.
>> +
>> +Negative colocation means keeping services separate on multiple nodes, and
>> +therefore maximizing the separation of services.
>> +
>> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
>> +which is available to the service, i.e. each node is currently online, is
>> +available according to other location constraints, and the service has not
>> +failed running there yet.
>> +
>> +=cut
>> +
>> +sub apply_negative_colocation_rules {
>> + my ($separate, $allowed_nodes) = @_;
>> +
>> + return if scalar(keys %$separate) < 1;
>> +
>> + my $mandatory_nodes = {};
>> + my $possible_nodes = PVE::HA::Tools::set_difference($allowed_nodes, $separate);
>
> this is confusing or I misunderstand something here, see below..
>
>> +
>> + for my $node (sort keys %$separate) {
>> + $mandatory_nodes->{$node} = 1 if $separate->{$node}->{strict};
>> + }
>> +
>> + if (scalar keys %$mandatory_nodes) {
>> + # limit to the nodes the service must not be on.
>
> this is missing a not?
> we are limiting to the nodes the service must not not be on :-P
>
> should we rename mandatory_nodes to forbidden_nodes?
Good idea, yes this would be a much better fitting name. When I wrote
$mandatory_nodes as above, I was always thinking 'mandatory to not be
there'...
>
>> + for my $node (keys %$allowed_nodes) {
>
> this could just loop over the forbidden nodes and delete them from
> allowed nodes?
Yes, this should also be possible. I think I had a counter example in an
earlier version, where this didn't work, but now it should make sense.
>
>> + next if !exists($mandatory_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>> + } elsif (scalar keys %$possible_nodes) {
>
> similar to above - if we have strict exclusions, we honor them, but we
> ignore the non-strict exclusions unless there are no strict ones?
Same principle above, but now $separate holds all nodes where the
anti-colocated services are already running on, so we're trying to not
select a node from there.
>
>> + # limit to the nodes the service should not be on, if any.
>> + for my $node (keys %$allowed_nodes) {
>> + next if exists($possible_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>> + }
>> +}
>> +
>> +sub apply_colocation_rules {
>> + my ($rules, $sid, $allowed_nodes, $online_node_usage) = @_;
>> +
>> + my ($together, $separate) = get_colocation_preference($rules, $sid, $online_node_usage);
>> +
>> + apply_positive_colocation_rules($together, $allowed_nodes);
>> + apply_negative_colocation_rules($separate, $allowed_nodes);
>> +}
>> +
>> sub select_service_node {
>> - my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
>> + # TODO Cleanup this signature post-RFC
>> + my ($rules, $groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
>>
>> my $group = get_service_group($groups, $online_node_usage, $service_conf);
>>
>> @@ -189,6 +382,8 @@ sub select_service_node {
>>
>> return $current_node if (!$try_next && !$best_scored) && $pri_nodes->{$current_node};
>>
>> + apply_colocation_rules($rules, $sid, $pri_nodes, $online_node_usage);
>> +
>> my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
>> my @nodes = sort {
>> $scores->{$a} <=> $scores->{$b} || $a cmp $b
>> @@ -758,6 +953,7 @@ sub next_state_request_start {
>>
>> if ($self->{crs}->{rebalance_on_request_start}) {
>> my $selected_node = select_service_node(
>> + $self->{rules},
>> $self->{groups},
>> $self->{online_node_usage},
>> $sid,
>> @@ -771,6 +967,9 @@ sub next_state_request_start {
>> my $select_text = $selected_node ne $current_node ? 'new' : 'current';
>> $haenv->log('info', "service $sid: re-balance selected $select_text node $selected_node for startup");
>>
>> + # TODO It would be better if this information would be retrieved from $ss/$sd post-RFC
>> + $self->{online_node_usage}->pin_service_node($sid, $selected_node);
>> +
>> if ($selected_node ne $current_node) {
>> $change_service_state->($self, $sid, 'request_start_balance', node => $current_node, target => $selected_node);
>> return;
>> @@ -898,6 +1097,7 @@ sub next_state_started {
>> }
>>
>> my $node = select_service_node(
>> + $self->{rules},
>> $self->{groups},
>> $self->{online_node_usage},
>> $sid,
>> @@ -1004,6 +1204,7 @@ sub next_state_recovery {
>> $self->recompute_online_node_usage(); # we want the most current node state
>>
>> my $recovery_node = select_service_node(
>> + $self->{rules},
>> $self->{groups},
>> $self->{online_node_usage},
>> $sid,
>> diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
>> index 308eab3..4c84fbd 100755
>> --- a/src/test/test_failover1.pl
>> +++ b/src/test/test_failover1.pl
>> @@ -8,6 +8,8 @@ use PVE::HA::Groups;
>> use PVE::HA::Manager;
>> use PVE::HA::Usage::Basic;
>>
>> +my $rules = {};
>> +
>> my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
>> group: prefer_node1
>> nodes node1
>> @@ -31,7 +33,7 @@ sub test {
>> my ($expected_node, $try_next) = @_;
>>
>> my $node = PVE::HA::Manager::select_service_node
>> - ($groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
>> + ($rules, $groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
>>
>> my (undef, undef, $line) = caller();
>> die "unexpected result: $node != ${expected_node} at line $line\n"
>> --
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
>>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
` (17 preceding siblings ...)
2025-04-01 1:50 ` DERUMIER, Alexandre
@ 2025-04-24 10:12 ` Fiona Ebner
2025-04-25 8:36 ` Daniel Kral
18 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-24 10:12 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> | Canonicalization
> ----------
>
> Additionally, colocation rules are currently simplified as follows:
>
> - If there are multiple positive colocation rules with common services
> and the same strictness, these are merged to a single positive
> colocation rule.
Do you intend to do that when writing the configuration file? I think
rules are better left unmerged from a user perspective. For example:
- services 1, 2 and 3 should strictly stay together, because of reason A
- services 1 and 3 should strictly stay together, because of different
reason B
Another scenario might be that the user is currently in the process of
editing some rules one-by-one and then it might also be surprising if
something is auto-merged.
You can of course always dynamically merge them when doing the
computation for the node selection.
In the same spirit, a comment field for each rule where the user can put
the reason might be nice to have.
Another question is if we should allow enabling/disabling rules.
Comment and enabling can of course always be added later. I'm just not
sure we should start out with the auto-merging of rules.
> | Inference rules
> ----------
>
> There are currently no inference rules implemented for the RFC, but
> there could be potential to further simplify some code paths in the
> future, e.g. a positive colocation rule where one service is part of a
> restricted HA group makes the other services in the positive colocation
> rule a part of this HA group as well.
If the rule is strict. If we do this I think it should only happen
dynamically for the node selection too.
> Comment about HA groups -> Location Rules
> -----------------------------------------
>
> This part is not really part of the patch series, but still worth for an
> on-list discussion.
>
> I'd like to suggest to also transform the existing HA groups to location
> rules, if the rule concept turns out to be a good fit for the colocation
> feature in the HA Manager, as HA groups seem to integrate quite easily
> into this concept.
>
> This would make service-node relationships a little more flexible for
> users and we'd be able to have both configurable / visible in the same
> WebUI view, API endpoint, and configuration file. Also, some code paths
> could be a little more consise, e.g. checking changes to constraints and
> canonicalizing the rules config.
>
> The how should be rather straightforward for the obvious use cases:
>
> - Services in unrestricted HA groups -> Location rules with the nodes of
> the HA group; We could either split each node priority group into
> separate location rules (with each having their score / weight) or
> keep the input format of HA groups with a list of
> `<node>(:<priority>)` in each rule
>
> - Services in restricted HA groups -> Same as above, but also using
> either `+inf` for a mandatory location rule or `strict` property
> depending on how we decide on the colocation rule properties
I'd prefer having a 'strict' property, as that is orthogonal to the
priorities and that aligns it with what you propose for the colocation
rules.
> This would allow most of the use cases of HA groups to be easily
> migratable to location rules. We could also keep the inference of the
> 'default group' for unrestricted HA groups (any node that is available
> is added as a group member with priority -1).
Nodes can change, so adding them explicitly will mean it can get
outdated. This should be implicit/done dynamically.
> The only thing that I'm unsure about this, is how we would migrate the
> `nofailback` option, since this operates on the group-level. If we keep
> the `<node>(:<priority>)` syntax and restrict that each service can only
> be part of one location rule, it'd be easy to have the same flag. If we
> go with multiple location rules per service and each having a score or
> weight (for the priority), then we wouldn't be able to have this flag
> anymore. I think we could keep the semantic if we move this flag to the
> service config, but I'm thankful for any comments on this.
My gut feeling is that going for a more direct mapping, i.e. each
location rule represents one HA group, is better. The nofailback flag
can still apply to a given location rule I think? For a given service,
if a higher-priority node is online for any location rule the service is
part of, with nofailback=0, it will get migrated to that higher-priority
node. It does make sense to have a given service be part of only one
location rule then though, since node priorities can conflict between rules.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-01 9:39 ` Daniel Kral
2025-04-01 11:05 ` DERUMIER, Alexandre via pve-devel
2025-04-03 12:26 ` Fabian Grünbichler
@ 2025-04-24 10:12 ` Fiona Ebner
2 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-24 10:12 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral, DERUMIER, Alexandre
Am 01.04.25 um 11:39 schrieb Daniel Kral:
> On 4/1/25 03:50, DERUMIER, Alexandre wrote:
>> my 2cents, but everybody in the industry is calling this
>> affinity/antiafifnity (vmware, nutanix, hyperv, openstack, ...).
>> More precisely, vm affinity rules (vm<->vm) vs node affinity rules
>> (vm->node , the current HA group)
>>
>> Personnally I don't care, it's just a name ^_^ .
>>
>> But I have a lot of customers asking about "does proxmox support
>> affinity/anti-affinity". and if they are doing their own research, they
>> will think that it doesnt exist.
>> (or at minimum, write somewhere in the doc something like "aka vm
>> affinity" or in commercial presentation ^_^)
>
> I see your point and also called it affinity/anti-affinity before, but
> if we go for the HA Rules route here, it'd be really neat to have
> "Location Rules" and "Colocation Rules" in the end to coexist and
> clearly show the distinction between them, as both are affinity rules at
> least for me.
>
> I'd definitely make sure that it is clear from the release notes and
> documentation, that this adds the feature to assign affinity between
> services, but let's wait for some other comments on this ;).
In the UI/docs we can be always be more descriptive and say things like
"(Anti-)Affinity Between Services" and "(Anti-)Affinity With Node",
while in the section config it's of course advantageous to have a single
word.
>
> On 4/1/25 03:50, DERUMIER, Alexandre wrote:
>> More serious question : Don't have read yet all the code, but how does
>> it play with the current topsis placement algorithm ?
>
> I currently implemented the colocation rules to put a constraint on
> which nodes the manager can select from for the to-be-migrated service.
>
> So if users use the static load scheduler (and the basic / service count
> scheduler for that matter too), the colocation rules just make sure that
> no recovery node is selected, which contradicts the colocation rules. So
> the TOPSIS algorithm isn't changed at all.
>
> There are two things that should/could be changed in the future (besides
> the many future ideas that I pointed out already), which are
>
> - (1) the schedulers will still consider all online nodes, i.e. even
> though HA groups and/or colocation rules restrict the allowed nodes in
> the end, the calculation is done for all nodes which could be
> significant for larger clusters, and
>
> - (2) the service (generally) are currently recovered one-by-one in a
> best-fit fashion, i.e. there's no order on the service's needed
> resources, etc. There could be some edge cases (e.g. think about a
> failing node with a bunch of service to be kept together; these should
> now be migrated to the same node, if possible, or put them on the
> minimum amount of nodes), where the algorithm could find better
> solutions if it either orders the to-be-recovered services, and/or the
> utilization scheduler has knowledge about the 'keep together'
> colocations and considers these (and all subsets) as a single service.
Yes, a simple heuristic here could be to take the subsets of:
1. (strict?) 'keep together' services
2. single services that are not otherwise in a (strict?) 'keep
together' relation, consider each by itself a subset too
Then order the above subsets by their usage (ordering inside a subset
should not be that important) and then recover the services in that
order one-by-one (i.e. one-by-one for the first subset in the ordering,
then one-by-one for the second subset in the ordering, etc.). Even if
it's one-by-one that should mean keeping the (strict) 'keep together'
together, right?
Like that you get the heavy subsets out of the way first. This prevents
the otherwise likely scenario where too many small services are
recovered in a balanced fashion to other nodes (let's say nodes all end
up at 80% usage) and then there's no single node with the necessary
resources for a heavy service that is still to be recovered (e.g. one
that would need 30% usage on a node).
Can of course be done as a follow-up.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
@ 2025-04-24 10:12 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-24 10:12 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 17:47 schrieb Daniel Kral:
> On 3/25/25 16:12, Daniel Kral wrote:
>> Colocation Rules
>> ----------------
>>
>> The two properties of colocation rules, as described in the
>> introduction, are rather straightforward. A typical colocation rule
>> inside of the config would look like the following:
>>
>> colocation: some-lonely-services
>> services vm:101,vm:103,ct:909
>> affinity separate
>> strict 1
>>
>> This means that the three services vm:101, vm:103 and ct:909 must be
>> kept separate on different nodes. I'm very keen on naming suggestions
>> since I think there could be a better word than 'affinity' here. I
>> played around with 'keep-services', since then it would always read
>> something like 'keep-services separate', which is very declarative, but
>> this might suggest that this is a binary option to too much users (I
>> mean it is, but not with the values 0 and 1).
>
> Just to document this, I've played around with using a score to decide
> whether the colocation rule is positive/negative, how strict and to
> allow specifying a value on how much it is desired to meet the
> colocation rule in case of an optional colocation rule, much like
> pacemaker's version.
>
> But in the end, I ditched the idea, since it didn't integrate well and
> it was also not trivial to find a good scale for this weight value that
> would correspond similarly as the node priority in HA groups, for
> example, especially when we select for each service individually.
The node priority for HA groups is not a weight, but an ordering.
In any case, such a weight for colocation could still be added on top
later if we really want to.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
@ 2025-04-24 12:29 ` Fiona Ebner
2025-04-25 7:39 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-24 12:29 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add methods get_service_node() and pin_service_node() to the Usage class
> to retrieve and pin the current node of a specific service.
Hmm, not sure about calling it "pin", why not "set"?
>
> This is used to retrieve the current node of a service for colocation
> rules inside of select_service_node(), where there is currently no
> access to the global services state.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> For me this is more of a temporary change, since I don't think putting
> this information here is very useful in the future. It was more of a
> workaround for the moment, since `select_service_node()` doesn't have
> access to the global service configuration data, which is needed here.
>
> I would like to give `select_service_node()` the information from e.g.
> $sc directly post-RFC.
Yes, this sounds cleaner than essentially tracking the same things twice
in different places. Can't we do this as preparation to avoid such
temporary workarounds?
> src/PVE/HA/Usage.pm | 12 ++++++++++++
> src/PVE/HA/Usage/Basic.pm | 15 +++++++++++++++
> src/PVE/HA/Usage/Static.pm | 14 ++++++++++++++
> 3 files changed, 41 insertions(+)
>
> diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
> index 66d9572..e4f86d7 100644
> --- a/src/PVE/HA/Usage.pm
> +++ b/src/PVE/HA/Usage.pm
> @@ -27,6 +27,18 @@ sub list_nodes {
> die "implement in subclass";
> }
>
> +sub get_service_node {
> + my ($self, $sid) = @_;
> +
> + die "implement in subclass";
> +}
> +
> +sub pin_service_node {
> + my ($self, $sid, $node) = @_;
> +
> + die "implement in subclass";
> +}
> +
> sub contains_node {
> my ($self, $nodename) = @_;
>
> diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
> index d6b3d6c..50d687b 100644
> --- a/src/PVE/HA/Usage/Basic.pm
> +++ b/src/PVE/HA/Usage/Basic.pm
> @@ -10,6 +10,7 @@ sub new {
>
> return bless {
> nodes => {},
> + services => {},
I feel like it would be more natural to also use 'service-nodes' here
like you do for the 'static' plugin [continued below...]
> haenv => $haenv,
> }, $class;
> }
> @@ -38,11 +39,25 @@ sub contains_node {
> return defined($self->{nodes}->{$nodename});
> }
>
> +sub get_service_node {
> + my ($self, $sid) = @_;
> +
> + return $self->{services}->{$sid};
...because these kinds of expressions don't make it clear that this is a
node.
> +}
> +
> +sub pin_service_node {
> + my ($self, $sid, $node) = @_;
> +
> + $self->{services}->{$sid} = $node;
> +}
> +
> sub add_service_usage_to_node {
> my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
>
> if ($self->contains_node($nodename)) {
> + $self->{total}++;
> $self->{nodes}->{$nodename}++;
> + $self->{services}->{$sid} = $nodename;
> } else {
> $self->{haenv}->log(
> 'warning',
> diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
> index 3d0af3a..8db9202 100644
> --- a/src/PVE/HA/Usage/Static.pm
> +++ b/src/PVE/HA/Usage/Static.pm
> @@ -22,6 +22,7 @@ sub new {
> 'service-stats' => {},
> haenv => $haenv,
> scheduler => $scheduler,
> + 'service-nodes' => {},
> 'service-counts' => {}, # Service count on each node. Fallback if scoring calculation fails.
> }, $class;
> }
> @@ -85,9 +86,22 @@ my sub get_service_usage {
> return $service_stats;
> }
>
> +sub get_service_node {
> + my ($self, $sid) = @_;
> +
> + return $self->{'service-nodes'}->{$sid};
> +}
> +
> +sub pin_service_node {
> + my ($self, $sid, $node) = @_;
> +
> + $self->{'service-nodes'}->{$sid} = $node;
> +}
> +
> sub add_service_usage_to_node {
> my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
>
> + $self->{'service-nodes'}->{$sid} = $nodename;
This is why "pin" feels wrong to me. It will just get overwritten here
next time a usage calculation is made. Can that be problematic?
> $self->{'service-counts'}->{$nodename}++;
>
> eval {
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
@ 2025-04-24 13:03 ` Fiona Ebner
2025-04-25 8:29 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-24 13:03 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add a rules section config base plugin to allow users to specify
> different kinds of rules in a single configuration file.
>
> The interface is designed to allow sub plugins to implement their own
> {decode,encode}_value() methods and also offer a canonicalized version
It's not "allow" them to implement, but actually requires them to
implement it. Otherwise, it would be infinite recursion.
> of their rules with canonicalize(), i.e. with any inconsistencies
> removed and ambiguities resolved. There is also a are_satisfiable()
> method for anticipation of the verification of additions or changes to
> the rules config via the API.
---snip 8<---
> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
> new file mode 100644
> index 0000000..bff3375
> --- /dev/null
> +++ b/src/PVE/HA/Rules.pm
> @@ -0,0 +1,118 @@
> +package PVE::HA::Rules;
> +
> +use strict;
> +use warnings;
> +
> +use PVE::JSONSchema qw(get_standard_option);
> +use PVE::SectionConfig;
Missing include of PVE::Tools.
Nit: I'd put a blank here to separate modules from different packages
and modules from the same package.
> +use PVE::HA::Tools;
> +
> +use base qw(PVE::SectionConfig);
> +
> +# TODO Add descriptions, completions, etc.
> +my $defaultData = {
> + propertyList => {
> + type => { description => "Rule type." },
> + ruleid => get_standard_option('pve-ha-rule-id'),
> + comment => {
> + type => 'string',
> + maxLength => 4096,
> + description => "Rule description.",
> + },
Oh good, so there already is a comment property :)
---snip 8<---
> +sub foreach_service_rule {
> + my ($rules, $func, $opts) = @_;
> +
> + my $sid = $opts->{sid};
> + my $type = $opts->{type};
> +
> + my @ruleids = sort {
> + $rules->{order}->{$a} <=> $rules->{order}->{$b}
> + } keys %{$rules->{ids}};
> +
> + for my $ruleid (@ruleids) {
> + my $rule = $rules->{ids}->{$ruleid};
> +
> + next if !$rule; # invalid rules are kept undef in section config, delete them
s/delete/skip/ ?
> + next if $type && $rule->{type} ne $type;
> + next if $sid && !defined($rule->{services}->{$sid});
Style nit: I'd prefer defined($type) and defined($sid) in the above
expressions
> +
> + $func->($rule, $ruleid);
> + }
> +}
> +
> +sub canonicalize {
> + my ($class, $rules, $groups, $services) = @_;
> +
> + die "implement in subclass";
> +}
> +
> +sub are_satisfiable {
> + my ($class, $rules, $groups, $services) = @_;
> +
> + die "implement in subclass";
> +}
This might not be possible to implement in just the subclasses. E.g.
services 1 and 2 have strict colocation with each other, but 1 has
restricted location on node A and 2 has restricted location on node B.
I don't think it hurts to rather put the implementation here with
knowledge of all rule types and what inter-dependencies they entail. And
maybe have it be a function rather than a method then?
> +sub checked_config {
> + my ($rules, $groups, $services) = @_;
> +
> + my $types = __PACKAGE__->lookup_types();
> +
> + for my $type (@$types) {
> + my $plugin = __PACKAGE__->lookup($type);
> +
> + $plugin->canonicalize($rules, $groups, $services);
Shouldn't we rather only pass the rules that belong to the specific
plugin rather than always all?
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods
2025-04-24 12:29 ` Fiona Ebner
@ 2025-04-25 7:39 ` Daniel Kral
0 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-25 7:39 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 4/24/25 14:29, Fiona Ebner wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> Add methods get_service_node() and pin_service_node() to the Usage class
>> to retrieve and pin the current node of a specific service.
>
> Hmm, not sure about calling it "pin", why not "set"?
>
>>
>> This is used to retrieve the current node of a service for colocation
>> rules inside of select_service_node(), where there is currently no
>> access to the global services state.
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> For me this is more of a temporary change, since I don't think putting
>> this information here is very useful in the future. It was more of a
>> workaround for the moment, since `select_service_node()` doesn't have
>> access to the global service configuration data, which is needed here.
>>
>> I would like to give `select_service_node()` the information from e.g.
>> $sc directly post-RFC.
>
> Yes, this sounds cleaner than essentially tracking the same things twice
> in different places. Can't we do this as preparation to avoid such
> temporary workarounds?
Yes, we can definitely do this as I'm also not a fan of copying
information at all. I just did it here for the RFC as I wanted to focus
on implementing the core functionality first and making it pretty
afterwards.
So this patch will be dropped/changed to restructure the signature of
select_service_node(...) to have more information where the services are
currently configured to run in the next revision.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin
2025-04-24 13:03 ` Fiona Ebner
@ 2025-04-25 8:29 ` Daniel Kral
2025-04-25 9:12 ` Fiona Ebner
0 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-25 8:29 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 4/24/25 15:03, Fiona Ebner wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> Add a rules section config base plugin to allow users to specify
>> different kinds of rules in a single configuration file.
>>
>> The interface is designed to allow sub plugins to implement their own
>> {decode,encode}_value() methods and also offer a canonicalized version
>
> It's not "allow" them to implement, but actually requires them to
> implement it. Otherwise, it would be infinite recursion.
ACK will change the wording here.
>
>> of their rules with canonicalize(), i.e. with any inconsistencies
>> removed and ambiguities resolved. There is also a are_satisfiable()
>> method for anticipation of the verification of additions or changes to
>> the rules config via the API.
>
> ---snip 8<---
>
>> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
>> new file mode 100644
>> index 0000000..bff3375
>> --- /dev/null
>> +++ b/src/PVE/HA/Rules.pm
>> @@ -0,0 +1,118 @@
>> +package PVE::HA::Rules;
>> +
>> +use strict;
>> +use warnings;
>> +
>> +use PVE::JSONSchema qw(get_standard_option);
>> +use PVE::SectionConfig;
>
> Missing include of PVE::Tools.
>
> Nit: I'd put a blank here to separate modules from different packages
> and modules from the same package.
ACK both.
>
>> +use PVE::HA::Tools;
>
>> +
>> +use base qw(PVE::SectionConfig);
>> +
>> +# TODO Add descriptions, completions, etc.
>> +my $defaultData = {
>> + propertyList => {
>> + type => { description => "Rule type." },
>> + ruleid => get_standard_option('pve-ha-rule-id'),
>> + comment => {
>> + type => 'string',
>> + maxLength => 4096,
>> + description => "Rule description.",
>> + },
>
> Oh good, so there already is a comment property :)
>
> ---snip 8<---
>
>> +sub foreach_service_rule {
>> + my ($rules, $func, $opts) = @_;
>> +
>> + my $sid = $opts->{sid};
>> + my $type = $opts->{type};
>> +
>> + my @ruleids = sort {
>> + $rules->{order}->{$a} <=> $rules->{order}->{$b}
>> + } keys %{$rules->{ids}};
>> +
>> + for my $ruleid (@ruleids) {
>> + my $rule = $rules->{ids}->{$ruleid};
>> +
>> + next if !$rule; # invalid rules are kept undef in section config, delete them
>
> s/delete/skip/ ?
ACK
>
>> + next if $type && $rule->{type} ne $type;
>> + next if $sid && !defined($rule->{services}->{$sid});
>
> Style nit: I'd prefer defined($type) and defined($sid) in the above
> expressions
ACK
>
>> +
>> + $func->($rule, $ruleid);
>> + }
>> +}
>> +
>> +sub canonicalize {
>> + my ($class, $rules, $groups, $services) = @_;
>> +
>> + die "implement in subclass";
>> +}
>> +
>> +sub are_satisfiable {
>> + my ($class, $rules, $groups, $services) = @_;
>> +
>> + die "implement in subclass";
>> +}
>
> This might not be possible to implement in just the subclasses. E.g.
> services 1 and 2 have strict colocation with each other, but 1 has
> restricted location on node A and 2 has restricted location on node B.
>
> I don't think it hurts to rather put the implementation here with
> knowledge of all rule types and what inter-dependencies they entail. And
> maybe have it be a function rather than a method then?
Yes, you're right, it would make more sense to have these be functions
rather than methods. In the current implementation it's rather confusing
and in the end $rules should consist of all types of rules, so $groups
and $services are hopefully not needed as separate parameters anymore
(The only usage for these are to check for HA group members).
What do you think about something like a
sub register_rule_check {
my ($class, $check_func, $canonicalize_func, $satisfiable_func) = @_;
}
in the base plugin and then each plugin can register their checker
methods with the behavior what is done when running canonicalize(...)
and are_satisfiable(...)? These then have to go through every registered
entry in the list and call $check_func and then either
$canonicalize_func and $satisfiable_func.
Another (simpler) option would be to just put all checker subroutines in
the base plugin, but that could get unmaintainable quite fast.
>
>> +sub checked_config {
>> + my ($rules, $groups, $services) = @_;
>> +
>> + my $types = __PACKAGE__->lookup_types();
>> +
>> + for my $type (@$types) {
>> + my $plugin = __PACKAGE__->lookup($type);
>> +
>> + $plugin->canonicalize($rules, $groups, $services);
>
> Shouldn't we rather only pass the rules that belong to the specific
> plugin rather than always all?
As in the previous comment, I think it would be reasonable to pass all
types of rules as there are some checks that require to check between
colocation and location rules, for example. But it would also make sense
to move these more general checks in the base plugin, so that the
checkers in the plugins have to only care about their own feasibility.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-24 10:12 ` Fiona Ebner
@ 2025-04-25 8:36 ` Daniel Kral
2025-04-25 12:25 ` Fiona Ebner
0 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-25 8:36 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
Thanks for the review, Fiona!
I have some comments left, one of them is about the last comment about
how to migrate HA groups to location rules to give a better illustration
why I'd like to allow multiple location rules in the end, hope we're
able to do this.
On 4/24/25 12:12, Fiona Ebner wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> | Canonicalization
>> ----------
>>
>> Additionally, colocation rules are currently simplified as follows:
>>
>> - If there are multiple positive colocation rules with common services
>> and the same strictness, these are merged to a single positive
>> colocation rule.
>
> Do you intend to do that when writing the configuration file? I think
> rules are better left unmerged from a user perspective. For example:
>
> - services 1, 2 and 3 should strictly stay together, because of reason A
> - services 1 and 3 should strictly stay together, because of different
> reason B
>
> Another scenario might be that the user is currently in the process of
> editing some rules one-by-one and then it might also be surprising if
> something is auto-merged.
>
> You can of course always dynamically merge them when doing the
> computation for the node selection.
This is what I had in mind and I should have made the description for
that clearer here. It is only for computing the feasibility of the rules
when (1) creating, (2) updating, and (3) applying them.
As suggested by @Lukas off-list, I'll also try to make the check
selective, e.g. the user has made an infeasible change to the config
manually by writing to the file and then wants to create another rule.
Here it should ignore the infeasible rules (as they'll be dropped
anyway) and only check if the added rule / changed rule is infeasible.
But as you said, it must not change the user's configuration in the end
as that would be very confusing to the user.
>
> In the same spirit, a comment field for each rule where the user can put
> the reason might be nice to have.
>
> Another question is if we should allow enabling/disabling rules.
>
> Comment and enabling can of course always be added later. I'm just not
> sure we should start out with the auto-merging of rules.
Good idea, I think there are definitely use cases for enabling/disabling
the rules and it's easy to implement, will add that to v1 :).
>
>> | Inference rules
>> ----------
>>
>> There are currently no inference rules implemented for the RFC, but
>> there could be potential to further simplify some code paths in the
>> future, e.g. a positive colocation rule where one service is part of a
>> restricted HA group makes the other services in the positive colocation
>> rule a part of this HA group as well.
>
> If the rule is strict. If we do this I think it should only happen
> dynamically for the node selection too.
Yes, I'll take a closer look here, but I fully agree that this part
should also be done dynamically as the steps above.
I'll see if that could improve something and wouldn't be unnecessary
overhead that will be handled by the node selection in the end anyway.
Roughly speaking, I'd like the select_service_node(...) to mostly
consist of the steps (as already done now with HA groups):
apply_location_rules(...);
apply_colocation_rules(...);
$scores = score_nodes_to_start_service(...);
/* select the best node according to utilization) */
>
>
>> Comment about HA groups -> Location Rules
>> -----------------------------------------
>>
>> This part is not really part of the patch series, but still worth for an
>> on-list discussion.
>>
>> I'd like to suggest to also transform the existing HA groups to location
>> rules, if the rule concept turns out to be a good fit for the colocation
>> feature in the HA Manager, as HA groups seem to integrate quite easily
>> into this concept.
>>
>> This would make service-node relationships a little more flexible for
>> users and we'd be able to have both configurable / visible in the same
>> WebUI view, API endpoint, and configuration file. Also, some code paths
>> could be a little more consise, e.g. checking changes to constraints and
>> canonicalizing the rules config.
>>
>> The how should be rather straightforward for the obvious use cases:
>>
>> - Services in unrestricted HA groups -> Location rules with the nodes of
>> the HA group; We could either split each node priority group into
>> separate location rules (with each having their score / weight) or
>> keep the input format of HA groups with a list of
>> `<node>(:<priority>)` in each rule
>>
>> - Services in restricted HA groups -> Same as above, but also using
>> either `+inf` for a mandatory location rule or `strict` property
>> depending on how we decide on the colocation rule properties
>
> I'd prefer having a 'strict' property, as that is orthogonal to the
> priorities and that aligns it with what you propose for the colocation
> rules.
ACK, I forgot to remove this bit as I dropped the idea of the flat
'score' property.
>
>> This would allow most of the use cases of HA groups to be easily
>> migratable to location rules. We could also keep the inference of the
>> 'default group' for unrestricted HA groups (any node that is available
>> is added as a group member with priority -1).
>
> Nodes can change, so adding them explicitly will mean it can get
> outdated. This should be implicit/done dynamically.
Yes, I should have stated this more clearly: it was meant to be
dynamically inferred instead of statically written to the config as this
would only clutter the config with "useless" rules for any service that
didn't have a HA group before ;).
>
>> The only thing that I'm unsure about this, is how we would migrate the
>> `nofailback` option, since this operates on the group-level. If we keep
>> the `<node>(:<priority>)` syntax and restrict that each service can only
>> be part of one location rule, it'd be easy to have the same flag. If we
>> go with multiple location rules per service and each having a score or
>> weight (for the priority), then we wouldn't be able to have this flag
>> anymore. I think we could keep the semantic if we move this flag to the
>> service config, but I'm thankful for any comments on this.
> My gut feeling is that going for a more direct mapping, i.e. each
> location rule represents one HA group, is better. The nofailback flag
> can still apply to a given location rule I think? For a given service,
> if a higher-priority node is online for any location rule the service is
> part of, with nofailback=0, it will get migrated to that higher-priority
> node. It does make sense to have a given service be part of only one
> location rule then though, since node priorities can conflict between rules.
Yeah, I think this is the reasonable option too.
I briefly discussed this with @Fabian off-list and we also agreed that
it would be good to make location rules as 1:1 to location rules as
possible and keep the nofailback per location rule, as the behavior of
the HA group's nofailback could still be preserved - at least if there's
only a single location rule per service at least.
---
On the other hand, I'll have to take a closer look if we can do
something about the blockers when creating multiple location rules where
e.g. one has nofailback enabled and the other has not. As you already
said, they could easily conflict between rules...
My previous idea was to make location rules as flexible as possible, so
that it would theoretically not matter if one writes:
location: rule1
services: vm:101
nodes: node1:2,node2:1
strict: 1
or:
location: rule1
services: vm:101
nodes: node1
strict: 1
location: rule2
services: vm:101
nodes: node2
strict: 1
The order which one's more important could be encoded in the order which
it is defined (if one configures this in the config it's easy, and I'd
add an API endpoint to realize this over the API/WebGUI too), or maybe
even simpler to maintain: just another property. But then, the
nofailback would have to be either moved to some other place...
Or it is still allowed in location rules, but either the more detailed
rule wins (e.g. one rule has node1 without a priority and the other does
have node1 with a priority) or the first location rule with a specific
node wins and the other is ignored. But this is already confusing when
writing it out here...
I'd prefer users to write the former (and make this the dynamic
'canonical' form when selecting nodes), but as with colocation rules it
could make sense to separate them for specific reasons / use cases.
And another reason why it could still make sense to go that way is to
allow "negative" location rules at a later point, which makes sense in
larger environments, where it's easier to write opt-out rules than
opt-in rules, so I'd like to keep that path open for the future.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin
2025-04-25 8:29 ` Daniel Kral
@ 2025-04-25 9:12 ` Fiona Ebner
2025-04-25 13:30 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-25 9:12 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion
Am 25.04.25 um 10:29 schrieb Daniel Kral:
> On 4/24/25 15:03, Fiona Ebner wrote:
>> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>>> +
>>> + $func->($rule, $ruleid);
>>> + }
>>> +}
>>> +
>>> +sub canonicalize {
>>> + my ($class, $rules, $groups, $services) = @_;
>>> +
>>> + die "implement in subclass";
>>> +}
>>> +
>>> +sub are_satisfiable {
>>> + my ($class, $rules, $groups, $services) = @_;
>>> +
>>> + die "implement in subclass";
>>> +}
>>
>> This might not be possible to implement in just the subclasses. E.g.
>> services 1 and 2 have strict colocation with each other, but 1 has
>> restricted location on node A and 2 has restricted location on node B.
>>
>> I don't think it hurts to rather put the implementation here with
>> knowledge of all rule types and what inter-dependencies they entail. And
>> maybe have it be a function rather than a method then?
>
> Yes, you're right, it would make more sense to have these be functions
> rather than methods. In the current implementation it's rather confusing
> and in the end $rules should consist of all types of rules, so $groups
> and $services are hopefully not needed as separate parameters anymore
> (The only usage for these are to check for HA group members).
For canonicalize(), I don't think it's a hard requirement. Can still be
useful for further optimization of course.
> What do you think about something like a
>
> sub register_rule_check {
> my ($class, $check_func, $canonicalize_func, $satisfiable_func) = @_;
> }
>
> in the base plugin and then each plugin can register their checker
> methods with the behavior what is done when running canonicalize(...)
> and are_satisfiable(...)? These then have to go through every registered
> entry in the list and call $check_func and then either
> $canonicalize_func and $satisfiable_func.
I don't see how that would help with the scenario I described above
where the non-satisfiability can only be seen by knowing about
inter-dependencies between rules.
> Another (simpler) option would be to just put all checker subroutines in
> the base plugin, but that could get unmaintainable quite fast.
I think the helpers should go into the plugins. These can be designed to
take the constraints arising from the inter-dependency as arguments.
E.g. a helper in the location plugin, simply checking if the location
rules are satisfiable (no constraints) and returning the arising
services<->nodes constraints. A helper in the colocation plugin to check
if colocation rules are satisfiable given certain services<->nodes
constraints. The main function in the base plugin would just need to
call these two in order then.
>>> +sub checked_config {
>>> + my ($rules, $groups, $services) = @_;
>>> +
>>> + my $types = __PACKAGE__->lookup_types();
>>> +
>>> + for my $type (@$types) {
>>> + my $plugin = __PACKAGE__->lookup($type);
>>> +
>>> + $plugin->canonicalize($rules, $groups, $services);
>>
>> Shouldn't we rather only pass the rules that belong to the specific
>> plugin rather than always all?
>
> As in the previous comment, I think it would be reasonable to pass all
> types of rules as there are some checks that require to check between
> colocation and location rules, for example. But it would also make sense
> to move these more general checks in the base plugin, so that the
> checkers in the plugins have to only care about their own feasibility.
Again, IMHO we could have the plugins implement suitable helper
functions, but put the logic that knows about inter-dependencies into
the base plugin itself. Otherwise, you essentially need every plugin to
care about all others, rather than having only the common base plugin
care about all.
So design the helpers in expectation of what inter-dependencies we need
to consider (this will of course change with future rules, but we are
flexible to adapt), but don't have the plugins be concerned with other
plugins directly, i.e. they don't need to know how the constraints arise
from other rule types.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-25 8:36 ` Daniel Kral
@ 2025-04-25 12:25 ` Fiona Ebner
2025-04-25 13:25 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-25 12:25 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion
Am 25.04.25 um 10:36 schrieb Daniel Kral:
> On 4/24/25 12:12, Fiona Ebner wrote:
>> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>>> | Canonicalization
>>> ----------
>>>
>>> Additionally, colocation rules are currently simplified as follows:
>>>
>>> - If there are multiple positive colocation rules with common services
>>> and the same strictness, these are merged to a single positive
>>> colocation rule.
>>
>> Do you intend to do that when writing the configuration file? I think
>> rules are better left unmerged from a user perspective. For example:
>>
>> - services 1, 2 and 3 should strictly stay together, because of reason A
>> - services 1 and 3 should strictly stay together, because of different
>> reason B
>>
>> Another scenario might be that the user is currently in the process of
>> editing some rules one-by-one and then it might also be surprising if
>> something is auto-merged.
>>
>> You can of course always dynamically merge them when doing the
>> computation for the node selection.
>
> This is what I had in mind and I should have made the description for
> that clearer here. It is only for computing the feasibility of the rules
> when (1) creating, (2) updating, and (3) applying them.
Okay, great :) Just wanted to make sure.
> As suggested by @Lukas off-list, I'll also try to make the check
> selective, e.g. the user has made an infeasible change to the config
> manually by writing to the file and then wants to create another rule.
> Here it should ignore the infeasible rules (as they'll be dropped
> anyway) and only check if the added rule / changed rule is infeasible.
How will you select the rule to drop? Applying the rules one-by-one to
find a first violation?
> But as you said, it must not change the user's configuration in the end
> as that would be very confusing to the user.
Okay, so dropping dynamically. I guess we could also disable such rules
explicitly/mark them as being in violation with other rules somehow:
Tri-state enabled/disabled/conflict status? Explicit field?
Something like that would make such rules easily visible and have the
configuration better reflect the actual status.
As discussed off-list now: we can try to re-enable conflicting rules
next time the rules are loaded.
>>> The only thing that I'm unsure about this, is how we would migrate the
>>> `nofailback` option, since this operates on the group-level. If we keep
>>> the `<node>(:<priority>)` syntax and restrict that each service can only
>>> be part of one location rule, it'd be easy to have the same flag. If we
>>> go with multiple location rules per service and each having a score or
>>> weight (for the priority), then we wouldn't be able to have this flag
>>> anymore. I think we could keep the semantic if we move this flag to the
>>> service config, but I'm thankful for any comments on this.
>> My gut feeling is that going for a more direct mapping, i.e. each
>> location rule represents one HA group, is better. The nofailback flag
>> can still apply to a given location rule I think? For a given service,
>> if a higher-priority node is online for any location rule the service is
>> part of, with nofailback=0, it will get migrated to that higher-priority
>> node. It does make sense to have a given service be part of only one
>> location rule then though, since node priorities can conflict between
>> rules.
>
> Yeah, I think this is the reasonable option too.
>
> I briefly discussed this with @Fabian off-list and we also agreed that
> it would be good to make location rules as 1:1 to location rules as
> possible and keep the nofailback per location rule, as the behavior of
> the HA group's nofailback could still be preserved - at least if there's
> only a single location rule per service at least.
>
> ---
>
> On the other hand, I'll have to take a closer look if we can do
> something about the blockers when creating multiple location rules where
> e.g. one has nofailback enabled and the other has not. As you already
> said, they could easily conflict between rules...
>
> My previous idea was to make location rules as flexible as possible, so
> that it would theoretically not matter if one writes:
>
> location: rule1
> services: vm:101
> nodes: node1:2,node2:1
> strict: 1
> or:
>
> location: rule1
> services: vm:101
> nodes: node1
> strict: 1
>
> location: rule2
> services: vm:101
> nodes: node2
> strict: 1
>
> The order which one's more important could be encoded in the order which
> it is defined (if one configures this in the config it's easy, and I'd
> add an API endpoint to realize this over the API/WebGUI too), or maybe
> even simpler to maintain: just another property.
We cannot use just the order, because a user might want to give two
nodes the same priority. I'd also like to avoid an implicit
order-priority mapping.
> But then, the
> nofailback would have to be either moved to some other place...
> Or it is still allowed in location rules, but either the more detailed
> rule wins (e.g. one rule has node1 without a priority and the other does
> have node1 with a priority)
Maybe we should prohibit multiple rules with the same service-node pair?
Otherwise, my intuition says that all rules should be considered and the
rule with the highest node priority should win.
> or the first location rule with a specific
> node wins and the other is ignored. But this is already confusing when
> writing it out here...
>
> I'd prefer users to write the former (and make this the dynamic
> 'canonical' form when selecting nodes), but as with colocation rules it
> could make sense to separate them for specific reasons / use cases.
Fair point.
> And another reason why it could still make sense to go that way is to
> allow "negative" location rules at a later point, which makes sense in
> larger environments, where it's easier to write opt-out rules than opt-
> in rules, so I'd like to keep that path open for the future.
We also discussed this off list: Daniel convinced me that it would be
cleaner if the nofailback property would be associated to a given
service rather than a given location rule. And if we later support pools
as resources, the property should be associated to (certain or all)
services in that pool and defined in the resource config for the pool.
To avoid the double-negation with nofailback=0, it could also be renamed
to a positive property, below called "auto-elevate", just a working name.
A small concern of mine was that this makes it impossible to have a
service that only "auto-elevates" to a specific node with a priority,
but not others. This is already not possible right now, and honestly,
that would be quite strange behavior and not supporting that is unlikely
to hurt real use cases.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-25 12:25 ` Fiona Ebner
@ 2025-04-25 13:25 ` Daniel Kral
2025-04-25 13:58 ` Fiona Ebner
0 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-25 13:25 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 4/25/25 14:25, Fiona Ebner wrote:
> Am 25.04.25 um 10:36 schrieb Daniel Kral:
>> On 4/24/25 12:12, Fiona Ebner wrote:
>> As suggested by @Lukas off-list, I'll also try to make the check
>> selective, e.g. the user has made an infeasible change to the config
>> manually by writing to the file and then wants to create another rule.
>> Here it should ignore the infeasible rules (as they'll be dropped
>> anyway) and only check if the added rule / changed rule is infeasible.
>
> How will you select the rule to drop? Applying the rules one-by-one to
> find a first violation?
AFAICS we could use the same helpers to check whether the rules are
feasible, and only check whether the added / updated ruleid is one that
is causing these troubles. I guess this would be a reasonable option
without duplicating code, but still check against the whole config.
There's surely some optimization potential here, but then we would have
a larger problem at reloading the rule configuration for the manager
anyway. For the latter I could check for what size of a larger
configuration this could become an actual bottleneck.
For either adding a rule or updating a rule, we would just make the
change to the configuration in-memory and run the helper. Depending on
the result, we'd store the config or error out to the API user.
>
>> But as you said, it must not change the user's configuration in the end
>> as that would be very confusing to the user.
>
> Okay, so dropping dynamically. I guess we could also disable such rules
> explicitly/mark them as being in violation with other rules somehow:
> Tri-state enabled/disabled/conflict status? Explicit field?
>
> Something like that would make such rules easily visible and have the
> configuration better reflect the actual status.
>
> As discussed off-list now: we can try to re-enable conflicting rules
> next time the rules are loaded.
Hm, there's three options now:
- Allowing conflicts over the create / update API and auto-resolving the
conflicts as soon as we're able to (e.g. on the load / save where the
rule becomes feasible again).
- Not allowing conflicts over the create / update API, but set the state
to 'conflict' if manual changes (or other circumstances) made the rules
be in conflict with one another.
- Having something like the SDN config, where there's a working
configuration and a "draft" configuration that needs to be applied. So
conflicts are allowed in drafts, but not in working configurations.
The SDN option seems too much for me here, but I just noticed some
similarity.
I guess one of the first two makes more sense. If there's no arguments
against this, I'd choose the second option as we can always allow
intentional conflicts later if there's user demand or we see other
reasons in that.
>
>>>> The only thing that I'm unsure about this, is how we would migrate the
>>>> `nofailback` option, since this operates on the group-level. If we keep
>>>> the `<node>(:<priority>)` syntax and restrict that each service can only
>>>> be part of one location rule, it'd be easy to have the same flag. If we
>>>> go with multiple location rules per service and each having a score or
>>>> weight (for the priority), then we wouldn't be able to have this flag
>>>> anymore. I think we could keep the semantic if we move this flag to the
>>>> service config, but I'm thankful for any comments on this.
>>> My gut feeling is that going for a more direct mapping, i.e. each
>>> location rule represents one HA group, is better. The nofailback flag
>>> can still apply to a given location rule I think? For a given service,
>>> if a higher-priority node is online for any location rule the service is
>>> part of, with nofailback=0, it will get migrated to that higher-priority
>>> node. It does make sense to have a given service be part of only one
>>> location rule then though, since node priorities can conflict between
>>> rules.
>>
>> Yeah, I think this is the reasonable option too.
>>
>> I briefly discussed this with @Fabian off-list and we also agreed that
>> it would be good to make location rules as 1:1 to location rules as
>> possible and keep the nofailback per location rule, as the behavior of
>> the HA group's nofailback could still be preserved - at least if there's
>> only a single location rule per service at least.
>>
>> ---
>>
>> On the other hand, I'll have to take a closer look if we can do
>> something about the blockers when creating multiple location rules where
>> e.g. one has nofailback enabled and the other has not. As you already
>> said, they could easily conflict between rules...
>>
>> My previous idea was to make location rules as flexible as possible, so
>> that it would theoretically not matter if one writes:
>>
>> location: rule1
>> services: vm:101
>> nodes: node1:2,node2:1
>> strict: 1
>> or:
>>
>> location: rule1
>> services: vm:101
>> nodes: node1
>> strict: 1
>>
>> location: rule2
>> services: vm:101
>> nodes: node2
>> strict: 1
>>
>> The order which one's more important could be encoded in the order which
>> it is defined (if one configures this in the config it's easy, and I'd
>> add an API endpoint to realize this over the API/WebGUI too), or maybe
>> even simpler to maintain: just another property.
>
> We cannot use just the order, because a user might want to give two
> nodes the same priority. I'd also like to avoid an implicit
> order-priority mapping.
Right, good point!
>
>> But then, the
>> nofailback would have to be either moved to some other place...
>
>> Or it is still allowed in location rules, but either the more detailed
>> rule wins (e.g. one rule has node1 without a priority and the other does
>> have node1 with a priority)
>
> Maybe we should prohibit multiple rules with the same service-node pair?
> Otherwise, my intuition says that all rules should be considered and the
> rule with the highest node priority should win.
Yes, I think that would make the most sense as disallowing users to put
the same two or more services in multiple negative colocation rules.
>
>> or the first location rule with a specific
>> node wins and the other is ignored. But this is already confusing when
>> writing it out here...
>>
>> I'd prefer users to write the former (and make this the dynamic
>> 'canonical' form when selecting nodes), but as with colocation rules it
>> could make sense to separate them for specific reasons / use cases.
>
> Fair point.
>
>> And another reason why it could still make sense to go that way is to
>> allow "negative" location rules at a later point, which makes sense in
>> larger environments, where it's easier to write opt-out rules than opt-
>> in rules, so I'd like to keep that path open for the future.
>
> We also discussed this off list: Daniel convinced me that it would be
> cleaner if the nofailback property would be associated to a given
> service rather than a given location rule. And if we later support pools
> as resources, the property should be associated to (certain or all)
> services in that pool and defined in the resource config for the pool.
>
> To avoid the double-negation with nofailback=0, it could also be renamed
> to a positive property, below called "auto-elevate", just a working name.
>
> A small concern of mine was that this makes it impossible to have a
> service that only "auto-elevates" to a specific node with a priority,
> but not others. This is already not possible right now, and honestly,
> that would be quite strange behavior and not supporting that is unlikely
> to hurt real use cases.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin
2025-04-25 9:12 ` Fiona Ebner
@ 2025-04-25 13:30 ` Daniel Kral
0 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-25 13:30 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 4/25/25 11:12, Fiona Ebner wrote:
> Am 25.04.25 um 10:29 schrieb Daniel Kral:
>> On 4/24/25 15:03, Fiona Ebner wrote:
>>> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>>>> +
>>>> + $func->($rule, $ruleid);
>>>> + }
>>>> +}
>>>> +
>>>> +sub canonicalize {
>>>> + my ($class, $rules, $groups, $services) = @_;
>>>> +
>>>> + die "implement in subclass";
>>>> +}
>>>> +
>>>> +sub are_satisfiable {
>>>> + my ($class, $rules, $groups, $services) = @_;
>>>> +
>>>> + die "implement in subclass";
>>>> +}
>>>
>>> This might not be possible to implement in just the subclasses. E.g.
>>> services 1 and 2 have strict colocation with each other, but 1 has
>>> restricted location on node A and 2 has restricted location on node B.
>>>
>>> I don't think it hurts to rather put the implementation here with
>>> knowledge of all rule types and what inter-dependencies they entail. And
>>> maybe have it be a function rather than a method then?
>>
>> Yes, you're right, it would make more sense to have these be functions
>> rather than methods. In the current implementation it's rather confusing
>> and in the end $rules should consist of all types of rules, so $groups
>> and $services are hopefully not needed as separate parameters anymore
>> (The only usage for these are to check for HA group members).
>
> For canonicalize(), I don't think it's a hard requirement. Can still be
> useful for further optimization of course.
>
>> What do you think about something like a
>>
>> sub register_rule_check {
>> my ($class, $check_func, $canonicalize_func, $satisfiable_func) = @_;
>> }
>>
>> in the base plugin and then each plugin can register their checker
>> methods with the behavior what is done when running canonicalize(...)
>> and are_satisfiable(...)? These then have to go through every registered
>> entry in the list and call $check_func and then either
>> $canonicalize_func and $satisfiable_func.
>
> I don't see how that would help with the scenario I described above
> where the non-satisfiability can only be seen by knowing about
> inter-dependencies between rules.
>
>> Another (simpler) option would be to just put all checker subroutines in
>> the base plugin, but that could get unmaintainable quite fast.
>
> I think the helpers should go into the plugins. These can be designed to
> take the constraints arising from the inter-dependency as arguments.
> E.g. a helper in the location plugin, simply checking if the location
> rules are satisfiable (no constraints) and returning the arising
> services<->nodes constraints. A helper in the colocation plugin to check
> if colocation rules are satisfiable given certain services<->nodes
> constraints. The main function in the base plugin would just need to
> call these two in order then.
As discussed off-list, I'll take a closer look how we can improve the
interface of the helpers between the Location and Colocation plugin
here, so that they are less coupled on one another.
Depending on how large the rule set can get, I could see some possible
improvements to factor out some of the common checks as pointed out by
Fabian and you on/off-list, so that they're only done once, but as
discussed off-list, I'll wait until their configuration variables are
settled (nofailback, enabled/disabled/conflict).
>
>>>> +sub checked_config {
>>>> + my ($rules, $groups, $services) = @_;
>>>> +
>>>> + my $types = __PACKAGE__->lookup_types();
>>>> +
>>>> + for my $type (@$types) {
>>>> + my $plugin = __PACKAGE__->lookup($type);
>>>> +
>>>> + $plugin->canonicalize($rules, $groups, $services);
>>>
>>> Shouldn't we rather only pass the rules that belong to the specific
>>> plugin rather than always all?
>>
>> As in the previous comment, I think it would be reasonable to pass all
>> types of rules as there are some checks that require to check between
>> colocation and location rules, for example. But it would also make sense
>> to move these more general checks in the base plugin, so that the
>> checkers in the plugins have to only care about their own feasibility.
>
> Again, IMHO we could have the plugins implement suitable helper
> functions, but put the logic that knows about inter-dependencies into
> the base plugin itself. Otherwise, you essentially need every plugin to
> care about all others, rather than having only the common base plugin
> care about all.
>
> So design the helpers in expectation of what inter-dependencies we need
> to consider (this will of course change with future rules, but we are
> flexible to adapt), but don't have the plugins be concerned with other
> plugins directly, i.e. they don't need to know how the constraints arise
> from other rule types.
Similar to the above, having a more decoupled interface or separating
check helpers into "plugin-local" and "global" (e.g. checking
inconsistent inter-dependencies between location and colocation rules)
makes sense here.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
2025-04-25 13:25 ` Daniel Kral
@ 2025-04-25 13:58 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-25 13:58 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion
Am 25.04.25 um 15:25 schrieb Daniel Kral:
> On 4/25/25 14:25, Fiona Ebner wrote:
>> Am 25.04.25 um 10:36 schrieb Daniel Kral:
>>> On 4/24/25 12:12, Fiona Ebner wrote:
>>> As suggested by @Lukas off-list, I'll also try to make the check
>>> selective, e.g. the user has made an infeasible change to the config
>>> manually by writing to the file and then wants to create another rule.
>>> Here it should ignore the infeasible rules (as they'll be dropped
>>> anyway) and only check if the added rule / changed rule is infeasible.
>>
>> How will you select the rule to drop? Applying the rules one-by-one to
>> find a first violation?
>
> AFAICS we could use the same helpers to check whether the rules are
> feasible, and only check whether the added / updated ruleid is one that
> is causing these troubles. I guess this would be a reasonable option
> without duplicating code, but still check against the whole config.
> There's surely some optimization potential here, but then we would have
> a larger problem at reloading the rule configuration for the manager
> anyway. For the latter I could check for what size of a larger
> configuration this could become an actual bottleneck.
>
> For either adding a rule or updating a rule, we would just make the
> change to the configuration in-memory and run the helper. Depending on
> the result, we'd store the config or error out to the API user.
ACK, I also don't think we need to worry too much about optimization
here yet.
>>> But as you said, it must not change the user's configuration in the end
>>> as that would be very confusing to the user.
>>
>> Okay, so dropping dynamically. I guess we could also disable such rules
>> explicitly/mark them as being in violation with other rules somehow:
>> Tri-state enabled/disabled/conflict status? Explicit field?
>>
>> Something like that would make such rules easily visible and have the
>> configuration better reflect the actual status.
>>
>> As discussed off-list now: we can try to re-enable conflicting rules
>> next time the rules are loaded.
>
> Hm, there's three options now:
>
> - Allowing conflicts over the create / update API and auto-resolving the
> conflicts as soon as we're able to (e.g. on the load / save where the
> rule becomes feasible again).
>
> - Not allowing conflicts over the create / update API, but set the state
> to 'conflict' if manual changes (or other circumstances) made the rules
> be in conflict with one another.
>
> - Having something like the SDN config, where there's a working
> configuration and a "draft" configuration that needs to be applied. So
> conflicts are allowed in drafts, but not in working configurations.
>
> The SDN option seems too much for me here, but I just noticed some
> similarity.
>
> I guess one of the first two makes more sense. If there's no arguments
> against this, I'd choose the second option as we can always allow
> intentional conflicts later if there's user demand or we see other
> reasons in that.
I do prefer the second option :)
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
2025-04-03 12:16 ` Fabian Grünbichler
@ 2025-04-25 14:05 ` Fiona Ebner
2025-04-29 8:44 ` Daniel Kral
1 sibling, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-25 14:05 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Not much to add to Fabian's review :)
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
> new file mode 100644
> index 0000000..808d48e
> --- /dev/null
> +++ b/src/PVE/HA/Rules/Colocation.pm
> @@ -0,0 +1,391 @@
> +package PVE::HA::Rules::Colocation;
> +
> +use strict;
> +use warnings;
> +
> +use Data::Dumper;
> +
> +use PVE::JSONSchema qw(get_standard_option);
Missing include of PVE::Tools.
Nit: I'd put a blank here to separate modules from different packages
and modules from the same package.
> +use PVE::HA::Tools;
> +
> +use base qw(PVE::HA::Rules);
> +
> +sub type {
> + return 'colocation';
> +}
> +
> +sub properties {
> + return {
> + services => get_standard_option('pve-ha-resource-id-list'),
> + affinity => {
> + description => "Describes whether the services are supposed to be kept on separate"
> + . " nodes, or are supposed to be kept together on the same node.",
> + type => 'string',
> + enum => ['separate', 'together'],
> + optional => 0,
> + },
> + strict => {
> + description => "Describes whether the colocation rule is mandatory or optional.",
> + type => 'boolean',
> + optional => 0,
> + },
> + }
Style nit: missing semicolon
Since we should move the property definitions to the base module once a
second plugin re-uses them later: should we already declare 'services'
and 'strict' in the base module to start out? Then we could implement
the encode/decode part for 'services' there already. Less moving around
or duplication later on.
> +}
> +
> +sub options {
> + return {
> + services => { optional => 0 },
> + strict => { optional => 0 },
> + affinity => { optional => 0 },
> + comment => { optional => 1 },
> + };
> +};
> +
> +sub decode_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + if ($key eq 'services') {
> + my $res = {};
> +
> + for my $service (PVE::Tools::split_list($value)) {
> + if (PVE::HA::Tools::pve_verify_ha_resource_id($service)) {
> + $res->{$service} = 1;
> + }
> + }
> +
> + return $res;
> + }
> +
> + return $value;
> +}
> +
> +sub encode_value {
> + my ($class, $type, $key, $value) = @_;
> +
> + if ($key eq 'services') {
> + PVE::HA::Tools::pve_verify_ha_resource_id($_) for (keys %$value);
Style nit:
[I] febner@dev8 /usr/share/perl5/PVE> ag "for keys" | wc -l
28
[I] febner@dev8 /usr/share/perl5/PVE> ag "for \(keys" | wc -l
0
> +
> + return join(',', keys %$value);
> + }
> +
> + return $value;
> +}
> +
---snip 8<---
> +=head3 check_service_count($rules)
> +
> +Returns a list of conflicts caused by colocation rules, which do not have
> +enough services in them, defined in C<$rules>.
> +
> +If there are no conflicts, the returned list is empty.
> +
> +=cut
> +
> +sub check_services_count {
> + my ($rules) = @_;
> +
> + my $conflicts = [];
> +
> + foreach_colocation_rule($rules, sub {
> + my ($rule, $ruleid) = @_;
> +
> + push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}}) < 2);
Style nit: parentheses for post-if
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-04-11 11:04 ` Daniel Kral
@ 2025-04-25 14:06 ` Fiona Ebner
2025-04-29 8:37 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-25 14:06 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral, Fabian Grünbichler
Am 11.04.25 um 13:04 schrieb Daniel Kral:
> On 4/3/25 14:16, Fabian Grünbichler wrote:
>> On March 25, 2025 4:12 pm, Daniel Kral wrote:
>>> +sub check_services_count {
>>> + my ($rules) = @_;
>>> +
>>> + my $conflicts = [];
>>> +
>>> + foreach_colocation_rule($rules, sub {
>>> + my ($rule, $ruleid) = @_;
>>> +
>>> + push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}})
>>> < 2);
>>> + });
>>> +
>>> + return $conflicts;
>>> +}
>>
>> is this really an issue? a colocation rule with a single service is just
>> a nop? there's currently no cleanup AFAICT if a resource is removed, but
>
> You're right, AFAICS those are a noop when selecting the service node. I
> guess I was a little pedantic / overprotective here about which rules
> make sense in general instead of what the algorithm does in the end.
>
> And good point about handling when resources are removed, adding that to
> delete_service_from_config comes right on my TODO list for the v1!
>
>> if we add that part (we maybe should?) then one can easily end up in a
>> situation where a rule temporarily contains a single or no service?
>
> Hm, yes, especially if we add pools/tags at a later point to select
> services for the rule, then this could happen very easily. But as you
> already mentioned, those two cases would be noops too.
>
> Nevertheless, should we drop this? I think it could benefit users in
> identifying that some rules might not do something they wanted and give
> them a reason why, i.e. there's only one service in there, but at the
> same time it could be a little noisy if there are a lot of affected rules.
I'd still keep rules that end up with only one service around, but maybe
(temporarily) disable them. And/or we could also add a special
"no-effect" marker like the "conflict" one proposed in my other answer?
Then a user could enable/make the rule effective again by adding a new
service to it.
>>> +
>>> +=head3 check_positive_intransitivity($rules)
>>> +
>>> +Returns a list of conflicts caused by transitive positive colocation
>>> rules
>>> +defined in C<$rules>.
>>> +
>>> +Transitive positive colocation rules exist, if there are at least
>>> two positive
>>> +colocation rules with the same strictness, which put at least the
>>> same two
>>> +services in relation. This means, that these rules can be merged
>>> together.
>>> +
>>> +If there are no conflicts, the returned list is empty.
>>
>> The terminology here is quit confusing - conflict meaning that two rules
>> are "transitive" and thus mergeable (which is good, cause it makes
>> things easier to handle?) is quite weird, as "conflict" is a rather
>> negative term..
>>
>> there's only a single call site in the same module, maybe we could just
>> rename this into "find_mergeable_positive_ruleids", similar to the
>> variable where the result is stored?
>
> Yeah, I was probably to keen on the `$conflict = check_something(...)`
> pattern here, but it would be much more readable with a simpler name,
> I'll change that for the v1!
>
> -----
>
> Ad why: I'll also add some documentation about the rationale why this is
> needed in the first place.
>
> The main reason was because the latter rule check
> 'check_inner_consistency' depends on the fact that the positive
> colocation rules have been merged already, as it assumes that each
> positive colocation rule has all of the services in there, which are
> positively colocated. If it weren't so, it wouldn't detect that the
> following three rules are inconsistent with each other:
>
> colocation: stick-together1
> services vm:101,vm:104
> affinity together
> strict 1
>
> colocation: stick-together2
> services vm:104,vm:102
> affinity together
> strict 1
>
> colocation: keep-apart
> services vm:101,vm:102,vm:103
> affinity separate
> strict 1
>
> This reduces the complexity of the logic a little in
> 'check_inner_consistency' as there it doesn't have to handle this
> special case as 'stick-together1' and 'stick-together2' are already
> merged in to one and it is easily apparent that vm 101 and vm 102 cannot
> be colocated and non-colocated at the same time.
>
> -----
>
> Also, I was curious about how that would work out for the case where a
> negative colocation rule was defined for three nodes with those rules
> split into three rules (essentially a cycle dependence). This should in
> theory have the same semantics as the above rule set:
>
> colocation: stick-together1
> services vm:101,vm:104
> affinity together
> strict 1
>
> colocation: stick-together2
> services vm:104,vm:102
> affinity together
> strict 1
>
> colocation: very-lonely-services1
> services vm:101,vm:102
> affinity separate
> strict 1
>
> colocation: very-lonely-services2
> services vm:102,vm:103
> affinity separate
> strict 1
>
> colocation: very-lonely-services3
> services vm:101,vm:103
> affinity separate
> strict 1
>
> Without the merge of positive rules, 'check_inner_consistency' would
> again not detect the inconsistency here. But with the merge correctly
> applied before checking the consistency, this would be resolved and the
> effective rule set would be:
I suppose the effective rule set would still also contain the two
'together' rules, or?
>
> colocation: very-lonely-services2
> services vm:102,vm:103
> affinity separate
> strict 1
>
> colocation: very-lonely-services3
> services vm:101,vm:103
> affinity separate
> strict 1
>
> It could be argued, that the negative colocation rules should be merged
> in a similar manner here, as there's now a "effective" difference in the
> semantics of the above rule sets, as the negative colocation rule
> between vm 101 and vm 103 and vm 102 and vm 103 remains.
>
> What do you think?
I don't think there's a particular need to also merge negative rules
between services (when they form a complete graph). It won't make a
difference if there are no conflicts with positive rules and in edge
cases when there are conflicts (which usually gets caught while editing
the rules), it's better to drop fewer rules, so not merging is an
advantage. Or do you have a particular advantage in favor of merging in
mind?
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
@ 2025-04-25 14:11 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-25 14:11 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
> index 1de4b69..3157e56 100644
> --- a/src/PVE/HA/Env/PVE2.pm
> +++ b/src/PVE/HA/Env/PVE2.pm
> @@ -28,6 +28,13 @@ PVE::HA::Resources::PVECT->register();
>
> PVE::HA::Resources->init();
>
> +use PVE::HA::Rules;
> +use PVE::HA::Rules::Colocation;
Nit: use statements should come before other code
> +
> +PVE::HA::Rules::Colocation->register();
> +
> +PVE::HA::Rules->init();
> +
> my $lockdir = "/etc/pve/priv/lock";
>
> sub new {
> @@ -188,6 +195,12 @@ sub steal_service {
> $self->cluster_state_update();
> }
>
> +sub read_rules_config {
> + my ($self) = @_;
> +
> + return PVE::HA::Config::read_rules_config();
> +}
> +
> sub read_group_config {
> my ($self) = @_;
>
> diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
> index b2ab231..2f73859 100644
> --- a/src/PVE/HA/Sim/Env.pm
> +++ b/src/PVE/HA/Sim/Env.pm
> @@ -20,6 +20,13 @@ PVE::HA::Sim::Resources::VirtFail->register();
>
> PVE::HA::Resources->init();
>
> +use PVE::HA::Rules;
> +use PVE::HA::Rules::Colocation;
Same nit as above.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
@ 2025-04-25 14:30 ` Fiona Ebner
2025-04-29 8:04 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-25 14:30 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Read the rules configuration in each round and update the canonicalized
> rules configuration if there were any changes since the last round to
> reduce the amount of times of verifying the rule set.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> As noted inline already, there's a missing check whether the service
> configuration changed, which includes the HA group assignment (and is
> only needed for these), since there's no digest as for groups/rules.
>
> I was hesitant to change the structure of `%sc` or the return value of
> `read_service_config()` as it's used quite often and didn't want to
> create a sha1 digest here just for this check. This is another plus
> point to have all of these constraints in a single configuration file.
>
> src/PVE/HA/Manager.pm | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index d983672..7a8e7dc 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -11,6 +11,9 @@ use PVE::HA::NodeStatus;
> use PVE::HA::Usage::Basic;
> use PVE::HA::Usage::Static;
>
> +use PVE::HA::Rules;
> +use PVE::HA::Rules::Colocation;
Nit: there should be no blank, should be ordered alphabetically
> +
> ## Variable Name & Abbreviations Convention
> #
> # The HA stack has some variables it uses frequently and thus abbreviates it such that it may be
> @@ -41,7 +44,12 @@ sub new {
>
> my $class = ref($this) || $this;
>
> - my $self = bless { haenv => $haenv, crs => {} }, $class;
> + my $self = bless {
> + haenv => $haenv,
> + crs => {},
> + last_rules_digest => '',
> + last_groups_digest => '',
> + }, $class;
>
> my $old_ms = $haenv->read_manager_status();
>
> @@ -497,6 +505,19 @@ sub manage {
> delete $ss->{$sid};
> }
>
> + my $new_rules = $haenv->read_rules_config();
> +
> + # TODO We should also check for a service digest here, but we would've to
> + # calculate it here independently or also expose it through read_service_config()
Ah, so read_and_check_resources_config() drops the digest and produces a
hash with the only keys being the $sids. Easiest is probably to add
wantarray in the relevant place(s) and return a list with the digest
second. And I guess we don't need that anymore after we migrated from HA
groups to location rules, since it's only used to get the groups? If
yes, then we should add a reminder to remove it again.
> + if ($new_rules->{digest} ne $self->{last_rules_digest}
> + || $self->{groups}->{digest} ne $self->{last_groups_digest}) {
> + $self->{rules} = $new_rules;
> + PVE::HA::Rules::checked_config($self->{rules}, $self->{groups}, $sc);
Might not matter now, but I'd prefer: check first, then assign.
> + }
> +
> + $self->{last_rules_digest} = $self->{rules}->{digest};
> + $self->{last_groups_digest} = $self->{groups}->{digest};
> +
> $self->update_crm_commands();
>
> for (;;) {
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17 ` Fabian Grünbichler
@ 2025-04-28 12:26 ` Fiona Ebner
2025-04-28 14:33 ` Fiona Ebner
2025-04-29 9:50 ` Daniel Kral
2025-04-30 11:09 ` Daniel Kral
2 siblings, 2 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 12:26 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add a mechanism to the node selection subroutine, which enforces the
> colocation rules defined in the rules config.
>
> The algorithm manipulates the set of nodes directly, which the service
> is allowed to run on, depending on the type and strictness of the
> colocation rules, if there are any.
>
> This makes it depend on the prior removal of any nodes, which are
> unavailable (i.e. offline, unreachable, or weren't able to start the
> service in previous tries) or are not allowed to be run on otherwise
> (i.e. HA group node restrictions) to function correctly.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> src/PVE/HA/Manager.pm | 203 ++++++++++++++++++++++++++++++++++++-
> src/test/test_failover1.pl | 4 +-
> 2 files changed, 205 insertions(+), 2 deletions(-)
>
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 8f2ab3d..79b6555 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -157,8 +157,201 @@ sub get_node_priority_groups {
> return ($pri_groups, $group_members);
> }
>
I feel like these helper functions should rather go into the colocation
plugin or some other module to not bloat up Manager.pm more.
> +=head3 get_colocated_services($rules, $sid, $online_node_usage)
> +
> +Returns a hash map of all services, which are specified as being in a positive
> +or negative colocation in C<$rules> with the given service with id C<$sid>.
> +
> +Each service entry consists of the type of colocation, strictness of colocation
> +and the node the service is currently assigned to, if any, according to
> +C<$online_node_usage>.
> +
> +For example, a service C<'vm:101'> being strictly colocated together (positive)
> +with two other services C<'vm:102'> and C<'vm:103'> and loosely colocated
> +separate with another service C<'vm:104'> results in the hash map:
> +
> + {
> + 'vm:102' => {
> + affinity => 'together',
> + strict => 1,
> + node => 'node2'
> + },
> + 'vm:103' => {
> + affinity => 'together',
> + strict => 1,
> + node => 'node2'
> + },
> + 'vm:104' => {
> + affinity => 'separate',
> + strict => 0,
> + node => undef
Why is the node undef here?
> + }
> + }
> +
> +=cut
> +
> +sub get_colocated_services {
> + my ($rules, $sid, $online_node_usage) = @_;
> +
> + my $services = {};
> +
> + PVE::HA::Rules::Colocation::foreach_colocation_rule($rules, sub {
> + my ($rule) = @_;
> +
> + for my $csid (sort keys %{$rule->{services}}) {
> + next if $csid eq $sid;
> +
> + $services->{$csid} = {
> + node => $online_node_usage->get_service_node($csid),
> + affinity => $rule->{affinity},
> + strict => $rule->{strict},
> + };
> + }
> + }, {
> + sid => $sid,
> + });
> +
> + return $services;
> +}
> +
> +=head3 get_colocation_preference($rules, $sid, $online_node_usage)
> +
> +Returns a list of two hashes, where each is a hash map of the colocation
> +preference of C<$sid>, according to the colocation rules in C<$rules> and the
> +service locations in C<$online_node_usage>.
> +
> +The first hash is the positive colocation preference, where each element
> +represents properties for how much C<$sid> prefers to be on the node.
> +Currently, this is a binary C<$strict> field, which means either it should be
s/it/the service/
> +there (C<0>) or must be there (C<1>).
> +
> +The second hash is the negative colocation preference, where each element
> +represents properties for how much C<$sid> prefers not to be on the node.
> +Currently, this is a binary C<$strict> field, which means either it should not
s/it/the service/
> +be there (C<0>) or must not be there (C<1>).
> +
> +=cut
> +
> +sub get_colocation_preference {
> + my ($rules, $sid, $online_node_usage) = @_;
> +
> + my $services = get_colocated_services($rules, $sid, $online_node_usage);
The name $services is a bit too generic, maybe $colocation_per_service
or something?
Maybe it would be better to just merge this one and the helper above
into a single one? I.e. just handle the info while iterating the rules
directly instead of creating a novel temporary per-service
data-structure and iterate twice.
> +
> + my $together = {};
> + my $separate = {};
> +
> + for my $service (values %$services) {
> + my $node = $service->{node};
> +
> + next if !$node;
> +
> + my $node_set = $service->{affinity} eq 'together' ? $together : $separate;
> + $node_set->{$node}->{strict} = $node_set->{$node}->{strict} || $service->{strict};
> + }
> +
> + return ($together, $separate);
> +}
> +
> +=head3 apply_positive_colocation_rules($together, $allowed_nodes)
> +
> +Applies the positive colocation preference C<$together> on the allowed node
> +hash set C<$allowed_nodes> directly.
> +
> +Positive colocation means keeping services together on a single node, and
> +therefore minimizing the separation of services.
> +
> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
> +which is available to the service, i.e. each node is currently online, is
> +available according to other location constraints, and the service has not
> +failed running there yet.
> +
> +=cut
> +
> +sub apply_positive_colocation_rules {
> + my ($together, $allowed_nodes) = @_;
> +
> + return if scalar(keys %$together) < 1;
> +
> + my $mandatory_nodes = {};
> + my $possible_nodes = PVE::HA::Tools::intersect($allowed_nodes, $together);
> +
> + for my $node (sort keys %$together) {
> + $mandatory_nodes->{$node} = 1 if $together->{$node}->{strict};
> + }
> +
> + if (scalar keys %$mandatory_nodes) {
> + # limit to only the nodes the service must be on.
> + for my $node (keys %$allowed_nodes) {
> + next if exists($mandatory_nodes->{$node});
Style nit: I'd avoid using exists() if you explicitly expect a set
value. Otherwise, it can break because of accidental auto-vivification
in the future.
> +
> + delete $allowed_nodes->{$node};
> + }
> + } elsif (scalar keys %$possible_nodes) {
> + # limit to the possible nodes the service should be on, if there are any.
> + for my $node (keys %$allowed_nodes) {
> + next if exists($possible_nodes->{$node});
> +
> + delete $allowed_nodes->{$node};
This seems wrong. Non-strict rules should not limit the allowed nodes.
See below for more on this.
> + }
> + }
> +}
> +
> +=head3 apply_negative_colocation_rules($separate, $allowed_nodes)
> +
> +Applies the negative colocation preference C<$separate> on the allowed node
> +hash set C<$allowed_nodes> directly.
> +
> +Negative colocation means keeping services separate on multiple nodes, and
> +therefore maximizing the separation of services.
> +
> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
> +which is available to the service, i.e. each node is currently online, is
> +available according to other location constraints, and the service has not
> +failed running there yet.
> +
> +=cut
> +
> +sub apply_negative_colocation_rules {
> + my ($separate, $allowed_nodes) = @_;
> +
> + return if scalar(keys %$separate) < 1;
> +
> + my $mandatory_nodes = {};
> + my $possible_nodes = PVE::HA::Tools::set_difference($allowed_nodes, $separate);
> +
> + for my $node (sort keys %$separate) {
> + $mandatory_nodes->{$node} = 1 if $separate->{$node}->{strict};
> + }
> +
> + if (scalar keys %$mandatory_nodes) {
> + # limit to the nodes the service must not be on.
> + for my $node (keys %$allowed_nodes) {
> + next if !exists($mandatory_nodes->{$node});
> +
> + delete $allowed_nodes->{$node};
> + }
> + } elsif (scalar keys %$possible_nodes) {
> + # limit to the nodes the service should not be on, if any.
> + for my $node (keys %$allowed_nodes) {
> + next if exists($possible_nodes->{$node});
> +
> + delete $allowed_nodes->{$node};
> + }
> + }
> +}
> +
> +sub apply_colocation_rules {
> + my ($rules, $sid, $allowed_nodes, $online_node_usage) = @_;
> +
> + my ($together, $separate) = get_colocation_preference($rules, $sid, $online_node_usage);
> +
> + apply_positive_colocation_rules($together, $allowed_nodes);
> + apply_negative_colocation_rules($separate, $allowed_nodes);
I think there could be a problematic scenario with
* no strict positive rules, but loose strict positive rules
* strict negative rules
where apply_positive_colocation_rules() will limit $allowed_nodes in
such a way that the strict negative rules cannot be satisfied anymore
afterwards.
I feel like what we actually want from non-strict rules is not to limit
the allowed nodes at all, but only express preferences. After scoring,
we could:
1. always take a colocation preference node if present no matter what
the usage score is
2. have a threshold to not follow through, if there is a non-colocation
preference node with a much better usage score relatively
3. somehow massage it into the score itself. E.g. every node that would
be preferred by colocation gets a 0.5 multiplier score adjustment while
other scores are unchanged - remember that lower score is better.
4. [insert your suggestion here]
So to me it seems like there should be a helper that gives us:
1. list of nodes that satisfy strict rules - these we can then intersect
with the $pri_nodes
2. list of nodes that are preferred by non-strict rules - these we can
consider after scoring
> +}
> +
> sub select_service_node {
> - my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
> + # TODO Cleanup this signature post-RFC
> + my ($rules, $groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
>
> my $group = get_service_group($groups, $online_node_usage, $service_conf);
>
> @@ -189,6 +382,8 @@ sub select_service_node {
>
> return $current_node if (!$try_next && !$best_scored) && $pri_nodes->{$current_node};
>
> + apply_colocation_rules($rules, $sid, $pri_nodes, $online_node_usage);
> +
> my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
> my @nodes = sort {
> $scores->{$a} <=> $scores->{$b} || $a cmp $b
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-11 15:56 ` Daniel Kral
@ 2025-04-28 12:46 ` Fiona Ebner
2025-04-29 9:07 ` Daniel Kral
0 siblings, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 12:46 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral, Fabian Grünbichler
Am 11.04.25 um 17:56 schrieb Daniel Kral:
> On 4/3/25 14:17, Fabian Grünbichler wrote:
>> On March 25, 2025 4:12 pm, Daniel Kral wrote:
>>> +sub apply_positive_colocation_rules {
>>> + my ($together, $allowed_nodes) = @_;
>>> +
>>> + return if scalar(keys %$together) < 1;
>>> +
>>> + my $mandatory_nodes = {};
>>> + my $possible_nodes = PVE::HA::Tools::intersect($allowed_nodes,
>>> $together);
>>> +
>>> + for my $node (sort keys %$together) {
>>> + $mandatory_nodes->{$node} = 1 if $together->{$node}->{strict};
>>> + }
>>> +
>>> + if (scalar keys %$mandatory_nodes) {
>>> + # limit to only the nodes the service must be on.
>>> + for my $node (keys %$allowed_nodes) {
>>> + next if exists($mandatory_nodes->{$node});
>>> +
>>> + delete $allowed_nodes->{$node};
>>> + }
>>> + } elsif (scalar keys %$possible_nodes) {
>>
>> I am not sure I follow this logic here.. if there are any strict
>> requirements, we only honor those.. if there are no strict requirements,
>> we only honor the non-strict ones?
>
> Please correct me if I'm wrong, but at least for my understanding this
> seems right, because the nodes in $together are the nodes, which other
> co-located services are already running on.
>
> If there is a co-located service already running somewhere and the
> services MUST be kept together, then there will be an entry like 'node3'
> => { strict => 1 } in $together. AFAICS we can then ignore any non-
> strict nodes here, because we already know where the service MUST run.
>
> If there is a co-located service already running somewhere and the
> services SHOULD be kept together, then there will be one or more
> entries, e.g. $together = { 'node1' => { strict => 0 }, 'node2' =>
> { strict => 0 } };
>
> If there is no co-located service already running somewhere, then
> $together = {}; and this subroutine won't do anything to $allowed_nodes.
>
> In theory, we could assume that %$mandatory_nodes has always only one
> node, because it is mandatory. But currently, we do not hinder users
> manually migrating against colocation rules (maybe we should?) or what
> if rules suddenly change from non-strict to strict. We do not auto-
> migrate if rules change (maybe we should?).
I feel like we should trigger auto-migration for strict colocation
rules. I.e. apply the rules earlier in select_service_node(), before the
"keep current node" early return.
With nofailback=0, we do not keep the current node when node priorities
change for HA groups or the service's group changes, so it feels
consistent to do the same for colocation rules. We'll need to be careful
not to get a "both services now migrate towards each other" switch-up
scenario of course.
We also don't hinder migrating against group priorities, where, with
nofailback=0, it will migrate straight back again. This can be improved
of course, but nothing new, so I'd consider it orthogonal to the
colocation implementation here.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
@ 2025-04-28 13:03 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 13:03 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Factor out the prioritized node hash set in the select_service_node as
> it is used multiple times and makes the intent a little clearer.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
@ 2025-04-28 13:20 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 13:20 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add an option to the VirtFail's name to allow the start and migrate fail
> counts to only apply on a certain node number with a specific naming
> scheme.
>
> This allows a slightly more elaborate test type, e.g. where a service
> can start on one node (or any other in that case), but fails to start on
> a specific node, which it is expected to start on after a migration.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
With some nits:
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> src/PVE/HA/Sim/Resources/VirtFail.pm | 37 +++++++++++++++++++---------
> 1 file changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/src/PVE/HA/Sim/Resources/VirtFail.pm b/src/PVE/HA/Sim/Resources/VirtFail.pm
> index ce88391..fddecd6 100644
> --- a/src/PVE/HA/Sim/Resources/VirtFail.pm
> +++ b/src/PVE/HA/Sim/Resources/VirtFail.pm
> @@ -10,25 +10,36 @@ use base qw(PVE::HA::Sim::Resources);
> # To make it more interesting we can encode some behavior in the VMID
> # with the following format, where fa: is the type and a, b, c, ...
> # are digits in base 10, i.e. the full service ID would be:
> -# fa:abcde
> +# fa:abcdef
> # And the digits after the fa: type prefix would mean:
> # - a: no meaning but can be used for differentiating similar resources
> # - b: how many tries are needed to start correctly (0 is normal behavior) (should be set)
> # - c: how many tries are needed to migrate correctly (0 is normal behavior) (should be set)
> # - d: should shutdown be successful (0 = yes, anything else no) (optional)
> # - e: return value of $plugin->exists() defaults to 1 if not set (optional)
> +# - f: limits the constraints of b and c to the nodeX (0 = apply to all nodes) (optional)
Requires us to have exactly this kind of node name for such tests, but
can be fine IMHO.
>
> my $decode_id = sub {
> my $id = shift;
>
> - my ($start, $migrate, $stop, $exists) = $id =~ /^\d(\d)(\d)(\d)?(\d)?/g;
> + my ($start, $migrate, $stop, $exists, $limit_to_node) = $id =~ /^\d(\d)(\d)(\d)?(\d)?(\d)?/g;
>
> $start = 0 if !defined($start);
> $migrate = 0 if !defined($migrate);
> $stop = 0 if !defined($stop);
> $exists = 1 if !defined($exists);
> + $limit_to_node = 0 if !defined($limit_to_node);
>
> - return ($start, $migrate, $stop, $exists)
> + return ($start, $migrate, $stop, $exists, $limit_to_node);
> +};
> +
> +my $should_retry_action = sub {
"action" feels a bit too general to me. It does not apply to all
actions. Also it determines whether the action itself should fail.
Retrying is then just the consequence.
> + my ($haenv, $limit_to_node) = @_;
> +
> + my ($node) = $haenv->nodename() =~ /^node(\d)/g;
No need for a regex, you could just check $limit_to_node == 0 early and
then compare with the exactly known value.
> + $node = 0 if !defined($node);
> +
> + return $limit_to_node == 0 || $limit_to_node == $node;
> };
>
> my $tries = {
> @@ -53,12 +64,14 @@ sub exists {
> sub start {
> my ($class, $haenv, $id) = @_;
>
> - my ($start_failure_count) = &$decode_id($id);
> + my ($start_failure_count, $limit_to_node) = (&$decode_id($id))[0,4];
Style nit: pre-existing, but you can go for $decode_id->()
>
> - $tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
> - $tries->{start}->{$id}++;
> + if ($should_retry_action->($haenv, $limit_to_node)) {
> + $tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
> + $tries->{start}->{$id}++;
>
> - return if $start_failure_count >= $tries->{start}->{$id};
> + return if $start_failure_count >= $tries->{start}->{$id};
> + }
>
> $tries->{start}->{$id} = 0; # reset counts
>
> @@ -79,12 +92,14 @@ sub shutdown {
> sub migrate {
> my ($class, $haenv, $id, $target, $online) = @_;
>
> - my (undef, $migrate_failure_count) = &$decode_id($id);
> + my ($migrate_failure_count, $limit_to_node) = (&$decode_id($id))[1,4];
Same as above
>
> - $tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
> - $tries->{migrate}->{$id}++;
> + if ($should_retry_action->($haenv, $limit_to_node)) {
> + $tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
> + $tries->{migrate}->{$id}++;
>
> - return if $migrate_failure_count >= $tries->{migrate}->{$id};
> + return if $migrate_failure_count >= $tries->{migrate}->{$id};
> + }
>
> $tries->{migrate}->{$id} = 0; # reset counts
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
@ 2025-04-28 13:44 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 13:44 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add test cases for strict negative colocation rules, i.e. where services
> must be kept on separate nodes. These verify the behavior of the
> services in strict negative colocation rules in case of a failover of
> the node of one or more of these services in the following scenarios:
>
> - 2 neg. colocated services in a 3 node cluster; 1 node failing
> - 3 neg. colocated services in a 5 node cluster; 1 node failing
> - 3 neg. colocated services in a 5 node cluster; 2 nodes failing
> - 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
> recovery node cannot start the service
> - Pair of 2 neg. colocated services (with one common service in both) in
> a 3 node cluster; 1 node failing
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Two very minor nits and a typo below:
> diff --git a/src/test/test-colocation-strict-separate2/README b/src/test/test-colocation-strict-separate2/README
> new file mode 100644
> index 0000000..f494d2b
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate2/README
> @@ -0,0 +1,15 @@
> +Test whether a strict negative colocation rule among three services makes one
> +of the services migrate to a different node than the other services in case of
> +a failover of the service's previously assigned node.
> +
> +The test scenario is:
> +- vm:101, vm:102, and vm:103 must be kept separate
> +- vm:101, vm:102, and vm:103 are on node3, node4, and node5 respectively
> +- node1 and node2 have each both higher service counts than node3, node4 and
> + node5 to test the rule is applied even though the scheduler would prefer the
> + less utilizied nodes node3, node4, or node5
s/utilizied/utilized/
Nit: I'd not list node5 in that sentence, because its service count is
not relevant as the failing node.
> diff --git a/src/test/test-colocation-strict-separate3/README b/src/test/test-colocation-strict-separate3/README
> new file mode 100644
> index 0000000..44d88ef
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate3/README
> @@ -0,0 +1,16 @@
> +Test whether a strict negative colocation rule among three services makes two
> +of the services migrate to two different recovery nodes than the node of the
> +third service in case of a failover of their two previously assigned nodes.
> +
> +The test scenario is:
> +- vm:101, vm:102, and vm:103 must be kept separate
> +- vm:101, vm:102, and vm:103 are respectively on node3, node4, and node5
> +- node1 and node2 have both higher service counts than node3, node4 and node5
> + to test the colocation rule is enforced even though the utilization would
> + prefer the other node3, node4, and node5
Nit: I'd not list node4 and node5 in that sentence, because their
service counts are not relevant since they fail.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive colocation rules
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
@ 2025-04-28 13:51 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 13:51 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add test cases for strict positive colocation rules, i.e. where services
> must be kept on the same node together. These verify the behavior of the
> services in strict positive colocation rules in case of a failover of
> their assigned nodes in the following scenarios:
>
> - 2 pos. colocated services in a 3 node cluster; 1 node failing
> - 3 pos. colocated services in a 3 node cluster; 1 node failing
> - 3 pos. colocated services in a 3 node cluster; 1 node failing, but the
> recovery node cannot start one of the services
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Again minor nits with the descriptions:
> diff --git a/src/test/test-colocation-strict-together2/README b/src/test/test-colocation-strict-together2/README
> new file mode 100644
> index 0000000..c1abf68
> --- /dev/null
> +++ b/src/test/test-colocation-strict-together2/README
> @@ -0,0 +1,11 @@
> +Test whether a strict positive colocation rule makes three services migrate to
> +the same recovery node in case of a failover of their previously assigned node.
> +
> +The test scenario is:
> +- vm:101, vm:102, and vm:103 must be kept together
> +- vm:101, vm:102, and vm:103 are all currently running on node3
> +- node1 has a higher service count than node2 to test that the rule is applied
> + even though it would be usually balanced between both remaining nodes
Nit: The balancing would also happen if the service count would be the
same on the two nodes, the sentence makes it sound like that it's a
requirement for this test.
> diff --git a/src/test/test-colocation-strict-together3/README b/src/test/test-colocation-strict-together3/README
> new file mode 100644
> index 0000000..5332696
> --- /dev/null
> +++ b/src/test/test-colocation-strict-together3/README
> @@ -0,0 +1,17 @@
> +Test whether a strict positive colocation rule makes three services migrate to
> +the same recovery node in case of a failover of their previously assigned node.
> +If one of those fail to start on the recovery node (e.g. insufficient
> +resources), the failing service will be kept on the recovery node.
> +
> +The test scenario is:
> +- vm:101, vm:102, and fa:120002 must be kept together
> +- vm:101, vm:102, and fa:120002 are all currently running on node3
> +- fa:120002 will fail to start on node2
> +- node1 has a higher service count than node2 to test that the rule is applied
> + even though it would be usually balanced between both remaining nodes
Nit: The balancing would also happen if the service count would be the
same on the two nodes, the sentence makes it sound like that it's a
requirement for this test. You do need it since the failure for the 'fa'
service will happen on node 2 however, so you should mention that instead.
> +
> +Therefore, the expected outcome is:
> +- As node3 fails, all services are migrated to node2
> +- Two of those services will start successfully, but fa:120002 will stay in
> + recovery, since it cannot be started on this node, but cannot be relocated to
> + another one either due to the strict colocation rule
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-28 12:26 ` Fiona Ebner
@ 2025-04-28 14:33 ` Fiona Ebner
2025-04-29 9:39 ` Daniel Kral
2025-04-29 9:50 ` Daniel Kral
1 sibling, 1 reply; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 14:33 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 28.04.25 um 14:26 schrieb Fiona Ebner:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>> + } elsif (scalar keys %$possible_nodes) {
>> + # limit to the possible nodes the service should be on, if there are any.
>> + for my $node (keys %$allowed_nodes) {
>> + next if exists($possible_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>
> This seems wrong. Non-strict rules should not limit the allowed nodes.
> See below for more on this.
Ah, if there are no possible nodes at all, then the allowed nodes are
not modified at all. This is what makes the loose tests work. This
"secret" here really needs to be properly documented ;)
It still would be nice to think about which kind of interaction with
scoring we want exactly. Currently it's the number 1 I mentioned, i.e.
"prefer loose colocation over scoring no matter what". Can be fine to
start out too, just means we'd need to introduce an option/tunable if we
ever want to change it.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose colocation rules
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
@ 2025-04-28 14:44 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-28 14:44 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add test cases for loose positive and negative colocation rules, i.e.
> where services should be kept on the same node together or kept separate
> nodes. These are copies of their strict counterpart tests, but verify
> the behavior if the colocation rule cannot be met, i.e. not adhering to
> the colocation rule. The test scenarios are:
>
> - 2 neg. colocated services in a 3 node cluster; 1 node failing
> - 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
> recovery node cannot start the service
> - 2 pos. colocated services in a 3 node cluster; 1 node failing
> - 3 pos. colocated services in a 3 node cluster; 1 node failing, but the
> recovery node cannot start one of the services
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
With the errors in the descriptions fixed:
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
> diff --git a/src/test/test-colocation-loose-separate4/README b/src/test/test-colocation-loose-separate4/README
Not sure it should be named the same number as the strict test just
because it's adapted from that.
> new file mode 100644
> index 0000000..5b68cde
> --- /dev/null
> +++ b/src/test/test-colocation-loose-separate4/README
> @@ -0,0 +1,17 @@
> +Test whether a loose negative colocation rule among two services makes one of
> +the services migrate to a different recovery node than the other service in
> +case of a failover of service's previously assigned node. As the service fails
> +to start on the recovery node (e.g. insufficient resources), the failing
> +service is kept on the recovery node.
The description here is wrong. It will be started on a different node
after the start failure.
> +
> +The test scenario is:
> +- vm:101 and fa:120001 should be kept separate
> +- vm:101 and fa:120001 are on node2 and node3 respectively
> +- fa:120001 will fail to start on node1
> +- node1 has a higher service count than node2 to test the colocation rule is
> + applied even though the scheduler would prefer the less utilized node
> +
> +Therefore, the expected outcome is:
> +- As node3 fails, fa:120001 is migrated to node1
> +- fa:120001 will be relocated to another node, since it couldn't start on its
> + initial recovery node
---snip 8<---
> diff --git a/src/test/test-colocation-loose-together1/README b/src/test/test-colocation-loose-together1/README
> new file mode 100644
> index 0000000..2f5aeec
> --- /dev/null
> +++ b/src/test/test-colocation-loose-together1/README
> @@ -0,0 +1,11 @@
> +Test whether a loose positive colocation rule makes two services migrate to
> +the same recovery node in case of a failover of their previously assigned node.
> +
> +The test scenario is:
> +- vm:101 and vm:102 should be kept together
> +- vm:101 and vm:102 are both currently running on node3
> +- node1 and node2 have the same service count to test that the rule is applied
> + even though it would be usually balanced between both remaining nodes
> +
> +Therefore, the expected outcome is:
> +- As node3 fails, both services are migrated to node2
It's actually node1
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config
2025-04-25 14:30 ` Fiona Ebner
@ 2025-04-29 8:04 ` Daniel Kral
0 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-29 8:04 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 4/25/25 16:30, Fiona Ebner wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> + my $new_rules = $haenv->read_rules_config();
>> +
>> + # TODO We should also check for a service digest here, but we would've to
>> + # calculate it here independently or also expose it through read_service_config()
>
> Ah, so read_and_check_resources_config() drops the digest and produces a
> hash with the only keys being the $sids. Easiest is probably to add
> wantarray in the relevant place(s) and return a list with the digest
> second. And I guess we don't need that anymore after we migrated from HA
> groups to location rules, since it's only used to get the groups? If
> yes, then we should add a reminder to remove it again.
Yes, the digest for services and HA groups shouldn't be needed anymore
as soon as we migrate the groups to be location rules. I'll expose it
through read_and_check_resources_config() as you suggested and add the
TODOs for removing it later.
>
>> + if ($new_rules->{digest} ne $self->{last_rules_digest}
>> + || $self->{groups}->{digest} ne $self->{last_groups_digest}) {
>> + $self->{rules} = $new_rules;
>> + PVE::HA::Rules::checked_config($self->{rules}, $self->{groups}, $sc);
>
> Might not matter now, but I'd prefer: check first, then assign.
ACK, will do that!
>
>> + }
>> +
>> + $self->{last_rules_digest} = $self->{rules}->{digest};
>> + $self->{last_groups_digest} = $self->{groups}->{digest};
>> +
>> $self->update_crm_commands();
>>
>> for (;;) {
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-04-25 14:06 ` Fiona Ebner
@ 2025-04-29 8:37 ` Daniel Kral
2025-04-29 9:15 ` Fiona Ebner
0 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-29 8:37 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion, Fabian Grünbichler
Agree with all here, but I propose to remove the merging for any
colocation rules in the last comment, if there's nothing speaking
against it.
On 4/25/25 16:06, Fiona Ebner wrote:
> Am 11.04.25 um 13:04 schrieb Daniel Kral:
>> On 4/3/25 14:16, Fabian Grünbichler wrote:
>>> On March 25, 2025 4:12 pm, Daniel Kral wrote:
>>>> +sub check_services_count {
>>>> + my ($rules) = @_;
>>>> +
>>>> + my $conflicts = [];
>>>> +
>>>> + foreach_colocation_rule($rules, sub {
>>>> + my ($rule, $ruleid) = @_;
>>>> +
>>>> + push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}})
>>>> < 2);
>>>> + });
>>>> +
>>>> + return $conflicts;
>>>> +}
>>>
>>> is this really an issue? a colocation rule with a single service is just
>>> a nop? there's currently no cleanup AFAICT if a resource is removed, but
>>
>> You're right, AFAICS those are a noop when selecting the service node. I
>> guess I was a little pedantic / overprotective here about which rules
>> make sense in general instead of what the algorithm does in the end.
>>
>> And good point about handling when resources are removed, adding that to
>> delete_service_from_config comes right on my TODO list for the v1!
>>
>>> if we add that part (we maybe should?) then one can easily end up in a
>>> situation where a rule temporarily contains a single or no service?
>>
>> Hm, yes, especially if we add pools/tags at a later point to select
>> services for the rule, then this could happen very easily. But as you
>> already mentioned, those two cases would be noops too.
>>
>> Nevertheless, should we drop this? I think it could benefit users in
>> identifying that some rules might not do something they wanted and give
>> them a reason why, i.e. there's only one service in there, but at the
>> same time it could be a little noisy if there are a lot of affected rules.
>
> I'd still keep rules that end up with only one service around, but maybe
> (temporarily) disable them. And/or we could also add a special
> "no-effect" marker like the "conflict" one proposed in my other answer?
> Then a user could enable/make the rule effective again by adding a new
> service to it.
Yes, I think the 'conflict'/'ineffective' states are much more
user-friendly opposed to dropping them and only making these decisions
explicit in the log. I'll add that to the v1!
>
>>>> +
>>>> +=head3 check_positive_intransitivity($rules)
>>>> +
>>>> +Returns a list of conflicts caused by transitive positive colocation
>>>> rules
>>>> +defined in C<$rules>.
>>>> +
>>>> +Transitive positive colocation rules exist, if there are at least
>>>> two positive
>>>> +colocation rules with the same strictness, which put at least the
>>>> same two
>>>> +services in relation. This means, that these rules can be merged
>>>> together.
>>>> +
>>>> +If there are no conflicts, the returned list is empty.
>>>
>>> The terminology here is quit confusing - conflict meaning that two rules
>>> are "transitive" and thus mergeable (which is good, cause it makes
>>> things easier to handle?) is quite weird, as "conflict" is a rather
>>> negative term..
>>>
>>> there's only a single call site in the same module, maybe we could just
>>> rename this into "find_mergeable_positive_ruleids", similar to the
>>> variable where the result is stored?
>>
>> Yeah, I was probably to keen on the `$conflict = check_something(...)`
>> pattern here, but it would be much more readable with a simpler name,
>> I'll change that for the v1!
>>
>> -----
>>
>> Ad why: I'll also add some documentation about the rationale why this is
>> needed in the first place.
>>
>> The main reason was because the latter rule check
>> 'check_inner_consistency' depends on the fact that the positive
>> colocation rules have been merged already, as it assumes that each
>> positive colocation rule has all of the services in there, which are
>> positively colocated. If it weren't so, it wouldn't detect that the
>> following three rules are inconsistent with each other:
>>
>> colocation: stick-together1
>> services vm:101,vm:104
>> affinity together
>> strict 1
>>
>> colocation: stick-together2
>> services vm:104,vm:102
>> affinity together
>> strict 1
>>
>> colocation: keep-apart
>> services vm:101,vm:102,vm:103
>> affinity separate
>> strict 1
>>
>> This reduces the complexity of the logic a little in
>> 'check_inner_consistency' as there it doesn't have to handle this
>> special case as 'stick-together1' and 'stick-together2' are already
>> merged in to one and it is easily apparent that vm 101 and vm 102 cannot
>> be colocated and non-colocated at the same time.
>>
>> -----
>>
>> Also, I was curious about how that would work out for the case where a
>> negative colocation rule was defined for three nodes with those rules
>> split into three rules (essentially a cycle dependence). This should in
>> theory have the same semantics as the above rule set:
>>
>> colocation: stick-together1
>> services vm:101,vm:104
>> affinity together
>> strict 1
>>
>> colocation: stick-together2
>> services vm:104,vm:102
>> affinity together
>> strict 1
>>
>> colocation: very-lonely-services1
>> services vm:101,vm:102
>> affinity separate
>> strict 1
>>
>> colocation: very-lonely-services2
>> services vm:102,vm:103
>> affinity separate
>> strict 1
>>
>> colocation: very-lonely-services3
>> services vm:101,vm:103
>> affinity separate
>> strict 1
>>
>> Without the merge of positive rules, 'check_inner_consistency' would
>> again not detect the inconsistency here. But with the merge correctly
>> applied before checking the consistency, this would be resolved and the
>> effective rule set would be:
>
> I suppose the effective rule set would still also contain the two
> 'together' rules, or?
No, here it would not. I found it would be most fair or reasonable that
if a positive and a negative colocation rule contradict each other to
drop both of them. Here the conflicts are
stick-together1 -- very-lonely-services1
stick-together2 -- very-lonely-services1
so all three of them will be dropped from the rule set.
Seeing this again here, such cases definitely benefit from the immediate
response with the 'conflict'/'ineffective' state to show users that
those won't be applied instead of only logging it.
>
>>
>> colocation: very-lonely-services2
>> services vm:102,vm:103
>> affinity separate
>> strict 1
>>
>> colocation: very-lonely-services3
>> services vm:101,vm:103
>> affinity separate
>> strict 1
>>
>> It could be argued, that the negative colocation rules should be merged
>> in a similar manner here, as there's now a "effective" difference in the
>> semantics of the above rule sets, as the negative colocation rule
>> between vm 101 and vm 103 and vm 102 and vm 103 remains.
>>
>> What do you think?
>
> I don't think there's a particular need to also merge negative rules
> between services (when they form a complete graph). It won't make a
> difference if there are no conflicts with positive rules and in edge
> cases when there are conflicts (which usually gets caught while editing
> the rules), it's better to drop fewer rules, so not merging is an
> advantage. Or do you have a particular advantage in favor of merging in
> mind?
Yes, I think so too.
There's quite the semantic difference between positive and negative
colocation rules here. "Connected" positive colocation relationships
(strict ones in particular) must be co-located in the end anyway, so it
makes sense to merge them. Negative colocation relationships must be
defined in a "circular" way and might just happen by coincidence for
small scenarios.
But one thing that just struck me is that what if the user intentionally
wrote them as separate rules? Then it might be confusing that all rules
are dropped and not just the minimal amount that contradict other
rules... Then check_inner_consistency() would just drop the minimal
amount of rules that need to be dropped as in the above example.
It would be a softer interpretation of the rules indeed, but it might
benefit the user in the end and make things easier to follow from the
user perspective. If there's no opposition to that, I'd tend to drop the
merging for any rules after all.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-04-25 14:05 ` Fiona Ebner
@ 2025-04-29 8:44 ` Daniel Kral
0 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-29 8:44 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 4/25/25 16:05, Fiona Ebner wrote:
> Not much to add to Fabian's review :)
>
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
>> new file mode 100644
>> index 0000000..808d48e
>> --- /dev/null
>> +++ b/src/PVE/HA/Rules/Colocation.pm
>> @@ -0,0 +1,391 @@
>> +package PVE::HA::Rules::Colocation;
>> +
>> +use strict;
>> +use warnings;
>> +
>> +use Data::Dumper;
>> +
>> +use PVE::JSONSchema qw(get_standard_option);
>
> Missing include of PVE::Tools.
>
> Nit: I'd put a blank here to separate modules from different packages
> and modules from the same package.
>
>> +use PVE::HA::Tools;
>> +
>> +use base qw(PVE::HA::Rules);
>> +
>> +sub type {
>> + return 'colocation';
>> +}
>> +
>> +sub properties {
>> + return {
>> + services => get_standard_option('pve-ha-resource-id-list'),
>> + affinity => {
>> + description => "Describes whether the services are supposed to be kept on separate"
>> + . " nodes, or are supposed to be kept together on the same node.",
>> + type => 'string',
>> + enum => ['separate', 'together'],
>> + optional => 0,
>> + },
>> + strict => {
>> + description => "Describes whether the colocation rule is mandatory or optional.",
>> + type => 'boolean',
>> + optional => 0,
>> + },
>> + }
>
> Style nit: missing semicolon
>
> Since we should move the property definitions to the base module once a
> second plugin re-uses them later: should we already declare 'services'
> and 'strict' in the base module to start out? Then we could implement
> the encode/decode part for 'services' there already. Less moving around
> or duplication later on.
Yes, especially as Fabian also agreed that it would make sense that
users are allowed to make location rules for multiple services in a
single rule.
I'll start to use the isolated_properties option that @Dominik
implemented so that other options can be separated and have
plugin-specific descriptions, etc. but services can definitely live with
a more general description.
>
>> +}
>> +
>> +sub options {
>> + return {
>> + services => { optional => 0 },
>> + strict => { optional => 0 },
>> + affinity => { optional => 0 },
>> + comment => { optional => 1 },
>> + };
>> +};
>> +
>> +sub decode_value {
>> + my ($class, $type, $key, $value) = @_;
>> +
>> + if ($key eq 'services') {
>> + my $res = {};
>> +
>> + for my $service (PVE::Tools::split_list($value)) {
>> + if (PVE::HA::Tools::pve_verify_ha_resource_id($service)) {
>> + $res->{$service} = 1;
>> + }
>> + }
>> +
>> + return $res;
>> + }
>> +
>> + return $value;
>> +}
>> +
>> +sub encode_value {
>> + my ($class, $type, $key, $value) = @_;
>> +
>> + if ($key eq 'services') {
>> + PVE::HA::Tools::pve_verify_ha_resource_id($_) for (keys %$value);
>
> Style nit:
> [I] febner@dev8 /usr/share/perl5/PVE> ag "for keys" | wc -l
> 28
> [I] febner@dev8 /usr/share/perl5/PVE> ag "for \(keys" | wc -l
> 0
ACK, will change that :)
>
>> +
>> + return join(',', keys %$value);
>> + }
>> +
>> + return $value;
>> +}
>> +
>
> ---snip 8<---
>
>> +=head3 check_service_count($rules)
>> +
>> +Returns a list of conflicts caused by colocation rules, which do not have
>> +enough services in them, defined in C<$rules>.
>> +
>> +If there are no conflicts, the returned list is empty.
>> +
>> +=cut
>> +
>> +sub check_services_count {
>> + my ($rules) = @_;
>> +
>> + my $conflicts = [];
>> +
>> + foreach_colocation_rule($rules, sub {
>> + my ($rule, $ruleid) = @_;
>> +
>> + push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}}) < 2);
>
> Style nit: parentheses for post-if
>
ACK, removed the outer parentheses
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
@ 2025-04-29 8:54 ` Fiona Ebner
2025-04-29 9:01 ` Fiona Ebner
1 sibling, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-29 8:54 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> diff --git a/src/test/test-crs-static-rebalance-coloc1/README b/src/test/test-crs-static-rebalance-coloc1/README
> new file mode 100644
> index 0000000..c709f45
> --- /dev/null
> +++ b/src/test/test-crs-static-rebalance-coloc1/README
> @@ -0,0 +1,26 @@
> +Test whether a mixed set of strict colocation rules in conjunction with the
> +static load scheduler with auto-rebalancing are applied correctly on service
> +start enabled and in case of a subsequent failover.
> +
> +The test scenario is:
> +- vm:101 and vm:102 are non-colocated services
> +- Services that must be kept together:
> + - vm:102, and vm:107
Even if going for serial commas, AFAIK it's not allowed when there's
only two items listed.
> + - vm:104, vm:106, and vm:108
> +- Services that must be kept separate:
> + - vm:103, vm:104, and vm:105
> + - vm:103, vm:106, and vm:107
> + - vm:107, and vm:108
> +- Therefore, there are consistent interdependencies between the positive and
> + negative colocation rules' service members
> +- vm:101 and vm:102 are currently assigned to node1 and node2 respectively
> +- vm:103 through vm:108 are currently assigned to node3
> +
> +Therefore, the expected outcome is:
> +- vm:101, vm:102, vm:103 should be started on node1, node2, and node3
> + respectively, as there's nothing running on there yet
> +- vm:104, vm:106, and vm:108 should all be assigned on the same node, which
> + will be node1, since it has the most resources left for vm:104
> +- vm:105 and vm:107 should both be assigned on the same node, which will be
> + node2, since both cannot be assigned to the other nodes because of the
> + colocation constraints
Would be nice to have a final sentence for the last part of the test:
"As node3 fails, ..."
---snip 8<---
> diff --git a/src/test/test-crs-static-rebalance-coloc2/README b/src/test/test-crs-static-rebalance-coloc2/README
> new file mode 100644
> index 0000000..1b788f8
> --- /dev/null
> +++ b/src/test/test-crs-static-rebalance-coloc2/README
> @@ -0,0 +1,16 @@
> +Test whether a set of transitive strict negative colocation rules, i.e. there's
I don't like the use of "transitive" here, as that comes with
connotations that just don't apply in general here, but would prefer
"pairwise".
> +negative colocation relations a->b, b->c and a->c, in conjunction with the
The relations are symmetric, so I'd write a<->b, etc.
> +static load scheduler with auto-rebalancing are applied correctly on service
> +start and in case of a subsequent failover.
> +
> +The test scenario is:
> +- vm:101 and vm:102 must be kept separate
> +- vm:102 and vm:103 must be kept separate
> +- vm:101 and vm:103 must be kept separate
> +- Therefore, vm:101, vm:102, and vm:103 must be kept separate
> +
> +Therefore, the expected outcome is:
> +- vm:101, vm:102, and vm:103 should be started on node1, node2, and node3
> + respectively, just as if the three negative colocation rule would've been
> + stated in a single negative colocation rule
This would already happen with just rebalancing though. I.e. even if I
remove the colocation rules, the part of the test output before node3
fails looks exactly the same. You could add dummy services in between or
have the nodes have rather huge differences in available resources to
make the colocation rules actually matter for the test.
> +- As node3 fails, vm:103 cannot be recovered
---snip 8<---
> diff --git a/src/test/test-crs-static-rebalance-coloc3/README b/src/test/test-crs-static-rebalance-coloc3/README
> new file mode 100644
> index 0000000..e54a2d4
> --- /dev/null
> +++ b/src/test/test-crs-static-rebalance-coloc3/README
> @@ -0,0 +1,14 @@
> +Test whether a more complex set of transitive strict negative colocation rules,
> +i.e. there's negative colocation relations a->b, b->c and a->c, in conjunction
Same comments as above regarding the wording.
> +with the static load scheduler with auto-rebalancing are applied correctly on
> +service start and in case of a subsequent failover.
> +
> +The test scenario is:
> +- Essentially, all 10 strict negative colocation rules say that, vm:101,
> + vm:102, vm:103, vm:104, and vm:105 must be kept together
s/together/separate/
> +
> +Therefore, the expected outcome is:
> +- vm:101, vm:102, and vm:103 should be started on node1, node2, node3, node4,
> + and node5 respectively, just as if the 10 negative colocation rule would've
> + been stated in a single negative colocation rule
> +- As node1 and node5 fails, vm:101 and vm:105 cannot be recovered
Again, it seems like colocation rules don't actually matter for the
first half of the test.
---snip 8<---
> diff --git a/src/test/test-crs-static-rebalance-coloc3/static_service_stats b/src/test/test-crs-static-rebalance-coloc3/static_service_stats
> new file mode 100644
> index 0000000..d9dc9e7
> --- /dev/null
> +++ b/src/test/test-crs-static-rebalance-coloc3/static_service_stats
> @@ -0,0 +1,5 @@
> +{
> + "vm:101": { "maxcpu": 8, "maxmem": 16000000000 },
> + "vm:102": { "maxcpu": 4, "maxmem": 24000000000 },
> + "vm:103": { "maxcpu": 2, "maxmem": 32000000000 }
vm:104 and vm:105 are not defined here
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-04-29 8:54 ` Fiona Ebner
@ 2025-04-29 9:01 ` Fiona Ebner
1 sibling, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-29 9:01 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 25.03.25 um 16:12 schrieb Daniel Kral:
> diff --git a/src/test/test-crs-static-rebalance-coloc3/README b/src/test/test-crs-static-rebalance-coloc3/README
> new file mode 100644
> index 0000000..e54a2d4
> --- /dev/null
> +++ b/src/test/test-crs-static-rebalance-coloc3/README
> @@ -0,0 +1,14 @@
> +Test whether a more complex set of transitive strict negative colocation rules,
> +i.e. there's negative colocation relations a->b, b->c and a->c, in conjunction
> +with the static load scheduler with auto-rebalancing are applied correctly on
> +service start and in case of a subsequent failover.
> +
> +The test scenario is:
> +- Essentially, all 10 strict negative colocation rules say that, vm:101,
> + vm:102, vm:103, vm:104, and vm:105 must be kept together
> +
> +Therefore, the expected outcome is:
> +- vm:101, vm:102, and vm:103 should be started on node1, node2, node3, node4,
> + and node5 respectively, just as if the 10 negative colocation rule would've
> + been stated in a single negative colocation rule
> +- As node1 and node5 fails, vm:101 and vm:105 cannot be recovered
Orthogonal to my other reply, I kinda feel like the inverse test would
actually be more interesting. Have a single rule and turn off (and then
on again) each node in turn to see that all pairwise rules (that derive
from the common rule) are actually honored, i.e. no service can be
recovered.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-28 12:46 ` Fiona Ebner
@ 2025-04-29 9:07 ` Daniel Kral
2025-04-29 9:22 ` Fiona Ebner
0 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-29 9:07 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion, Fabian Grünbichler
On 4/28/25 14:46, Fiona Ebner wrote:
> I feel like we should trigger auto-migration for strict colocation
> rules. I.e. apply the rules earlier in select_service_node(), before the
> "keep current node" early return.
>
> With nofailback=0, we do not keep the current node when node priorities
> change for HA groups or the service's group changes, so it feels
> consistent to do the same for colocation rules. We'll need to be careful
> not to get a "both services now migrate towards each other" switch-up
> scenario of course.
What scenario would that be? Or do you mean just disallowing migrating
services contradicting the HA (colocation) rules?
>
> We also don't hinder migrating against group priorities, where, with
> nofailback=0, it will migrate straight back again. This can be improved
> of course, but nothing new, so I'd consider it orthogonal to the
> colocation implementation here.
Yes, it would improve UX to add migration blockers for these in the
future as the info could be exposed there without putting too much
dependency between pve-manager and pve-ha-manager.
I'll try to add the blockers for colocation rules for v1 or a follow-up.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
2025-04-29 8:37 ` Daniel Kral
@ 2025-04-29 9:15 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-29 9:15 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion, Fabian Grünbichler
Am 29.04.25 um 10:37 schrieb Daniel Kral:
> On 4/25/25 16:06, Fiona Ebner wrote:
>> Am 11.04.25 um 13:04 schrieb Daniel Kral:
>>> On 4/3/25 14:16, Fabian Grünbichler wrote:
>>>> On March 25, 2025 4:12 pm, Daniel Kral wrote:
>>> Also, I was curious about how that would work out for the case where a
>>> negative colocation rule was defined for three nodes with those rules
>>> split into three rules (essentially a cycle dependence). This should in
>>> theory have the same semantics as the above rule set:
>>>
>>> colocation: stick-together1
>>> services vm:101,vm:104
>>> affinity together
>>> strict 1
>>>
>>> colocation: stick-together2
>>> services vm:104,vm:102
>>> affinity together
>>> strict 1
>>>
>>> colocation: very-lonely-services1
>>> services vm:101,vm:102
>>> affinity separate
>>> strict 1
>>>
>>> colocation: very-lonely-services2
>>> services vm:102,vm:103
>>> affinity separate
>>> strict 1
>>>
>>> colocation: very-lonely-services3
>>> services vm:101,vm:103
>>> affinity separate
>>> strict 1
>>>
>>> Without the merge of positive rules, 'check_inner_consistency' would
>>> again not detect the inconsistency here. But with the merge correctly
>>> applied before checking the consistency, this would be resolved and the
>>> effective rule set would be:
>>
>> I suppose the effective rule set would still also contain the two
>> 'together' rules, or?
>
> No, here it would not. I found it would be most fair or reasonable that
> if a positive and a negative colocation rule contradict each other to
> drop both of them. Here the conflicts are
>
> stick-together1 -- very-lonely-services1
> stick-together2 -- very-lonely-services1
>
> so all three of them will be dropped from the rule set.
>
> Seeing this again here, such cases definitely benefit from the immediate
> response with the 'conflict'/'ineffective' state to show users that
> those won't be applied instead of only logging it.
I don't think dropping all conflicting rules is best. Say you have a
rule between 100 services and that conflicts with a rule with just 2
services. Dropping the latter only is much preferred then IMHO. In
general, I'd argue that the more rules we can still honor, the better
from a user perspective. I don't think it's worth going out of our way
though and introduce much complexity to minimize it, because conflicts
are usually prevented while configuring already.
>>> colocation: very-lonely-services2
>>> services vm:102,vm:103
>>> affinity separate
>>> strict 1
>>>
>>> colocation: very-lonely-services3
>>> services vm:101,vm:103
>>> affinity separate
>>> strict 1
>>>
>>> It could be argued, that the negative colocation rules should be merged
>>> in a similar manner here, as there's now a "effective" difference in the
>>> semantics of the above rule sets, as the negative colocation rule
>>> between vm 101 and vm 103 and vm 102 and vm 103 remains.
>>>
>>> What do you think?
>>
>> I don't think there's a particular need to also merge negative rules
>> between services (when they form a complete graph). It won't make a
>> difference if there are no conflicts with positive rules and in edge
>> cases when there are conflicts (which usually gets caught while editing
>> the rules), it's better to drop fewer rules, so not merging is an
>> advantage. Or do you have a particular advantage in favor of merging in
>> mind?
>
> Yes, I think so too.
>
> There's quite the semantic difference between positive and negative
> colocation rules here. "Connected" positive colocation relationships
> (strict ones in particular) must be co-located in the end anyway, so it
> makes sense to merge them. Negative colocation relationships must be
> defined in a "circular" way and might just happen by coincidence for
> small scenarios.
>
> But one thing that just struck me is that what if the user intentionally
> wrote them as separate rules? Then it might be confusing that all rules
> are dropped and not just the minimal amount that contradict other
> rules... Then check_inner_consistency() would just drop the minimal
> amount of rules that need to be dropped as in the above example.
>
> It would be a softer interpretation of the rules indeed, but it might
> benefit the user in the end and make things easier to follow from the
> user perspective. If there's no opposition to that, I'd tend to drop the
> merging for any rules after all.
Having conflicts is already a bit of an edge case, so I don't think we
need to go out of our way to avoid merging of positive rules. But if it
doesn't increase the complexity much, it's fine either way IMHO.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-29 9:07 ` Daniel Kral
@ 2025-04-29 9:22 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-04-29 9:22 UTC (permalink / raw)
To: Daniel Kral, Proxmox VE development discussion, Fabian Grünbichler
Am 29.04.25 um 11:07 schrieb Daniel Kral:
> On 4/28/25 14:46, Fiona Ebner wrote:
>> I feel like we should trigger auto-migration for strict colocation
>> rules. I.e. apply the rules earlier in select_service_node(), before the
>> "keep current node" early return.
>>
>> With nofailback=0, we do not keep the current node when node priorities
>> change for HA groups or the service's group changes, so it feels
>> consistent to do the same for colocation rules. We'll need to be careful
>> not to get a "both services now migrate towards each other" switch-up
>> scenario of course.
>
> What scenario would that be? Or do you mean just disallowing migrating
> services contradicting the HA (colocation) rules?
I just meant we need to be careful when implementing if we want to apply
new rules directly/honor them while services are running. E.g. say a new
rule vm:101<->vm:102 is introduced, with 101 on node A and 102 on node
B. Then the HA manager should only issue a migration command 101 to B or
102 to A, but not both of course.
>> We also don't hinder migrating against group priorities, where, with
>> nofailback=0, it will migrate straight back again. This can be improved
>> of course, but nothing new, so I'd consider it orthogonal to the
>> colocation implementation here.
>
> Yes, it would improve UX to add migration blockers for these in the
> future as the info could be exposed there without putting too much
> dependency between pve-manager and pve-ha-manager.
>
> I'll try to add the blockers for colocation rules for v1 or a follow-up.
Might be better as a follow-up/separate series, to not blow up the
series here too much.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-28 14:33 ` Fiona Ebner
@ 2025-04-29 9:39 ` Daniel Kral
0 siblings, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-29 9:39 UTC (permalink / raw)
To: Proxmox VE development discussion
On 4/28/25 16:33, Fiona Ebner wrote:
> Am 28.04.25 um 14:26 schrieb Fiona Ebner:
>> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>>> +
>>> + delete $allowed_nodes->{$node};
>>> + }
>>> + } elsif (scalar keys %$possible_nodes) {
>>> + # limit to the possible nodes the service should be on, if there are any.
>>> + for my $node (keys %$allowed_nodes) {
>>> + next if exists($possible_nodes->{$node});
>>> +
>>> + delete $allowed_nodes->{$node};
>>
>> This seems wrong. Non-strict rules should not limit the allowed nodes.
>> See below for more on this.
>
> Ah, if there are no possible nodes at all, then the allowed nodes are
> not modified at all. This is what makes the loose tests work. This
> "secret" here really needs to be properly documented ;)
Yes definitely, I'm working on making the logic clearer here.
I think another improvement we could make here is along with "merging"
get_colocated_services(...) and get_colocation_preference(...), they
could already structure together in a strict and non-strict part, so the
apply_*_colocation_rules(...) helpers don't have to handle that anymore
and just could focus on applying them in the end.
If we go for that route, we could reduce
apply_positive_colocation_rules(...) to something like this:
sub apply_positive_colocation_rules {
my ($together, $allowed_nodes) = @_;
my $possible_nodes = { $together->{strict}->%* };
# Consider loose nodes if there are no strict nodes
$possible_nodes = PVE::HA::Tools::set_intersect($allowed_nodes,
$together->{loose})
if !%$possible_nodes;
# If there are no strict nodes or the loose nodes would result in
an empty $allowed_nodes, apply nothing
return if !%$possible_nodes;
for my $node (keys %$allowed_nodes) {
next if exists($possible_nodes->{$node});
delete $allowed_nodes->{$node};
}
}
>
> It still would be nice to think about which kind of interaction with
> scoring we want exactly. Currently it's the number 1 I mentioned, i.e.
> "prefer loose colocation over scoring no matter what". Can be fine to
> start out too, just means we'd need to introduce an option/tunable if we
> ever want to change it.
I'll see if I find a good indicator that is understandable for users and
results in something deterministic from our side to be testable.
But AFAICS the current config interface would need an extra tunable
anyway to express some kind of factor which controls how much the loose
colocation rule anyway, so I think that would fit better in a follow-up
series - especially if users request it :).
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-28 12:26 ` Fiona Ebner
2025-04-28 14:33 ` Fiona Ebner
@ 2025-04-29 9:50 ` Daniel Kral
1 sibling, 0 replies; 67+ messages in thread
From: Daniel Kral @ 2025-04-29 9:50 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion
On 4/28/25 14:26, Fiona Ebner wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> Add a mechanism to the node selection subroutine, which enforces the
>> colocation rules defined in the rules config.
>>
>> The algorithm manipulates the set of nodes directly, which the service
>> is allowed to run on, depending on the type and strictness of the
>> colocation rules, if there are any.
>>
>> This makes it depend on the prior removal of any nodes, which are
>> unavailable (i.e. offline, unreachable, or weren't able to start the
>> service in previous tries) or are not allowed to be run on otherwise
>> (i.e. HA group node restrictions) to function correctly.
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> src/PVE/HA/Manager.pm | 203 ++++++++++++++++++++++++++++++++++++-
>> src/test/test_failover1.pl | 4 +-
>> 2 files changed, 205 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
>> index 8f2ab3d..79b6555 100644
>> --- a/src/PVE/HA/Manager.pm
>> +++ b/src/PVE/HA/Manager.pm
>> @@ -157,8 +157,201 @@ sub get_node_priority_groups {
>> return ($pri_groups, $group_members);
>> }
>>
>
> I feel like these helper functions should rather go into the colocation
> plugin or some other module to not bloat up Manager.pm more.
I agree, I'll move them either to Colocation.pm or some
>
>> +=head3 get_colocated_services($rules, $sid, $online_node_usage)
>> +
>> +Returns a hash map of all services, which are specified as being in a positive
>> +or negative colocation in C<$rules> with the given service with id C<$sid>.
>> +
>> +Each service entry consists of the type of colocation, strictness of colocation
>> +and the node the service is currently assigned to, if any, according to
>> +C<$online_node_usage>.
>> +
>> +For example, a service C<'vm:101'> being strictly colocated together (positive)
>> +with two other services C<'vm:102'> and C<'vm:103'> and loosely colocated
>> +separate with another service C<'vm:104'> results in the hash map:
>> +
>> + {
>> + 'vm:102' => {
>> + affinity => 'together',
>> + strict => 1,
>> + node => 'node2'
>> + },
>> + 'vm:103' => {
>> + affinity => 'together',
>> + strict => 1,
>> + node => 'node2'
>> + },
>> + 'vm:104' => {
>> + affinity => 'separate',
>> + strict => 0,
>> + node => undef
>
> Why is the node undef here?
It's set undefined as the vm:104 is not placed on a node yet. This could
happen if two nodes fail at exactly the same time and e.g. this helper
is called for selecting a service node for 'vm:101'.
I'll change the example to a more intuitive case, e.g. when a single
node failed and one of the positively colocated services has already
been assigned a node, but the current and another one hasn't (i.e. the
node is undef) and document here why.
But I'll go with the comment below and integrate this in the helper
below as it is the only user of this subroutine.
>
>> + }
>> + }
>> +
>> +=cut
>> +
>> +sub get_colocated_services {
>> + my ($rules, $sid, $online_node_usage) = @_;
>> +
>> + my $services = {};
>> +
>> + PVE::HA::Rules::Colocation::foreach_colocation_rule($rules, sub {
>> + my ($rule) = @_;
>> +
>> + for my $csid (sort keys %{$rule->{services}}) {
>> + next if $csid eq $sid;
>> +
>> + $services->{$csid} = {
>> + node => $online_node_usage->get_service_node($csid),
>> + affinity => $rule->{affinity},
>> + strict => $rule->{strict},
>> + };
>> + }
>> + }, {
>> + sid => $sid,
>> + });
>> +
>> + return $services;
>> +}
>> +
>> +=head3 get_colocation_preference($rules, $sid, $online_node_usage)
>> +
>> +Returns a list of two hashes, where each is a hash map of the colocation
>> +preference of C<$sid>, according to the colocation rules in C<$rules> and the
>> +service locations in C<$online_node_usage>.
>> +
>> +The first hash is the positive colocation preference, where each element
>> +represents properties for how much C<$sid> prefers to be on the node.
>> +Currently, this is a binary C<$strict> field, which means either it should be
>
> s/it/the service/
ACK
>
>> +there (C<0>) or must be there (C<1>).
>> +
>> +The second hash is the negative colocation preference, where each element
>> +represents properties for how much C<$sid> prefers not to be on the node.
>> +Currently, this is a binary C<$strict> field, which means either it should not
>
> s/it/the service/
ACK
>
>> +be there (C<0>) or must not be there (C<1>).
>> +
>> +=cut
>> +
>> +sub get_colocation_preference {
>> + my ($rules, $sid, $online_node_usage) = @_;
>> +
>> + my $services = get_colocated_services($rules, $sid, $online_node_usage);
>
> The name $services is a bit too generic, maybe $colocation_per_service
> or something?
>
> Maybe it would be better to just merge this one and the helper above
> into a single one? I.e. just handle the info while iterating the rules
> directly instead of creating a novel temporary per-service
> data-structure and iterate twice.
ACK, I'll do that.
I thought get_colocated_services(...) would get more users in the end,
but it seems like it wouldn't and if it does I can easily re-introduce it.
>
>> +
>> + my $together = {};
>> + my $separate = {};
>> +
>> + for my $service (values %$services) {
>> + my $node = $service->{node};
>> +
>> + next if !$node;
>> +
>> + my $node_set = $service->{affinity} eq 'together' ? $together : $separate;
>> + $node_set->{$node}->{strict} = $node_set->{$node}->{strict} || $service->{strict};
>> + }
>> +
>> + return ($together, $separate);
>> +}
>> +
>> +=head3 apply_positive_colocation_rules($together, $allowed_nodes)
>> +
>> +Applies the positive colocation preference C<$together> on the allowed node
>> +hash set C<$allowed_nodes> directly.
>> +
>> +Positive colocation means keeping services together on a single node, and
>> +therefore minimizing the separation of services.
>> +
>> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
>> +which is available to the service, i.e. each node is currently online, is
>> +available according to other location constraints, and the service has not
>> +failed running there yet.
>> +
>> +=cut
>> +
>> +sub apply_positive_colocation_rules {
>> + my ($together, $allowed_nodes) = @_;
>> +
>> + return if scalar(keys %$together) < 1;
>> +
>> + my $mandatory_nodes = {};
>> + my $possible_nodes = PVE::HA::Tools::intersect($allowed_nodes, $together);
>> +
>> + for my $node (sort keys %$together) {
>> + $mandatory_nodes->{$node} = 1 if $together->{$node}->{strict};
>> + }
>> +
>> + if (scalar keys %$mandatory_nodes) {
>> + # limit to only the nodes the service must be on.
>> + for my $node (keys %$allowed_nodes) {
>> + next if exists($mandatory_nodes->{$node});
>
> Style nit: I'd avoid using exists() if you explicitly expect a set
> value. Otherwise, it can break because of accidental auto-vivification
> in the future.
ACK, I already ran into this when restructuring this helper ;).
>
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>> + } elsif (scalar keys %$possible_nodes) {
>> + # limit to the possible nodes the service should be on, if there are any.
>> + for my $node (keys %$allowed_nodes) {
>> + next if exists($possible_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>
> This seems wrong. Non-strict rules should not limit the allowed nodes.
> See below for more on this.
Already addressed with the other thread :)
>
>> + }
>> + }
>> +}
>> +
>> +=head3 apply_negative_colocation_rules($separate, $allowed_nodes)
>> +
>> +Applies the negative colocation preference C<$separate> on the allowed node
>> +hash set C<$allowed_nodes> directly.
>> +
>> +Negative colocation means keeping services separate on multiple nodes, and
>> +therefore maximizing the separation of services.
>> +
>> +The allowed node hash set C<$allowed_nodes> is expected to contain any node,
>> +which is available to the service, i.e. each node is currently online, is
>> +available according to other location constraints, and the service has not
>> +failed running there yet.
>> +
>> +=cut
>> +
>> +sub apply_negative_colocation_rules {
>> + my ($separate, $allowed_nodes) = @_;
>> +
>> + return if scalar(keys %$separate) < 1;
>> +
>> + my $mandatory_nodes = {};
>> + my $possible_nodes = PVE::HA::Tools::set_difference($allowed_nodes, $separate);
>> +
>> + for my $node (sort keys %$separate) {
>> + $mandatory_nodes->{$node} = 1 if $separate->{$node}->{strict};
>> + }
>> +
>> + if (scalar keys %$mandatory_nodes) {
>> + # limit to the nodes the service must not be on.
>> + for my $node (keys %$allowed_nodes) {
>> + next if !exists($mandatory_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>> + } elsif (scalar keys %$possible_nodes) {
>> + # limit to the nodes the service should not be on, if any.
>> + for my $node (keys %$allowed_nodes) {
>> + next if exists($possible_nodes->{$node});
>> +
>> + delete $allowed_nodes->{$node};
>> + }
>> + }
>> +}
>> +
>> +sub apply_colocation_rules {
>> + my ($rules, $sid, $allowed_nodes, $online_node_usage) = @_;
>> +
>> + my ($together, $separate) = get_colocation_preference($rules, $sid, $online_node_usage);
>> +
>> + apply_positive_colocation_rules($together, $allowed_nodes);
>> + apply_negative_colocation_rules($separate, $allowed_nodes);
>
> I think there could be a problematic scenario with
> * no strict positive rules, but loose strict positive rules
> * strict negative rules
> where apply_positive_colocation_rules() will limit $allowed_nodes in
> such a way that the strict negative rules cannot be satisfied anymore
> afterwards.
>
> I feel like what we actually want from non-strict rules is not to limit
> the allowed nodes at all, but only express preferences. After scoring,
> we could:
> 1. always take a colocation preference node if present no matter what
> the usage score is
> 2. have a threshold to not follow through, if there is a non-colocation
> preference node with a much better usage score relatively
> 3. somehow massage it into the score itself. E.g. every node that would
> be preferred by colocation gets a 0.5 multiplier score adjustment while
> other scores are unchanged - remember that lower score is better.
> 4. [insert your suggestion here]
>
> So to me it seems like there should be a helper that gives us:
> 1. list of nodes that satisfy strict rules - these we can then intersect
> with the $pri_nodes
> 2. list of nodes that are preferred by non-strict rules - these we can
> consider after scoring
As mentioned in another reply, I think this should be addressed in a
separate follow-up and hopefully gets some user feedback to see what
their requirements are.
Especially since we want users to configure the threshold or score how
much the rule is preferred. I also would like to see this be somewhat
comparable to the HA group node priorities so that users can have a
mental model what these scores mean.
>
>> +}
>> +
>> sub select_service_node {
>> - my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
>> + # TODO Cleanup this signature post-RFC
>> + my ($rules, $groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
>>
>> my $group = get_service_group($groups, $online_node_usage, $service_conf);
>>
>> @@ -189,6 +382,8 @@ sub select_service_node {
>>
>> return $current_node if (!$try_next && !$best_scored) && $pri_nodes->{$current_node};
>>
>> + apply_colocation_rules($rules, $sid, $pri_nodes, $online_node_usage);
>> +
>> my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
>> my @nodes = sort {
>> $scores->{$a} <=> $scores->{$b} || $a cmp $b
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17 ` Fabian Grünbichler
2025-04-28 12:26 ` Fiona Ebner
@ 2025-04-30 11:09 ` Daniel Kral
2025-05-02 9:33 ` Fiona Ebner
2 siblings, 1 reply; 67+ messages in thread
From: Daniel Kral @ 2025-04-30 11:09 UTC (permalink / raw)
To: pve-devel
On 3/25/25 16:12, Daniel Kral wrote:
> sub select_service_node {
> - my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
> + # TODO Cleanup this signature post-RFC
> + my ($rules, $groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback, $best_scored) = @_;
I'm currently trying to clean up the helper's signature here, but doing
something like
sub select_service_node {
my ($service_info, $affinity_info, $try_next, $best_scored) = @_;
my ($sid, $service_conf, $current_node) = $service_info->@{qw(sid
config current_node)};
my ($rules, $groups, $online_node_usage, $tried_nodes,
$maintenance_fallback) =
$affinity_info->@{qw(rules groups online_node_usage failed_nodes
maintenance_node)};
would require us to create helper structures on all four call sites (one
of them is just the test case ./test_failover1.pl), or introduce another
helper to just create them for passing it here and immediately
de-structuring it in select_service_node(...):
sub get_service_affinity_info {
my ($self, $sid, $cd, $sd) = @_;
my $service_info = {
sid => $sid,
config => $cd,
current_node => $sd->{node},
};
my $affinity_info = {
rules => $self->{rules},
groups => $self->{groups},
failed_nodes => $sd->{failed_nodes},
maintenance_node => $sd->{maintenance_node},
online_node_usage => $self->{online_node_usage},
};
return ($service_info, $affinity_info);
};
Also the call site in next_state_recovery(...) does not pass
$sd->{failed_nodes}, $sd->{maintenance_node} and $best_scored to it.
AFAICS $sd->{failed_nodes} should be undef in next_state_recovery(...)
anyway, but I feel like I have missed some states it could be in there.
And $sd->{maintenance_node} could be set anytime.
If there's nothing speaking against that, I'd prefer to elevate
select_service_node(...) to be a method as it needs quite a lot of state
anyway, especially as we will need global information about other
services than just the current one in the future anyway.
So, I'd do something like
sub select_service_node {
my ($self, $sid, $service_conf, $sd, $mode) = @_;
my ($rules, $groups, $online_node_usage) = $self->@{qw(rules groups
online_node_usage)};
my ($current_node, $tried_nodes, $maintenance_fallback) =
$self->@{qw(node failed_nodes maintenance_node)};
here. It's not fancy as in there's a well-defined interface one can
immediately see what this helper needs (as it has access to the whole
$self) and doesn't have the guarantees of a standalone helper (won't
touch $self), but I think it could be better than creating helper
structures which are only pass a message, which is immediately
destructured anyway. We could also just pass $self slightly differently,
but I don't see much difference there.
The $mode could then be a enumeration of e.g. whether $try_next (e.g.
'try_again') or $best_scored (e.g. 'rebalance') is used (and can be
extended of course). Those are mutually exclusive in the three call
sites right now. If next_state_recovery(...) really does have states
where $tried_nodes is set (and $maintenance_node too), then we can also
introduce a 'recovery' state, which will ignore them.
The names for $service_conf and $sd can also be improved, but I wanted
to introduce minimal change to select_service_node(...) as well as stay
to the $sd name for the service data as in other places of the Manager.pm.
That's still just a work in progress and I'd very appreciate some
feedback if any of the two above are viable options here. If it helps
any, I'd send the result as a separate series in advance which the HA
colocation will then be based on, so we don't loose focus in the HA
colocation patch series.
CC'd @Fiona and @Fabian here, if you have any thoughts here :).
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes
2025-04-30 11:09 ` Daniel Kral
@ 2025-05-02 9:33 ` Fiona Ebner
0 siblings, 0 replies; 67+ messages in thread
From: Fiona Ebner @ 2025-05-02 9:33 UTC (permalink / raw)
To: Proxmox VE development discussion
Am 30.04.25 um 13:09 schrieb Daniel Kral:
> On 3/25/25 16:12, Daniel Kral wrote:
>> sub select_service_node {
>> - my ($groups, $online_node_usage, $sid, $service_conf,
>> $current_node, $try_next, $tried_nodes, $maintenance_fallback,
>> $best_scored) = @_;
>> + # TODO Cleanup this signature post-RFC
>> + my ($rules, $groups, $online_node_usage, $sid, $service_conf,
>> $current_node, $try_next, $tried_nodes, $maintenance_fallback,
>> $best_scored) = @_;
>
> I'm currently trying to clean up the helper's signature here, but doing
> something like
>
> sub select_service_node {
> my ($service_info, $affinity_info, $try_next, $best_scored) = @_;
>
> my ($sid, $service_conf, $current_node) = $service_info->@{qw(sid
> config current_node)};
> my ($rules, $groups, $online_node_usage, $tried_nodes,
> $maintenance_fallback) =
> $affinity_info->@{qw(rules groups online_node_usage failed_nodes
> maintenance_node)};
>
> would require us to create helper structures on all four call sites (one
> of them is just the test case ./test_failover1.pl), or introduce another
> helper to just create them for passing it here and immediately de-
> structuring it in select_service_node(...):
>
> sub get_service_affinity_info {
> my ($self, $sid, $cd, $sd) = @_;
>
> my $service_info = {
> sid => $sid,
> config => $cd,
> current_node => $sd->{node},
> };
>
> my $affinity_info = {
> rules => $self->{rules},
> groups => $self->{groups},
> failed_nodes => $sd->{failed_nodes},
> maintenance_node => $sd->{maintenance_node},
> online_node_usage => $self->{online_node_usage},
> };
>
> return ($service_info, $affinity_info);
> };
>
> Also the call site in next_state_recovery(...) does not pass $sd-
>>{failed_nodes}, $sd->{maintenance_node} and $best_scored to it. AFAICS
> $sd->{failed_nodes} should be undef in next_state_recovery(...) anyway,
> but I feel like I have missed some states it could be in there. And $sd-
>>{maintenance_node} could be set anytime.
I think it makes sense to have it explicitly (rather than just
implicitly) opt-out of $try_next just like the caller for
rebalance_on_request_start. Without $try_next, the $tried_nodes
parameter does not have any effect (the caller for
rebalance_on_request_start passes it, but select_service_node() won't
read it if $try_next isn't set).
The caller in next_state_recovery() should also pass $best_scored IMHO,
so that it is fully aligned with the caller for
rebalance_on_request_start. It won't be an actual change result-wise,
because for recovery, the current node is not available, so it already
cannot be the result, but it makes sense semantically. We want the best
scored node for recovery. And having the two callers look the same is a
simplification too.
> If there's nothing speaking against that, I'd prefer to elevate
> select_service_node(...) to be a method as it needs quite a lot of state
> anyway, especially as we will need global information about other
> services than just the current one in the future anyway.
I don't have strong feelings about this, both approaches seem fine to me.
> So, I'd do something like
>
> sub select_service_node {
> my ($self, $sid, $service_conf, $sd, $mode) = @_;
>
> my ($rules, $groups, $online_node_usage) = $self->@{qw(rules groups
> online_node_usage)};
If we don't want to make it a method, we could still pass these ones
separately. After implementing location rules, $groups would be dropped
anyways.
> my ($current_node, $tried_nodes, $maintenance_fallback) = $self-
>>@{qw(node failed_nodes maintenance_node)};
It's $sd->... here.
> here. It's not fancy as in there's a well-defined interface one can
> immediately see what this helper needs (as it has access to the whole
> $self) and doesn't have the guarantees of a standalone helper (won't
> touch $self), but I think it could be better than creating helper
> structures which are only pass a message, which is immediately
> destructured anyway. We could also just pass $self slightly differently,
> but I don't see much difference there.
>
> The $mode could then be a enumeration of e.g. whether $try_next (e.g.
> 'try_again') or $best_scored (e.g. 'rebalance') is used (and can be
Having a mode sounds good to me. I don't think it should be called
'rebalance', the best-scored semantics should apply to recovery too, see
above.
> extended of course). Those are mutually exclusive in the three call
> sites right now. If next_state_recovery(...) really does have states
> where $tried_nodes is set (and $maintenance_node too), then we can also
> introduce a 'recovery' state, which will ignore them.
>
> The names for $service_conf and $sd can also be improved, but I wanted
> to introduce minimal change to select_service_node(...) as well as stay
> to the $sd name for the service data as in other places of the Manager.pm.
>
> That's still just a work in progress and I'd very appreciate some
> feedback if any of the two above are viable options here. If it helps
> any, I'd send the result as a separate series in advance which the HA
> colocation will then be based on, so we don't loose focus in the HA
> colocation patch series.
>
> CC'd @Fiona and @Fabian here, if you have any thoughts here :).
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2025-05-02 9:33 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
2025-03-25 17:49 ` [pve-devel] applied: " Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
2025-03-25 17:53 ` Thomas Lamprecht
2025-04-03 12:16 ` Fabian Grünbichler
2025-04-11 11:24 ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
2025-04-24 12:29 ` Fiona Ebner
2025-04-25 7:39 ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
2025-04-24 13:03 ` Fiona Ebner
2025-04-25 8:29 ` Daniel Kral
2025-04-25 9:12 ` Fiona Ebner
2025-04-25 13:30 ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
2025-04-03 12:16 ` Fabian Grünbichler
2025-04-11 11:04 ` Daniel Kral
2025-04-25 14:06 ` Fiona Ebner
2025-04-29 8:37 ` Daniel Kral
2025-04-29 9:15 ` Fiona Ebner
2025-04-25 14:05 ` Fiona Ebner
2025-04-29 8:44 ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-04-25 14:11 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
2025-04-25 14:30 ` Fiona Ebner
2025-04-29 8:04 ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
2025-04-28 13:03 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17 ` Fabian Grünbichler
2025-04-11 15:56 ` Daniel Kral
2025-04-28 12:46 ` Fiona Ebner
2025-04-29 9:07 ` Daniel Kral
2025-04-29 9:22 ` Fiona Ebner
2025-04-28 12:26 ` Fiona Ebner
2025-04-28 14:33 ` Fiona Ebner
2025-04-29 9:39 ` Daniel Kral
2025-04-29 9:50 ` Daniel Kral
2025-04-30 11:09 ` Daniel Kral
2025-05-02 9:33 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
2025-04-28 13:20 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
2025-04-28 13:44 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
2025-04-28 13:51 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
2025-04-28 14:44 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-04-29 8:54 ` Fiona Ebner
2025-04-29 9:01 ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-04-24 10:12 ` Fiona Ebner
2025-04-01 1:50 ` DERUMIER, Alexandre
2025-04-01 9:39 ` Daniel Kral
2025-04-01 11:05 ` DERUMIER, Alexandre via pve-devel
2025-04-03 12:26 ` Fabian Grünbichler
2025-04-24 10:12 ` Fiona Ebner
2025-04-24 10:12 ` Fiona Ebner
2025-04-25 8:36 ` Daniel Kral
2025-04-25 12:25 ` Fiona Ebner
2025-04-25 13:25 ` Daniel Kral
2025-04-25 13:58 ` Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal