public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH ha-manager v2 14/26] manager: apply colocation rules when selecting service nodes
Date: Fri, 20 Jun 2025 16:31:26 +0200	[thread overview]
Message-ID: <20250620143148.218469-19-d.kral@proxmox.com> (raw)
In-Reply-To: <20250620143148.218469-1-d.kral@proxmox.com>

Add a mechanism to the node selection subroutine, which enforces the
colocation rules defined in the rules config.

The algorithm makes in-place changes to the set of nodes in such a way,
that the final set contains only the nodes where the colocation rules
allow the service to run on, depending on the affinity type of the
colocation rules.

The service's failback property also slightly changes meaning because
now it also controls how services behave for colocation rules, not only
location rules.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
    - added documentation
    - moved apply helpers from Manager.pm into Colocation rule plugin
    - dropped loose colocations, so I could shrink down the apply
      helpers to only a few lines
    - dropped `get_colocated_services(...)` from this patch (which will
      be introduced in another version in a later patch), and merged its
      logic into `get_colocation_preference(...)`
    - fix bug when positively colocated services are on different nodes
      (e.g. when newly creating a rule for these), then they could still
      favor their current node, because then multiple nodes are in the
      $together hash set. For now, just select for all of them the node
      which is the most populated by all other pos. colocated services;
      this can be improved in a follow-up to check which node has the
      resources for all of them, for example
    - introduce `is_allowed_on_node(...)` helper to check in 'none' mode
      whether the current node is compliant with the colocation rules

 src/PVE/HA/Manager.pm          |  15 +++-
 src/PVE/HA/Resources.pm        |   3 +-
 src/PVE/HA/Rules/Colocation.pm | 151 +++++++++++++++++++++++++++++++++
 3 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 4c7228e..a69898b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -11,7 +11,8 @@ use PVE::HA::Tools ':exit_codes';
 use PVE::HA::NodeStatus;
 use PVE::HA::Rules;
 use PVE::HA::Rules::Location qw(get_location_preference);
-use PVE::HA::Rules::Colocation;
+use PVE::HA::Rules::Colocation
+    qw(get_colocation_preference apply_positive_colocation_rules apply_negative_colocation_rules);
 use PVE::HA::Usage::Basic;
 use PVE::HA::Usage::Static;
 
@@ -155,8 +156,15 @@ sub select_service_node {
 
     return undef if !%$pri_nodes;
 
+    my ($together, $separate) = get_colocation_preference($rules, $sid, $online_node_usage);
+
     # stay on current node if possible (avoids random migrations)
-    if ($mode eq 'none' && !$service_conf->{failback} && $allowed_nodes->{$current_node}) {
+    if (
+        $mode eq 'none'
+        && !$service_conf->{failback}
+        && $allowed_nodes->{$current_node}
+        && PVE::HA::Rules::Colocation::is_allowed_on_node($together, $separate, $current_node)
+    ) {
         return $current_node;
     }
 
@@ -167,6 +175,9 @@ sub select_service_node {
         }
     }
 
+    apply_positive_colocation_rules($together, $pri_nodes);
+    apply_negative_colocation_rules($separate, $pri_nodes);
+
     return $maintenance_fallback
         if defined($maintenance_fallback) && $pri_nodes->{$maintenance_fallback};
 
diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 90410a9..f8aad35 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -65,7 +65,8 @@ EODESC
         failback => {
             description => "Automatically migrate service to the node with the highest priority"
                 . " according to their location rules, if a node with a higher priority than the"
-                . " current node comes online.",
+                . " current node comes online, or migrate to the node, which doesn't violate any"
+                . " colocation rule.",
             type => 'boolean',
             optional => 1,
             default => 1,
diff --git a/src/PVE/HA/Rules/Colocation.pm b/src/PVE/HA/Rules/Colocation.pm
index 0539eb3..190478e 100644
--- a/src/PVE/HA/Rules/Colocation.pm
+++ b/src/PVE/HA/Rules/Colocation.pm
@@ -7,8 +7,15 @@ use PVE::HashTools;
 
 use PVE::HA::Rules;
 
+use base qw(Exporter);
 use base qw(PVE::HA::Rules);
 
+our @EXPORT_OK = qw(
+    get_colocation_preference
+    apply_positive_colocation_rules
+    apply_negative_colocation_rules
+);
+
 =head1 NAME
 
 PVE::HA::Rules::Colocation - Colocation Plugin for HA Rules
@@ -284,4 +291,148 @@ sub plugin_canonicalize {
     merge_connected_positive_colocation_rules($rules, $args->{positive_rules});
 }
 
+=head1 COLOCATION RULE HELPERS
+
+=cut
+
+=head3 get_colocation_preference($rules, $sid, $online_node_usage)
+
+Returns a list of two hashes, where the first describes the positive colocation
+preference and the second describes the negative colocation preference for
+C<$sid> according to the colocation rules in C<$rules> and the service
+locations in C<$online_node_usage>.
+
+For the positive colocation preference of a service C<$sid>, each element in the
+hash represents an online node, where other positively colocated services are
+already running, and how many of them. That is, each element represents a node,
+where the service must be.
+
+For the negative colocation preference of a service C<$sid>, each element in the
+hash represents an online node, where other negatively colocated services are
+already running. That is, each element represents a node, where the service must
+not be.
+
+For example, if there are already three services running, which the service
+C<$sid> is in a positive colocation with, and two running services, which the
+service C<$sid> is in a negative colocation relationship with, the returned
+value will be:
+
+    {
+        together => {
+            node2 => 3
+        },
+        separate => {
+            node1 => 1,
+            node3 => 1
+        }
+    }
+
+=cut
+
+sub get_colocation_preference : prototype($$$) {
+    my ($rules, $sid, $online_node_usage) = @_;
+
+    my $together = {};
+    my $separate = {};
+
+    PVE::HA::Rules::foreach_rule(
+        $rules,
+        sub {
+            my ($rule) = @_;
+
+            for my $csid (keys %{ $rule->{services} }) {
+                next if $csid eq $sid;
+
+                my $nodes = $online_node_usage->get_service_nodes($csid);
+
+                next if !$nodes || !@$nodes; # skip unassigned nodes
+
+                if ($rule->{affinity} eq 'together') {
+                    $together->{$_}++ for @$nodes;
+                } elsif ($rule->{affinity} eq 'separate') {
+                    $separate->{$_} = 1 for @$nodes;
+                } else {
+                    die "unimplemented colocation affinity type $rule->{affinity}\n";
+                }
+            }
+        },
+        {
+            sid => $sid,
+            type => 'colocation',
+            state => 'enabled',
+        },
+    );
+
+    return ($together, $separate);
+}
+
+=head3 is_allowed_on_node($together, $separate, $node)
+
+Checks whether the colocation preference hashes C<$together> or C<$separate>
+describe colocation relation, where for C<$together> the C<$node> must be
+selected, or for C<$separate> the node C<$node> must be avoided.
+
+=cut
+
+sub is_allowed_on_node : prototype($$$) {
+    my ($together, $separate, $node) = @_;
+
+    return $together->{$node} || !$separate->{$node};
+}
+
+=head3 apply_positive_colocation_rules($together, $allowed_nodes)
+
+Applies the positive colocation preference C<$together> on the allowed node
+hash set C<$allowed_nodes> by modifying it directly.
+
+Positive colocation means keeping services together on a single node and
+therefore minimizing the separation of services.
+
+The allowed node hash set C<$allowed_nodes> is expected to contain all nodes,
+which are available to the service this helper is called for, i.e. each node
+is currently online, available according to other location constraints, and the
+service has not failed running there yet.
+
+=cut
+
+sub apply_positive_colocation_rules : prototype($$) {
+    my ($together, $allowed_nodes) = @_;
+
+    my @possible_nodes = sort keys $together->%*
+        or return; # nothing to do if there are no positive colocation preferences
+
+    # select the most populated node as the target node for positive colocations
+    @possible_nodes = sort { $together->{$b} <=> $together->{$a} } @possible_nodes;
+    my $majority_node = $possible_nodes[0];
+
+    for my $node (keys %$allowed_nodes) {
+        delete $allowed_nodes->{$node} if $node ne $majority_node;
+    }
+}
+
+=head3 apply_negative_colocation_rules($separate, $allowed_nodes)
+
+Applies the negative colocation preference C<$separate> on the allowed node
+hash set C<$allowed_nodes> by modifying it directly.
+
+Negative colocation means keeping services separate on multiple nodes and
+therefore maximizing the separation of services.
+
+The allowed node hash set C<$allowed_nodes> is expected to contain all nodes,
+which are available to the service this helper is called for, i.e. each node
+is currently online, available according to other location constraints, and the
+service has not failed running there yet.
+
+=cut
+
+sub apply_negative_colocation_rules : prototype($$) {
+    my ($separate, $allowed_nodes) = @_;
+
+    my $forbidden_nodes = { $separate->%* };
+
+    for my $node (keys %$forbidden_nodes) {
+        delete $allowed_nodes->{$node};
+    }
+}
+
 1;
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-06-20 14:35 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20 14:31 [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH common v2 1/1] introduce HashTools module Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 1/3] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 2/3] datacenter config: make pve-ha-shutdown-policy optional Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules Daniel Kral
2025-06-23 15:58   ` Thomas Lamprecht
2025-06-24  7:29     ` Daniel Kral
2025-06-24  7:51       ` Thomas Lamprecht
2025-06-24  8:19         ` Daniel Kral
2025-06-24  8:25           ` Thomas Lamprecht
2025-06-24  8:52             ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 01/26] tree-wide: make arguments for select_service_node explicit Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node Daniel Kral
2025-06-23 16:21   ` Thomas Lamprecht
2025-06-24  8:06     ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 03/26] introduce rules base plugin Daniel Kral
2025-07-04 14:18   ` Michael Köppl
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin Daniel Kral
2025-06-20 16:17   ` Jillian Morgan
2025-06-20 16:30     ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 05/26] rules: introduce colocation " Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules Daniel Kral
2025-07-01 11:02   ` Daniel Kral
2025-07-04 14:43   ` Michael Köppl
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 07/26] config, env, hw: add rules read and parse methods Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 08/26] manager: read and update rules config Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 09/26] test: ha tester: add test cases for future location rules Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 10/26] resources: introduce failback property in service config Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 11/26] manager: migrate ha groups to location rules in-memory Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 12/26] manager: apply location rules when selecting service nodes Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 13/26] usage: add information about a service's assigned nodes Daniel Kral
2025-06-20 14:31 ` Daniel Kral [this message]
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 15/26] manager: handle migrations for colocated services Daniel Kral
2025-06-27  9:10   ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 16/26] sim: resources: add option to limit start and migrate tries to node Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 17/26] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 18/26] test: ha tester: add test cases for strict positive " Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 19/26] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 20/26] test: add test cases for rules config Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services Daniel Kral
2025-07-01 12:11   ` Michael Köppl
2025-07-01 12:23     ` Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 22/26] config: prune services from rules if services are deleted from config Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 23/26] api: introduce ha rules api endpoints Daniel Kral
2025-07-04 14:16   ` Michael Köppl
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 24/26] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 25/26] api: groups, services: assert use-location-rules feature flag Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH ha-manager v2 26/26] api: services: check for colocations for service motions Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 1/5] ha: config: add section about ha rules Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 2/5] update static files to include ha rules api endpoints Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 3/5] update static files to include use-location-rules feature flag Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 4/5] update static files to include ha resources failback flag Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH docs v2 5/5] update static files to include ha service motion return value schema Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 1/5] api: ha: add ha rules api endpoints Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 2/5] ui: add use-location-rules feature flag Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 3/5] ui: ha: hide ha groups if use-location-rules is enabled Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 4/5] ui: ha: adapt resources components " Daniel Kral
2025-06-20 14:31 ` [pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry Daniel Kral
2025-06-30 15:09   ` Michael Köppl
2025-07-01 14:38   ` Michael Köppl
2025-06-20 15:43 ` [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules Daniel Kral
2025-06-20 17:11   ` Jillian Morgan
2025-06-20 17:45     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <476c41123dced9d560dfbf27640ef8705fd90f11.camel@groupe-cyllene.com>
2025-06-23 15:36       ` Thomas Lamprecht
2025-06-24  8:48         ` Daniel Kral
2025-06-27 12:23           ` Friedrich Weber
2025-06-27 12:41             ` Daniel Kral
2025-06-23  8:11 ` DERUMIER, Alexandre via pve-devel
     [not found] ` <bf973ec4e8c52a10535ed35ad64bf0ec8d1ad37d.camel@groupe-cyllene.com>
2025-06-23 15:28   ` Thomas Lamprecht
2025-06-23 23:21     ` DERUMIER, Alexandre via pve-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250620143148.218469-19-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal