all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: Fiona Ebner <f.ebner@proxmox.com>,
	Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
Date: Fri, 25 Apr 2025 10:36:49 +0200	[thread overview]
Message-ID: <c93a04b3-b177-4bfc-8a8a-8e10fd635b15@proxmox.com> (raw)
In-Reply-To: <cab3e44f-1294-429d-8e06-b6743c3cb3a7@proxmox.com>

Thanks for the review, Fiona!

I have some comments left, one of them is about the last comment about 
how to migrate HA groups to location rules to give a better illustration 
why I'd like to allow multiple location rules in the end, hope we're 
able to do this.

On 4/24/25 12:12, Fiona Ebner wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> | Canonicalization
>> ----------
>>
>> Additionally, colocation rules are currently simplified as follows:
>>
>> - If there are multiple positive colocation rules with common services
>>    and the same strictness, these are merged to a single positive
>>    colocation rule.
> 
> Do you intend to do that when writing the configuration file? I think
> rules are better left unmerged from a user perspective. For example:
> 
> - services 1, 2 and 3 should strictly stay together, because of reason A
> - services 1 and 3 should strictly stay together, because of different
> reason B
> 
> Another scenario might be that the user is currently in the process of
> editing some rules one-by-one and then it might also be surprising if
> something is auto-merged.
> 
> You can of course always dynamically merge them when doing the
> computation for the node selection.

This is what I had in mind and I should have made the description for 
that clearer here. It is only for computing the feasibility of the rules 
when (1) creating, (2) updating, and (3) applying them.

As suggested by @Lukas off-list, I'll also try to make the check 
selective, e.g. the user has made an infeasible change to the config 
manually by writing to the file and then wants to create another rule. 
Here it should ignore the infeasible rules (as they'll be dropped 
anyway) and only check if the added rule / changed rule is infeasible.

But as you said, it must not change the user's configuration in the end 
as that would be very confusing to the user.

> 
> In the same spirit, a comment field for each rule where the user can put
> the reason might be nice to have.
> 
> Another question is if we should allow enabling/disabling rules.
> 
> Comment and enabling can of course always be added later. I'm just not
> sure we should start out with the auto-merging of rules.

Good idea, I think there are definitely use cases for enabling/disabling 
the rules and it's easy to implement, will add that to v1 :).

> 
>> | Inference rules
>> ----------
>>
>> There are currently no inference rules implemented for the RFC, but
>> there could be potential to further simplify some code paths in the
>> future, e.g. a positive colocation rule where one service is part of a
>> restricted HA group makes the other services in the positive colocation
>> rule a part of this HA group as well.
> 
> If the rule is strict. If we do this I think it should only happen
> dynamically for the node selection too.

Yes, I'll take a closer look here, but I fully agree that this part 
should also be done dynamically as the steps above.

I'll see if that could improve something and wouldn't be unnecessary 
overhead that will be handled by the node selection in the end anyway. 
Roughly speaking, I'd like the select_service_node(...) to mostly 
consist of the steps (as already done now with HA groups):

apply_location_rules(...);
apply_colocation_rules(...);

$scores = score_nodes_to_start_service(...);
/* select the best node according to utilization) */

> 
> 
>> Comment about HA groups -> Location Rules
>> -----------------------------------------
>>
>> This part is not really part of the patch series, but still worth for an
>> on-list discussion.
>>
>> I'd like to suggest to also transform the existing HA groups to location
>> rules, if the rule concept turns out to be a good fit for the colocation
>> feature in the HA Manager, as HA groups seem to integrate quite easily
>> into this concept.
>>
>> This would make service-node relationships a little more flexible for
>> users and we'd be able to have both configurable / visible in the same
>> WebUI view, API endpoint, and configuration file. Also, some code paths
>> could be a little more consise, e.g. checking changes to constraints and
>> canonicalizing the rules config.
>>
>> The how should be rather straightforward for the obvious use cases:
>>
>> - Services in unrestricted HA groups -> Location rules with the nodes of
>>    the HA group; We could either split each node priority group into
>>    separate location rules (with each having their score / weight) or
>>    keep the input format of HA groups with a list of
>>    `<node>(:<priority>)` in each rule
>>
>> - Services in restricted HA groups -> Same as above, but also using
>>    either `+inf` for a mandatory location rule or `strict` property
>>    depending on how we decide on the colocation rule properties
> 
> I'd prefer having a 'strict' property, as that is orthogonal to the
> priorities and that aligns it with what you propose for the colocation
> rules.

ACK, I forgot to remove this bit as I dropped the idea of the flat 
'score' property.

> 
>> This would allow most of the use cases of HA groups to be easily
>> migratable to location rules. We could also keep the inference of the
>> 'default group' for unrestricted HA groups (any node that is available
>> is added as a group member with priority -1).
> 
> Nodes can change, so adding them explicitly will mean it can get
> outdated. This should be implicit/done dynamically.

Yes, I should have stated this more clearly: it was meant to be 
dynamically inferred instead of statically written to the config as this 
would only clutter the config with "useless" rules for any service that 
didn't have a HA group before ;).

> 
>> The only thing that I'm unsure about this, is how we would migrate the
>> `nofailback` option, since this operates on the group-level. If we keep
>> the `<node>(:<priority>)` syntax and restrict that each service can only
>> be part of one location rule, it'd be easy to have the same flag. If we
>> go with multiple location rules per service and each having a score or
>> weight (for the priority), then we wouldn't be able to have this flag
>> anymore. I think we could keep the semantic if we move this flag to the
>> service config, but I'm thankful for any comments on this.
> My gut feeling is that going for a more direct mapping, i.e. each
> location rule represents one HA group, is better. The nofailback flag
> can still apply to a given location rule I think? For a given service,
> if a higher-priority node is online for any location rule the service is
> part of, with nofailback=0, it will get migrated to that higher-priority
> node. It does make sense to have a given service be part of only one
> location rule then though, since node priorities can conflict between rules.

Yeah, I think this is the reasonable option too.

I briefly discussed this with @Fabian off-list and we also agreed that 
it would be good to make location rules as 1:1 to location rules as 
possible and keep the nofailback per location rule, as the behavior of 
the HA group's nofailback could still be preserved - at least if there's 
only a single location rule per service at least.

---

On the other hand, I'll have to take a closer look if we can do 
something about the blockers when creating multiple location rules where 
e.g. one has nofailback enabled and the other has not. As you already 
said, they could easily conflict between rules...

My previous idea was to make location rules as flexible as possible, so 
that it would theoretically not matter if one writes:

location: rule1
     services: vm:101
     nodes: node1:2,node2:1
     strict: 1

or:

location: rule1
     services: vm:101
     nodes: node1
     strict: 1

location: rule2
     services: vm:101
     nodes: node2
     strict: 1

The order which one's more important could be encoded in the order which 
it is defined (if one configures this in the config it's easy, and I'd 
add an API endpoint to realize this over the API/WebGUI too), or maybe 
even simpler to maintain: just another property. But then, the 
nofailback would have to be either moved to some other place...

Or it is still allowed in location rules, but either the more detailed 
rule wins (e.g. one rule has node1 without a priority and the other does 
have node1 with a priority) or the first location rule with a specific 
node wins and the other is ignored. But this is already confusing when 
writing it out here...

I'd prefer users to write the former (and make this the dynamic 
'canonical' form when selecting nodes), but as with colocation rules it 
could make sense to separate them for specific reasons / use cases.

And another reason why it could still make sense to go that way is to 
allow "negative" location rules at a later point, which makes sense in 
larger environments, where it's easier to write opt-out rules than 
opt-in rules, so I'd like to keep that path open for the future.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2025-04-25  8:36 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 15:12 Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
2025-03-25 17:49   ` [pve-devel] applied: " Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
2025-03-25 17:53   ` Thomas Lamprecht
2025-04-03 12:16     ` Fabian Grünbichler
2025-04-11 11:24       ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
2025-04-24 12:29   ` Fiona Ebner
2025-04-25  7:39     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
2025-04-24 13:03   ` Fiona Ebner
2025-04-25  8:29     ` Daniel Kral
2025-04-25  9:12       ` Fiona Ebner
2025-04-25 13:30         ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
2025-04-03 12:16   ` Fabian Grünbichler
2025-04-11 11:04     ` Daniel Kral
2025-04-25 14:06       ` Fiona Ebner
2025-04-29  8:37         ` Daniel Kral
2025-04-29  9:15           ` Fiona Ebner
2025-05-07  8:41             ` Daniel Kral
2025-04-25 14:05   ` Fiona Ebner
2025-04-29  8:44     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-04-25 14:11   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
2025-04-25 14:30   ` Fiona Ebner
2025-04-29  8:04     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
2025-04-28 13:03   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17   ` Fabian Grünbichler
2025-04-11 15:56     ` Daniel Kral
2025-04-28 12:46       ` Fiona Ebner
2025-04-29  9:07         ` Daniel Kral
2025-04-29  9:22           ` Fiona Ebner
2025-04-28 12:26   ` Fiona Ebner
2025-04-28 14:33     ` Fiona Ebner
2025-04-29  9:39       ` Daniel Kral
2025-04-29  9:50     ` Daniel Kral
2025-04-30 11:09   ` Daniel Kral
2025-05-02  9:33     ` Fiona Ebner
2025-05-07  8:31       ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
2025-04-28 13:20   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
2025-04-28 13:44   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
2025-04-28 13:51   ` Fiona Ebner
2025-05-09 11:22     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
2025-04-28 14:44   ` Fiona Ebner
2025-05-09 11:20     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-04-29  8:54   ` Fiona Ebner
2025-04-29  9:01   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-04-24 10:12   ` Fiona Ebner
2025-04-01  1:50 ` DERUMIER, Alexandre
2025-04-01  9:39   ` Daniel Kral
2025-04-01 11:05     ` DERUMIER, Alexandre via pve-devel
2025-04-03 12:26     ` Fabian Grünbichler
2025-04-24 10:12     ` Fiona Ebner
2025-04-24 10:12 ` Fiona Ebner
2025-04-25  8:36   ` Daniel Kral [this message]
2025-04-25 12:25     ` Fiona Ebner
2025-04-25 13:25       ` Daniel Kral
2025-04-25 13:58         ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c93a04b3-b177-4bfc-8a8a-8e10fd635b15@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal