public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: "Fiona Ebner" <f.ebner@proxmox.com>,
	"Proxmox VE development discussion" <pve-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin
Date: Tue, 29 Apr 2025 10:37:26 +0200	[thread overview]
Message-ID: <bdea044b-68fb-45b8-b7c4-23b9fdd52d06@proxmox.com> (raw)
In-Reply-To: <9a47259a-2ffd-4e43-a45c-fa4f9a7c2ee8@proxmox.com>

Agree with all here, but I propose to remove the merging for any 
colocation rules in the last comment, if there's nothing speaking 
against it.

On 4/25/25 16:06, Fiona Ebner wrote:
> Am 11.04.25 um 13:04 schrieb Daniel Kral:
>> On 4/3/25 14:16, Fabian Grünbichler wrote:
>>> On March 25, 2025 4:12 pm, Daniel Kral wrote:
>>>> +sub check_services_count {
>>>> +    my ($rules) = @_;
>>>> +
>>>> +    my $conflicts = [];
>>>> +
>>>> +    foreach_colocation_rule($rules, sub {
>>>> +    my ($rule, $ruleid) = @_;
>>>> +
>>>> +    push @$conflicts, $ruleid if (scalar(keys %{$rule->{services}})
>>>> < 2);
>>>> +    });
>>>> +
>>>> +    return $conflicts;
>>>> +}
>>>
>>> is this really an issue? a colocation rule with a single service is just
>>> a nop? there's currently no cleanup AFAICT if a resource is removed, but
>>
>> You're right, AFAICS those are a noop when selecting the service node. I
>> guess I was a little pedantic / overprotective here about which rules
>> make sense in general instead of what the algorithm does in the end.
>>
>> And good point about handling when resources are removed, adding that to
>> delete_service_from_config comes right on my TODO list for the v1!
>>
>>> if we add that part (we maybe should?) then one can easily end up in a
>>> situation where a rule temporarily contains a single or no service?
>>
>> Hm, yes, especially if we add pools/tags at a later point to select
>> services for the rule, then this could happen very easily. But as you
>> already mentioned, those two cases would be noops too.
>>
>> Nevertheless, should we drop this? I think it could benefit users in
>> identifying that some rules might not do something they wanted and give
>> them a reason why, i.e. there's only one service in there, but at the
>> same time it could be a little noisy if there are a lot of affected rules.
> 
> I'd still keep rules that end up with only one service around, but maybe
> (temporarily) disable them. And/or we could also add a special
> "no-effect" marker like the "conflict" one proposed in my other answer?
> Then a user could enable/make the rule effective again by adding a new
> service to it.

Yes, I think the 'conflict'/'ineffective' states are much more 
user-friendly opposed to dropping them and only making these decisions 
explicit in the log. I'll add that to the v1!

> 
>>>> +
>>>> +=head3 check_positive_intransitivity($rules)
>>>> +
>>>> +Returns a list of conflicts caused by transitive positive colocation
>>>> rules
>>>> +defined in C<$rules>.
>>>> +
>>>> +Transitive positive colocation rules exist, if there are at least
>>>> two positive
>>>> +colocation rules with the same strictness, which put at least the
>>>> same two
>>>> +services in relation. This means, that these rules can be merged
>>>> together.
>>>> +
>>>> +If there are no conflicts, the returned list is empty.
>>>
>>> The terminology here is quit confusing - conflict meaning that two rules
>>> are "transitive" and thus mergeable (which is good, cause it makes
>>> things easier to handle?) is quite weird, as "conflict" is a rather
>>> negative term..
>>>
>>> there's only a single call site in the same module, maybe we could just
>>> rename this into "find_mergeable_positive_ruleids", similar to the
>>> variable where the result is stored?
>>
>> Yeah, I was probably to keen on the `$conflict = check_something(...)`
>> pattern here, but it would be much more readable with a simpler name,
>> I'll change that for the v1!
>>
>> -----
>>
>> Ad why: I'll also add some documentation about the rationale why this is
>> needed in the first place.
>>
>> The main reason was because the latter rule check
>> 'check_inner_consistency' depends on the fact that the positive
>> colocation rules have been merged already, as it assumes that each
>> positive colocation rule has all of the services in there, which are
>> positively colocated. If it weren't so, it wouldn't detect that the
>> following three rules are inconsistent with each other:
>>
>> colocation: stick-together1
>>      services vm:101,vm:104
>>      affinity together
>>      strict 1
>>
>> colocation: stick-together2
>>      services vm:104,vm:102
>>      affinity together
>>      strict 1
>>
>> colocation: keep-apart
>>      services vm:101,vm:102,vm:103
>>      affinity separate
>>      strict 1
>>
>> This reduces the complexity of the logic a little in
>> 'check_inner_consistency' as there it doesn't have to handle this
>> special case as 'stick-together1' and 'stick-together2' are already
>> merged in to one and it is easily apparent that vm 101 and vm 102 cannot
>> be colocated and non-colocated at the same time.
>>
>> -----
>>
>> Also, I was curious about how that would work out for the case where a
>> negative colocation rule was defined for three nodes with those rules
>> split into three rules (essentially a cycle dependence). This should in
>> theory have the same semantics as the above rule set:
>>
>> colocation: stick-together1
>>      services vm:101,vm:104
>>      affinity together
>>      strict 1
>>
>> colocation: stick-together2
>>      services vm:104,vm:102
>>      affinity together
>>      strict 1
>>
>> colocation: very-lonely-services1
>>      services vm:101,vm:102
>>      affinity separate
>>      strict 1
>>
>> colocation: very-lonely-services2
>>      services vm:102,vm:103
>>      affinity separate
>>      strict 1
>>
>> colocation: very-lonely-services3
>>      services vm:101,vm:103
>>      affinity separate
>>      strict 1
>>
>> Without the merge of positive rules, 'check_inner_consistency' would
>> again not detect the inconsistency here. But with the merge correctly
>> applied before checking the consistency, this would be resolved and the
>> effective rule set would be:
> 
> I suppose the effective rule set would still also contain the two
> 'together' rules, or?

No, here it would not. I found it would be most fair or reasonable that 
if a positive and a negative colocation rule contradict each other to 
drop both of them. Here the conflicts are

stick-together1 -- very-lonely-services1
stick-together2 -- very-lonely-services1

so all three of them will be dropped from the rule set.

Seeing this again here, such cases definitely benefit from the immediate 
response with the 'conflict'/'ineffective' state to show users that 
those won't be applied instead of only logging it.

> 
>>
>> colocation: very-lonely-services2
>>      services vm:102,vm:103
>>      affinity separate
>>      strict 1
>>
>> colocation: very-lonely-services3
>>      services vm:101,vm:103
>>      affinity separate
>>      strict 1
>>
>> It could be argued, that the negative colocation rules should be merged
>> in a similar manner here, as there's now a "effective" difference in the
>> semantics of the above rule sets, as the negative colocation rule
>> between vm 101 and vm 103 and vm 102 and vm 103 remains.
>>
>> What do you think?
> 
> I don't think there's a particular need to also merge negative rules
> between services (when they form a complete graph). It won't make a
> difference if there are no conflicts with positive rules and in edge
> cases when there are conflicts (which usually gets caught while editing
> the rules), it's better to drop fewer rules, so not merging is an
> advantage. Or do you have a particular advantage in favor of merging in
> mind?

Yes, I think so too.

There's quite the semantic difference between positive and negative 
colocation rules here. "Connected" positive colocation relationships 
(strict ones in particular) must be co-located in the end anyway, so it 
makes sense to merge them. Negative colocation relationships must be 
defined in a "circular" way and might just happen by coincidence for 
small scenarios.

But one thing that just struck me is that what if the user intentionally 
wrote them as separate rules? Then it might be confusing that all rules 
are dropped and not just the minimal amount that contradict other 
rules... Then check_inner_consistency() would just drop the minimal 
amount of rules that need to be dropped as in the above example.

It would be a softer interpretation of the rules indeed, but it might 
benefit the user in the end and make things easier to follow from the 
user perspective. If there's no opposition to that, I'd tend to drop the 
merging for any rules after all.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-04-29  8:37 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 15:12 [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
2025-03-25 17:49   ` [pve-devel] applied: " Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
2025-03-25 17:53   ` Thomas Lamprecht
2025-04-03 12:16     ` Fabian Grünbichler
2025-04-11 11:24       ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
2025-04-24 12:29   ` Fiona Ebner
2025-04-25  7:39     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
2025-04-24 13:03   ` Fiona Ebner
2025-04-25  8:29     ` Daniel Kral
2025-04-25  9:12       ` Fiona Ebner
2025-04-25 13:30         ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
2025-04-03 12:16   ` Fabian Grünbichler
2025-04-11 11:04     ` Daniel Kral
2025-04-25 14:06       ` Fiona Ebner
2025-04-29  8:37         ` Daniel Kral [this message]
2025-04-29  9:15           ` Fiona Ebner
2025-05-07  8:41             ` Daniel Kral
2025-04-25 14:05   ` Fiona Ebner
2025-04-29  8:44     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-04-25 14:11   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
2025-04-25 14:30   ` Fiona Ebner
2025-04-29  8:04     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
2025-04-28 13:03   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17   ` Fabian Grünbichler
2025-04-11 15:56     ` Daniel Kral
2025-04-28 12:46       ` Fiona Ebner
2025-04-29  9:07         ` Daniel Kral
2025-04-29  9:22           ` Fiona Ebner
2025-04-28 12:26   ` Fiona Ebner
2025-04-28 14:33     ` Fiona Ebner
2025-04-29  9:39       ` Daniel Kral
2025-04-29  9:50     ` Daniel Kral
2025-04-30 11:09   ` Daniel Kral
2025-05-02  9:33     ` Fiona Ebner
2025-05-07  8:31       ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
2025-04-28 13:20   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
2025-04-28 13:44   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
2025-04-28 13:51   ` Fiona Ebner
2025-05-09 11:22     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
2025-04-28 14:44   ` Fiona Ebner
2025-05-09 11:20     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-04-29  8:54   ` Fiona Ebner
2025-04-29  9:01   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-04-24 10:12   ` Fiona Ebner
2025-04-01  1:50 ` DERUMIER, Alexandre
2025-04-01  9:39   ` Daniel Kral
2025-04-01 11:05     ` DERUMIER, Alexandre via pve-devel
2025-04-03 12:26     ` Fabian Grünbichler
2025-04-24 10:12     ` Fiona Ebner
2025-04-24 10:12 ` Fiona Ebner
2025-04-25  8:36   ` Daniel Kral
2025-04-25 12:25     ` Fiona Ebner
2025-04-25 13:25       ` Daniel Kral
2025-04-25 13:58         ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bdea044b-68fb-45b8-b7c4-23b9fdd52d06@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal