From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id EB6181FF165
	for <inbox@lore.proxmox.com>; Thu, 24 Apr 2025 12:12:37 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 085053C41;
	Thu, 24 Apr 2025 12:12:38 +0200 (CEST)
Message-ID: <cab3e44f-1294-429d-8e06-b6743c3cb3a7@proxmox.com>
Date: Thu, 24 Apr 2025 12:12:04 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Fiona Ebner <f.ebner@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Daniel Kral <d.kral@proxmox.com>
References: <20250325151254.193177-1-d.kral@proxmox.com>
Content-Language: en-US
In-Reply-To: <20250325151254.193177-1-d.kral@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.038 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

Am 25.03.25 um 16:12 schrieb Daniel Kral:
> | Canonicalization
> ----------
> 
> Additionally, colocation rules are currently simplified as follows:
> 
> - If there are multiple positive colocation rules with common services
>   and the same strictness, these are merged to a single positive
>   colocation rule.

Do you intend to do that when writing the configuration file? I think
rules are better left unmerged from a user perspective. For example:

- services 1, 2 and 3 should strictly stay together, because of reason A
- services 1 and 3 should strictly stay together, because of different
reason B

Another scenario might be that the user is currently in the process of
editing some rules one-by-one and then it might also be surprising if
something is auto-merged.

You can of course always dynamically merge them when doing the
computation for the node selection.

In the same spirit, a comment field for each rule where the user can put
the reason might be nice to have.

Another question is if we should allow enabling/disabling rules.

Comment and enabling can of course always be added later. I'm just not
sure we should start out with the auto-merging of rules.

> | Inference rules
> ----------
> 
> There are currently no inference rules implemented for the RFC, but
> there could be potential to further simplify some code paths in the
> future, e.g. a positive colocation rule where one service is part of a
> restricted HA group makes the other services in the positive colocation
> rule a part of this HA group as well.

If the rule is strict. If we do this I think it should only happen
dynamically for the node selection too.


> Comment about HA groups -> Location Rules
> -----------------------------------------
> 
> This part is not really part of the patch series, but still worth for an
> on-list discussion.
> 
> I'd like to suggest to also transform the existing HA groups to location
> rules, if the rule concept turns out to be a good fit for the colocation
> feature in the HA Manager, as HA groups seem to integrate quite easily
> into this concept.
> 
> This would make service-node relationships a little more flexible for
> users and we'd be able to have both configurable / visible in the same
> WebUI view, API endpoint, and configuration file. Also, some code paths
> could be a little more consise, e.g. checking changes to constraints and
> canonicalizing the rules config.
> 
> The how should be rather straightforward for the obvious use cases:
> 
> - Services in unrestricted HA groups -> Location rules with the nodes of
>   the HA group; We could either split each node priority group into
>   separate location rules (with each having their score / weight) or
>   keep the input format of HA groups with a list of
>   `<node>(:<priority>)` in each rule
> 
> - Services in restricted HA groups -> Same as above, but also using
>   either `+inf` for a mandatory location rule or `strict` property
>   depending on how we decide on the colocation rule properties

I'd prefer having a 'strict' property, as that is orthogonal to the
priorities and that aligns it with what you propose for the colocation
rules.

> This would allow most of the use cases of HA groups to be easily
> migratable to location rules. We could also keep the inference of the
> 'default group' for unrestricted HA groups (any node that is available
> is added as a group member with priority -1).

Nodes can change, so adding them explicitly will mean it can get
outdated. This should be implicit/done dynamically.

> The only thing that I'm unsure about this, is how we would migrate the
> `nofailback` option, since this operates on the group-level. If we keep
> the `<node>(:<priority>)` syntax and restrict that each service can only
> be part of one location rule, it'd be easy to have the same flag. If we
> go with multiple location rules per service and each having a score or
> weight (for the priority), then we wouldn't be able to have this flag
> anymore. I think we could keep the semantic if we move this flag to the
> service config, but I'm thankful for any comments on this.
My gut feeling is that going for a more direct mapping, i.e. each
location rule represents one HA group, is better. The nofailback flag
can still apply to a given location rule I think? For a given service,
if a higher-priority node is online for any location rule the service is
part of, with nofailback=0, it will get migrated to that higher-priority
node. It does make sense to have a given service be part of only one
location rule then though, since node priorities can conflict between rules.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel