public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Daniel Kral <d.kral@proxmox.com>,
	"DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
Date: Thu, 24 Apr 2025 12:12:06 +0200	[thread overview]
Message-ID: <6e314cc8-918a-4a03-8638-eef351e78d19@proxmox.com> (raw)
In-Reply-To: <498c09ec-662b-451b-a4a8-0aa51bb575df@proxmox.com>

Am 01.04.25 um 11:39 schrieb Daniel Kral:
> On 4/1/25 03:50, DERUMIER, Alexandre wrote:
>> my 2cents, but everybody in the industry is calling this
>> affinity/antiafifnity (vmware, nutanix, hyperv, openstack, ...).
>> More precisely, vm affinity rules (vm<->vm)   vs  node affinity rules
>> (vm->node , the current HA group)
>>
>> Personnally I don't care, it's just a name ^_^ .
>>
>> But I have a lot  of customers asking about "does proxmox support
>> affinity/anti-affinity". and if they are doing their own research, they
>> will think that it doesnt exist.
>> (or at minimum, write  somewhere in the doc something like "aka vm
>> affinity" or in commercial presentation ^_^)
> 
> I see your point and also called it affinity/anti-affinity before, but
> if we go for the HA Rules route here, it'd be really neat to have
> "Location Rules" and "Colocation Rules" in the end to coexist and
> clearly show the distinction between them, as both are affinity rules at
> least for me.
> 
> I'd definitely make sure that it is clear from the release notes and
> documentation, that this adds the feature to assign affinity between
> services, but let's wait for some other comments on this ;).

In the UI/docs we can be always be more descriptive and say things like
"(Anti-)Affinity Between Services" and "(Anti-)Affinity With Node",
while in the section config it's of course advantageous to have a single
word.

> 
> On 4/1/25 03:50, DERUMIER, Alexandre wrote:
>> More serious question : Don't have read yet all the code, but how does
>> it play with the current topsis placement algorithm ?
> 
> I currently implemented the colocation rules to put a constraint on
> which nodes the manager can select from for the to-be-migrated service.
> 
> So if users use the static load scheduler (and the basic / service count
> scheduler for that matter too), the colocation rules just make sure that
> no recovery node is selected, which contradicts the colocation rules. So
> the TOPSIS algorithm isn't changed at all.
> 
> There are two things that should/could be changed in the future (besides
> the many future ideas that I pointed out already), which are
> 
> - (1) the schedulers will still consider all online nodes, i.e. even
> though HA groups and/or colocation rules restrict the allowed nodes in
> the end, the calculation is done for all nodes which could be
> significant for larger clusters, and
> 
> - (2) the service (generally) are currently recovered one-by-one in a
> best-fit fashion, i.e. there's no order on the service's needed
> resources, etc. There could be some edge cases (e.g. think about a
> failing node with a bunch of service to be kept together; these should
> now be migrated to the same node, if possible, or put them on the
> minimum amount of nodes), where the algorithm could find better
> solutions if it either orders the to-be-recovered services, and/or the
> utilization scheduler has knowledge about the 'keep together'
> colocations and considers these (and all subsets) as a single service.

Yes, a simple heuristic here could be to take the subsets of:
1. (strict?) 'keep together' services
2. single services that are not otherwise in a (strict?) 'keep
   together' relation, consider each by itself a subset too

Then order the above subsets by their usage (ordering inside a subset
should not be that important) and then recover the services in that
order one-by-one (i.e. one-by-one for the first subset in the ordering,
then one-by-one for the second subset in the ordering, etc.). Even if
it's one-by-one that should mean keeping the (strict) 'keep together'
together, right?

Like that you get the heavy subsets out of the way first. This prevents
the otherwise likely scenario where too many small services are
recovered in a balanced fashion to other nodes (let's say nodes all end
up at 80% usage) and then there's no single node with the necessary
resources for a heavy service that is still to be recovered (e.g. one
that would need 30% usage on a node).

Can of course be done as a follow-up.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  parent reply	other threads:[~2025-04-24 10:12 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 15:12 Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
2025-03-25 17:49   ` [pve-devel] applied: " Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
2025-03-25 17:53   ` Thomas Lamprecht
2025-04-03 12:16     ` Fabian Grünbichler
2025-04-11 11:24       ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
2025-04-24 12:29   ` Fiona Ebner
2025-04-25  7:39     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
2025-04-24 13:03   ` Fiona Ebner
2025-04-25  8:29     ` Daniel Kral
2025-04-25  9:12       ` Fiona Ebner
2025-04-25 13:30         ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
2025-04-03 12:16   ` Fabian Grünbichler
2025-04-11 11:04     ` Daniel Kral
2025-04-25 14:06       ` Fiona Ebner
2025-04-29  8:37         ` Daniel Kral
2025-04-29  9:15           ` Fiona Ebner
2025-05-07  8:41             ` Daniel Kral
2025-04-25 14:05   ` Fiona Ebner
2025-04-29  8:44     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-04-25 14:11   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
2025-04-25 14:30   ` Fiona Ebner
2025-04-29  8:04     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
2025-04-28 13:03   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17   ` Fabian Grünbichler
2025-04-11 15:56     ` Daniel Kral
2025-04-28 12:46       ` Fiona Ebner
2025-04-29  9:07         ` Daniel Kral
2025-04-29  9:22           ` Fiona Ebner
2025-04-28 12:26   ` Fiona Ebner
2025-04-28 14:33     ` Fiona Ebner
2025-04-29  9:39       ` Daniel Kral
2025-04-29  9:50     ` Daniel Kral
2025-04-30 11:09   ` Daniel Kral
2025-05-02  9:33     ` Fiona Ebner
2025-05-07  8:31       ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
2025-04-28 13:20   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
2025-04-28 13:44   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
2025-04-28 13:51   ` Fiona Ebner
2025-05-09 11:22     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
2025-04-28 14:44   ` Fiona Ebner
2025-05-09 11:20     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-04-29  8:54   ` Fiona Ebner
2025-04-29  9:01   ` Fiona Ebner
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-04-24 10:12   ` Fiona Ebner
2025-04-01  1:50 ` DERUMIER, Alexandre
2025-04-01  9:39   ` Daniel Kral
2025-04-01 11:05     ` DERUMIER, Alexandre via pve-devel
2025-04-03 12:26     ` Fabian Grünbichler
2025-04-24 10:12     ` Fiona Ebner [this message]
2025-04-24 10:12 ` Fiona Ebner
2025-04-25  8:36   ` Daniel Kral
2025-04-25 12:25     ` Fiona Ebner
2025-04-25 13:25       ` Daniel Kral
2025-04-25 13:58         ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e314cc8-918a-4a03-8638-eef351e78d19@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=alexandre.derumier@groupe-cyllene.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal