public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "DERUMIER, Alexandre via pve-devel" <pve-devel@lists.proxmox.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
	"d.kral@proxmox.com" <d.kral@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
Date: Tue, 1 Apr 2025 11:05:57 +0000	[thread overview]
Message-ID: <mailman.411.1743505598.359.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <498c09ec-662b-451b-a4a8-0aa51bb575df@proxmox.com>

[-- Attachment #1: Type: message/rfc822, Size: 17669 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "d.kral@proxmox.com" <d.kral@proxmox.com>
Subject: Re: [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
Date: Tue, 1 Apr 2025 11:05:57 +0000
Message-ID: <ff6ab6753d00e1d6daa85fc985db90a1d056585e.camel@groupe-cyllene.com>

>>I currently implemented the colocation rules to put a constraint on 
>>which nodes the manager can select from for the to-be-migrated
>>service.

>>So if users use the static load scheduler (and the basic / service
>>count 
>>scheduler for that matter too), the colocation rules just make sure
>>that 
>>no recovery node is selected, which contradicts the colocation rules.
>>So 
>>the TOPSIS algorithm isn't changed at all.

Ah ok, got it, so it's an hard constraint (MUST) filtering the target
nodes.


>>There are two things that should/could be changed in the future
(besides 
>>the many future ideas that I pointed out already), which are

>>- (1) the schedulers will still consider all online nodes, i.e. even 
>>though HA groups and/or colocation rules restrict the allowed nodes
>>in 
>>the end, the calculation is done for all nodes which could be 
>>significant for larger clusters, and

>>- (2) the service (generally) are currently recovered one-by-one in a
>>best-fit fashion, i.e. there's no order on the service's needed 
>>resources, etc. There could be some edge cases (e.g. think about a 
>>failing node with a bunch of service to be kept together; these
>>should 
>>now be migrated to the same node, if possible, or put them on the 
>>minimum amount of nodes), where the algorithm could find better 
>>solutions if it either orders the to-be-recovered services, and/or
>>the 
>>utilization scheduler has knowledge about the 'keep together' 
>>colocations and considers these (and all subsets) as a single
service.
>>
>>For the latter, the complexity explodes a bit and is harder to test
>>for, 
>>which is why I've gone for the current implementation, as it also 
>>reduces the burden on users to think about what could happen with a 
>>specific set of rules and already allows the notion of MUST/SHOULD.
>>This 
>>gives enough flexibility to improve the decision making of the
>>scheduler 
>>in the future.

yes, soft constraint (SHOULD) is not so easy indeed.
I remember to have done some tests, putting in the topsis the number of
conflicting constraint by vm  for each host, and migrate vm with the
more constraint first. 
I had not too bad results, but this need to be tested at scale. 

Hard constraint is already a good step. (should work for 90% of people
without 10000 constraints mixed together )


On 4/1/25 03:50, DERUMIER, Alexandre wrote:
> Small feature request from students && customers:  they are a lot
> asking to be able to use vm tags in the colocation/affinity

>>Good idea! We were thinking about this too and I forgot to add it to
>>the 
>>list, thanks for bringing it up again!

Ye>>s, the idea would be to make pools and tags available as selectors
>>for 
>>rules here, so that the changes can be made rather dynamic by just 
>>adding a tag to a service.

could be perfect :)

>>The only thing we have to consider here is that HA rules have some 
>>verification phase and invalid rules will be dropped or modified to
>>make 
>>them applicable. Also these external changes must be identified
>>somehow 
>>in the HA stack, as I want to keep the amount of runs through the 
>>verification code to a minimum, i.e. only when the configuration is 
>>changed by the user. But that will be a discussion for another series
>>;).

yes sure!


BTW, another improvement could be hard constraint on storage
availability, as currently the HA stack is moving the vm blinding, 
try to start, then move the vm to another node if storage is available.
The only workaround is to create HA server group, but this could be a
improvment.

Same for the number of cores available on host.  (host number of cores
must be > than vm cores )


I'll try to take time to follow && test your patches !

Alexandre



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-04-01 11:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 15:12 Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH cluster 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 01/15] ignore output of fence config tests in tree Daniel Kral
2025-03-25 17:49   ` [pve-devel] applied: " Thomas Lamprecht
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines Daniel Kral
2025-03-25 17:53   ` Thomas Lamprecht
2025-04-03 12:16     ` Fabian Grünbichler
2025-04-11 11:24       ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 03/15] usage: add get_service_node and pin_service_node methods Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 04/15] add rules section config base plugin Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 05/15] rules: add colocation rule plugin Daniel Kral
2025-04-03 12:16   ` Fabian Grünbichler
2025-04-11 11:04     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 06/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 07/15] manager: read and update rules config Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 08/15] manager: factor out prioritized nodes in select_service_node Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 09/15] manager: apply colocation rules when selecting service nodes Daniel Kral
2025-04-03 12:17   ` Fabian Grünbichler
2025-04-11 15:56     ` Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option to limit start and migrate tries to node Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 12/15] test: ha tester: add test cases for strict positive " Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose " Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 14/15] test: ha tester: add test cases in more complex scenarios Daniel Kral
2025-03-25 15:12 ` [pve-devel] [PATCH ha-manager 15/15] test: add test cases for rules config Daniel Kral
2025-03-25 16:47 ` [pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules Daniel Kral
2025-04-01  1:50 ` DERUMIER, Alexandre
2025-04-01  9:39   ` Daniel Kral
2025-04-01 11:05     ` DERUMIER, Alexandre via pve-devel [this message]
2025-04-03 12:26     ` Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mailman.411.1743505598.359.pve-devel@lists.proxmox.com \
    --to=pve-devel@lists.proxmox.com \
    --cc=alexandre.derumier@groupe-cyllene.com \
    --cc=d.kral@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal