all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Michael Köppl" <m.koeppl@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Daniel Kral <d.kral@proxmox.com>
Subject: Re: [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules
Date: Tue, 22 Jul 2025 18:38:06 +0200	[thread overview]
Message-ID: <7dccc5ec-0e2e-427a-9261-f32af9da77c1@proxmox.com> (raw)
In-Reply-To: <20250704181659.465441-1-d.kral@proxmox.com>

I've had a look at this series without looking at the second series that
extends this (yet). As Dano noted, there are multiple TODOs for the
final ha-manager patch in this series that tries to persist the
migration from groups to rules. During my testing I quickly ran into
some problems with this patch. I noted some of the errors I encountered
and how they could be fixed in the respective patch.

With that said, I tested the following on a 3-node cluster:
- Migration of resource in when node with higher priority becomes
available (tested also that no migration occurs if nofailback is active)
- Stopping node the resource is on from outside correctly triggers
recovery to another node
- Resource is stopped if none of the nodes in the group restricted group
is available
- Same case as before, but node keeps running on node outside of the
group in case of unrestricted group
- Resource is migrated back once a node from its group comes online
again if nofailback is not checked
- Resource stays on current node even if its preferred node comes online
again if nofailback is checked

The in-memory migration seemed to work well, I did not really notice any
problems there.

The persistent migration failed due to the errors describes in
ha-manager 15/15. I applied the changes I proposed in my comments to
that patch and tested that the migration produced a correct rules.cfg. I
tested the various possible group configurations and checked that the
resulting rules.cfg contained the correct nodes and resources.

I noticed the following regarding persistent migration:
- The resources.cfg is empty after the migration. I have not yet been
able to find out why, but I'll try to send a fix for that.
- The nofailback flag is (due to above) not correctly migrated

The migrations based on the persisted rules seemed to work (apart from
failback). Of course, without the second series, the current state is
that the groups are migrated and then are simply gone and there is
nothing to replace them, so it's not usable with the persistent
migration. I'll have a look at the second series as well and at the
tests for this one.

On 7/4/25 20:16, Daniel Kral wrote:
> RFC v1: https://lore.proxmox.com/pve-devel/20250325151254.193177-1-d.kral@proxmox.com/
> RFC v2: https://lore.proxmox.com/pve-devel/20250620143148.218469-1-d.kral@proxmox.com/
> 
> I've separated the core HA Rules module and the transformation from HA
> groups to HA Node Affinity rules (formerly known as HA Location rules)
> in this patch series, to reduce the overhead for reviewers and strive
> for a better version history, as changing two things at a time is rather
> confusing.
> 
> The main things that have changed since the last version (v2):
> 
> - split up the patch series (ofc)
> 
> - rebased on newest available master
> 
> - renamed "HA Location Rule" to "HA Node Affinity Rule"
> 
> - renamed any reference of a 'HA service' to 'HA resource' (e.g. rules
>   property 'services' is now 'resources')
> 
> - converted tri-state property 'state' to a binary 'disable' flag on HA
>   rules and expose the 'contradictory' state with an 'errors' hash
> 
> - remove the "use-location-rules" feature flag and implement a more
>   straightforward ha groups migration (more on that below)
> 
> - remove any reference of ha groups from the web interface
> 
> 
> 
> As before, HA groups are migrated to HA node affinity rules in each HA
> Manager round where something has changed about the HA groups / HA
> resources config file, but these are now unconditionally done as soon as
> a HA Manager runs with that version. It will also try to persistently
> migrate these, but that will only be successful as soon as all other
> nodes are upgraded (i.e. every node can run at least the HA Manager
> version that can successfully parse and apply the HA rules).
> 
> 
> There are still some things left to do, which I didn't get the time to
> come around to do for this revision:
> 
> - Testing, testing, testing
> 
> - I've ran out of time on the persistent HA groups migration part, which
>   has at least the two TODOs, which are mentioned in the patch itself,
>   and I haven't tested them on any real PVE upgrade yet; It's more of a
>   draft on how the migration should potentially work
> 
> - Also, the last patch for the persistent HA groups migration part will
>   fail the tests but the two that have been added, because of the way
>   the other tests are designed; that should be abstracted away in the HA
>   environment, e.g., a routine "have_groups_been_migrated" for PVE2/Sim.
> 
> - There might be a bit too many in-memory group migrations on the HA
>   Rules API side now, but better safe then sorry, maybe they can be
>   removed later; however, these shouldn't overwrite the rules that come
>   from the config, I haven't checked on that yet
> 
> - Should the HA Groups API (and the HA Resources 'group' property in the
>   HA Resources API) be removed now? Or should these stay and uses of
>   them make auto-migrations to the HA Rules?
> 
> As in the previous revisions, I've run a
> 
>     git rebase master --exec 'make clean && make deb'
> 
> on the series, so the tests should work for every patch.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-07-22 16:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-04 18:16 Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH cluster v3 1/1] cfs: add 'ha/rules.cfg' to observed files Daniel Kral
2025-07-16 14:02   ` [pve-devel] applied: " Thomas Lamprecht
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 01/15] tree-wide: make arguments for select_service_node explicit Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 02/15] manager: improve signature of select_service_node Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 03/15] introduce rules base plugin Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 04/15] rules: introduce node affinity rule plugin Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 05/15] config, env, hw: add rules read and parse methods Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 06/15] config: delete services from rules if services are deleted from config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 07/15] manager: read and update rules config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 08/15] test: ha tester: add test cases for future node affinity rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 09/15] resources: introduce failback property in ha resource config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 10/15] manager: migrate ha groups to node affinity rules in-memory Daniel Kral
2025-07-22 16:38   ` Michael Köppl
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 11/15] manager: apply node affinity rules when selecting service nodes Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 12/15] test: add test cases for rules config Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 13/15] api: introduce ha rules api endpoints Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH ha-manager v3 14/15] cli: expose ha rules api endpoints to ha-manager cli Daniel Kral
2025-07-04 18:16 ` [pve-devel] [RFC ha-manager v3 15/15] manager: persistently migrate ha groups to ha rules Daniel Kral
2025-07-22 16:38   ` Michael Köppl
2025-07-22 16:56     ` Michael Köppl
2025-07-04 18:16 ` [pve-devel] [PATCH docs v3 1/1] ha: add documentation about ha rules and ha node affinity rules Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 1/3] api: ha: add ha rules api endpoints Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 2/3] ui: ha: remove ha groups from ha resource components Daniel Kral
2025-07-04 18:16 ` [pve-devel] [PATCH manager v3 3/3] ui: ha: show failback flag in resources status view Daniel Kral
2025-07-22 16:38 ` Michael Köppl [this message]
2025-07-23 15:36   ` [pve-devel] [PATCH cluster/docs/ha-manager/manager v3 00/20] HA Rules Michael Köppl
2025-07-30 10:13 ` [pve-devel] superseded: " Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7dccc5ec-0e2e-427a-9261-f32af9da77c1@proxmox.com \
    --to=m.koeppl@proxmox.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal