public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH-SERIES ha-manager 0/7] improve handling of maintenance nodes
Date: Wed, 22 Apr 2026 12:00:18 +0200	[thread overview]
Message-ID: <20260422100035.232716-1-d.kral@proxmox.com> (raw)

As reported by a recent Proxmox forum post [0], there are some cases
where either HA resources are not moved away from maintenance nodes or
are not moved back to the maintenance nodes after these are put out of
maintenance again.

Even though we cannot resolve all situations (for example, the affinity
rules constrain the HA resource so that it cannot be moved anywhere but
the maintenance node), this patch series improves the handling with:

- log warnings if HA resources cannot be moved to a replacement node
- make HA resources with fallback enabled move back to their previous
  maintenance node
- make HA resource bundles move back to their previous maintenance node
- try all available, effective priority classes for an HA resource while
  applying the negative resource affinity rules

The last change makes the system more consistent overall, but might
introduce some unintended node placements in highly constrained
scenarios because of how the HA Manager currently resolves these node
placements individually per-HA resource. This should be improved upon in
a future patch series (this bugzilla entry [1] might also be relevant).

[0] https://forum.proxmox.com/threads/182890/
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7475

Daniel Kral (7):
  manager: warn if HA resources cannot be moved away from maintenance
    node
  test: add test casses for node affinity rules with maintenance mode
  test: add test cases for resource affinity rules with maintenance mode
  manager: make HA resources without failback move back to maintenance
    node
  manager: make HA resource bundles move back to maintenance node
  make get_node_affinity return all priority classes sorted in
    descending order
  manager: try multiple priority classes when applying negative resource
    affinity

 src/PVE/HA/Manager.pm                         | 60 +++++++++++++---
 src/PVE/HA/Rules/NodeAffinity.pm              | 24 ++++---
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  3 +
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  3 +
 .../README                                    |  3 +
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 35 ++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  3 +
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 35 ++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 48 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  3 +
 .../README                                    |  5 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 54 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  4 ++
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 47 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  5 ++
 .../README                                    |  4 ++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 67 ++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  4 ++
 .../README                                    | 10 +++
 .../cmdlist                                   |  4 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 69 +++++++++++++++++++
 .../manager_status                            | 34 +++++++++
 .../rules_config                              |  3 +
 .../service_config                            |  5 ++
 .../README                                    |  9 +++
 .../cmdlist                                   |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 54 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  7 ++
 .../service_config                            |  4 ++
 .../README                                    |  7 +-
 .../log.expect                                | 16 +++--
 .../test-stale-maintenance-node/log.expect    |  3 +
 82 files changed, 922 insertions(+), 29 deletions(-)
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/README
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict1/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/README
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-nonstrict2/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict1/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict2/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict3/service_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/README
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/cmdlist
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/hardware_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/log.expect
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/manager_status
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/rules_config
 create mode 100644 src/test/test-node-affinity-maintenance-strict4/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative1/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-negative2/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive1/service_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/README
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/cmdlist
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/hardware_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/log.expect
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/manager_status
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/rules_config
 create mode 100644 src/test/test-resource-affinity-maintenance-strict-positive2/service_config
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/README
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/cmdlist
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/hardware_status
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/log.expect
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/manager_status
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/rules_config
 create mode 100644 src/test/test-resource-affinity-with-node-affinity-maintenance-strict-negative1/service_config

-- 
2.47.3





             reply	other threads:[~2026-04-22 10:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22 10:00 Daniel Kral [this message]
2026-04-22 10:00 ` [PATCH ha-manager 1/7] manager: warn if HA resources cannot be moved away from maintenance node Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 2/7] test: add test casses for node affinity rules with maintenance mode Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 3/7] test: add test cases for resource " Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 4/7] manager: make HA resources without failback move back to maintenance node Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 5/7] manager: make HA resource bundles " Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 6/7] make get_node_affinity return all priority classes sorted in descending order Daniel Kral
2026-04-22 10:00 ` [PATCH ha-manager 7/7] manager: try multiple priority classes when applying negative resource affinity Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260422100035.232716-1-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal