From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance
Date: Sun, 22 Mar 2026 00:42:49 +0100 [thread overview]
Message-ID: <20260321234350.2158438-1-t.lamprecht@proxmox.com> (raw)
The biggest change compared to v1 is how ignore mode handles the service
status: instead of clearing it entirely, the relevant parts of service
status are now preserved across the disarm/arm cycle. This allows
runtime state like maintenance_node to survive, so services correctly
migrate back to their original node after maintenance ends, even if the
disarm happened while maintenance was active. Thanks @Dominik R. for
noticing this.
To keep the preserved state clean, stale runtime data (failed_nodes,
cmd, target, ...) is pruned from service entries on disarm - both in
freeze and ignore mode - so the state machine starts fresh on re-arm.
The status API overrides the displayed service state to 'ignore' during
disarm-ignore mode, while the internal state stays untouched for
seamless resume.
On arm-ha from ignore mode, the CRM now rechecks the previous resource's
node against the resource service config, picking up any manual
migrations the admin performed while HA tracking was suspended.
First patch 1/4 is new and adds a manual-migrate simulator command as a
preparatory patch, since it is independently useful for testing the
per-service 'ignored' state handling.
Previous discussion and v1:
https://lore.proxmox.com/pve-devel/20260309220128.973793-1-t.lamprecht@proxmox.com/
TBD:
- some more in-depth (real-world) testing
- UI integration
- maybe some more polishing
changes v1 -> v2:
- ignore mode: preserve relevant service status instead of clearing it,
recheck node info on arm-ha for manual migrations [Dominik]
- prune stale runtime data from service entries on disarm for both modes
- add 'protected => 1' to both API endpoints [Dominik]
- split out manual-migrate sim command as preparatory patch
- various style, log level, and test improvements (see per-patch
changelogs for details)
Thomas Lamprecht (4):
sim: hardware: add manual-migrate command for ignored services
api: status: add fencing status entry with armed/standby state
fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance
api: status: add disarm-ha and arm-ha endpoints and CLI wiring
src/PVE/API2/HA/Status.pm | 143 ++++++++++++-
src/PVE/CLI/ha_manager.pm | 2 +
src/PVE/HA/CRM.pm | 33 ++-
src/PVE/HA/Config.pm | 5 +
src/PVE/HA/LRM.pm | 31 ++-
src/PVE/HA/Manager.pm | 197 ++++++++++++++++--
src/PVE/HA/Sim/Hardware.pm | 36 ++++
src/test/test-disarm-crm-stop1/README | 13 ++
src/test/test-disarm-crm-stop1/cmdlist | 6 +
.../test-disarm-crm-stop1/hardware_status | 5 +
src/test/test-disarm-crm-stop1/log.expect | 66 ++++++
src/test/test-disarm-crm-stop1/manager_status | 1 +
src/test/test-disarm-crm-stop1/service_config | 5 +
src/test/test-disarm-double1/cmdlist | 7 +
src/test/test-disarm-double1/hardware_status | 5 +
src/test/test-disarm-double1/log.expect | 53 +++++
src/test/test-disarm-double1/manager_status | 1 +
src/test/test-disarm-double1/service_config | 4 +
src/test/test-disarm-failing-service1/cmdlist | 6 +
.../hardware_status | 5 +
.../test-disarm-failing-service1/log.expect | 125 +++++++++++
.../manager_status | 1 +
.../service_config | 4 +
src/test/test-disarm-fence1/cmdlist | 9 +
src/test/test-disarm-fence1/hardware_status | 5 +
src/test/test-disarm-fence1/log.expect | 78 +++++++
src/test/test-disarm-fence1/manager_status | 1 +
src/test/test-disarm-fence1/service_config | 5 +
src/test/test-disarm-frozen1/README | 10 +
src/test/test-disarm-frozen1/cmdlist | 5 +
src/test/test-disarm-frozen1/hardware_status | 5 +
src/test/test-disarm-frozen1/log.expect | 59 ++++++
src/test/test-disarm-frozen1/manager_status | 1 +
src/test/test-disarm-frozen1/service_config | 5 +
src/test/test-disarm-ignored1/README | 10 +
src/test/test-disarm-ignored1/cmdlist | 5 +
src/test/test-disarm-ignored1/hardware_status | 5 +
src/test/test-disarm-ignored1/log.expect | 50 +++++
src/test/test-disarm-ignored1/manager_status | 1 +
src/test/test-disarm-ignored1/service_config | 5 +
src/test/test-disarm-ignored2/cmdlist | 6 +
src/test/test-disarm-ignored2/hardware_status | 5 +
src/test/test-disarm-ignored2/log.expect | 60 ++++++
src/test/test-disarm-ignored2/manager_status | 1 +
src/test/test-disarm-ignored2/service_config | 5 +
src/test/test-disarm-maintenance1/cmdlist | 7 +
.../test-disarm-maintenance1/hardware_status | 5 +
src/test/test-disarm-maintenance1/log.expect | 79 +++++++
.../test-disarm-maintenance1/manager_status | 1 +
.../test-disarm-maintenance1/service_config | 5 +
src/test/test-disarm-maintenance2/cmdlist | 7 +
.../test-disarm-maintenance2/hardware_status | 5 +
src/test/test-disarm-maintenance2/log.expect | 78 +++++++
.../test-disarm-maintenance2/manager_status | 1 +
.../test-disarm-maintenance2/service_config | 5 +
src/test/test-disarm-maintenance3/cmdlist | 8 +
.../test-disarm-maintenance3/hardware_status | 5 +
src/test/test-disarm-maintenance3/log.expect | 80 +++++++
.../test-disarm-maintenance3/manager_status | 1 +
.../test-disarm-maintenance3/service_config | 5 +
src/test/test-disarm-relocate1/README | 3 +
src/test/test-disarm-relocate1/cmdlist | 7 +
.../test-disarm-relocate1/hardware_status | 5 +
src/test/test-disarm-relocate1/log.expect | 51 +++++
src/test/test-disarm-relocate1/manager_status | 1 +
src/test/test-disarm-relocate1/service_config | 4 +
src/test/test-manual-migrate-ignored1/cmdlist | 7 +
.../hardware_status | 5 +
.../test-manual-migrate-ignored1/log.expect | 44 ++++
.../manager_status | 1 +
.../service_config | 5 +
71 files changed, 1481 insertions(+), 34 deletions(-)
create mode 100644 src/test/test-disarm-crm-stop1/README
create mode 100644 src/test/test-disarm-crm-stop1/cmdlist
create mode 100644 src/test/test-disarm-crm-stop1/hardware_status
create mode 100644 src/test/test-disarm-crm-stop1/log.expect
create mode 100644 src/test/test-disarm-crm-stop1/manager_status
create mode 100644 src/test/test-disarm-crm-stop1/service_config
create mode 100644 src/test/test-disarm-double1/cmdlist
create mode 100644 src/test/test-disarm-double1/hardware_status
create mode 100644 src/test/test-disarm-double1/log.expect
create mode 100644 src/test/test-disarm-double1/manager_status
create mode 100644 src/test/test-disarm-double1/service_config
create mode 100644 src/test/test-disarm-failing-service1/cmdlist
create mode 100644 src/test/test-disarm-failing-service1/hardware_status
create mode 100644 src/test/test-disarm-failing-service1/log.expect
create mode 100644 src/test/test-disarm-failing-service1/manager_status
create mode 100644 src/test/test-disarm-failing-service1/service_config
create mode 100644 src/test/test-disarm-fence1/cmdlist
create mode 100644 src/test/test-disarm-fence1/hardware_status
create mode 100644 src/test/test-disarm-fence1/log.expect
create mode 100644 src/test/test-disarm-fence1/manager_status
create mode 100644 src/test/test-disarm-fence1/service_config
create mode 100644 src/test/test-disarm-frozen1/README
create mode 100644 src/test/test-disarm-frozen1/cmdlist
create mode 100644 src/test/test-disarm-frozen1/hardware_status
create mode 100644 src/test/test-disarm-frozen1/log.expect
create mode 100644 src/test/test-disarm-frozen1/manager_status
create mode 100644 src/test/test-disarm-frozen1/service_config
create mode 100644 src/test/test-disarm-ignored1/README
create mode 100644 src/test/test-disarm-ignored1/cmdlist
create mode 100644 src/test/test-disarm-ignored1/hardware_status
create mode 100644 src/test/test-disarm-ignored1/log.expect
create mode 100644 src/test/test-disarm-ignored1/manager_status
create mode 100644 src/test/test-disarm-ignored1/service_config
create mode 100644 src/test/test-disarm-ignored2/cmdlist
create mode 100644 src/test/test-disarm-ignored2/hardware_status
create mode 100644 src/test/test-disarm-ignored2/log.expect
create mode 100644 src/test/test-disarm-ignored2/manager_status
create mode 100644 src/test/test-disarm-ignored2/service_config
create mode 100644 src/test/test-disarm-maintenance1/cmdlist
create mode 100644 src/test/test-disarm-maintenance1/hardware_status
create mode 100644 src/test/test-disarm-maintenance1/log.expect
create mode 100644 src/test/test-disarm-maintenance1/manager_status
create mode 100644 src/test/test-disarm-maintenance1/service_config
create mode 100644 src/test/test-disarm-maintenance2/cmdlist
create mode 100644 src/test/test-disarm-maintenance2/hardware_status
create mode 100644 src/test/test-disarm-maintenance2/log.expect
create mode 100644 src/test/test-disarm-maintenance2/manager_status
create mode 100644 src/test/test-disarm-maintenance2/service_config
create mode 100644 src/test/test-disarm-maintenance3/cmdlist
create mode 100644 src/test/test-disarm-maintenance3/hardware_status
create mode 100644 src/test/test-disarm-maintenance3/log.expect
create mode 100644 src/test/test-disarm-maintenance3/manager_status
create mode 100644 src/test/test-disarm-maintenance3/service_config
create mode 100644 src/test/test-disarm-relocate1/README
create mode 100644 src/test/test-disarm-relocate1/cmdlist
create mode 100644 src/test/test-disarm-relocate1/hardware_status
create mode 100644 src/test/test-disarm-relocate1/log.expect
create mode 100644 src/test/test-disarm-relocate1/manager_status
create mode 100644 src/test/test-disarm-relocate1/service_config
create mode 100644 src/test/test-manual-migrate-ignored1/cmdlist
create mode 100644 src/test/test-manual-migrate-ignored1/hardware_status
create mode 100644 src/test/test-manual-migrate-ignored1/log.expect
create mode 100644 src/test/test-manual-migrate-ignored1/manager_status
create mode 100644 src/test/test-manual-migrate-ignored1/service_config
--
2.47.3
next reply other threads:[~2026-03-21 23:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-21 23:42 Thomas Lamprecht [this message]
2026-03-21 23:42 ` [PATCH ha-manager v2 1/4] sim: hardware: add manual-migrate command for ignored services Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 2/4] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance Thomas Lamprecht
2026-03-23 13:04 ` Dominik Rusovac
2026-03-25 15:50 ` Fiona Ebner
2026-03-27 1:17 ` Thomas Lamprecht
2026-03-26 16:02 ` Daniel Kral
2026-03-26 23:15 ` Thomas Lamprecht
2026-03-27 10:21 ` Daniel Kral
2026-03-21 23:42 ` [PATCH ha-manager v2 4/4] api: status: add disarm-ha and arm-ha endpoints and CLI wiring Thomas Lamprecht
2026-03-23 13:05 ` [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Dominik Rusovac
2026-03-25 12:06 ` applied: " Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260321234350.2158438-1-t.lamprecht@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.