all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance
@ 2026-03-21 23:42 Thomas Lamprecht
  2026-03-21 23:42 ` [PATCH ha-manager v2 1/4] sim: hardware: add manual-migrate command for ignored services Thomas Lamprecht
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Thomas Lamprecht @ 2026-03-21 23:42 UTC (permalink / raw)
  To: pve-devel

The biggest change compared to v1 is how ignore mode handles the service
status: instead of clearing it entirely, the relevant parts of service
status are now preserved across the disarm/arm cycle. This allows
runtime state like maintenance_node to survive, so services correctly
migrate back to their original node after maintenance ends, even if the
disarm happened while maintenance was active. Thanks @Dominik R. for
noticing this.

To keep the preserved state clean, stale runtime data (failed_nodes,
cmd, target, ...) is pruned from service entries on disarm - both in
freeze and ignore mode - so the state machine starts fresh on re-arm.
The status API overrides the displayed service state to 'ignore' during
disarm-ignore mode, while the internal state stays untouched for
seamless resume.

On arm-ha from ignore mode, the CRM now rechecks the previous resource's
node against the resource service config, picking up any manual
migrations the admin performed while HA tracking was suspended.

First patch 1/4 is new and adds a manual-migrate simulator command as a
preparatory patch, since it is independently useful for testing the
per-service 'ignored' state handling.

Previous discussion and v1:
https://lore.proxmox.com/pve-devel/20260309220128.973793-1-t.lamprecht@proxmox.com/

TBD:
- some more in-depth (real-world) testing
- UI integration
- maybe some more polishing

changes v1 -> v2:
- ignore mode: preserve relevant service status instead of clearing it,
  recheck node info on arm-ha for manual migrations [Dominik]
- prune stale runtime data from service entries on disarm for both modes
- add 'protected => 1' to both API endpoints [Dominik]
- split out manual-migrate sim command as preparatory patch
- various style, log level, and test improvements (see per-patch
  changelogs for details)

Thomas Lamprecht (4):
  sim: hardware: add manual-migrate command for ignored services
  api: status: add fencing status entry with armed/standby state
  fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance
  api: status: add disarm-ha and arm-ha endpoints and CLI wiring

 src/PVE/API2/HA/Status.pm                     | 143 ++++++++++++-
 src/PVE/CLI/ha_manager.pm                     |   2 +
 src/PVE/HA/CRM.pm                             |  33 ++-
 src/PVE/HA/Config.pm                          |   5 +
 src/PVE/HA/LRM.pm                             |  31 ++-
 src/PVE/HA/Manager.pm                         | 197 ++++++++++++++++--
 src/PVE/HA/Sim/Hardware.pm                    |  36 ++++
 src/test/test-disarm-crm-stop1/README         |  13 ++
 src/test/test-disarm-crm-stop1/cmdlist        |   6 +
 .../test-disarm-crm-stop1/hardware_status     |   5 +
 src/test/test-disarm-crm-stop1/log.expect     |  66 ++++++
 src/test/test-disarm-crm-stop1/manager_status |   1 +
 src/test/test-disarm-crm-stop1/service_config |   5 +
 src/test/test-disarm-double1/cmdlist          |   7 +
 src/test/test-disarm-double1/hardware_status  |   5 +
 src/test/test-disarm-double1/log.expect       |  53 +++++
 src/test/test-disarm-double1/manager_status   |   1 +
 src/test/test-disarm-double1/service_config   |   4 +
 src/test/test-disarm-failing-service1/cmdlist |   6 +
 .../hardware_status                           |   5 +
 .../test-disarm-failing-service1/log.expect   | 125 +++++++++++
 .../manager_status                            |   1 +
 .../service_config                            |   4 +
 src/test/test-disarm-fence1/cmdlist           |   9 +
 src/test/test-disarm-fence1/hardware_status   |   5 +
 src/test/test-disarm-fence1/log.expect        |  78 +++++++
 src/test/test-disarm-fence1/manager_status    |   1 +
 src/test/test-disarm-fence1/service_config    |   5 +
 src/test/test-disarm-frozen1/README           |  10 +
 src/test/test-disarm-frozen1/cmdlist          |   5 +
 src/test/test-disarm-frozen1/hardware_status  |   5 +
 src/test/test-disarm-frozen1/log.expect       |  59 ++++++
 src/test/test-disarm-frozen1/manager_status   |   1 +
 src/test/test-disarm-frozen1/service_config   |   5 +
 src/test/test-disarm-ignored1/README          |  10 +
 src/test/test-disarm-ignored1/cmdlist         |   5 +
 src/test/test-disarm-ignored1/hardware_status |   5 +
 src/test/test-disarm-ignored1/log.expect      |  50 +++++
 src/test/test-disarm-ignored1/manager_status  |   1 +
 src/test/test-disarm-ignored1/service_config  |   5 +
 src/test/test-disarm-ignored2/cmdlist         |   6 +
 src/test/test-disarm-ignored2/hardware_status |   5 +
 src/test/test-disarm-ignored2/log.expect      |  60 ++++++
 src/test/test-disarm-ignored2/manager_status  |   1 +
 src/test/test-disarm-ignored2/service_config  |   5 +
 src/test/test-disarm-maintenance1/cmdlist     |   7 +
 .../test-disarm-maintenance1/hardware_status  |   5 +
 src/test/test-disarm-maintenance1/log.expect  |  79 +++++++
 .../test-disarm-maintenance1/manager_status   |   1 +
 .../test-disarm-maintenance1/service_config   |   5 +
 src/test/test-disarm-maintenance2/cmdlist     |   7 +
 .../test-disarm-maintenance2/hardware_status  |   5 +
 src/test/test-disarm-maintenance2/log.expect  |  78 +++++++
 .../test-disarm-maintenance2/manager_status   |   1 +
 .../test-disarm-maintenance2/service_config   |   5 +
 src/test/test-disarm-maintenance3/cmdlist     |   8 +
 .../test-disarm-maintenance3/hardware_status  |   5 +
 src/test/test-disarm-maintenance3/log.expect  |  80 +++++++
 .../test-disarm-maintenance3/manager_status   |   1 +
 .../test-disarm-maintenance3/service_config   |   5 +
 src/test/test-disarm-relocate1/README         |   3 +
 src/test/test-disarm-relocate1/cmdlist        |   7 +
 .../test-disarm-relocate1/hardware_status     |   5 +
 src/test/test-disarm-relocate1/log.expect     |  51 +++++
 src/test/test-disarm-relocate1/manager_status |   1 +
 src/test/test-disarm-relocate1/service_config |   4 +
 src/test/test-manual-migrate-ignored1/cmdlist |   7 +
 .../hardware_status                           |   5 +
 .../test-manual-migrate-ignored1/log.expect   |  44 ++++
 .../manager_status                            |   1 +
 .../service_config                            |   5 +
 71 files changed, 1481 insertions(+), 34 deletions(-)
 create mode 100644 src/test/test-disarm-crm-stop1/README
 create mode 100644 src/test/test-disarm-crm-stop1/cmdlist
 create mode 100644 src/test/test-disarm-crm-stop1/hardware_status
 create mode 100644 src/test/test-disarm-crm-stop1/log.expect
 create mode 100644 src/test/test-disarm-crm-stop1/manager_status
 create mode 100644 src/test/test-disarm-crm-stop1/service_config
 create mode 100644 src/test/test-disarm-double1/cmdlist
 create mode 100644 src/test/test-disarm-double1/hardware_status
 create mode 100644 src/test/test-disarm-double1/log.expect
 create mode 100644 src/test/test-disarm-double1/manager_status
 create mode 100644 src/test/test-disarm-double1/service_config
 create mode 100644 src/test/test-disarm-failing-service1/cmdlist
 create mode 100644 src/test/test-disarm-failing-service1/hardware_status
 create mode 100644 src/test/test-disarm-failing-service1/log.expect
 create mode 100644 src/test/test-disarm-failing-service1/manager_status
 create mode 100644 src/test/test-disarm-failing-service1/service_config
 create mode 100644 src/test/test-disarm-fence1/cmdlist
 create mode 100644 src/test/test-disarm-fence1/hardware_status
 create mode 100644 src/test/test-disarm-fence1/log.expect
 create mode 100644 src/test/test-disarm-fence1/manager_status
 create mode 100644 src/test/test-disarm-fence1/service_config
 create mode 100644 src/test/test-disarm-frozen1/README
 create mode 100644 src/test/test-disarm-frozen1/cmdlist
 create mode 100644 src/test/test-disarm-frozen1/hardware_status
 create mode 100644 src/test/test-disarm-frozen1/log.expect
 create mode 100644 src/test/test-disarm-frozen1/manager_status
 create mode 100644 src/test/test-disarm-frozen1/service_config
 create mode 100644 src/test/test-disarm-ignored1/README
 create mode 100644 src/test/test-disarm-ignored1/cmdlist
 create mode 100644 src/test/test-disarm-ignored1/hardware_status
 create mode 100644 src/test/test-disarm-ignored1/log.expect
 create mode 100644 src/test/test-disarm-ignored1/manager_status
 create mode 100644 src/test/test-disarm-ignored1/service_config
 create mode 100644 src/test/test-disarm-ignored2/cmdlist
 create mode 100644 src/test/test-disarm-ignored2/hardware_status
 create mode 100644 src/test/test-disarm-ignored2/log.expect
 create mode 100644 src/test/test-disarm-ignored2/manager_status
 create mode 100644 src/test/test-disarm-ignored2/service_config
 create mode 100644 src/test/test-disarm-maintenance1/cmdlist
 create mode 100644 src/test/test-disarm-maintenance1/hardware_status
 create mode 100644 src/test/test-disarm-maintenance1/log.expect
 create mode 100644 src/test/test-disarm-maintenance1/manager_status
 create mode 100644 src/test/test-disarm-maintenance1/service_config
 create mode 100644 src/test/test-disarm-maintenance2/cmdlist
 create mode 100644 src/test/test-disarm-maintenance2/hardware_status
 create mode 100644 src/test/test-disarm-maintenance2/log.expect
 create mode 100644 src/test/test-disarm-maintenance2/manager_status
 create mode 100644 src/test/test-disarm-maintenance2/service_config
 create mode 100644 src/test/test-disarm-maintenance3/cmdlist
 create mode 100644 src/test/test-disarm-maintenance3/hardware_status
 create mode 100644 src/test/test-disarm-maintenance3/log.expect
 create mode 100644 src/test/test-disarm-maintenance3/manager_status
 create mode 100644 src/test/test-disarm-maintenance3/service_config
 create mode 100644 src/test/test-disarm-relocate1/README
 create mode 100644 src/test/test-disarm-relocate1/cmdlist
 create mode 100644 src/test/test-disarm-relocate1/hardware_status
 create mode 100644 src/test/test-disarm-relocate1/log.expect
 create mode 100644 src/test/test-disarm-relocate1/manager_status
 create mode 100644 src/test/test-disarm-relocate1/service_config
 create mode 100644 src/test/test-manual-migrate-ignored1/cmdlist
 create mode 100644 src/test/test-manual-migrate-ignored1/hardware_status
 create mode 100644 src/test/test-manual-migrate-ignored1/log.expect
 create mode 100644 src/test/test-manual-migrate-ignored1/manager_status
 create mode 100644 src/test/test-manual-migrate-ignored1/service_config

-- 
2.47.3





^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-27 10:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-21 23:42 [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 1/4] sim: hardware: add manual-migrate command for ignored services Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 2/4] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance Thomas Lamprecht
2026-03-23 13:04   ` Dominik Rusovac
2026-03-25 15:50   ` Fiona Ebner
2026-03-27  1:17     ` Thomas Lamprecht
2026-03-26 16:02   ` Daniel Kral
2026-03-26 23:15     ` Thomas Lamprecht
2026-03-27 10:21       ` Daniel Kral
2026-03-21 23:42 ` [PATCH ha-manager v2 4/4] api: status: add disarm-ha and arm-ha endpoints and CLI wiring Thomas Lamprecht
2026-03-23 13:05 ` [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Dominik Rusovac
2026-03-25 12:06 ` applied: " Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal