public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance
Date: Sun, 22 Mar 2026 00:42:49 +0100	[thread overview]
Message-ID: <20260321234350.2158438-1-t.lamprecht@proxmox.com> (raw)

The biggest change compared to v1 is how ignore mode handles the service
status: instead of clearing it entirely, the relevant parts of service
status are now preserved across the disarm/arm cycle. This allows
runtime state like maintenance_node to survive, so services correctly
migrate back to their original node after maintenance ends, even if the
disarm happened while maintenance was active. Thanks @Dominik R. for
noticing this.

To keep the preserved state clean, stale runtime data (failed_nodes,
cmd, target, ...) is pruned from service entries on disarm - both in
freeze and ignore mode - so the state machine starts fresh on re-arm.
The status API overrides the displayed service state to 'ignore' during
disarm-ignore mode, while the internal state stays untouched for
seamless resume.

On arm-ha from ignore mode, the CRM now rechecks the previous resource's
node against the resource service config, picking up any manual
migrations the admin performed while HA tracking was suspended.

First patch 1/4 is new and adds a manual-migrate simulator command as a
preparatory patch, since it is independently useful for testing the
per-service 'ignored' state handling.

Previous discussion and v1:
https://lore.proxmox.com/pve-devel/20260309220128.973793-1-t.lamprecht@proxmox.com/

TBD:
- some more in-depth (real-world) testing
- UI integration
- maybe some more polishing

changes v1 -> v2:
- ignore mode: preserve relevant service status instead of clearing it,
  recheck node info on arm-ha for manual migrations [Dominik]
- prune stale runtime data from service entries on disarm for both modes
- add 'protected => 1' to both API endpoints [Dominik]
- split out manual-migrate sim command as preparatory patch
- various style, log level, and test improvements (see per-patch
  changelogs for details)

Thomas Lamprecht (4):
  sim: hardware: add manual-migrate command for ignored services
  api: status: add fencing status entry with armed/standby state
  fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance
  api: status: add disarm-ha and arm-ha endpoints and CLI wiring

 src/PVE/API2/HA/Status.pm                     | 143 ++++++++++++-
 src/PVE/CLI/ha_manager.pm                     |   2 +
 src/PVE/HA/CRM.pm                             |  33 ++-
 src/PVE/HA/Config.pm                          |   5 +
 src/PVE/HA/LRM.pm                             |  31 ++-
 src/PVE/HA/Manager.pm                         | 197 ++++++++++++++++--
 src/PVE/HA/Sim/Hardware.pm                    |  36 ++++
 src/test/test-disarm-crm-stop1/README         |  13 ++
 src/test/test-disarm-crm-stop1/cmdlist        |   6 +
 .../test-disarm-crm-stop1/hardware_status     |   5 +
 src/test/test-disarm-crm-stop1/log.expect     |  66 ++++++
 src/test/test-disarm-crm-stop1/manager_status |   1 +
 src/test/test-disarm-crm-stop1/service_config |   5 +
 src/test/test-disarm-double1/cmdlist          |   7 +
 src/test/test-disarm-double1/hardware_status  |   5 +
 src/test/test-disarm-double1/log.expect       |  53 +++++
 src/test/test-disarm-double1/manager_status   |   1 +
 src/test/test-disarm-double1/service_config   |   4 +
 src/test/test-disarm-failing-service1/cmdlist |   6 +
 .../hardware_status                           |   5 +
 .../test-disarm-failing-service1/log.expect   | 125 +++++++++++
 .../manager_status                            |   1 +
 .../service_config                            |   4 +
 src/test/test-disarm-fence1/cmdlist           |   9 +
 src/test/test-disarm-fence1/hardware_status   |   5 +
 src/test/test-disarm-fence1/log.expect        |  78 +++++++
 src/test/test-disarm-fence1/manager_status    |   1 +
 src/test/test-disarm-fence1/service_config    |   5 +
 src/test/test-disarm-frozen1/README           |  10 +
 src/test/test-disarm-frozen1/cmdlist          |   5 +
 src/test/test-disarm-frozen1/hardware_status  |   5 +
 src/test/test-disarm-frozen1/log.expect       |  59 ++++++
 src/test/test-disarm-frozen1/manager_status   |   1 +
 src/test/test-disarm-frozen1/service_config   |   5 +
 src/test/test-disarm-ignored1/README          |  10 +
 src/test/test-disarm-ignored1/cmdlist         |   5 +
 src/test/test-disarm-ignored1/hardware_status |   5 +
 src/test/test-disarm-ignored1/log.expect      |  50 +++++
 src/test/test-disarm-ignored1/manager_status  |   1 +
 src/test/test-disarm-ignored1/service_config  |   5 +
 src/test/test-disarm-ignored2/cmdlist         |   6 +
 src/test/test-disarm-ignored2/hardware_status |   5 +
 src/test/test-disarm-ignored2/log.expect      |  60 ++++++
 src/test/test-disarm-ignored2/manager_status  |   1 +
 src/test/test-disarm-ignored2/service_config  |   5 +
 src/test/test-disarm-maintenance1/cmdlist     |   7 +
 .../test-disarm-maintenance1/hardware_status  |   5 +
 src/test/test-disarm-maintenance1/log.expect  |  79 +++++++
 .../test-disarm-maintenance1/manager_status   |   1 +
 .../test-disarm-maintenance1/service_config   |   5 +
 src/test/test-disarm-maintenance2/cmdlist     |   7 +
 .../test-disarm-maintenance2/hardware_status  |   5 +
 src/test/test-disarm-maintenance2/log.expect  |  78 +++++++
 .../test-disarm-maintenance2/manager_status   |   1 +
 .../test-disarm-maintenance2/service_config   |   5 +
 src/test/test-disarm-maintenance3/cmdlist     |   8 +
 .../test-disarm-maintenance3/hardware_status  |   5 +
 src/test/test-disarm-maintenance3/log.expect  |  80 +++++++
 .../test-disarm-maintenance3/manager_status   |   1 +
 .../test-disarm-maintenance3/service_config   |   5 +
 src/test/test-disarm-relocate1/README         |   3 +
 src/test/test-disarm-relocate1/cmdlist        |   7 +
 .../test-disarm-relocate1/hardware_status     |   5 +
 src/test/test-disarm-relocate1/log.expect     |  51 +++++
 src/test/test-disarm-relocate1/manager_status |   1 +
 src/test/test-disarm-relocate1/service_config |   4 +
 src/test/test-manual-migrate-ignored1/cmdlist |   7 +
 .../hardware_status                           |   5 +
 .../test-manual-migrate-ignored1/log.expect   |  44 ++++
 .../manager_status                            |   1 +
 .../service_config                            |   5 +
 71 files changed, 1481 insertions(+), 34 deletions(-)
 create mode 100644 src/test/test-disarm-crm-stop1/README
 create mode 100644 src/test/test-disarm-crm-stop1/cmdlist
 create mode 100644 src/test/test-disarm-crm-stop1/hardware_status
 create mode 100644 src/test/test-disarm-crm-stop1/log.expect
 create mode 100644 src/test/test-disarm-crm-stop1/manager_status
 create mode 100644 src/test/test-disarm-crm-stop1/service_config
 create mode 100644 src/test/test-disarm-double1/cmdlist
 create mode 100644 src/test/test-disarm-double1/hardware_status
 create mode 100644 src/test/test-disarm-double1/log.expect
 create mode 100644 src/test/test-disarm-double1/manager_status
 create mode 100644 src/test/test-disarm-double1/service_config
 create mode 100644 src/test/test-disarm-failing-service1/cmdlist
 create mode 100644 src/test/test-disarm-failing-service1/hardware_status
 create mode 100644 src/test/test-disarm-failing-service1/log.expect
 create mode 100644 src/test/test-disarm-failing-service1/manager_status
 create mode 100644 src/test/test-disarm-failing-service1/service_config
 create mode 100644 src/test/test-disarm-fence1/cmdlist
 create mode 100644 src/test/test-disarm-fence1/hardware_status
 create mode 100644 src/test/test-disarm-fence1/log.expect
 create mode 100644 src/test/test-disarm-fence1/manager_status
 create mode 100644 src/test/test-disarm-fence1/service_config
 create mode 100644 src/test/test-disarm-frozen1/README
 create mode 100644 src/test/test-disarm-frozen1/cmdlist
 create mode 100644 src/test/test-disarm-frozen1/hardware_status
 create mode 100644 src/test/test-disarm-frozen1/log.expect
 create mode 100644 src/test/test-disarm-frozen1/manager_status
 create mode 100644 src/test/test-disarm-frozen1/service_config
 create mode 100644 src/test/test-disarm-ignored1/README
 create mode 100644 src/test/test-disarm-ignored1/cmdlist
 create mode 100644 src/test/test-disarm-ignored1/hardware_status
 create mode 100644 src/test/test-disarm-ignored1/log.expect
 create mode 100644 src/test/test-disarm-ignored1/manager_status
 create mode 100644 src/test/test-disarm-ignored1/service_config
 create mode 100644 src/test/test-disarm-ignored2/cmdlist
 create mode 100644 src/test/test-disarm-ignored2/hardware_status
 create mode 100644 src/test/test-disarm-ignored2/log.expect
 create mode 100644 src/test/test-disarm-ignored2/manager_status
 create mode 100644 src/test/test-disarm-ignored2/service_config
 create mode 100644 src/test/test-disarm-maintenance1/cmdlist
 create mode 100644 src/test/test-disarm-maintenance1/hardware_status
 create mode 100644 src/test/test-disarm-maintenance1/log.expect
 create mode 100644 src/test/test-disarm-maintenance1/manager_status
 create mode 100644 src/test/test-disarm-maintenance1/service_config
 create mode 100644 src/test/test-disarm-maintenance2/cmdlist
 create mode 100644 src/test/test-disarm-maintenance2/hardware_status
 create mode 100644 src/test/test-disarm-maintenance2/log.expect
 create mode 100644 src/test/test-disarm-maintenance2/manager_status
 create mode 100644 src/test/test-disarm-maintenance2/service_config
 create mode 100644 src/test/test-disarm-maintenance3/cmdlist
 create mode 100644 src/test/test-disarm-maintenance3/hardware_status
 create mode 100644 src/test/test-disarm-maintenance3/log.expect
 create mode 100644 src/test/test-disarm-maintenance3/manager_status
 create mode 100644 src/test/test-disarm-maintenance3/service_config
 create mode 100644 src/test/test-disarm-relocate1/README
 create mode 100644 src/test/test-disarm-relocate1/cmdlist
 create mode 100644 src/test/test-disarm-relocate1/hardware_status
 create mode 100644 src/test/test-disarm-relocate1/log.expect
 create mode 100644 src/test/test-disarm-relocate1/manager_status
 create mode 100644 src/test/test-disarm-relocate1/service_config
 create mode 100644 src/test/test-manual-migrate-ignored1/cmdlist
 create mode 100644 src/test/test-manual-migrate-ignored1/hardware_status
 create mode 100644 src/test/test-manual-migrate-ignored1/log.expect
 create mode 100644 src/test/test-manual-migrate-ignored1/manager_status
 create mode 100644 src/test/test-manual-migrate-ignored1/service_config

-- 
2.47.3





             reply	other threads:[~2026-03-21 23:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-21 23:42 Thomas Lamprecht [this message]
2026-03-21 23:42 ` [PATCH ha-manager v2 1/4] sim: hardware: add manual-migrate command for ignored services Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 2/4] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
2026-03-21 23:42 ` [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance Thomas Lamprecht
2026-03-23 13:04   ` Dominik Rusovac
2026-03-25 15:50   ` Fiona Ebner
2026-03-27  1:17     ` Thomas Lamprecht
2026-03-26 16:02   ` Daniel Kral
2026-03-26 23:15     ` Thomas Lamprecht
2026-03-27 10:21       ` Daniel Kral
2026-03-21 23:42 ` [PATCH ha-manager v2 4/4] api: status: add disarm-ha and arm-ha endpoints and CLI wiring Thomas Lamprecht
2026-03-23 13:05 ` [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Dominik Rusovac
2026-03-25 12:06 ` applied: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260321234350.2158438-1-t.lamprecht@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal