public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH ha-manager 0/3] fix #2751: implement disarm/arm HA for safer cluster maintenance
@ 2026-03-09 21:57 Thomas Lamprecht
  2026-03-09 21:57 ` [PATCH ha-manager 1/3] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2026-03-09 21:57 UTC (permalink / raw)
  To: pve-devel

Implements a new pair of crm-commands to disarm or arm the HA stack and
also integrates the fencing state in the HA status API endpoint.

Note that the underlying /dev/watchdog will still be kept open by
watchdog-mux in any case, not every watchdog supports graceful closing
it via the magic "V" byte, i.e., some reset in such cases, and we also
want to ensure nobody else can grab the watchdog device. But
watchdog-mux hanging up is basically impossible for any system, it does
zero disk IO and is very trivial in runtime needs.

Dominik R. wanted to pick this up recently and as I started with a WIP
already years ago before some PVE release where then finishing got lost
in time, I now dug that out, polished it and tested a bit around. But
would be still good to recheck the approach and naturally also the whole
implementation.

btw. I also did some polishing of the watchdog-mux, mostly to make its
behavior clearer, which I noticed when testing these patches. Basically,
I now log when clients connect or gracefully disconnect and when all
clients got disconnected. Further, I now query the PID of the client
connecting and save that and include it in log messages, which allows
tracing issues more easily back to the respective HA daemon. Those
patches got applied already but mentioning for awareness.

TBD:
- more in-depth (real-world!) testing
- UI integration
- docs (got something started here, but can be finished once this is
  finalized)
- maybe some more polishing
- ...?

Should I become unresponsive here, feel free to take this over
(fine to either build/fix on top or become co-author), I just wanted to
get out what I have to avoid that this nice UX feature misses yet
another release.

Thomas Lamprecht (3):
  api: status: add fencing status entry with armed/standby state
  fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance
  api: status: add disarm-ha and arm-ha endpoints and CLI wiring

 src/PVE/API2/HA/Status.pm                     | 142 +++++++++++++++++-
 src/PVE/CLI/ha_manager.pm                     |   2 +
 src/PVE/HA/CRM.pm                             |  33 +++-
 src/PVE/HA/Config.pm                          |   5 +
 src/PVE/HA/LRM.pm                             |  31 +++-
 src/PVE/HA/Manager.pm                         | 124 ++++++++++++++-
 src/PVE/HA/Sim/Hardware.pm                    |   4 +
 src/test/test-disarm-crm-stop1/README         |  13 ++
 src/test/test-disarm-crm-stop1/cmdlist        |   6 +
 .../test-disarm-crm-stop1/hardware_status     |   5 +
 src/test/test-disarm-crm-stop1/log.expect     |  66 ++++++++
 src/test/test-disarm-crm-stop1/manager_status |   1 +
 src/test/test-disarm-crm-stop1/service_config |   5 +
 src/test/test-disarm-fence1/cmdlist           |   9 ++
 src/test/test-disarm-fence1/hardware_status   |   5 +
 src/test/test-disarm-fence1/log.expect        |  78 ++++++++++
 src/test/test-disarm-fence1/manager_status    |   1 +
 src/test/test-disarm-fence1/service_config    |   5 +
 src/test/test-disarm-frozen1/README           |  10 ++
 src/test/test-disarm-frozen1/cmdlist          |   5 +
 src/test/test-disarm-frozen1/hardware_status  |   5 +
 src/test/test-disarm-frozen1/log.expect       |  59 ++++++++
 src/test/test-disarm-frozen1/manager_status   |   1 +
 src/test/test-disarm-frozen1/service_config   |   5 +
 src/test/test-disarm-ignored1/README          |  10 ++
 src/test/test-disarm-ignored1/cmdlist         |   5 +
 src/test/test-disarm-ignored1/hardware_status |   5 +
 src/test/test-disarm-ignored1/log.expect      |  60 ++++++++
 src/test/test-disarm-ignored1/manager_status  |   1 +
 src/test/test-disarm-ignored1/service_config  |   5 +
 src/test/test-disarm-maintenance1/cmdlist     |   7 +
 .../test-disarm-maintenance1/hardware_status  |   5 +
 src/test/test-disarm-maintenance1/log.expect  |  79 ++++++++++
 .../test-disarm-maintenance1/manager_status   |   1 +
 .../test-disarm-maintenance1/service_config   |   5 +
 src/test/test-disarm-relocate1/README         |   3 +
 src/test/test-disarm-relocate1/cmdlist        |   7 +
 .../test-disarm-relocate1/hardware_status     |   5 +
 src/test/test-disarm-relocate1/log.expect     |  51 +++++++
 src/test/test-disarm-relocate1/manager_status |   1 +
 src/test/test-disarm-relocate1/service_config |   4 +
 41 files changed, 861 insertions(+), 13 deletions(-)
 create mode 100644 src/test/test-disarm-crm-stop1/README
 create mode 100644 src/test/test-disarm-crm-stop1/cmdlist
 create mode 100644 src/test/test-disarm-crm-stop1/hardware_status
 create mode 100644 src/test/test-disarm-crm-stop1/log.expect
 create mode 100644 src/test/test-disarm-crm-stop1/manager_status
 create mode 100644 src/test/test-disarm-crm-stop1/service_config
 create mode 100644 src/test/test-disarm-fence1/cmdlist
 create mode 100644 src/test/test-disarm-fence1/hardware_status
 create mode 100644 src/test/test-disarm-fence1/log.expect
 create mode 100644 src/test/test-disarm-fence1/manager_status
 create mode 100644 src/test/test-disarm-fence1/service_config
 create mode 100644 src/test/test-disarm-frozen1/README
 create mode 100644 src/test/test-disarm-frozen1/cmdlist
 create mode 100644 src/test/test-disarm-frozen1/hardware_status
 create mode 100644 src/test/test-disarm-frozen1/log.expect
 create mode 100644 src/test/test-disarm-frozen1/manager_status
 create mode 100644 src/test/test-disarm-frozen1/service_config
 create mode 100644 src/test/test-disarm-ignored1/README
 create mode 100644 src/test/test-disarm-ignored1/cmdlist
 create mode 100644 src/test/test-disarm-ignored1/hardware_status
 create mode 100644 src/test/test-disarm-ignored1/log.expect
 create mode 100644 src/test/test-disarm-ignored1/manager_status
 create mode 100644 src/test/test-disarm-ignored1/service_config
 create mode 100644 src/test/test-disarm-maintenance1/cmdlist
 create mode 100644 src/test/test-disarm-maintenance1/hardware_status
 create mode 100644 src/test/test-disarm-maintenance1/log.expect
 create mode 100644 src/test/test-disarm-maintenance1/manager_status
 create mode 100644 src/test/test-disarm-maintenance1/service_config
 create mode 100644 src/test/test-disarm-relocate1/README
 create mode 100644 src/test/test-disarm-relocate1/cmdlist
 create mode 100644 src/test/test-disarm-relocate1/hardware_status
 create mode 100644 src/test/test-disarm-relocate1/log.expect
 create mode 100644 src/test/test-disarm-relocate1/manager_status
 create mode 100644 src/test/test-disarm-relocate1/service_config

-- 
2.47.3





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-09 22:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-09 21:57 [PATCH ha-manager 0/3] fix #2751: implement disarm/arm HA for safer cluster maintenance Thomas Lamprecht
2026-03-09 21:57 ` [PATCH ha-manager 1/3] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
2026-03-09 21:57 ` [PATCH ha-manager 2/3] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance Thomas Lamprecht
2026-03-09 21:57 ` [PATCH ha-manager 3/3] api: status: add disarm-ha and arm-ha endpoints and CLI wiring Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal