From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager 0/3] fix #2751: implement disarm/arm HA for safer cluster maintenance
Date: Mon, 9 Mar 2026 22:57:07 +0100 [thread overview]
Message-ID: <20260309220128.973793-1-t.lamprecht@proxmox.com> (raw)
Implements a new pair of crm-commands to disarm or arm the HA stack and
also integrates the fencing state in the HA status API endpoint.
Note that the underlying /dev/watchdog will still be kept open by
watchdog-mux in any case, not every watchdog supports graceful closing
it via the magic "V" byte, i.e., some reset in such cases, and we also
want to ensure nobody else can grab the watchdog device. But
watchdog-mux hanging up is basically impossible for any system, it does
zero disk IO and is very trivial in runtime needs.
Dominik R. wanted to pick this up recently and as I started with a WIP
already years ago before some PVE release where then finishing got lost
in time, I now dug that out, polished it and tested a bit around. But
would be still good to recheck the approach and naturally also the whole
implementation.
btw. I also did some polishing of the watchdog-mux, mostly to make its
behavior clearer, which I noticed when testing these patches. Basically,
I now log when clients connect or gracefully disconnect and when all
clients got disconnected. Further, I now query the PID of the client
connecting and save that and include it in log messages, which allows
tracing issues more easily back to the respective HA daemon. Those
patches got applied already but mentioning for awareness.
TBD:
- more in-depth (real-world!) testing
- UI integration
- docs (got something started here, but can be finished once this is
finalized)
- maybe some more polishing
- ...?
Should I become unresponsive here, feel free to take this over
(fine to either build/fix on top or become co-author), I just wanted to
get out what I have to avoid that this nice UX feature misses yet
another release.
Thomas Lamprecht (3):
api: status: add fencing status entry with armed/standby state
fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance
api: status: add disarm-ha and arm-ha endpoints and CLI wiring
src/PVE/API2/HA/Status.pm | 142 +++++++++++++++++-
src/PVE/CLI/ha_manager.pm | 2 +
src/PVE/HA/CRM.pm | 33 +++-
src/PVE/HA/Config.pm | 5 +
src/PVE/HA/LRM.pm | 31 +++-
src/PVE/HA/Manager.pm | 124 ++++++++++++++-
src/PVE/HA/Sim/Hardware.pm | 4 +
src/test/test-disarm-crm-stop1/README | 13 ++
src/test/test-disarm-crm-stop1/cmdlist | 6 +
.../test-disarm-crm-stop1/hardware_status | 5 +
src/test/test-disarm-crm-stop1/log.expect | 66 ++++++++
src/test/test-disarm-crm-stop1/manager_status | 1 +
src/test/test-disarm-crm-stop1/service_config | 5 +
src/test/test-disarm-fence1/cmdlist | 9 ++
src/test/test-disarm-fence1/hardware_status | 5 +
src/test/test-disarm-fence1/log.expect | 78 ++++++++++
src/test/test-disarm-fence1/manager_status | 1 +
src/test/test-disarm-fence1/service_config | 5 +
src/test/test-disarm-frozen1/README | 10 ++
src/test/test-disarm-frozen1/cmdlist | 5 +
src/test/test-disarm-frozen1/hardware_status | 5 +
src/test/test-disarm-frozen1/log.expect | 59 ++++++++
src/test/test-disarm-frozen1/manager_status | 1 +
src/test/test-disarm-frozen1/service_config | 5 +
src/test/test-disarm-ignored1/README | 10 ++
src/test/test-disarm-ignored1/cmdlist | 5 +
src/test/test-disarm-ignored1/hardware_status | 5 +
src/test/test-disarm-ignored1/log.expect | 60 ++++++++
src/test/test-disarm-ignored1/manager_status | 1 +
src/test/test-disarm-ignored1/service_config | 5 +
src/test/test-disarm-maintenance1/cmdlist | 7 +
.../test-disarm-maintenance1/hardware_status | 5 +
src/test/test-disarm-maintenance1/log.expect | 79 ++++++++++
.../test-disarm-maintenance1/manager_status | 1 +
.../test-disarm-maintenance1/service_config | 5 +
src/test/test-disarm-relocate1/README | 3 +
src/test/test-disarm-relocate1/cmdlist | 7 +
.../test-disarm-relocate1/hardware_status | 5 +
src/test/test-disarm-relocate1/log.expect | 51 +++++++
src/test/test-disarm-relocate1/manager_status | 1 +
src/test/test-disarm-relocate1/service_config | 4 +
41 files changed, 861 insertions(+), 13 deletions(-)
create mode 100644 src/test/test-disarm-crm-stop1/README
create mode 100644 src/test/test-disarm-crm-stop1/cmdlist
create mode 100644 src/test/test-disarm-crm-stop1/hardware_status
create mode 100644 src/test/test-disarm-crm-stop1/log.expect
create mode 100644 src/test/test-disarm-crm-stop1/manager_status
create mode 100644 src/test/test-disarm-crm-stop1/service_config
create mode 100644 src/test/test-disarm-fence1/cmdlist
create mode 100644 src/test/test-disarm-fence1/hardware_status
create mode 100644 src/test/test-disarm-fence1/log.expect
create mode 100644 src/test/test-disarm-fence1/manager_status
create mode 100644 src/test/test-disarm-fence1/service_config
create mode 100644 src/test/test-disarm-frozen1/README
create mode 100644 src/test/test-disarm-frozen1/cmdlist
create mode 100644 src/test/test-disarm-frozen1/hardware_status
create mode 100644 src/test/test-disarm-frozen1/log.expect
create mode 100644 src/test/test-disarm-frozen1/manager_status
create mode 100644 src/test/test-disarm-frozen1/service_config
create mode 100644 src/test/test-disarm-ignored1/README
create mode 100644 src/test/test-disarm-ignored1/cmdlist
create mode 100644 src/test/test-disarm-ignored1/hardware_status
create mode 100644 src/test/test-disarm-ignored1/log.expect
create mode 100644 src/test/test-disarm-ignored1/manager_status
create mode 100644 src/test/test-disarm-ignored1/service_config
create mode 100644 src/test/test-disarm-maintenance1/cmdlist
create mode 100644 src/test/test-disarm-maintenance1/hardware_status
create mode 100644 src/test/test-disarm-maintenance1/log.expect
create mode 100644 src/test/test-disarm-maintenance1/manager_status
create mode 100644 src/test/test-disarm-maintenance1/service_config
create mode 100644 src/test/test-disarm-relocate1/README
create mode 100644 src/test/test-disarm-relocate1/cmdlist
create mode 100644 src/test/test-disarm-relocate1/hardware_status
create mode 100644 src/test/test-disarm-relocate1/log.expect
create mode 100644 src/test/test-disarm-relocate1/manager_status
create mode 100644 src/test/test-disarm-relocate1/service_config
--
2.47.3
next reply other threads:[~2026-03-09 22:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-09 21:57 Thomas Lamprecht [this message]
2026-03-09 21:57 ` [PATCH ha-manager 1/3] api: status: add fencing status entry with armed/standby state Thomas Lamprecht
2026-03-09 21:57 ` [PATCH ha-manager 2/3] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance Thomas Lamprecht
2026-03-09 21:57 ` [PATCH ha-manager 3/3] api: status: add disarm-ha and arm-ha endpoints and CLI wiring Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260309220128.973793-1-t.lamprecht@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox