From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 319F61FF136 for ; Mon, 09 Mar 2026 23:01:49 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id CA6FBBE8C; Mon, 9 Mar 2026 23:01:40 +0100 (CET) From: Thomas Lamprecht To: pve-devel@lists.proxmox.com Subject: [PATCH ha-manager 0/3] fix #2751: implement disarm/arm HA for safer cluster maintenance Date: Mon, 9 Mar 2026 22:57:07 +0100 Message-ID: <20260309220128.973793-1-t.lamprecht@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1773093663843 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.080 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.408 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.819 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.903 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: WXVWOY52YGMGIX7T56VGEXAMGEESSCPN X-Message-ID-Hash: WXVWOY52YGMGIX7T56VGEXAMGEESSCPN X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Implements a new pair of crm-commands to disarm or arm the HA stack and also integrates the fencing state in the HA status API endpoint. Note that the underlying /dev/watchdog will still be kept open by watchdog-mux in any case, not every watchdog supports graceful closing it via the magic "V" byte, i.e., some reset in such cases, and we also want to ensure nobody else can grab the watchdog device. But watchdog-mux hanging up is basically impossible for any system, it does zero disk IO and is very trivial in runtime needs. Dominik R. wanted to pick this up recently and as I started with a WIP already years ago before some PVE release where then finishing got lost in time, I now dug that out, polished it and tested a bit around. But would be still good to recheck the approach and naturally also the whole implementation. btw. I also did some polishing of the watchdog-mux, mostly to make its behavior clearer, which I noticed when testing these patches. Basically, I now log when clients connect or gracefully disconnect and when all clients got disconnected. Further, I now query the PID of the client connecting and save that and include it in log messages, which allows tracing issues more easily back to the respective HA daemon. Those patches got applied already but mentioning for awareness. TBD: - more in-depth (real-world!) testing - UI integration - docs (got something started here, but can be finished once this is finalized) - maybe some more polishing - ...? Should I become unresponsive here, feel free to take this over (fine to either build/fix on top or become co-author), I just wanted to get out what I have to avoid that this nice UX feature misses yet another release. Thomas Lamprecht (3): api: status: add fencing status entry with armed/standby state fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance api: status: add disarm-ha and arm-ha endpoints and CLI wiring src/PVE/API2/HA/Status.pm | 142 +++++++++++++++++- src/PVE/CLI/ha_manager.pm | 2 + src/PVE/HA/CRM.pm | 33 +++- src/PVE/HA/Config.pm | 5 + src/PVE/HA/LRM.pm | 31 +++- src/PVE/HA/Manager.pm | 124 ++++++++++++++- src/PVE/HA/Sim/Hardware.pm | 4 + src/test/test-disarm-crm-stop1/README | 13 ++ src/test/test-disarm-crm-stop1/cmdlist | 6 + .../test-disarm-crm-stop1/hardware_status | 5 + src/test/test-disarm-crm-stop1/log.expect | 66 ++++++++ src/test/test-disarm-crm-stop1/manager_status | 1 + src/test/test-disarm-crm-stop1/service_config | 5 + src/test/test-disarm-fence1/cmdlist | 9 ++ src/test/test-disarm-fence1/hardware_status | 5 + src/test/test-disarm-fence1/log.expect | 78 ++++++++++ src/test/test-disarm-fence1/manager_status | 1 + src/test/test-disarm-fence1/service_config | 5 + src/test/test-disarm-frozen1/README | 10 ++ src/test/test-disarm-frozen1/cmdlist | 5 + src/test/test-disarm-frozen1/hardware_status | 5 + src/test/test-disarm-frozen1/log.expect | 59 ++++++++ src/test/test-disarm-frozen1/manager_status | 1 + src/test/test-disarm-frozen1/service_config | 5 + src/test/test-disarm-ignored1/README | 10 ++ src/test/test-disarm-ignored1/cmdlist | 5 + src/test/test-disarm-ignored1/hardware_status | 5 + src/test/test-disarm-ignored1/log.expect | 60 ++++++++ src/test/test-disarm-ignored1/manager_status | 1 + src/test/test-disarm-ignored1/service_config | 5 + src/test/test-disarm-maintenance1/cmdlist | 7 + .../test-disarm-maintenance1/hardware_status | 5 + src/test/test-disarm-maintenance1/log.expect | 79 ++++++++++ .../test-disarm-maintenance1/manager_status | 1 + .../test-disarm-maintenance1/service_config | 5 + src/test/test-disarm-relocate1/README | 3 + src/test/test-disarm-relocate1/cmdlist | 7 + .../test-disarm-relocate1/hardware_status | 5 + src/test/test-disarm-relocate1/log.expect | 51 +++++++ src/test/test-disarm-relocate1/manager_status | 1 + src/test/test-disarm-relocate1/service_config | 4 + 41 files changed, 861 insertions(+), 13 deletions(-) create mode 100644 src/test/test-disarm-crm-stop1/README create mode 100644 src/test/test-disarm-crm-stop1/cmdlist create mode 100644 src/test/test-disarm-crm-stop1/hardware_status create mode 100644 src/test/test-disarm-crm-stop1/log.expect create mode 100644 src/test/test-disarm-crm-stop1/manager_status create mode 100644 src/test/test-disarm-crm-stop1/service_config create mode 100644 src/test/test-disarm-fence1/cmdlist create mode 100644 src/test/test-disarm-fence1/hardware_status create mode 100644 src/test/test-disarm-fence1/log.expect create mode 100644 src/test/test-disarm-fence1/manager_status create mode 100644 src/test/test-disarm-fence1/service_config create mode 100644 src/test/test-disarm-frozen1/README create mode 100644 src/test/test-disarm-frozen1/cmdlist create mode 100644 src/test/test-disarm-frozen1/hardware_status create mode 100644 src/test/test-disarm-frozen1/log.expect create mode 100644 src/test/test-disarm-frozen1/manager_status create mode 100644 src/test/test-disarm-frozen1/service_config create mode 100644 src/test/test-disarm-ignored1/README create mode 100644 src/test/test-disarm-ignored1/cmdlist create mode 100644 src/test/test-disarm-ignored1/hardware_status create mode 100644 src/test/test-disarm-ignored1/log.expect create mode 100644 src/test/test-disarm-ignored1/manager_status create mode 100644 src/test/test-disarm-ignored1/service_config create mode 100644 src/test/test-disarm-maintenance1/cmdlist create mode 100644 src/test/test-disarm-maintenance1/hardware_status create mode 100644 src/test/test-disarm-maintenance1/log.expect create mode 100644 src/test/test-disarm-maintenance1/manager_status create mode 100644 src/test/test-disarm-maintenance1/service_config create mode 100644 src/test/test-disarm-relocate1/README create mode 100644 src/test/test-disarm-relocate1/cmdlist create mode 100644 src/test/test-disarm-relocate1/hardware_status create mode 100644 src/test/test-disarm-relocate1/log.expect create mode 100644 src/test/test-disarm-relocate1/manager_status create mode 100644 src/test/test-disarm-relocate1/service_config -- 2.47.3