From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 0EA6B1FF14B for ; Sun, 22 Mar 2026 00:44:25 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 760AD3749B; Sun, 22 Mar 2026 00:44:34 +0100 (CET) From: Thomas Lamprecht To: pve-devel@lists.proxmox.com Subject: [PATCH ha-manager v2 0/4] fix #2751: implement disarm/arm HA for safer cluster maintenance Date: Sun, 22 Mar 2026 00:42:49 +0100 Message-ID: <20260321234350.2158438-1-t.lamprecht@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774136591805 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.010 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: LU57X5LIAK6LZDINJ65QPXJY5SD3Z5EE X-Message-ID-Hash: LU57X5LIAK6LZDINJ65QPXJY5SD3Z5EE X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: The biggest change compared to v1 is how ignore mode handles the service status: instead of clearing it entirely, the relevant parts of service status are now preserved across the disarm/arm cycle. This allows runtime state like maintenance_node to survive, so services correctly migrate back to their original node after maintenance ends, even if the disarm happened while maintenance was active. Thanks @Dominik R. for noticing this. To keep the preserved state clean, stale runtime data (failed_nodes, cmd, target, ...) is pruned from service entries on disarm - both in freeze and ignore mode - so the state machine starts fresh on re-arm. The status API overrides the displayed service state to 'ignore' during disarm-ignore mode, while the internal state stays untouched for seamless resume. On arm-ha from ignore mode, the CRM now rechecks the previous resource's node against the resource service config, picking up any manual migrations the admin performed while HA tracking was suspended. First patch 1/4 is new and adds a manual-migrate simulator command as a preparatory patch, since it is independently useful for testing the per-service 'ignored' state handling. Previous discussion and v1: https://lore.proxmox.com/pve-devel/20260309220128.973793-1-t.lamprecht@proxmox.com/ TBD: - some more in-depth (real-world) testing - UI integration - maybe some more polishing changes v1 -> v2: - ignore mode: preserve relevant service status instead of clearing it, recheck node info on arm-ha for manual migrations [Dominik] - prune stale runtime data from service entries on disarm for both modes - add 'protected => 1' to both API endpoints [Dominik] - split out manual-migrate sim command as preparatory patch - various style, log level, and test improvements (see per-patch changelogs for details) Thomas Lamprecht (4): sim: hardware: add manual-migrate command for ignored services api: status: add fencing status entry with armed/standby state fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance api: status: add disarm-ha and arm-ha endpoints and CLI wiring src/PVE/API2/HA/Status.pm | 143 ++++++++++++- src/PVE/CLI/ha_manager.pm | 2 + src/PVE/HA/CRM.pm | 33 ++- src/PVE/HA/Config.pm | 5 + src/PVE/HA/LRM.pm | 31 ++- src/PVE/HA/Manager.pm | 197 ++++++++++++++++-- src/PVE/HA/Sim/Hardware.pm | 36 ++++ src/test/test-disarm-crm-stop1/README | 13 ++ src/test/test-disarm-crm-stop1/cmdlist | 6 + .../test-disarm-crm-stop1/hardware_status | 5 + src/test/test-disarm-crm-stop1/log.expect | 66 ++++++ src/test/test-disarm-crm-stop1/manager_status | 1 + src/test/test-disarm-crm-stop1/service_config | 5 + src/test/test-disarm-double1/cmdlist | 7 + src/test/test-disarm-double1/hardware_status | 5 + src/test/test-disarm-double1/log.expect | 53 +++++ src/test/test-disarm-double1/manager_status | 1 + src/test/test-disarm-double1/service_config | 4 + src/test/test-disarm-failing-service1/cmdlist | 6 + .../hardware_status | 5 + .../test-disarm-failing-service1/log.expect | 125 +++++++++++ .../manager_status | 1 + .../service_config | 4 + src/test/test-disarm-fence1/cmdlist | 9 + src/test/test-disarm-fence1/hardware_status | 5 + src/test/test-disarm-fence1/log.expect | 78 +++++++ src/test/test-disarm-fence1/manager_status | 1 + src/test/test-disarm-fence1/service_config | 5 + src/test/test-disarm-frozen1/README | 10 + src/test/test-disarm-frozen1/cmdlist | 5 + src/test/test-disarm-frozen1/hardware_status | 5 + src/test/test-disarm-frozen1/log.expect | 59 ++++++ src/test/test-disarm-frozen1/manager_status | 1 + src/test/test-disarm-frozen1/service_config | 5 + src/test/test-disarm-ignored1/README | 10 + src/test/test-disarm-ignored1/cmdlist | 5 + src/test/test-disarm-ignored1/hardware_status | 5 + src/test/test-disarm-ignored1/log.expect | 50 +++++ src/test/test-disarm-ignored1/manager_status | 1 + src/test/test-disarm-ignored1/service_config | 5 + src/test/test-disarm-ignored2/cmdlist | 6 + src/test/test-disarm-ignored2/hardware_status | 5 + src/test/test-disarm-ignored2/log.expect | 60 ++++++ src/test/test-disarm-ignored2/manager_status | 1 + src/test/test-disarm-ignored2/service_config | 5 + src/test/test-disarm-maintenance1/cmdlist | 7 + .../test-disarm-maintenance1/hardware_status | 5 + src/test/test-disarm-maintenance1/log.expect | 79 +++++++ .../test-disarm-maintenance1/manager_status | 1 + .../test-disarm-maintenance1/service_config | 5 + src/test/test-disarm-maintenance2/cmdlist | 7 + .../test-disarm-maintenance2/hardware_status | 5 + src/test/test-disarm-maintenance2/log.expect | 78 +++++++ .../test-disarm-maintenance2/manager_status | 1 + .../test-disarm-maintenance2/service_config | 5 + src/test/test-disarm-maintenance3/cmdlist | 8 + .../test-disarm-maintenance3/hardware_status | 5 + src/test/test-disarm-maintenance3/log.expect | 80 +++++++ .../test-disarm-maintenance3/manager_status | 1 + .../test-disarm-maintenance3/service_config | 5 + src/test/test-disarm-relocate1/README | 3 + src/test/test-disarm-relocate1/cmdlist | 7 + .../test-disarm-relocate1/hardware_status | 5 + src/test/test-disarm-relocate1/log.expect | 51 +++++ src/test/test-disarm-relocate1/manager_status | 1 + src/test/test-disarm-relocate1/service_config | 4 + src/test/test-manual-migrate-ignored1/cmdlist | 7 + .../hardware_status | 5 + .../test-manual-migrate-ignored1/log.expect | 44 ++++ .../manager_status | 1 + .../service_config | 5 + 71 files changed, 1481 insertions(+), 34 deletions(-) create mode 100644 src/test/test-disarm-crm-stop1/README create mode 100644 src/test/test-disarm-crm-stop1/cmdlist create mode 100644 src/test/test-disarm-crm-stop1/hardware_status create mode 100644 src/test/test-disarm-crm-stop1/log.expect create mode 100644 src/test/test-disarm-crm-stop1/manager_status create mode 100644 src/test/test-disarm-crm-stop1/service_config create mode 100644 src/test/test-disarm-double1/cmdlist create mode 100644 src/test/test-disarm-double1/hardware_status create mode 100644 src/test/test-disarm-double1/log.expect create mode 100644 src/test/test-disarm-double1/manager_status create mode 100644 src/test/test-disarm-double1/service_config create mode 100644 src/test/test-disarm-failing-service1/cmdlist create mode 100644 src/test/test-disarm-failing-service1/hardware_status create mode 100644 src/test/test-disarm-failing-service1/log.expect create mode 100644 src/test/test-disarm-failing-service1/manager_status create mode 100644 src/test/test-disarm-failing-service1/service_config create mode 100644 src/test/test-disarm-fence1/cmdlist create mode 100644 src/test/test-disarm-fence1/hardware_status create mode 100644 src/test/test-disarm-fence1/log.expect create mode 100644 src/test/test-disarm-fence1/manager_status create mode 100644 src/test/test-disarm-fence1/service_config create mode 100644 src/test/test-disarm-frozen1/README create mode 100644 src/test/test-disarm-frozen1/cmdlist create mode 100644 src/test/test-disarm-frozen1/hardware_status create mode 100644 src/test/test-disarm-frozen1/log.expect create mode 100644 src/test/test-disarm-frozen1/manager_status create mode 100644 src/test/test-disarm-frozen1/service_config create mode 100644 src/test/test-disarm-ignored1/README create mode 100644 src/test/test-disarm-ignored1/cmdlist create mode 100644 src/test/test-disarm-ignored1/hardware_status create mode 100644 src/test/test-disarm-ignored1/log.expect create mode 100644 src/test/test-disarm-ignored1/manager_status create mode 100644 src/test/test-disarm-ignored1/service_config create mode 100644 src/test/test-disarm-ignored2/cmdlist create mode 100644 src/test/test-disarm-ignored2/hardware_status create mode 100644 src/test/test-disarm-ignored2/log.expect create mode 100644 src/test/test-disarm-ignored2/manager_status create mode 100644 src/test/test-disarm-ignored2/service_config create mode 100644 src/test/test-disarm-maintenance1/cmdlist create mode 100644 src/test/test-disarm-maintenance1/hardware_status create mode 100644 src/test/test-disarm-maintenance1/log.expect create mode 100644 src/test/test-disarm-maintenance1/manager_status create mode 100644 src/test/test-disarm-maintenance1/service_config create mode 100644 src/test/test-disarm-maintenance2/cmdlist create mode 100644 src/test/test-disarm-maintenance2/hardware_status create mode 100644 src/test/test-disarm-maintenance2/log.expect create mode 100644 src/test/test-disarm-maintenance2/manager_status create mode 100644 src/test/test-disarm-maintenance2/service_config create mode 100644 src/test/test-disarm-maintenance3/cmdlist create mode 100644 src/test/test-disarm-maintenance3/hardware_status create mode 100644 src/test/test-disarm-maintenance3/log.expect create mode 100644 src/test/test-disarm-maintenance3/manager_status create mode 100644 src/test/test-disarm-maintenance3/service_config create mode 100644 src/test/test-disarm-relocate1/README create mode 100644 src/test/test-disarm-relocate1/cmdlist create mode 100644 src/test/test-disarm-relocate1/hardware_status create mode 100644 src/test/test-disarm-relocate1/log.expect create mode 100644 src/test/test-disarm-relocate1/manager_status create mode 100644 src/test/test-disarm-relocate1/service_config create mode 100644 src/test/test-manual-migrate-ignored1/cmdlist create mode 100644 src/test/test-manual-migrate-ignored1/hardware_status create mode 100644 src/test/test-manual-migrate-ignored1/log.expect create mode 100644 src/test/test-manual-migrate-ignored1/manager_status create mode 100644 src/test/test-manual-migrate-ignored1/service_config -- 2.47.3