From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 12AB31FF140 for ; Fri, 27 Mar 2026 11:20:53 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2333A3EC7; Fri, 27 Mar 2026 11:21:16 +0100 (CET) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Fri, 27 Mar 2026 11:21:12 +0100 Message-Id: To: "Thomas Lamprecht" , Subject: Re: [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance From: "Daniel Kral" X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260321234350.2158438-1-t.lamprecht@proxmox.com> <20260321234350.2158438-4-t.lamprecht@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774606822813 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.058 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: BIWTUBBUJOUIQJD2M5S6ZPUMPZK5DLFG X-Message-ID-Hash: BIWTUBBUJOUIQJD2M5S6ZPUMPZK5DLFG X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri Mar 27, 2026 at 12:15 AM CET, Thomas Lamprecht wrote: > Am 26.03.26 um 17:02 schrieb Daniel Kral: >> Two other possible cases of 'implicitly transient' states might be: >>=20 >> - adding a HA rule, which makes a HA resource in 'started' state be put >> in 'migrate' to another node when processing the select_service_node() >> in next_state_started(). >>=20 >> - the node of a HA resource is offline delayed in the same round as the >> disarm request. If none of the HA resources are in a transient state >> yet, the disarm request goes through, otherwise the affected HA >> resources might be put in 'fence'. >>=20 >>=20 >> I haven't thought this through fully, but an option might be that we >> only allow the FSM processing of the HA resources, which are in one of >> these 4 transient states and don't process the others. >>=20 >> E.g. breaking out the FSM transition loop into its own function and in >> normal operation we iterate through all services in $ss, but for >> deferred disarming we only iterate through the HA resources in transient >> states, which should be resolved. > > I pushed a follow-up [0] that should deal with this, another look at > that would be appreciated! > FWIW, I mostly pushed directly as I still wanted to do a bump+test upload > today, because if everything is good we get a small regression fix faster > out to users, if not we can follow-up here and no harm done. > > [0]: https://git.proxmox.com/?p=3Dpve-ha-manager.git;a=3Dcommitdiff;h=3Db= 6b025a268032ff5302bede1f5eb56247af13f21 > > [...] > Thanks for the quick patch! The changes look good to me too and the test case captures the behavior well!