From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id F3EE11FF13B for ; Wed, 20 May 2026 08:54:17 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A710019F74; Wed, 20 May 2026 08:54:15 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Wed, 20 May 2026 08:53:40 +0200 Message-Id: From: "Daniel Kral" To: "Fiona Ebner" , Subject: Re: [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed X-Mailer: aerc 0.21.0-136-gdb9fe9896a79-dirty References: <20260519143842.382324-1-d.kral@proxmox.com> <20260519143842.382324-3-d.kral@proxmox.com> <5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com> In-Reply-To: <5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1779260005301 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.075 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [manager.pm,lrm.pm,tools.pm] Message-ID-Hash: QZVEKF2GOUECZO4ICWEM4GW5OHF7IGUL X-Message-ID-Hash: QZVEKF2GOUECZO4ICWEM4GW5OHF7IGUL X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue May 19, 2026 at 6:00 PM CEST, Fiona Ebner wrote: > Am 19.05.26 um 4:39 PM schrieb Daniel Kral: >> If there are HA resources, which are in transient states that defer the >> disarming process, but their LRMs are already in idle state and disarmed >> mode, these LRMs will not properly resolve the transient states of these >> HA resources as assumed by the HA Manager. >>=20 >> For HA resources, which are still moving, this makes the HA Manager >> stuck in a loop, which tries to defer the disarming process to wait for >> a LRM response for these moving HA resources, which will never come as >> the LRM is idle. >>=20 >> Therefore allow the LRM to become active in disarm mode if there are any >> HA resources on the LRM's node, which are in any of these transient >> states, and make sure that the LRM only processes the disarm-deferring >> HA resources while the LRM is active. >>=20 >> Signed-off-by: Daniel Kral >> --- >> src/PVE/HA/LRM.pm | 19 ++++++++++- >> src/PVE/HA/Manager.pm | 8 ++--- >> src/PVE/HA/Tools.pm | 17 ++++++++++ >> src/test/test-disarm-idle-lrm1/log.expect | 37 ++++++--------------- >> src/test/test-disarm-idle-lrm2/log.expect | 39 +++++++---------------- >> 5 files changed, 58 insertions(+), 62 deletions(-) >>=20 >> diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm >> index 426982cc..9100d611 100644 >> --- a/src/PVE/HA/LRM.pm >> +++ b/src/PVE/HA/LRM.pm >> @@ -312,6 +312,18 @@ sub active_service_count { >> return PVE::HA::Tools::count_active_services($ss, $nodename); >> } >> =20 >> +# returns a truthy value if there are HA resources in transient states,= which >> +# need to be resolved, e.g. to complete the disarm procedure. >> +sub has_disarm_deferred_services { > > Nit: I feel like the variables and functions should rather be named > disarm_deferring rather than disarm_deferred Thanks for the review, agree with all the renames! [snip] >> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm >> index 26629fb5..37b27e11 100644 >> --- a/src/PVE/HA/Tools.pm >> +++ b/src/PVE/HA/Tools.pm >> @@ -213,6 +213,23 @@ sub count_active_services { >> return $active_count; >> } >> =20 >> +sub get_disarm_deferred_services { >> + my ($ss, $node) =3D @_; >> + >> + my $deferred_sids =3D {}; >> + my @deferrable_states =3D qw(fence recovery migrate relocate); > > Nit: disarm_deferring_states > >> + >> + for my $sid (keys %$ss) { >> + my ($state, $current_node, $target_node) =3D $ss->{$sid}->@{qw(= state node target)}; >> + >> + next if $node && (!$current_node || $current_node ne $node); > > Just wondering: when does !$current_node happen? AFAIK the only case where this can currently happen is if the HA resource's guest doesn't exist in the cluster anymore according to the pmxcfs' vmlist and isn't removed by HA Manager anymore (as is done when the HA stack is in disarm mode). > >> + >> + $deferred_sids->{$sid} =3D 1 if grep { $state eq $_ } @deferrab= le_states; >> + } >> + >> + return $deferred_sids; >> +} >> + >> sub get_verbose_service_state { >> my ($service_state, $service_conf) =3D @_; >> =20