From: Fiona Ebner <f.ebner@proxmox.com>
To: Daniel Kral <d.kral@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed
Date: Tue, 19 May 2026 18:00:31 +0200 [thread overview]
Message-ID: <5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com> (raw)
In-Reply-To: <20260519143842.382324-3-d.kral@proxmox.com>
Am 19.05.26 um 4:39 PM schrieb Daniel Kral:
> If there are HA resources, which are in transient states that defer the
> disarming process, but their LRMs are already in idle state and disarmed
> mode, these LRMs will not properly resolve the transient states of these
> HA resources as assumed by the HA Manager.
>
> For HA resources, which are still moving, this makes the HA Manager
> stuck in a loop, which tries to defer the disarming process to wait for
> a LRM response for these moving HA resources, which will never come as
> the LRM is idle.
>
> Therefore allow the LRM to become active in disarm mode if there are any
> HA resources on the LRM's node, which are in any of these transient
> states, and make sure that the LRM only processes the disarm-deferring
> HA resources while the LRM is active.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> src/PVE/HA/LRM.pm | 19 ++++++++++-
> src/PVE/HA/Manager.pm | 8 ++---
> src/PVE/HA/Tools.pm | 17 ++++++++++
> src/test/test-disarm-idle-lrm1/log.expect | 37 ++++++---------------
> src/test/test-disarm-idle-lrm2/log.expect | 39 +++++++----------------
> 5 files changed, 58 insertions(+), 62 deletions(-)
>
> diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
> index 426982cc..9100d611 100644
> --- a/src/PVE/HA/LRM.pm
> +++ b/src/PVE/HA/LRM.pm
> @@ -312,6 +312,18 @@ sub active_service_count {
> return PVE::HA::Tools::count_active_services($ss, $nodename);
> }
>
> +# returns a truthy value if there are HA resources in transient states, which
> +# need to be resolved, e.g. to complete the disarm procedure.
> +sub has_disarm_deferred_services {
Nit: I feel like the variables and functions should rather be named
disarm_deferring rather than disarm_deferred
> + my ($self) = @_;
> +
> + my $ss = $self->{service_status};
> + my $nodename = $self->{haenv}->nodename();
> + my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename);
> +
> + return %$deferred_sids;
> +}
> +
> my $wrote_lrm_status_at_startup = 0;
>
> sub do_one_iteration {
> @@ -371,7 +383,7 @@ sub work {
>
> my $service_count = $self->active_service_count();
>
> - if ($self->{mode} eq 'disarm') {
> + if ($self->{mode} eq 'disarm' && !$self->has_disarm_deferred_services()) {
> # stay idle while disarmed, don't acquire lock
> } elsif (!$fence_request && $service_count && $haenv->quorate()) {
> if ($self->get_protected_ha_agent_lock()) {
> @@ -709,12 +721,17 @@ sub manage_resources {
> my $nodename = $haenv->nodename();
>
> my $ss = $self->{service_status};
> + my $deferred_sids;
Nit: Here, a full $disarm_deferring_sids would provide the most context.
> + $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename)
> + if $self->{mode} eq 'disarm';
>
> foreach my $sid (keys %{ $self->{restart_tries} }) {
> delete $self->{restart_tries}->{$sid} if !$ss->{$sid};
> }
>
> foreach my $sid (keys %$ss) {
> + next if $deferred_sids && !$deferred_sids->{$sid};
> +
> my $sd = $ss->{$sid};
> next if !$sd->{node} || !$sd->{uid};
> next if $sd->{node} ne $nodename;
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 9b901c4f..a2baf349 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -929,15 +929,13 @@ sub handle_disarm {
> }
>
> # defer disarm if any services are in a transient state that needs the state machine to resolve
> - my $deferred_sids = {};
> - for my $sid (sort keys %$ss) {
> + my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss);
> + for my $sid (sort keys %$deferred_sids) {
> my $state = $ss->{$sid}->{state};
> if ($state eq 'fence' || $state eq 'recovery') {
> $haenv->log('warning', "deferring disarm - service '$sid' is in '$state' state");
> - $deferred_sids->{$sid} = 1;
> - } elsif ($state eq 'migrate' || $state eq 'relocate') {
> + } else {
> $haenv->log('info', "deferring disarm - service '$sid' is in '$state' state");
> - $deferred_sids->{$sid} = 1;
> }
> }
>
> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
> index 26629fb5..37b27e11 100644
> --- a/src/PVE/HA/Tools.pm
> +++ b/src/PVE/HA/Tools.pm
> @@ -213,6 +213,23 @@ sub count_active_services {
> return $active_count;
> }
>
> +sub get_disarm_deferred_services {
> + my ($ss, $node) = @_;
> +
> + my $deferred_sids = {};
> + my @deferrable_states = qw(fence recovery migrate relocate);
Nit: disarm_deferring_states
> +
> + for my $sid (keys %$ss) {
> + my ($state, $current_node, $target_node) = $ss->{$sid}->@{qw(state node target)};
> +
> + next if $node && (!$current_node || $current_node ne $node);
Just wondering: when does !$current_node happen?
> +
> + $deferred_sids->{$sid} = 1 if grep { $state eq $_ } @deferrable_states;
> + }
> +
> + return $deferred_sids;
> +}
> +
> sub get_verbose_service_state {
> my ($service_state, $service_conf) = @_;
>
next prev parent reply other threads:[~2026-05-19 16:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 16:00 ` Fiona Ebner [this message]
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00 ` Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=d.kral@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox