public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Daniel Kral <d.kral@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed
Date: Tue, 19 May 2026 18:00:31 +0200	[thread overview]
Message-ID: <5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com> (raw)
In-Reply-To: <20260519143842.382324-3-d.kral@proxmox.com>

Am 19.05.26 um 4:39 PM schrieb Daniel Kral:
> If there are HA resources, which are in transient states that defer the
> disarming process, but their LRMs are already in idle state and disarmed
> mode, these LRMs will not properly resolve the transient states of these
> HA resources as assumed by the HA Manager.
> 
> For HA resources, which are still moving, this makes the HA Manager
> stuck in a loop, which tries to defer the disarming process to wait for
> a LRM response for these moving HA resources, which will never come as
> the LRM is idle.
> 
> Therefore allow the LRM to become active in disarm mode if there are any
> HA resources on the LRM's node, which are in any of these transient
> states, and make sure that the LRM only processes the disarm-deferring
> HA resources while the LRM is active.
> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
>  src/PVE/HA/LRM.pm                         | 19 ++++++++++-
>  src/PVE/HA/Manager.pm                     |  8 ++---
>  src/PVE/HA/Tools.pm                       | 17 ++++++++++
>  src/test/test-disarm-idle-lrm1/log.expect | 37 ++++++---------------
>  src/test/test-disarm-idle-lrm2/log.expect | 39 +++++++----------------
>  5 files changed, 58 insertions(+), 62 deletions(-)
> 
> diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
> index 426982cc..9100d611 100644
> --- a/src/PVE/HA/LRM.pm
> +++ b/src/PVE/HA/LRM.pm
> @@ -312,6 +312,18 @@ sub active_service_count {
>      return PVE::HA::Tools::count_active_services($ss, $nodename);
>  }
>  
> +# returns a truthy value if there are HA resources in transient states, which
> +# need to be resolved, e.g. to complete the disarm procedure.
> +sub has_disarm_deferred_services {

Nit: I feel like the variables and functions should rather be named
disarm_deferring rather than disarm_deferred

> +    my ($self) = @_;
> +
> +    my $ss = $self->{service_status};
> +    my $nodename = $self->{haenv}->nodename();
> +    my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename);
> +
> +    return %$deferred_sids;
> +}
> +
>  my $wrote_lrm_status_at_startup = 0;
>  
>  sub do_one_iteration {
> @@ -371,7 +383,7 @@ sub work {
>  
>          my $service_count = $self->active_service_count();
>  
> -        if ($self->{mode} eq 'disarm') {
> +        if ($self->{mode} eq 'disarm' && !$self->has_disarm_deferred_services()) {
>              # stay idle while disarmed, don't acquire lock
>          } elsif (!$fence_request && $service_count && $haenv->quorate()) {
>              if ($self->get_protected_ha_agent_lock()) {
> @@ -709,12 +721,17 @@ sub manage_resources {
>      my $nodename = $haenv->nodename();
>  
>      my $ss = $self->{service_status};
> +    my $deferred_sids;

Nit: Here, a full $disarm_deferring_sids would provide the most context.

> +    $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename)
> +        if $self->{mode} eq 'disarm';
>  
>      foreach my $sid (keys %{ $self->{restart_tries} }) {
>          delete $self->{restart_tries}->{$sid} if !$ss->{$sid};
>      }
>  
>      foreach my $sid (keys %$ss) {
> +        next if $deferred_sids && !$deferred_sids->{$sid};
> +
>          my $sd = $ss->{$sid};
>          next if !$sd->{node} || !$sd->{uid};
>          next if $sd->{node} ne $nodename;
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 9b901c4f..a2baf349 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -929,15 +929,13 @@ sub handle_disarm {
>      }
>  
>      # defer disarm if any services are in a transient state that needs the state machine to resolve
> -    my $deferred_sids = {};
> -    for my $sid (sort keys %$ss) {
> +    my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss);
> +    for my $sid (sort keys %$deferred_sids) {
>          my $state = $ss->{$sid}->{state};
>          if ($state eq 'fence' || $state eq 'recovery') {
>              $haenv->log('warning', "deferring disarm - service '$sid' is in '$state' state");
> -            $deferred_sids->{$sid} = 1;
> -        } elsif ($state eq 'migrate' || $state eq 'relocate') {
> +        } else {
>              $haenv->log('info', "deferring disarm - service '$sid' is in '$state' state");
> -            $deferred_sids->{$sid} = 1;
>          }
>      }
>  
> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
> index 26629fb5..37b27e11 100644
> --- a/src/PVE/HA/Tools.pm
> +++ b/src/PVE/HA/Tools.pm
> @@ -213,6 +213,23 @@ sub count_active_services {
>      return $active_count;
>  }
>  
> +sub get_disarm_deferred_services {
> +    my ($ss, $node) = @_;
> +
> +    my $deferred_sids = {};
> +    my @deferrable_states = qw(fence recovery migrate relocate);

Nit: disarm_deferring_states

> +
> +    for my $sid (keys %$ss) {
> +        my ($state, $current_node, $target_node) = $ss->{$sid}->@{qw(state node target)};
> +
> +        next if $node && (!$current_node || $current_node ne $node);

Just wondering: when does !$current_node happen?

> +
> +        $deferred_sids->{$sid} = 1 if grep { $state eq $_ } @deferrable_states;
> +    }
> +
> +    return $deferred_sids;
> +}
> +
>  sub get_verbose_service_state {
>      my ($service_state, $service_conf) = @_;
>  




  reply	other threads:[~2026-05-19 16:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 16:00   ` Fiona Ebner [this message]
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal