all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Daniel Kral <d.kral@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed
Date: Tue, 19 May 2026 18:00:31 +0200	[thread overview]
Message-ID: <5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com> (raw)
In-Reply-To: <20260519143842.382324-3-d.kral@proxmox.com>

Am 19.05.26 um 4:39 PM schrieb Daniel Kral:
> If there are HA resources, which are in transient states that defer the
> disarming process, but their LRMs are already in idle state and disarmed
> mode, these LRMs will not properly resolve the transient states of these
> HA resources as assumed by the HA Manager.
> 
> For HA resources, which are still moving, this makes the HA Manager
> stuck in a loop, which tries to defer the disarming process to wait for
> a LRM response for these moving HA resources, which will never come as
> the LRM is idle.
> 
> Therefore allow the LRM to become active in disarm mode if there are any
> HA resources on the LRM's node, which are in any of these transient
> states, and make sure that the LRM only processes the disarm-deferring
> HA resources while the LRM is active.
> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
>  src/PVE/HA/LRM.pm                         | 19 ++++++++++-
>  src/PVE/HA/Manager.pm                     |  8 ++---
>  src/PVE/HA/Tools.pm                       | 17 ++++++++++
>  src/test/test-disarm-idle-lrm1/log.expect | 37 ++++++---------------
>  src/test/test-disarm-idle-lrm2/log.expect | 39 +++++++----------------
>  5 files changed, 58 insertions(+), 62 deletions(-)
> 
> diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
> index 426982cc..9100d611 100644
> --- a/src/PVE/HA/LRM.pm
> +++ b/src/PVE/HA/LRM.pm
> @@ -312,6 +312,18 @@ sub active_service_count {
>      return PVE::HA::Tools::count_active_services($ss, $nodename);
>  }
>  
> +# returns a truthy value if there are HA resources in transient states, which
> +# need to be resolved, e.g. to complete the disarm procedure.
> +sub has_disarm_deferred_services {

Nit: I feel like the variables and functions should rather be named
disarm_deferring rather than disarm_deferred

> +    my ($self) = @_;
> +
> +    my $ss = $self->{service_status};
> +    my $nodename = $self->{haenv}->nodename();
> +    my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename);
> +
> +    return %$deferred_sids;
> +}
> +
>  my $wrote_lrm_status_at_startup = 0;
>  
>  sub do_one_iteration {
> @@ -371,7 +383,7 @@ sub work {
>  
>          my $service_count = $self->active_service_count();
>  
> -        if ($self->{mode} eq 'disarm') {
> +        if ($self->{mode} eq 'disarm' && !$self->has_disarm_deferred_services()) {
>              # stay idle while disarmed, don't acquire lock
>          } elsif (!$fence_request && $service_count && $haenv->quorate()) {
>              if ($self->get_protected_ha_agent_lock()) {
> @@ -709,12 +721,17 @@ sub manage_resources {
>      my $nodename = $haenv->nodename();
>  
>      my $ss = $self->{service_status};
> +    my $deferred_sids;

Nit: Here, a full $disarm_deferring_sids would provide the most context.

> +    $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename)
> +        if $self->{mode} eq 'disarm';
>  
>      foreach my $sid (keys %{ $self->{restart_tries} }) {
>          delete $self->{restart_tries}->{$sid} if !$ss->{$sid};
>      }
>  
>      foreach my $sid (keys %$ss) {
> +        next if $deferred_sids && !$deferred_sids->{$sid};
> +
>          my $sd = $ss->{$sid};
>          next if !$sd->{node} || !$sd->{uid};
>          next if $sd->{node} ne $nodename;
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 9b901c4f..a2baf349 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -929,15 +929,13 @@ sub handle_disarm {
>      }
>  
>      # defer disarm if any services are in a transient state that needs the state machine to resolve
> -    my $deferred_sids = {};
> -    for my $sid (sort keys %$ss) {
> +    my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss);
> +    for my $sid (sort keys %$deferred_sids) {
>          my $state = $ss->{$sid}->{state};
>          if ($state eq 'fence' || $state eq 'recovery') {
>              $haenv->log('warning', "deferring disarm - service '$sid' is in '$state' state");
> -            $deferred_sids->{$sid} = 1;
> -        } elsif ($state eq 'migrate' || $state eq 'relocate') {
> +        } else {
>              $haenv->log('info', "deferring disarm - service '$sid' is in '$state' state");
> -            $deferred_sids->{$sid} = 1;
>          }
>      }
>  
> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
> index 26629fb5..37b27e11 100644
> --- a/src/PVE/HA/Tools.pm
> +++ b/src/PVE/HA/Tools.pm
> @@ -213,6 +213,23 @@ sub count_active_services {
>      return $active_count;
>  }
>  
> +sub get_disarm_deferred_services {
> +    my ($ss, $node) = @_;
> +
> +    my $deferred_sids = {};
> +    my @deferrable_states = qw(fence recovery migrate relocate);

Nit: disarm_deferring_states

> +
> +    for my $sid (keys %$ss) {
> +        my ($state, $current_node, $target_node) = $ss->{$sid}->@{qw(state node target)};
> +
> +        next if $node && (!$current_node || $current_node ne $node);

Just wondering: when does !$current_node happen?

> +
> +        $deferred_sids->{$sid} = 1 if grep { $state eq $_ } @deferrable_states;
> +    }
> +
> +    return $deferred_sids;
> +}
> +
>  sub get_verbose_service_state {
>      my ($service_state, $service_conf) = @_;
>  




  reply	other threads:[~2026-05-19 16:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 16:00   ` Fiona Ebner [this message]
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00   ` Fiona Ebner
2026-05-19 20:11 ` applied: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5978c036-c864-413c-a4a7-6febe1b7f2b3@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal