From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 6D26C1FF13F
	for <inbox@lore.proxmox.com>; Thu, 26 Mar 2026 17:02:27 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id CB0AE19F7C;
	Thu, 26 Mar 2026 17:02:46 +0100 (CET)
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Thu, 26 Mar 2026 17:02:40 +0100
Message-Id: <DHCU9R824P2D.38DIPJF7JXIZ9@proxmox.com>
From: "Daniel Kral" <d.kral@proxmox.com>
To: "Thomas Lamprecht" <t.lamprecht@proxmox.com>,
 <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and
 arm-ha for safe cluster maintenance
X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty
References: <20260321234350.2158438-1-t.lamprecht@proxmox.com>
 <20260321234350.2158438-4-t.lamprecht@proxmox.com>
In-Reply-To: <20260321234350.2158438-4-t.lamprecht@proxmox.com>
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1774540912048
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.056 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_RPBL_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_SAFE_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: VYNWNANIHJCSNGLXQPIFQKZ3IN5ALIGF
X-Message-ID-Hash: VYNWNANIHJCSNGLXQPIFQKZ3IN5ALIGF
X-MailFrom: d.kral@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

Nice work!

While rebasing the load balancing patches on this, I had a few thoughts
about the transient states, which I wanted to add below. Nothing really
urgent and I might have missed something so correct me if I'm wrong.


One small UX nit:

As the test case 'test-disarm-relocate1' documents, the 'disarm-ha'
command takes priority over the migrate/relocate command. It is more of
an edge case anyway that both a migration an disarm-ha command are
issued in the same HA Manager cycle, but of course could still happen.

Handling the disarming as an admin task is more important than a
user-initiated migration request. It would still be great that this
action is relayed back to the user for the migration request (which
might only see the hamigrate task but no following qmigrate task), but
this is more an UX thing and should be handled e.g. as part of #6220 [0]
so users can directly follow the HA request and the further actions.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=3D6220

On Sun Mar 22, 2026 at 12:42 AM CET, Thomas Lamprecht wrote:
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index b1dbe6a..aa29858 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -631,6 +684,94 @@ sub try_persistent_group_migration {
>      }
>  }
> =20
> +sub handle_disarm {
> +    my ($self, $disarm, $ss, $lrm_modes) =3D @_;
> +
> +    my $haenv =3D $self->{haenv};
> +    my $ns =3D $self->{ns};
> +
> +    # defer disarm if any services are in a transient state that needs t=
he state machine to resolve
> +    for my $sid (sort keys %$ss) {
> +        my $state =3D $ss->{$sid}->{state};
> +        if ($state eq 'fence' || $state eq 'recovery') {
> +            $haenv->log(
> +                'warn', "deferring disarm - service '$sid' is in '$state=
' state",
> +            );
> +            return 0; # let manage() continue so fence/recovery can prog=
ress
> +        }
> +        if ($state eq 'migrate' || $state eq 'relocate') {
> +            $haenv->log(
> +                'info', "deferring disarm - service '$sid' is in '$state=
' state",
> +            );
> +            return 0; # let manage() continue so migration can complete
> +        }
> +    }

Here the HA disarming is deferred so that the HA Manager can continue
with processing the HA resources FSM if at least one of the HA resources
is in one of the 4 transient states.

This can have side-effects for other HA resources as well, which
currently aren't in one of these 4 transient states, but are implicitly
in a transient state.


For example:

There are two HA resources in the current service state.

{
   ...
   "service_status": {
       "vm:101": {
           "state": "started",
           "node": "node1",
           "cmd": [ "migrate", "node2" ],
           "uid": "...",
       },
       "vm:102": {
           "state": "...",
           "node": "node1",
           "uid": "...",
       },
   }
}


If vm:102 is in state 'started', then the HA Manager will start to
disarm the HA stack right away and drop vm:101's migration request in
both disarm modes.

If vm:102 is in any of the 4 transient states from above though, then
the HA Manager will defer the disarm and process all HA resources, which
will cause 'vm:101' to be put in 'migrate' now.


So the transient state of one HA resource might cause other HA resources
to be put in a transient state as well, even though I would have
expected the deferring here to only resolve the existing transient
states.

This also means that the HA Manager will handle user requests for HA
resources in the same HA Manager cycle as a disarm request if there is
at least one HA resource in one of the 4 transient states. This
contradicts that disarming takes priority over the migrate/relocate
request.

Two other possible cases of 'implicitly transient' states might be:

- adding a HA rule, which makes a HA resource in 'started' state be put
  in 'migrate' to another node when processing the select_service_node()
  in next_state_started().

- the node of a HA resource is offline delayed in the same round as the
  disarm request. If none of the HA resources are in a transient state
  yet, the disarm request goes through, otherwise the affected HA
  resources might be put in 'fence'.


I haven't thought this through fully, but an option might be that we
only allow the FSM processing of the HA resources, which are in one of
these 4 transient states and don't process the others.

E.g. breaking out the FSM transition loop into its own function and in
normal operation we iterate through all services in $ss, but for
deferred disarming we only iterate through the HA resources in transient
states, which should be resolved.


> +
> +    my $mode =3D $disarm->{mode};
> +
> +    # prune stale runtime data (failed_nodes, cmd, target, ...) so the s=
tate machine starts
> +    # fresh on re-arm; preserve maintenance_node for correct return beha=
vior
> +    my %keep_keys =3D map { $_ =3D> 1 } qw(state node uid maintenance_no=
de);
> +
> +    if ($mode eq 'freeze') {
> +        for my $sid (sort keys %$ss) {
> +            my $sd =3D $ss->{$sid};
> +            my $state =3D $sd->{state};
> +            next if $state eq 'freeze'; # already frozen
> +            if (
> +                $state eq 'started'
> +                || $state eq 'stopped'
> +                || $state eq 'request_stop'
> +                || $state eq 'request_start'
> +                || $state eq 'request_start_balance'
> +                || $state eq 'error'
> +            ) {
> +                $haenv->log('info', "disarm: freezing service '$sid' (wa=
s '$state')");
> +                delete $sd->{$_} for grep { !$keep_keys{$_} } keys %$sd;
> +                $sd->{state} =3D 'freeze';
> +                $sd->{uid} =3D compute_new_uuid('freeze');
> +            }
> +        }
> +    } elsif ($mode eq 'ignore') {
> +        # keep $ss intact; the disarm flag in $ms causes service loops a=
nd vm_is_ha_managed()
> +        # to skip these services while disarmed
> +        for my $sid (sort keys %$ss) {
> +            my $sd =3D $ss->{$sid};
> +            delete $sd->{$_} for grep { !$keep_keys{$_} } keys %$sd;
> +        }
> +    }
> +
> +    # check if all online LRMs have entered disarm mode
> +    my $all_disarmed =3D 1;
> +    my $online_nodes =3D $ns->list_online_nodes();
> +
> +    for my $node (@$online_nodes) {
> +        my $lrm_mode =3D $lrm_modes->{$node} // 'unknown';
> +        if ($lrm_mode ne 'disarm') {
> +            $all_disarmed =3D 0;
> +            last;
> +        }
> +    }
> +
> +    if ($all_disarmed && $disarm->{state} ne 'disarmed') {
> +        $haenv->log('info', "all LRMs disarmed, HA stack is now fully di=
sarmed");
> +        $disarm->{state} =3D 'disarmed';
> +    }
> +
> +    # once disarmed, stay disarmed - a returning node's LRM will catch u=
p within one cycle
> +    $self->{all_lrms_disarmed} =3D $disarm->{state} eq 'disarmed';
> +
> +    $self->flush_master_status();
> +
> +    return 1;
> +}
> +
> +sub is_fully_disarmed {
> +    my ($self) =3D @_;
> +
> +    return $self->{all_lrms_disarmed};
> +}
> +
>  sub manage {
>      my ($self) =3D @_;
> =20
> @@ -713,6 +857,15 @@ sub manage {
> =20
>      $self->update_crm_commands();
> =20
> +    if (my $disarm =3D $ms->{disarm}) {
> +        if ($self->handle_disarm($disarm, $ss, $lrm_modes)) {
> +            return; # disarm active and progressing, skip normal service=
 state machine
> +        }
> +        # disarm deferred (e.g. due to active fencing) - fall through to=
 let it complete
> +    }

Regarding the load balancing patches, I would change this to

- }
+ } else {
+     # load balance only if disarm is disabled as during a deferred disarm
+     # the HA Manager should not introduce any new migrations
+     $self->load_balance();
+ }

there.

> +
> +    $self->{all_lrms_disarmed} =3D 0;
> +
>      for (;;) {
>          my $repeat =3D 0;