public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Daniel Kral" <d.kral@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>,
	"Fiona Ebner" <f.ebner@proxmox.com>,
	"Thomas Lamprecht" <t.lamprecht@proxmox.com>,
	<pve-devel@lists.proxmox.com>
Subject: Re: [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed
Date: Wed, 20 May 2026 09:48:11 +0200	[thread overview]
Message-ID: <DINC73WLZGVK.3OY7RC6PNXU45@proxmox.com> (raw)
In-Reply-To: <DINB1D9OP7XH.HMLQ6152CNIU@proxmox.com>

On Wed May 20, 2026 at 8:53 AM CEST, Daniel Kral wrote:
> On Tue May 19, 2026 at 6:00 PM CEST, Fiona Ebner wrote:
>> Am 19.05.26 um 4:39 PM schrieb Daniel Kral:
>>> +
>>> +    for my $sid (keys %$ss) {
>>> +        my ($state, $current_node, $target_node) = $ss->{$sid}->@{qw(state node target)};
>>> +
>>> +        next if $node && (!$current_node || $current_node ne $node);
>>
>> Just wondering: when does !$current_node happen?
>
> AFAIK the only case where this can currently happen is if the HA
> resource's guest doesn't exist in the cluster anymore according to the
> pmxcfs' vmlist and isn't removed by HA Manager anymore (as is done when
> the HA stack is in disarm mode).

Sorry for the noise, had another look: the HA Manager never removes HA
resources that have an undef node (e.g. if the VM was removed in some
way that bypasses the check to also prune the HA resource from the
config) no matter if the HA stack is disarming or not:

    # jq '.service_status["vm:2000"]' /etc/pve/ha/manager_status
    {
      "node": "pve",
      "uid": "pHQkcW2HF1jeyQJ5JLb/8Q",
      "state": "stopped"
    }
    # mv /etc/pve/nodes/pve/qemu-server/2000.conf .
    # jq '.service_status["vm:2000"]' /etc/pve/ha/manager_status
    {
      "state": "stopped",
      "uid": "wtRkyVgpB7LcmCtqGBtf+w",
      "node": null
    }

As I tried it out a few times, this is also a cause why undef nodenames
get written to the manager_status and as there was never a timestamp for
the undef node entry the vm was tried to fenced which failed quite a few
assumptions in the HA Manager:

May 20 09:24:51 pve-2 pve-ha-crm[22795]: unable to score nodes according to dynamic usage for service 'vm:2000' - did not get dynamic service usage information for 'vm:2000'
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value in numeric comparison (<=>) at /usr/share/perl5/PVE/HA/Manager.pm line 390.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value in numeric comparison (<=>) at /usr/share/perl5/PVE/HA/Manager.pm line 390.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value in numeric comparison (<=>) at /usr/share/perl5/PVE/HA/Manager.pm line 390.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value in numeric comparison (<=>) at /usr/share/perl5/PVE/HA/Manager.pm line 390.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value $current_node in string eq at /usr/share/perl5/PVE/HA/Manager.pm line 396.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value $current_node in string eq at /usr/share/perl5/PVE/HA/Manager.pm line 396.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value $current_node in string eq at /usr/share/perl5/PVE/HA/Manager.pm line 396.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value $fenced_node in concatenation (.) or string at /usr/share/perl5/PVE/HA/Manager.pm line 1663.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: Use of uninitialized value $fenced_node in string eq at /usr/share/perl5/PVE/HA/Manager.pm line 1664.
May 20 09:24:51 pve-2 pve-ha-crm[22795]: recover service 'vm:2000' from fenced node '' to node 'pve'
May 20 09:24:51 pve-2 pve-ha-crm[22795]: got unexpected error - Configuration file 'nodes/pve-2/qemu-server/2000.conf' does not exist

This isn't something that should happen in normal circumstances though,
I'll send a patch doing more checks to be defensive and/or removing the
service status entry if the HA resource's guest isn't in the vmlist
anymore, though for the latter I'll have to check if that could cause
any trouble.

Furthermore, if the HA resource then gets fenced, the HA Manager will
acquire the lock for it's own node as get_ha_agent_lock($self, $node)
defaults to the current nodename if $node is undef.

Also might be worth to detect that HA resources have changed node
inbetween in the HA Manager, as it currently doesn't update the node at
all if it's moved manually and is already present in the HA Manager
status... I'll look into it further, as I already wanted to change this
behavior slightly for a partial fix of e.g. fenced HA resources, which
were migrated in the mean time [0].

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=6610

>
>>
>>> +
>>> +        $deferred_sids->{$sid} = 1 if grep { $state eq $_ } @deferrable_states;
>>> +    }
>>> +
>>> +    return $deferred_sids;
>>> +}
>>> +
>>>  sub get_verbose_service_state {
>>>      my ($service_state, $service_conf) = @_;
>>>  





  reply	other threads:[~2026-05-20  7:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 16:00   ` Fiona Ebner
2026-05-20  6:53     ` Daniel Kral
2026-05-20  7:48       ` Daniel Kral [this message]
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00   ` Fiona Ebner
2026-05-19 20:11 ` applied: " Thomas Lamprecht
2026-05-20  8:07   ` Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DINC73WLZGVK.3OY7RC6PNXU45@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal