From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
Fabian Ebner <f.ebner@proxmox.com>
Subject: [pve-devel] applied: [RFC ha-manager] manage: handle edge case where a node gets stuck in 'fence' state
Date: Wed, 19 Jan 2022 14:36:09 +0100 [thread overview]
Message-ID: <74302f7d-706e-e2fa-edb6-d7d5cc4e8b85@proxmox.com> (raw)
In-Reply-To: <20211008125226.56551-1-f.ebner@proxmox.com>
On 08.10.21 14:52, Fabian Ebner wrote:
> If all services in 'fence' state are gone from a node (e.g. by
> removing the services) before fence_node() was successful, a node
> would get stuck in the 'fence' state. Avoid this by calling
> fence_node() if the node is in 'fence' state, regardless of service
> state.
>
> Reported in the community forum:
> https://forum.proxmox.com/threads/ha-migration-stuck-is-doing-nothing.94469/
>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>
> Not really sure if this is worth it, because it's a hard to reach edge
> case, but AFAICT there is no good way to get out of being stuck. What
> would work is either of:
> * Manually correcting the node state.
> * Adding a service to the stuck node and triggering a fence
> situation.
>
> An alternative would be to keep services in 'fence' state in the
> manager state, even if they were removed from the config. But the
> approach from this patch seemed a bit more robust: for example, it
> will fix an already existing stuck state, rather than just avoid
> creating one.
>
> src/PVE/HA/Manager.pm | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>
applied, thanks!
As also discussed off-list I noticed a related issue to a derived edge-case,
that could cause trouble too. Spent some time in coming up with two tests
covering your fixed situation plus also mine, expanding the capabilities of
the test/simulation system slightly.
https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=ca2e547a7662467f9a08c54fa15b46825e3702e6
https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=30fc7ceedb7f3047659f22d063cc16c94c20dd7a
prev parent reply other threads:[~2022-01-19 13:36 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-08 12:52 [pve-devel] " Fabian Ebner
2022-01-19 13:36 ` Thomas Lamprecht [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=74302f7d-706e-e2fa-edb6-d7d5cc4e8b85@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal