From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
Fabian Ebner <f.ebner@proxmox.com>
Subject: [pve-devel] applied: [RFC ha-manager] manage: handle edge case where a node gets stuck in 'fence' state
Date: Wed, 19 Jan 2022 14:36:09 +0100 [thread overview]
Message-ID: <74302f7d-706e-e2fa-edb6-d7d5cc4e8b85@proxmox.com> (raw)
In-Reply-To: <20211008125226.56551-1-f.ebner@proxmox.com>
On 08.10.21 14:52, Fabian Ebner wrote:
> If all services in 'fence' state are gone from a node (e.g. by
> removing the services) before fence_node() was successful, a node
> would get stuck in the 'fence' state. Avoid this by calling
> fence_node() if the node is in 'fence' state, regardless of service
> state.
>
> Reported in the community forum:
> https://forum.proxmox.com/threads/ha-migration-stuck-is-doing-nothing.94469/
>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
>
> Not really sure if this is worth it, because it's a hard to reach edge
> case, but AFAICT there is no good way to get out of being stuck. What
> would work is either of:
> * Manually correcting the node state.
> * Adding a service to the stuck node and triggering a fence
> situation.
>
> An alternative would be to keep services in 'fence' state in the
> manager state, even if they were removed from the config. But the
> approach from this patch seemed a bit more robust: for example, it
> will fix an already existing stuck state, rather than just avoid
> creating one.
>
> src/PVE/HA/Manager.pm | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>
applied, thanks!
As also discussed off-list I noticed a related issue to a derived edge-case,
that could cause trouble too. Spent some time in coming up with two tests
covering your fixed situation plus also mine, expanding the capabilities of
the test/simulation system slightly.
https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=ca2e547a7662467f9a08c54fa15b46825e3702e6
https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=30fc7ceedb7f3047659f22d063cc16c94c20dd7a
prev parent reply other threads:[~2022-01-19 13:36 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-08 12:52 [pve-devel] " Fabian Ebner
2022-01-19 13:36 ` Thomas Lamprecht [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=74302f7d-706e-e2fa-edb6-d7d5cc4e8b85@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox