public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Fabian Ebner <f.ebner@proxmox.com>
Subject: [pve-devel] applied: [RFC ha-manager] manage: handle edge case where a node gets stuck in 'fence' state
Date: Wed, 19 Jan 2022 14:36:09 +0100	[thread overview]
Message-ID: <74302f7d-706e-e2fa-edb6-d7d5cc4e8b85@proxmox.com> (raw)
In-Reply-To: <20211008125226.56551-1-f.ebner@proxmox.com>

On 08.10.21 14:52, Fabian Ebner wrote:
> If all services in 'fence' state are gone from a node (e.g. by
> removing the services) before fence_node() was successful, a node
> would get stuck in the 'fence' state. Avoid this by calling
> fence_node() if the node is in 'fence' state, regardless of service
> state.
> 
> Reported in the community forum:
> https://forum.proxmox.com/threads/ha-migration-stuck-is-doing-nothing.94469/
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> Not really sure if this is worth it, because it's a hard to reach edge
> case, but AFAICT there is no good way to get out of being stuck. What
> would work is either of:
>     * Manually correcting the node state.
>     * Adding a service to the stuck node and triggering a fence
>       situation.
> 
> An alternative would be to keep services in 'fence' state in the
> manager state, even if they were removed from the config. But the
> approach from this patch seemed a bit more robust: for example, it
> will fix an already existing stuck state, rather than just avoid
> creating one.
> 
>  src/PVE/HA/Manager.pm | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
>

applied, thanks!

As also discussed off-list I noticed a related issue to a derived edge-case,
that could cause trouble too. Spent some time in coming up with two tests
covering your fixed situation plus also mine, expanding the capabilities of
the test/simulation system slightly.

https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=ca2e547a7662467f9a08c54fa15b46825e3702e6
https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=30fc7ceedb7f3047659f22d063cc16c94c20dd7a




      reply	other threads:[~2022-01-19 13:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 12:52 [pve-devel] " Fabian Ebner
2022-01-19 13:36 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74302f7d-706e-e2fa-edb6-d7d5cc4e8b85@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal