Re: [pve-devel] [PATCH qemu-server 1/1] qemu: add offline migration from dead node

From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Thomas Lamprecht <t.lamprecht@proxmox.com>,
	Dominik Csapak <d.csapak@proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server 1/1] qemu: add offline migration from dead node
Date: Tue, 1 Apr 2025 13:13:25 +0200 (CEST)	[thread overview]
Message-ID: <135708611.3668.1743506005916@webmail.proxmox.com> (raw)
In-Reply-To: <6b44b21c-5399-47a5-8e06-3c1d3e1eaab9@proxmox.com>

> Thomas Lamprecht <t.lamprecht@proxmox.com> hat am 01.04.2025 12:46 CEST geschrieben:
> 
>  
> Am 01.04.25 um 12:19 schrieb Dominik Csapak:
> > while i also agree to all said here, I have one counter point to offer:
> > 
> > In the case that such an operation is necessary (e.g. HA is not wanted/needed/possible
> > for what ever reason), the user will fall back to do it manually (iow. 'mv source target')
> > which is at least as dangerous as exposing over the API, since
> > 
> > * now the admins sharing the system must share root@pam credentials (ssh/console access)
> >    (alternatively setup sudo, which has it's own problems)
> 
> Setups with many admins need to handle already how they can log in as root, be
> it through a jump user (`doas` is a thing if sudo is deemed to complex), some
> identity provider (LDAP, OIDC, ... with PAM configuration), as root operations
> are required for other things too.

and the same feature on the API also requires root@pam anyway ;)

> [..]

> > * link to the doc section for it from the UI with a big caveat
> >    https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_recovery
> 
> As Fabian wrote, such disclaimers might be nice for shifting the blame
> but are not enough in practice for such an operation.
> 
> And Fabians point wasn't that doing it on the CLI is less dangerous, its
> about the same either way, but that exposing this as well-integrated feature
> makes it seem much less dangerous to the user, especially those that are
> less experienced and should be stumped and ask some support channel for
> help.
> 
> That said, the actual first step to move this forward would IMO be to create
> an extensive documentation/how-to for how such things can be resolved and what
> one needs to watch out for, sort of check-list style might be a good format.
> As that alone should help users a lot already, and that would also make it
> much clearer what a more integrated (semi-automated) way could look like.
> Which could be a check tool that helps with assessing the recovery depending
> on config, storage (types), network, mappings, ... which would ensure that
> common issues/blockers are not missed and will even help experienced admins.
> If that cannot be first documented and then optionally transformed into a
> hands-off evaluation checker tool, or if that's deemed to not help users, I
> really do not see how an API integrated solution can do so without just
> hand-waving all actual and real issues for why this does not already exists
> away.

(improving) such docs would be nice - we do have a little bit here:

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_recovering_moving_guests_from_failed_nodes 

the only way to technically improve what is possible IMHO would be to implement
some kind of reliable STONITH mechanism in addition to fencing, and base an
integrated "guest stealing" mechanism on that (with some additional component
that ensures that if the "shot" comes back up right away it won't do anything
with the "stolen" guest before the theft is over).

e.g., if you have a (set of) remote-manageable power strip(s) configured that
allows:
- removing all power from node
- query power state of a node

you could use that to reduce HA failover times (you can shoot the other node
if you want to make it fenced, irrespective of watchdog timeouts/..), and to
implement a guest stealing mechanism:
- put a file/entry in /etc/pve marking a guest as "currently being stolen"
- shoot the other node and verify it is down
- steal config
- remove marker file/entry

no matter at which point after the shooting the other node comes back up, it
must first sync up /etc/pve, which means it can check for markers on VM
locking. if a marker is found, it's not allowed to lock, else it can proceed
(checking doesn't require locking cluster wide, just setting the mark would).
if no marker is found, the config is not there anymore either or it hasn't
been stolen and can be locked and used normally.

if no stonith mechanism is configured, stealing is not available.

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel