Re: [PVE-User] Locking HA during UPS shutdown

From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Locking HA during UPS shutdown
Date: Thu, 10 Mar 2022 15:24:28 +0100	[thread overview]
Message-ID: <1646920824.w5mef4abey.astroid@nora.none> (raw)
In-Reply-To: <D4408778-64CB-4AB4-B0E9-7BF744B40C12@telehouse.solutions>

On March 10, 2022 2:48 pm, admins@telehouse.solutions wrote:
> That was actually a really BAD ADVICE…. as when node initiate maintenance mode it will try to migrate hosted vms … and eventually ends up in the same Lock loop..
> what you really need is to remove started vms from ha-manager, so when the node initiate shutdown it will do firstly do regular shutdown vm per vm.
> 
> So, do something like below as first command in your NUT command sequence:
> 
> for a in `ha-manager status | grep started|awk '{print $2}'|sed 's/vm://g'`; do ha-manager remove $a;done

what you should do is just change the policy to freeze or fail-over 
before triggering the shutdown. and once power comes back up and your 
cluster has booted, switch it back to migrate.

that way, the shutdown will just stop and freeze the resources, similar 
to what happens when rebooting using the default conditional policy.

note that editing datacenter.cfg (where the shutdown_policy is 
configured) is currently not exposed in any CLI tool, but you can update 
it using pvesh or the API.

there is still one issue though - if the whole cluster is shutdown at 
the same time, at some point during the shutdown a non-quorate partition 
will be all that's left, and at that point certain actions won't work 
anymore and the node probably will get fenced. fixing this effectively 
would require some sort of conditional delay at the right point in the 
shutdown sequence that waits for all guests on all nodes(!) to stop 
before proceeding with stopping the PVE services and corosync (nodes 
still might get fenced if they take too long shutting down after the 
last guest has exited, but that shouldn't cause much issues other than 
noise). one way to do this would be for your NUT script to set a flag 
file in /etc/pve, and some systemd service with the right Wants/After 
settings that blocks the shutdown if the flag file exists and any guests 
are still running. probably requires some tinkering, but can be safely 
tested in a virtual cluster before moving to production ;)

this last problem is not related to HA though (other than HA introducing 
another source of trouble courtesy of fencing being active) - you will 
also potentially hit it with your approach. the 'stop all guests on 
node' logic that PVE has on shutdown is for shutting down one node 
without affecting quorum, it doesn't work reliably for full-cluster 
shutdowns (you might not see problems if timing works out, but it's 
based on chance).

an alternative approach would be to request all HA resources to be stopped 
or disabled (`ha-manager set .. --state ..`), wait for that to be done 
cluster-wide (e.g. by polling /cluster/resources API path), and then 
trigger the shutdown. disadvantage of that is you have to remember the 
pre-shutdown state and restore that afterwards for each resource..

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_node_maintenance

>> On Mar 10, 2022, at 2:48 PM, admins@telehouse.solutions wrote:
>> 
>> I don’t remember, search into pvecm and pve[tab][tab] related commands man pages 
>> 
>>> On Mar 10, 2022, at 2:19 PM, Stefan Radman <stefan.radman@me.com> wrote:
>>> 
>>> Hi Sto
>>> 
>>> Thanks for the suggestions.
>>> 
>>> The second option is what I was looking for.
>>> 
>>> How do I initiate “pve node maintenance mode”?
>>> 
>>> The “Node Maintenance” paragraph in the HA documentation is quite brief and does not refer to any command or GUI component.
>>> 
>>> Thank you
>>> 
>>> Stefan
>>> 
>>> 
>>>> On Mar 10, 2022, at 14:50, admins@telehouse.solutions <mailto:admins@telehouse.solutions> wrote:
>>>> 
>>>> Hi, 
>>>> 
>>>> here are two ideas: shutdown sequence -and- command sequence
>>>> 1: shutdown sequence you may achieve when you set NUT’s on each node to only monitor the UPS power, then configure each node to shutdown itself on a different ups power levels, ex: node1 on 15% battery, node2 on 10% battery and so on
>>>> 2: you can set a cmd sequence to firstly execute  pve node maintenance mode , and then execute shutdown -> this way HA will not try to migrate vm to node in maintenance, and the chance all nodes to goes into maintenance in exactly same second seems to be not a risk at all.
>>>> 
>>>> hope thats helpful.
>>>> 
>>>> Regards,
>>>> Sto.
>>>> 
>>>>> On Mar 10, 2022, at 1:10 PM, Stefan Radman via pve-user <pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>> wrote:
>>>>> 
>>>>> 
>>>>> From: Stefan Radman <stefan.radman@me.com <mailto:stefan.radman@me.com>>
>>>>> Subject: Locking HA during UPS shutdown
>>>>> Date: March 10, 2022 at 1:10:09 PM GMT+2
>>>>> To: PVE User List <pve-user@pve.proxmox.com <mailto:pve-user@pve.proxmox.com>>
>>>>> 
>>>>> 
>>>>> Hi 
>>>>> 
>>>>> I am configuring a 3 node PVE cluster with integrated Ceph storage.
>>>>> 
>>>>> It is powered by 2 UPS that are monitored by NUT (Network UPS Tools).
>>>>> 
>>>>> HA is configured with 3 groups:
>>>>> group pve1 nodes pve1:1,pve2,pve3
>>>>> group pve2 nodes pve1,pve2:1,pve3
>>>>> group pve3 nodes pve1,pve2,pve3:1
>>>>> 
>>>>> That will normally place the VMs in each group on the corresponding node, unless that node fails.
>>>>> 
>>>>> The cluster is configured to migrate VMs away from a node before shutting it down (Cluster=>Options=>HA Settings: shutdown_policy=migrate).
>>>>> 
>>>>> NUT is configured to shut down the serves once the last of the two UPS is running low on battery.
>>>>> 
>>>>> My problem:
>>>>> When NUT starts shutting down the 3 nodes, HA will first try to live-migrate them to another node.
>>>>> That live migration process gets stuck because all the nodes are shutting down simultaneously.
>>>>> It seems that the whole process runs into a timeout, finally “powers off” all the VMs and shuts down the nodes.
>>>>> 
>>>>> My question:
>>>>> Is there a way to “lock” or temporarily de-activate HA before shutting down a node to avoid that deadlock?
>>>>> 
>>>>> Thank you
>>>>> 
>>>>> Stefan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>
>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>
>>>> 
>>>> 
>>>> Best Regards,
>>>> 
>>>> Stoyan Stoyanov Sto | Solutions Manager
>>>> | Telehouse.Solutions | ICT Department
>>>> | phone/viber:  +359 894774934 <tel:+359 894774934>
>>>> | telegram:  @prostoSto <https://mysignature.io/redirect/skype:prosto.sto?chat>
>>>> | skype:  prosto.sto <https://mysignature.io/redirect/skype:prosto.sto?chat>
>>>> | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
>>>> | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
>>>> | address: Telepoint #2, Sofia, Bulgaria
>>>> <https://mysignature.io/editor/?utm_source=freepixel><356841.png>
>>>> 
>>>> <https://mysig.io/ZDNkNWY>
>>>> Save paper. Don’t print
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Best Regards,
>>>> 
>>>> Stoyan Stoyanov Sto | Solutions Manager
>>>> | Telehouse.Solutions | ICT Department
>>>> | phone/viber:  +359 894774934 <tel:+359 894774934>
>>>> | telegram:  @prostoSto <https://mysignature.io/redirect/skype:prosto.sto?chat>
>>>> | skype:  prosto.sto <https://mysignature.io/redirect/skype:prosto.sto?chat>
>>>> | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
>>>> | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
>>>> | address: Telepoint #2, Sofia, Bulgaria
>>>> <https://mysignature.io/editor/?utm_source=freepixel><356841.png>
>>>> 
>>>> <https://mysig.io/ZDNkNWY>
>>>> Save paper. Don’t print
>>> 
>> 
>> 
>> Best Regards,
>> 
>> Stoyan Stoyanov Sto | Solutions Manager
>> | Telehouse.Solutions | ICT Department
>> | phone/viber:  +359 894774934 <tel:+359 894774934>
>> | telegram:  @prostoSto <https://mysignature.io/redirect/skype:prosto.sto?chat>
>> | skype:  prosto.sto <https://mysignature.io/redirect/skype:prosto.sto?chat>
>> | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
>> | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
>> | address: Telepoint #2, Sofia, Bulgaria
>> <https://mysignature.io/editor/?utm_source=freepixel>
>> 
>> <https://mysig.io/ZDNkNWY>
>> Save paper. Don’t print
>> 
>> 
>> 
>> 
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> 
> Best Regards,
> 
> Stoyan Stoyanov Sto | Solutions Manager
> | Telehouse.Solutions | ICT Department
> | phone/viber:  +359 894774934 <tel:+359 894774934>
> | telegram:  @prostoSto <https://mysignature.io/redirect/skype:prosto.sto?chat>
> | skype:  prosto.sto <https://mysignature.io/redirect/skype:prosto.sto?chat>
> | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
> | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
> | address: Telepoint #2, Sofia, Bulgaria
>  <https://mysignature.io/editor/?utm_source=freepixel>
> 
>  <https://mysig.io/ZDNkNWY>
> Save paper. Don’t print
> 
> 
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>