Re: [PVE-User] Locking HA during UPS shutdown

From: "M. Lyakhovsky" <markl17@gmail.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Locking HA during UPS shutdown
Date: Thu, 10 Mar 2022 11:07:46 -0500	[thread overview]
Message-ID: <CAMah9rsPsuXP=aXhHhCeLbmXLRsw2Gok_yJB_adQ8s-UWY2Evg@mail.gmail.com> (raw)
In-Reply-To: <1646920824.w5mef4abey.astroid@nora.none>

Hi I asked a. Question earlier and no one answered  about not being able
load lv2 library because I need lvremove
And please can someone tell me how anable WiFi by putting a sting in.
/etc/network/devices

On Thu, Mar 10, 2022 at 9:30 AM Fabian Grünbichler <
f.gruenbichler@proxmox.com> wrote:

> On March 10, 2022 2:48 pm, admins@telehouse.solutions wrote:
> > That was actually a really BAD ADVICE…. as when node initiate
> maintenance mode it will try to migrate hosted vms … and eventually ends up
> in the same Lock loop..
> > what you really need is to remove started vms from ha-manager, so when
> the node initiate shutdown it will do firstly do regular shutdown vm per vm.
> >
> > So, do something like below as first command in your NUT command
> sequence:
> >
> > for a in `ha-manager status | grep started|awk '{print $2}'|sed
> 's/vm://g'`; do ha-manager remove $a;done
>
> what you should do is just change the policy to freeze or fail-over
> before triggering the shutdown. and once power comes back up and your
> cluster has booted, switch it back to migrate.
>
> that way, the shutdown will just stop and freeze the resources, similar
> to what happens when rebooting using the default conditional policy.
>
> note that editing datacenter.cfg (where the shutdown_policy is
> configured) is currently not exposed in any CLI tool, but you can update
> it using pvesh or the API.
>
> there is still one issue though - if the whole cluster is shutdown at
> the same time, at some point during the shutdown a non-quorate partition
> will be all that's left, and at that point certain actions won't work
> anymore and the node probably will get fenced. fixing this effectively
> would require some sort of conditional delay at the right point in the
> shutdown sequence that waits for all guests on all nodes(!) to stop
> before proceeding with stopping the PVE services and corosync (nodes
> still might get fenced if they take too long shutting down after the
> last guest has exited, but that shouldn't cause much issues other than
> noise). one way to do this would be for your NUT script to set a flag
> file in /etc/pve, and some systemd service with the right Wants/After
> settings that blocks the shutdown if the flag file exists and any guests
> are still running. probably requires some tinkering, but can be safely
> tested in a virtual cluster before moving to production ;)
>
> this last problem is not related to HA though (other than HA introducing
> another source of trouble courtesy of fencing being active) - you will
> also potentially hit it with your approach. the 'stop all guests on
> node' logic that PVE has on shutdown is for shutting down one node
> without affecting quorum, it doesn't work reliably for full-cluster
> shutdowns (you might not see problems if timing works out, but it's
> based on chance).
>
> an alternative approach would be to request all HA resources to be stopped
> or disabled (`ha-manager set .. --state ..`), wait for that to be done
> cluster-wide (e.g. by polling /cluster/resources API path), and then
> trigger the shutdown. disadvantage of that is you have to remember the
> pre-shutdown state and restore that afterwards for each resource..
>
> https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_node_maintenance
>
> >> On Mar 10, 2022, at 2:48 PM, admins@telehouse.solutions wrote:
> >>
> >> I don’t remember, search into pvecm and pve[tab][tab] related commands
> man pages
> >>
> >>> On Mar 10, 2022, at 2:19 PM, Stefan Radman <stefan.radman@me.com>
> wrote:
> >>>
> >>> Hi Sto
> >>>
> >>> Thanks for the suggestions.
> >>>
> >>> The second option is what I was looking for.
> >>>
> >>> How do I initiate “pve node maintenance mode”?
> >>>
> >>> The “Node Maintenance” paragraph in the HA documentation is quite
> brief and does not refer to any command or GUI component.
> >>>
> >>> Thank you
> >>>
> >>> Stefan
> >>>
> >>>
> >>>> On Mar 10, 2022, at 14:50, admins@telehouse.solutions <mailto:
> admins@telehouse.solutions> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> here are two ideas: shutdown sequence -and- command sequence
> >>>> 1: shutdown sequence you may achieve when you set NUT’s on each node
> to only monitor the UPS power, then configure each node to shutdown itself
> on a different ups power levels, ex: node1 on 15% battery, node2 on 10%
> battery and so on
> >>>> 2: you can set a cmd sequence to firstly execute  pve node
> maintenance mode , and then execute shutdown -> this way HA will not try to
> migrate vm to node in maintenance, and the chance all nodes to goes into
> maintenance in exactly same second seems to be not a risk at all.
> >>>>
> >>>> hope thats helpful.
> >>>>
> >>>> Regards,
> >>>> Sto.
> >>>>
> >>>>> On Mar 10, 2022, at 1:10 PM, Stefan Radman via pve-user <
> pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>> wrote:
> >>>>>
> >>>>>
> >>>>> From: Stefan Radman <stefan.radman@me.com <mailto:
> stefan.radman@me.com>>
> >>>>> Subject: Locking HA during UPS shutdown
> >>>>> Date: March 10, 2022 at 1:10:09 PM GMT+2
> >>>>> To: PVE User List <pve-user@pve.proxmox.com <mailto:
> pve-user@pve.proxmox.com>>
> >>>>>
> >>>>>
> >>>>> Hi
> >>>>>
> >>>>> I am configuring a 3 node PVE cluster with integrated Ceph storage.
> >>>>>
> >>>>> It is powered by 2 UPS that are monitored by NUT (Network UPS Tools).
> >>>>>
> >>>>> HA is configured with 3 groups:
> >>>>> group pve1 nodes pve1:1,pve2,pve3
> >>>>> group pve2 nodes pve1,pve2:1,pve3
> >>>>> group pve3 nodes pve1,pve2,pve3:1
> >>>>>
> >>>>> That will normally place the VMs in each group on the corresponding
> node, unless that node fails.
> >>>>>
> >>>>> The cluster is configured to migrate VMs away from a node before
> shutting it down (Cluster=>Options=>HA Settings: shutdown_policy=migrate).
> >>>>>
> >>>>> NUT is configured to shut down the serves once the last of the two
> UPS is running low on battery.
> >>>>>
> >>>>> My problem:
> >>>>> When NUT starts shutting down the 3 nodes, HA will first try to
> live-migrate them to another node.
> >>>>> That live migration process gets stuck because all the nodes are
> shutting down simultaneously.
> >>>>> It seems that the whole process runs into a timeout, finally “powers
> off” all the VMs and shuts down the nodes.
> >>>>>
> >>>>> My question:
> >>>>> Is there a way to “lock” or temporarily de-activate HA before
> shutting down a node to avoid that deadlock?
> >>>>>
> >>>>> Thank you
> >>>>>
> >>>>> Stefan
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> pve-user mailing list
> >>>>> pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>
> >>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user <
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>
> >>>>
> >>>>
> >>>> Best Regards,
> >>>>
> >>>> Stoyan Stoyanov Sto | Solutions Manager
> >>>> | Telehouse.Solutions | ICT Department
> >>>> | phone/viber:  +359 894774934 <tel:+359 894774934>
> >>>> | telegram:  @prostoSto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> >>>> | skype:  prosto.sto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> >>>> | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
> >>>> | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
> >>>> | address: Telepoint #2, Sofia, Bulgaria
> >>>> <https://mysignature.io/editor/?utm_source=freepixel><356841.png>
> >>>>
> >>>> <https://mysig.io/ZDNkNWY>
> >>>> Save paper. Don’t print
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Best Regards,
> >>>>
> >>>> Stoyan Stoyanov Sto | Solutions Manager
> >>>> | Telehouse.Solutions | ICT Department
> >>>> | phone/viber:  +359 894774934 <tel:+359 894774934>
> >>>> | telegram:  @prostoSto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> >>>> | skype:  prosto.sto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> >>>> | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
> >>>> | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
> >>>> | address: Telepoint #2, Sofia, Bulgaria
> >>>> <https://mysignature.io/editor/?utm_source=freepixel><356841.png>
> >>>>
> >>>> <https://mysig.io/ZDNkNWY>
> >>>> Save paper. Don’t print
> >>>
> >>
> >>
> >> Best Regards,
> >>
> >> Stoyan Stoyanov Sto | Solutions Manager
> >> | Telehouse.Solutions | ICT Department
> >> | phone/viber:  +359 894774934 <tel:+359 894774934>
> >> | telegram:  @prostoSto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> >> | skype:  prosto.sto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> >> | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
> >> | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
> >> | address: Telepoint #2, Sofia, Bulgaria
> >> <https://mysignature.io/editor/?utm_source=freepixel>
> >>
> >> <https://mysig.io/ZDNkNWY>
> >> Save paper. Don’t print
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-user@lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> >
> > Best Regards,
> >
> > Stoyan Stoyanov Sto | Solutions Manager
> > | Telehouse.Solutions | ICT Department
> > | phone/viber:  +359 894774934 <tel:+359 894774934>
> > | telegram:  @prostoSto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> > | skype:  prosto.sto <
> https://mysignature.io/redirect/skype:prosto.sto?chat>
> > | email:  sto@telehouse.solutions <mailto:sto@telehouse.solutions>
> > | website: www.telehouse.solutions <https://mysig.io/MTRmMTg>
> > | address: Telepoint #2, Sofia, Bulgaria
> >  <https://mysignature.io/editor/?utm_source=freepixel>
> >
> >  <https://mysig.io/ZDNkNWY>
> > Save paper. Don’t print
> >
> >
> >
> >
> > _______________________________________________
> > pve-user mailing list
> > pve-user@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
>
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
-- 
Do have a Blessed Day