From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 323FD68D5F for ; Thu, 10 Mar 2022 14:50:19 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 247961B0AE for ; Thu, 10 Mar 2022 14:49:49 +0100 (CET) Received: from ra-mx.espeurope.net (mx.espeurope.net [78.128.73.10]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2E71F1B082 for ; Thu, 10 Mar 2022 14:49:46 +0100 (CET) Received: from ra-mx (dbmail [78.128.73.10]) by ra-mx.espeurope.net (Postfix) with ESMTP id A0852493B9D; Thu, 10 Mar 2022 14:50:22 +0100 (CET) X-Virus-Scanned: amavisd-new at remote-admins.com Received: from ra-mx.espeurope.net ([78.128.73.10]) by ra-mx (ra-mx.espeurope.net [78.128.73.10]) (amavisd-new, port 10024) with ESMTP id HJHyPjq1jC26; Thu, 10 Mar 2022 14:49:34 +0100 (CET) Received: from [10.166.0.6] (unknown [78.128.73.8]) by ra-mx.espeurope.net (Postfix) with ESMTPSA id 7C7CA493B76; Thu, 10 Mar 2022 14:49:32 +0100 (CET) From: admins@telehouse.solutions Message-Id: Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Date: Thu, 10 Mar 2022 15:48:53 +0200 In-Reply-To: <0F233260-1E25-4379-9EE2-FF64FFC62278@telehouse.solutions> Cc: Stefan Radman , PVE User List To: Proxmox VE user list References: <9032C6EF-53C1-4804-AF68-198C8DB2426C@telehouse.solutions> <0F233260-1E25-4379-9EE2-FF64FFC62278@telehouse.solutions> X-Mailer: Apple Mail (2.3608.120.23.2.7) X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% HTML_MESSAGE 0.001 HTML included in message KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [mysig.io, proxmox.com, mysignature.io, telehouse.solutions] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [PVE-User] Locking HA during UPS shutdown X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2022 13:50:19 -0000 That was actually a really BAD ADVICE=E2=80=A6. as when node initiate = maintenance mode it will try to migrate hosted vms =E2=80=A6 and = eventually ends up in the same Lock loop.. what you really need is to remove started vms from ha-manager, so when = the node initiate shutdown it will do firstly do regular shutdown vm per = vm. So, do something like below as first command in your NUT command = sequence: for a in `ha-manager status | grep started|awk '{print $2}'|sed = 's/vm://g'`; do ha-manager remove $a;done > On Mar 10, 2022, at 2:48 PM, admins@telehouse.solutions wrote: >=20 > I don=E2=80=99t remember, search into pvecm and pve[tab][tab] related = commands man pages=20 >=20 >> On Mar 10, 2022, at 2:19 PM, Stefan Radman = wrote: >>=20 >> Hi Sto >>=20 >> Thanks for the suggestions. >>=20 >> The second option is what I was looking for. >>=20 >> How do I initiate =E2=80=9Cpve node maintenance mode=E2=80=9D? >>=20 >> The =E2=80=9CNode Maintenance=E2=80=9D paragraph in the HA = documentation is quite brief and does not refer to any command or GUI = component. >>=20 >> Thank you >>=20 >> Stefan >>=20 >>=20 >>> On Mar 10, 2022, at 14:50, admins@telehouse.solutions = wrote: >>>=20 >>> Hi,=20 >>>=20 >>> here are two ideas: shutdown sequence -and- command sequence >>> 1: shutdown sequence you may achieve when you set NUT=E2=80=99s on = each node to only monitor the UPS power, then configure each node to = shutdown itself on a different ups power levels, ex: node1 on 15% = battery, node2 on 10% battery and so on >>> 2: you can set a cmd sequence to firstly execute pve node = maintenance mode , and then execute shutdown -> this way HA will not try = to migrate vm to node in maintenance, and the chance all nodes to goes = into maintenance in exactly same second seems to be not a risk at all. >>>=20 >>> hope thats helpful. >>>=20 >>> Regards, >>> Sto. >>>=20 >>>> On Mar 10, 2022, at 1:10 PM, Stefan Radman via pve-user = > wrote: >>>>=20 >>>>=20 >>>> From: Stefan Radman > >>>> Subject: Locking HA during UPS shutdown >>>> Date: March 10, 2022 at 1:10:09 PM GMT+2 >>>> To: PVE User List > >>>>=20 >>>>=20 >>>> Hi=20 >>>>=20 >>>> I am configuring a 3 node PVE cluster with integrated Ceph storage. >>>>=20 >>>> It is powered by 2 UPS that are monitored by NUT (Network UPS = Tools). >>>>=20 >>>> HA is configured with 3 groups: >>>> group pve1 nodes pve1:1,pve2,pve3 >>>> group pve2 nodes pve1,pve2:1,pve3 >>>> group pve3 nodes pve1,pve2,pve3:1 >>>>=20 >>>> That will normally place the VMs in each group on the corresponding = node, unless that node fails. >>>>=20 >>>> The cluster is configured to migrate VMs away from a node before = shutting it down (Cluster=3D>Options=3D>HA Settings: = shutdown_policy=3Dmigrate). >>>>=20 >>>> NUT is configured to shut down the serves once the last of the two = UPS is running low on battery. >>>>=20 >>>> My problem: >>>> When NUT starts shutting down the 3 nodes, HA will first try to = live-migrate them to another node. >>>> That live migration process gets stuck because all the nodes are = shutting down simultaneously. >>>> It seems that the whole process runs into a timeout, finally = =E2=80=9Cpowers off=E2=80=9D all the VMs and shuts down the nodes. >>>>=20 >>>> My question: >>>> Is there a way to =E2=80=9Clock=E2=80=9D or temporarily de-activate = HA before shutting down a node to avoid that deadlock? >>>>=20 >>>> Thank you >>>>=20 >>>> Stefan >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> _______________________________________________ >>>> pve-user mailing list >>>> pve-user@lists.proxmox.com >>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user = >>>=20 >>>=20 >>> Best Regards, >>>=20 >>> Stoyan Stoyanov Sto | Solutions Manager >>> | Telehouse.Solutions | ICT Department >>> | phone/viber: +359 894774934 >>> | telegram: @prostoSto = >>> | skype: prosto.sto = >>> | email: sto@telehouse.solutions >>> | website: www.telehouse.solutions >>> | address: Telepoint #2, Sofia, Bulgaria >>> <356841.png> >>>=20 >>> >>> Save paper. Don=E2=80=99t print >>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>> Best Regards, >>>=20 >>> Stoyan Stoyanov Sto | Solutions Manager >>> | Telehouse.Solutions | ICT Department >>> | phone/viber: +359 894774934 >>> | telegram: @prostoSto = >>> | skype: prosto.sto = >>> | email: sto@telehouse.solutions >>> | website: www.telehouse.solutions >>> | address: Telepoint #2, Sofia, Bulgaria >>> <356841.png> >>>=20 >>> >>> Save paper. Don=E2=80=99t print >>=20 >=20 >=20 > Best Regards, >=20 > Stoyan Stoyanov Sto | Solutions Manager > | Telehouse.Solutions | ICT Department > | phone/viber: +359 894774934 > | telegram: @prostoSto = > | skype: prosto.sto = > | email: sto@telehouse.solutions > | website: www.telehouse.solutions > | address: Telepoint #2, Sofia, Bulgaria > >=20 > > Save paper. Don=E2=80=99t print >=20 >=20 >=20 >=20 > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user Best Regards, Stoyan Stoyanov Sto | Solutions Manager | Telehouse.Solutions | ICT Department | phone/viber: +359 894774934 | telegram: @prostoSto = | skype: prosto.sto = | email: sto@telehouse.solutions | website: www.telehouse.solutions | address: Telepoint #2, Sofia, Bulgaria Save paper. Don=E2=80=99t print