From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 2D14768C32 for ; Thu, 10 Mar 2022 12:55:15 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1662019E43 for ; Thu, 10 Mar 2022 12:54:45 +0100 (CET) Received: from ra-mx.espeurope.net (mx.espeurope.net [78.128.73.10]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0B0EE19E16 for ; Thu, 10 Mar 2022 12:54:43 +0100 (CET) Received: from ra-mx (dbmail [78.128.73.10]) by ra-mx.espeurope.net (Postfix) with ESMTP id 044E14936BA; Thu, 10 Mar 2022 12:51:42 +0100 (CET) X-Virus-Scanned: amavisd-new at remote-admins.com Received: from ra-mx.espeurope.net ([78.128.73.10]) by ra-mx (ra-mx.espeurope.net [78.128.73.10]) (amavisd-new, port 10024) with ESMTP id 6e0UTK5ExCWl; Thu, 10 Mar 2022 12:50:50 +0100 (CET) Received: from [10.166.0.6] (unknown [78.128.73.8]) by ra-mx.espeurope.net (Postfix) with ESMTPSA id 2FB5F493693; Thu, 10 Mar 2022 12:50:49 +0100 (CET) From: admins@telehouse.solutions Message-Id: <9032C6EF-53C1-4804-AF68-198C8DB2426C@telehouse.solutions> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Date: Thu, 10 Mar 2022 13:50:10 +0200 In-Reply-To: Cc: PVE User List , Stefan Radman To: Proxmox VE user list References: X-Mailer: Apple Mail (2.3608.120.23.2.7) X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% HTML_MESSAGE 0.001 HTML included in message KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [telehouse.solutions, mysignature.io, proxmox.com, mysig.io] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [PVE-User] Locking HA during UPS shutdown X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2022 11:55:15 -0000 Hi,=20 here are two ideas: shutdown sequence -and- command sequence 1: shutdown sequence you may achieve when you set NUT=E2=80=99s on each = node to only monitor the UPS power, then configure each node to shutdown = itself on a different ups power levels, ex: node1 on 15% battery, node2 = on 10% battery and so on 2: you can set a cmd sequence to firstly execute pve node maintenance = mode , and then execute shutdown -> this way HA will not try to migrate = vm to node in maintenance, and the chance all nodes to goes into = maintenance in exactly same second seems to be not a risk at all. hope thats helpful. Regards, Sto. > On Mar 10, 2022, at 1:10 PM, Stefan Radman via pve-user = > wrote: >=20 >=20 > From: Stefan Radman > > Subject: Locking HA during UPS shutdown > Date: March 10, 2022 at 1:10:09 PM GMT+2 > To: PVE User List > >=20 >=20 > Hi=20 >=20 > I am configuring a 3 node PVE cluster with integrated Ceph storage. >=20 > It is powered by 2 UPS that are monitored by NUT (Network UPS Tools). >=20 > HA is configured with 3 groups: > group pve1 nodes pve1:1,pve2,pve3 > group pve2 nodes pve1,pve2:1,pve3 > group pve3 nodes pve1,pve2,pve3:1 >=20 > That will normally place the VMs in each group on the corresponding = node, unless that node fails. >=20 > The cluster is configured to migrate VMs away from a node before = shutting it down (Cluster=3D>Options=3D>HA Settings: = shutdown_policy=3Dmigrate). >=20 > NUT is configured to shut down the serves once the last of the two UPS = is running low on battery. >=20 > My problem: > When NUT starts shutting down the 3 nodes, HA will first try to = live-migrate them to another node. > That live migration process gets stuck because all the nodes are = shutting down simultaneously. > It seems that the whole process runs into a timeout, finally =E2=80=9Cpo= wers off=E2=80=9D all the VMs and shuts down the nodes. >=20 > My question: > Is there a way to =E2=80=9Clock=E2=80=9D or temporarily de-activate HA = before shutting down a node to avoid that deadlock? >=20 > Thank you >=20 > Stefan >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user Best Regards, Stoyan Stoyanov Sto | Solutions Manager | Telehouse.Solutions | ICT Department | phone/viber: +359 894774934 | telegram: @prostoSto = | skype: prosto.sto = | email: sto@telehouse.solutions | website: www.telehouse.solutions | address: Telepoint #2, Sofia, Bulgaria Save paper. Don=E2=80=99t print Best Regards, Stoyan Stoyanov Sto | Solutions Manager | Telehouse.Solutions | ICT Department | phone/viber: +359 894774934 | telegram: @prostoSto = | skype: prosto.sto = | email: sto@telehouse.solutions | website: www.telehouse.solutions | address: Telepoint #2, Sofia, Bulgaria Save paper. Don=E2=80=99t print