From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id E56779F24 for ; Wed, 27 Apr 2022 14:00:44 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D64CD25F7A for ; Wed, 27 Apr 2022 14:00:14 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id BBCCD25F6F for ; Wed, 27 Apr 2022 14:00:13 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 81BCE42929 for ; Wed, 27 Apr 2022 14:00:13 +0200 (CEST) Message-ID: <54cb7acd-2586-aeae-d867-34b9e1f271ac@proxmox.com> Date: Wed, 27 Apr 2022 14:00:12 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Thunderbird/100.0 Content-Language: en-US To: Proxmox VE development discussion , =?UTF-8?Q?Fabian_Gr=c3=bcnbichler?= References: <20220427101955.3550677-1-f.gruenbichler@proxmox.com> From: Thomas Lamprecht In-Reply-To: <20220427101955.3550677-1-f.gruenbichler@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.025 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] applied: [PATCH ha-manager] lrm: fix getting stuck on restart X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Apr 2022 12:00:44 -0000 On 27.04.22 12:19, Fabian Gr=C3=BCnbichler wrote: > run_workers is responsible for updating the state after workers have > exited. if the current LRM state is 'active', but a shutdown_request wa= s > issued in 'restart' mode (like on package upgrades), this call is the > only one made in the LRM work() loop. >=20 > skipping it if there are active services means the following sequence o= f > events effectively keeps the LRM from restarting or making any progress= : >=20 > - start HA migration on node A > - reload LRM on node A while migration is still running >=20 > even once the migration is finished, the service count is still >=3D 1 > since the LRM never calls run_workers (directly or via > manage_resources), so the service having been migrated is never noticed= =2E >=20 > maintenance mode (i.e., rebooting the node with shutdown policy migrate= ) > does call manage_resources and thus run_workers, and will proceed once > the last worker has exited. >=20 > reported by a user: >=20 > https://forum.proxmox.com/threads/lrm-hangs-when-updating-while-migrati= on-is-running.108628 >=20 > Signed-off-by: Fabian Gr=C3=BCnbichler > --- > better viewed with -w ;) >=20 > src/PVE/HA/LRM.pm | 17 ++++++++--------- > 1 file changed, 8 insertions(+), 9 deletions(-) >=20 > good fix! applied, thanks!