From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id D6F4495A95 for ; Thu, 19 Jan 2023 13:39:39 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B81BE2BFEB for ; Thu, 19 Jan 2023 13:39:39 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 19 Jan 2023 13:39:39 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id DDCC4446F2 for ; Thu, 19 Jan 2023 13:39:38 +0100 (CET) From: Friedrich Weber To: pve-devel@lists.proxmox.com Date: Thu, 19 Jan 2023 13:39:02 +0100 Message-Id: <20230119123902.745440-1-f.weber@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [lxc.pm] Subject: [pve-devel] [RFC container] fix: shutdown: if lxc-stop fails, wait for socket closing with timeout X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jan 2023 12:39:39 -0000 When trying to shutdown a hung container with `forceStop=0` (e.g. via the Web UI), the shutdown task may run indefinitely while holding a lock on the container config. The reason is that the shutdown subroutine waits for the LXC command socket to close, even if the `lxc-stop` command has failed due to timeout. This prevents other tasks (such as a stop task) from acquiring the lock. In order to stop the container, the shutdown task has to be explicitly killed first, which is inconvenient. This occurs e.g. when trying to shutdown a hung CentOS 7 container (with systemd --- I stumbled upon the hanging CentOS 7 container shutdown task while looking into #4474. However, it is quite the edge case and only slightly inconvenient, so I'm not sure whether it needs to be addressed -- and if it needs to be addressed, I'm not sure whether the attached fix is the way to go. :) So I'm submitting it as an RFC. Let me know what you think. src/PVE/LXC.pm | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index ce6d5a5..9b3cd64 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -2473,11 +2473,21 @@ sub vm_stop { } eval { run_command($cmd, timeout => $shutdown_timeout) }; + + my $result = 1; + my $wait = sub { $result = <$sock>; }; + + # Wait until the command socket is closed. + # In case the lxc-stop call failed, reading from the command socket may block forever, + # so read with another timeout to avoid freezing the shutdown task. if (my $err = $@) { - warn $@ if $@; - } + warn $err if $err; - my $result = <$sock>; + eval { PVE::Tools::run_with_timeout($shutdown_timeout, $wait); }; + warn "read from command socket failed: $@" if $@; + } else { + $wait->(); + } return if !defined $result; # monitor is gone and the ct has stopped. die "container did not stop\n"; -- 2.30.2