From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 739D69690E for ; Wed, 25 Jan 2023 14:09:39 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 262FC1038F for ; Wed, 25 Jan 2023 14:09:09 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 25 Jan 2023 14:09:07 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A5B6A461A8 for ; Wed, 25 Jan 2023 14:09:07 +0100 (CET) From: Friedrich Weber To: pve-devel@lists.proxmox.com Date: Wed, 25 Jan 2023 14:07:49 +0100 Message-Id: <20230125130749.782566-1-f.weber@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 2.309 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_HI -5 Sender listed at https://www.dnswl.org/, high trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [lxc.pm] Subject: [pve-devel] [PATCH v2 container] fix: shutdown: if lxc-stop fails, wait for socket closing with timeout X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jan 2023 13:09:39 -0000 When trying to shutdown a hung container with `forceStop=0` (e.g. via the Web UI), the shutdown task may run indefinitely while holding a lock on the container config. The reason is that the shutdown subroutine waits for the LXC command socket to close, even if the `lxc-stop` command has failed due to timeout. This prevents other tasks (such as a stop task) from acquiring the lock. In order to stop the container, the shutdown task has to be explicitly killed first, which is inconvenient. This occurs e.g. when trying to shutdown a hung CentOS 7 container (with systemd --- changes since v1: * wait for command socket closing with timeout using IO::Poll instead of `run_with_timeout`, as suggested by Wolfgang src/PVE/LXC.pm | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index ce6d5a5..7cf1dcf 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -13,6 +13,7 @@ use Cwd qw(); use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY); use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED ENOSYS EEXIST); use IO::Socket::UNIX; +use IO::Poll qw(POLLIN POLLHUP); use PVE::Exception qw(raise_perm_exc); use PVE::Storage; @@ -2473,13 +2474,22 @@ sub vm_stop { } eval { run_command($cmd, timeout => $shutdown_timeout) }; + + # Wait until the command socket is closed. + # In case the lxc-stop call failed, reading from the command socket may block forever, + # so poll with another timeout to avoid freezing the shutdown task. if (my $err = $@) { - warn $@ if $@; - } + warn $err if $err; - my $result = <$sock>; + my $poll = IO::Poll->new(); + $poll->mask($sock => POLLIN | POLLHUP); # watch for input and EOF events + $poll->poll($shutdown_timeout); # IO::Poll timeout is in seconds + return if ($poll->events($sock) & POLLHUP); + } else { + my $result = <$sock>; + return if !defined $result; # monitor is gone and the ct has stopped. + } - return if !defined $result; # monitor is gone and the ct has stopped. die "container did not stop\n"; } -- 2.30.2