public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v2 container] fix: shutdown: if lxc-stop fails, wait for socket closing with timeout
@ 2023-01-25 13:07 Friedrich Weber
  2023-02-20 12:41 ` [pve-devel] applied: " Wolfgang Bumiller
  0 siblings, 1 reply; 2+ messages in thread
From: Friedrich Weber @ 2023-01-25 13:07 UTC (permalink / raw)
  To: pve-devel

When trying to shutdown a hung container with `forceStop=0` (e.g. via
the Web UI), the shutdown task may run indefinitely while holding a lock
on the container config. The reason is that the shutdown subroutine
waits for the LXC command socket to close, even if the `lxc-stop`
command has failed due to timeout. This prevents other tasks (such as a
stop task) from acquiring the lock. In order to stop the container, the
shutdown task has to be explicitly killed first, which is inconvenient.
This occurs e.g. when trying to shutdown a hung CentOS 7 container (with
systemd <v232) in a cgroupv2 environment.

This fix imposes a timeout on the socket polling operation if the
`lxc-stop` command has failed. Behavior in case `lxc-stop` succeeds is
unchanged. This reintroduces some behavior from b1bad293. The timeout
duration is the given shutdown timeout, meaning that the final task
duration in the scenario above is twice the shutdown timeout.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---

 changes since v1:
 * wait for command socket closing with timeout using IO::Poll instead
   of `run_with_timeout`, as suggested by Wolfgang

 src/PVE/LXC.pm | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index ce6d5a5..7cf1dcf 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -13,6 +13,7 @@ use Cwd qw();
 use Fcntl qw(O_RDONLY O_WRONLY O_NOFOLLOW O_DIRECTORY);
 use Errno qw(ELOOP ENOTDIR EROFS ECONNREFUSED ENOSYS EEXIST);
 use IO::Socket::UNIX;
+use IO::Poll qw(POLLIN POLLHUP);
 
 use PVE::Exception qw(raise_perm_exc);
 use PVE::Storage;
@@ -2473,13 +2474,22 @@ sub vm_stop {
     }
 
     eval { run_command($cmd, timeout => $shutdown_timeout) };
+
+    # Wait until the command socket is closed.
+    # In case the lxc-stop call failed, reading from the command socket may block forever,
+    # so poll with another timeout to avoid freezing the shutdown task.
     if (my $err = $@) {
-	warn $@ if $@;
-    }
+	warn $err if $err;
 
-    my $result = <$sock>;
+	my $poll = IO::Poll->new();
+	$poll->mask($sock => POLLIN | POLLHUP); # watch for input and EOF events
+	$poll->poll($shutdown_timeout); # IO::Poll timeout is in seconds
+	return if ($poll->events($sock) & POLLHUP);
+    } else {
+	my $result = <$sock>;
+	return if !defined $result; # monitor is gone and the ct has stopped.
+    }
 
-    return if !defined $result; # monitor is gone and the ct has stopped.
     die "container did not stop\n";
 }
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-02-20 12:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-25 13:07 [pve-devel] [PATCH v2 container] fix: shutdown: if lxc-stop fails, wait for socket closing with timeout Friedrich Weber
2023-02-20 12:41 ` [pve-devel] applied: " Wolfgang Bumiller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal