public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server v4 3/6] await and kill lingering KVM thread when VM start reaches timeout
Date: Thu,  5 Jan 2023 11:08:34 +0100	[thread overview]
Message-ID: <20230105100837.195520-4-d.tschlatscher@proxmox.com> (raw)
In-Reply-To: <20230105100837.195520-1-d.tschlatscher@proxmox.com>

In some cases the VM API start method would return before the detached
KVM process would have exited. This is especially problematic with HA,
because the HA manager would think the VM started successfully, later
see that it exited and start it again in an endless loop.

Moreover, another case exists when resuming a hibernated VM. In this
case, the qemu thread will attempt to load the whole vmstate into
memory before exiting.
Depending on vmstate size, disk read speed, and similar factors this
can take quite a while though and it is not possible to start the VM
normally during this time.

To get around this, this patch intercepts the error, looks whether a
corresponding KVM thread is still running, and waits for/kills it,
before continuing.

Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
---
Changes from v3:
* Minor code clean up concerning the usage of "$pid" in ifs according
  to Fabian's suggestion

 PVE/QemuServer.pm | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 2a4bc75..549d666 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5881,15 +5881,39 @@ sub vm_start_nolock {
 		$tpmpid = start_swtpm($storecfg, $vmid, $tpm, $migratedfrom);
 	    }
 
-	    my $exitcode = run_command($cmd, %run_params);
-	    if ($exitcode) {
-		if ($tpmpid) {
-		    warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup error\n";
-		    kill 'TERM', $tpmpid;
+	    eval {
+		my $exitcode = run_command($cmd, %run_params);
+
+		if ($exitcode) {
+		    if ($tpmpid) {
+			log_warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup error\n";
+			kill 'TERM', $tpmpid;
+		    }
+		    die "QEMU exited with code $exitcode\n";
 		}
-		die "QEMU exited with code $exitcode\n";
+	    };
+
+	    if (my $err = $@) {
+		if (my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+		    my $count = 0;
+		    my $timeout = 300;
+
+		    print "Waiting $timeout seconds for detached qemu process $pid to exit\n";
+		    while (($count < $timeout) &&
+			PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
+			$count++;
+			sleep(1);
+		    }
+
+		    if ($count >= $timeout) {
+			log_warn "Reached timeout. Terminating now with SIGKILL\n";
+			kill(9, $pid) if PVE::QemuServer::Helpers::vm_running_locally($vmid) eq $pid;
+		    }
+		}
+
+		die $err;
 	    }
-	};
+	}
     };
 
     if ($conf->{hugepages}) {
-- 
2.30.2





  parent reply	other threads:[~2023-01-05 10:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-05 10:08 [pve-devel] [PATCH common/qemu-server/manager v4] fix #3502: VM start timeout config parameter Daniel Tschlatscher
2023-01-05 10:08 ` [pve-devel] [PATCH common v4 1/6] VM start timeout config parameter in backend Daniel Tschlatscher
2023-01-16 15:38   ` Thomas Lamprecht
2023-01-05 10:08 ` [pve-devel] [PATCH qemu-server v4 2/6] expose VM start timeout config setting in API Daniel Tschlatscher
2023-01-05 10:08 ` Daniel Tschlatscher [this message]
2023-01-05 10:08 ` [pve-devel] [PATCH qemu-server v4 4/6] re-check if VM is running and PID for KILL after timeout Daniel Tschlatscher
2023-01-05 10:08 ` [pve-devel] [PATCH qemu-server v4 5/6] make the timeout value editable when the VM is locked Daniel Tschlatscher
2023-01-05 10:08 ` [pve-devel] [PATCH manager v4 6/6] VM start Timeout "Options" parameter in the GUI Daniel Tschlatscher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230105100837.195520-4-d.tschlatscher@proxmox.com \
    --to=d.tschlatscher@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal