public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server v3 3/5] await and kill lingering KVM thread when VM start reaches timeout
Date: Wed, 21 Dec 2022 12:14:41 +0100	[thread overview]
Message-ID: <1671620408.1e4qgx6uw7.astroid@yuna.none> (raw)
In-Reply-To: <20221216133655.510957-4-d.tschlatscher@proxmox.com>

On December 16, 2022 2:36 pm, Daniel Tschlatscher wrote:
> In some cases the VM API start method would return before the detached
> KVM process would have exited. This is especially problematic with HA,
> because the HA manager would think the VM started successfully, later
> see that it exited and start it again in an endless loop.
> 
> Moreover, another case exists when resuming a hibernated VM. In this
> case, the qemu thread will attempt to load the whole vmstate into
> memory before exiting.
> Depending on vmstate size, disk read speed, and similar factors this
> can take quite a while though and it is not possible to start the VM
> normally during this time.
> 
> To get around this, this patch intercepts the error, looks whether a
> corresponding KVM thread is still running, and waits for/kills it,
> before continuing.
> 
> Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
> ---
> 
> Changes from v2:
> * Rebased to current master
> * Changed warn to use 'log_warn' instead
> * Reworded log message when waiting for lingering qemu process
> 
>  PVE/QemuServer.pm | 40 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2adbe3a..f63dc3f 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -5884,15 +5884,41 @@ sub vm_start_nolock {
>  		$tpmpid = start_swtpm($storecfg, $vmid, $tpm, $migratedfrom);
>  	    }
>  
> -	    my $exitcode = run_command($cmd, %run_params);
> -	    if ($exitcode) {
> -		if ($tpmpid) {
> -		    warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup error\n";
> -		    kill 'TERM', $tpmpid;
> +	    eval {
> +		my $exitcode = run_command($cmd, %run_params);
> +
> +		if ($exitcode) {
> +		    if ($tpmpid) {
> +			log_warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup
error\n";

this warn -> log_warn change kind of slipped in, it's not really part of this
patch?

> +			kill 'TERM', $tpmpid;
> +		    }
> +		    die "QEMU exited with code $exitcode\n";
>  		}
> -		die "QEMU exited with code $exitcode\n";
> +	    };
> +
> +	    if (my $err = $@) {
> +		my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid);
> +
> +		if ($pid ne "") {

can be combined:
if (my $pid = ...) {

}

(empty string evaluates to false in perl ;))

> +		    my $count = 0;
> +		    my $timeout = 300;
> +
> +		    print "Waiting $timeout seconds for detached qemu process $pid to exit\n";
> +		    while (($count < $timeout) &&
> +			PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
> +			$count++;
> +			sleep(1);
> +		    }
> +

either here

> +		    if ($count >= $timeout) {
> +			log_warn "Reached timeout. Terminating now with SIGKILL\n";

or here, recheck that VM is still running and still has the same PID, and log
accordingly instead of KILLing if not..

the same is also true in _do_vm_stop

> +			kill(9, $pid);
> +		    }
> +		}
> +
> +		die $err;
>  	    }
> -	};
> +	}
>      };
>  
>      if ($conf->{hugepages}) {
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




  reply	other threads:[~2022-12-21 11:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-16 13:36 [pve-devel] [PATCH common/qemu-server/manager v3] fix #3502: VM start timeout config parameter Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH common v3 1/5] VM start timeout config parameter in backend Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH qemu-server v3 2/5] expose VM start timeout config setting in API Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH qemu-server v3 3/5] await and kill lingering KVM thread when VM start reaches timeout Daniel Tschlatscher
2022-12-21 11:14   ` Fabian Grünbichler [this message]
2022-12-22 12:58     ` Daniel Tschlatscher
2022-12-22 13:20       ` Fabian Grünbichler
2022-12-22 14:22         ` Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH qemu-server v3 4/5] make the timeout value editable when the VM is locked Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH manager v3 5/5] VM start Timeout "Options" parameter in the GUI Daniel Tschlatscher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1671620408.1e4qgx6uw7.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal