all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH qemu-server v3 3/5] await and kill lingering KVM thread when VM start reaches timeout
Date: Thu, 22 Dec 2022 13:58:29 +0100	[thread overview]
Message-ID: <a348bbc4-e5d1-cc6d-b5bc-1e51d5358f2d@proxmox.com> (raw)
In-Reply-To: <1671620408.1e4qgx6uw7.astroid@yuna.none>



On 12/21/22 12:14, Fabian Grünbichler wrote:
> On December 16, 2022 2:36 pm, Daniel Tschlatscher wrote:
>> In some cases the VM API start method would return before the detached
>> KVM process would have exited. This is especially problematic with HA,
>> because the HA manager would think the VM started successfully, later
>> see that it exited and start it again in an endless loop.
>>
>> Moreover, another case exists when resuming a hibernated VM. In this
>> case, the qemu thread will attempt to load the whole vmstate into
>> memory before exiting.
>> Depending on vmstate size, disk read speed, and similar factors this
>> can take quite a while though and it is not possible to start the VM
>> normally during this time.
>>
>> To get around this, this patch intercepts the error, looks whether a
>> corresponding KVM thread is still running, and waits for/kills it,
>> before continuing.
>>
>> Signed-off-by: Daniel Tschlatscher <d.tschlatscher@proxmox.com>
>> ---
>>
>> Changes from v2:
>> * Rebased to current master
>> * Changed warn to use 'log_warn' instead
>> * Reworded log message when waiting for lingering qemu process
>>
>>  PVE/QemuServer.pm | 40 +++++++++++++++++++++++++++++++++-------
>>  1 file changed, 33 insertions(+), 7 deletions(-)
>>
>> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
>> index 2adbe3a..f63dc3f 100644
>> --- a/PVE/QemuServer.pm
>> +++ b/PVE/QemuServer.pm
>> @@ -5884,15 +5884,41 @@ sub vm_start_nolock {
>>  		$tpmpid = start_swtpm($storecfg, $vmid, $tpm, $migratedfrom);
>>  	    }
>>  
>> -	    my $exitcode = run_command($cmd, %run_params);
>> -	    if ($exitcode) {
>> -		if ($tpmpid) {
>> -		    warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup error\n";
>> -		    kill 'TERM', $tpmpid;
>> +	    eval {
>> +		my $exitcode = run_command($cmd, %run_params);
>> +
>> +		if ($exitcode) {
>> +		    if ($tpmpid) {
>> +			log_warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup
> error\n";
> 
> this warn -> log_warn change kind of slipped in, it's not really part of this
> patch?

Because I changed this line anyway, I changed it to log_warn as it is
imported already and, as I understood, the preferable alternative
to calling 'warn'.
Sourcing this in it's own patch seems overkill to me, or would you
rather suggest something like this should be handled in, e.g. a
file-encompassing refactoring?

> 
>> +			kill 'TERM', $tpmpid;
>> +		    }
>> +		    die "QEMU exited with code $exitcode\n";
>>  		}
>> -		die "QEMU exited with code $exitcode\n";
>> +	    };
>> +
>> +	    if (my $err = $@) {
>> +		my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid);
>> +
>> +		if ($pid ne "") {
> 
> can be combined:
> if (my $pid = ...) {
> 
> }
> 
> (empty string evaluates to false in perl ;))

Thanks for the input!

> 
>> +		    my $count = 0;
>> +		    my $timeout = 300;
>> +
>> +		    print "Waiting $timeout seconds for detached qemu process $pid to exit\n";
>> +		    while (($count < $timeout) &&
>> +			PVE::QemuServer::Helpers::vm_running_locally($vmid)) {
>> +			$count++;
>> +			sleep(1);
>> +		    }
>> +
> 
> either here
> 
>> +		    if ($count >= $timeout) {
>> +			log_warn "Reached timeout. Terminating now with SIGKILL\n";
> 
> or here, recheck that VM is still running and still has the same PID, and log
> accordingly instead of KILLing if not..
> 
> the same is also true in _do_vm_stop

Alright, I will look into it

> 
>> +			kill(9, $pid);
>> +		    }
>> +		}
>> +
>> +		die $err;
>>  	    }
>> -	};
>> +	}
>>      };
>>  
>>      if ($conf->{hugepages}) {
>> -- 
>> 2.30.2
>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 




  reply	other threads:[~2022-12-22 12:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-16 13:36 [pve-devel] [PATCH common/qemu-server/manager v3] fix #3502: VM start timeout config parameter Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH common v3 1/5] VM start timeout config parameter in backend Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH qemu-server v3 2/5] expose VM start timeout config setting in API Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH qemu-server v3 3/5] await and kill lingering KVM thread when VM start reaches timeout Daniel Tschlatscher
2022-12-21 11:14   ` Fabian Grünbichler
2022-12-22 12:58     ` Daniel Tschlatscher [this message]
2022-12-22 13:20       ` Fabian Grünbichler
2022-12-22 14:22         ` Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH qemu-server v3 4/5] make the timeout value editable when the VM is locked Daniel Tschlatscher
2022-12-16 13:36 ` [pve-devel] [PATCH manager v3 5/5] VM start Timeout "Options" parameter in the GUI Daniel Tschlatscher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a348bbc4-e5d1-cc6d-b5bc-1e51d5358f2d@proxmox.com \
    --to=d.tschlatscher@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal