From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 412D3918E8 for ; Thu, 22 Dec 2022 13:58:33 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2229F1B85D for ; Thu, 22 Dec 2022 13:58:33 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 22 Dec 2022 13:58:31 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id ED57B44023 for ; Thu, 22 Dec 2022 13:58:30 +0100 (CET) Message-ID: Date: Thu, 22 Dec 2022 13:58:29 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Content-Language: en-US To: pve-devel@lists.proxmox.com References: <20221216133655.510957-1-d.tschlatscher@proxmox.com> <20221216133655.510957-4-d.tschlatscher@proxmox.com> <1671620408.1e4qgx6uw7.astroid@yuna.none> From: Daniel Tschlatscher In-Reply-To: <1671620408.1e4qgx6uw7.astroid@yuna.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.061 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_ASCII_DIVIDERS 0.8 Email that uses ascii formatting dividers and possible spam tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -1.148 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu-server v3 3/5] await and kill lingering KVM thread when VM start reaches timeout X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2022 12:58:33 -0000 On 12/21/22 12:14, Fabian Grünbichler wrote: > On December 16, 2022 2:36 pm, Daniel Tschlatscher wrote: >> In some cases the VM API start method would return before the detached >> KVM process would have exited. This is especially problematic with HA, >> because the HA manager would think the VM started successfully, later >> see that it exited and start it again in an endless loop. >> >> Moreover, another case exists when resuming a hibernated VM. In this >> case, the qemu thread will attempt to load the whole vmstate into >> memory before exiting. >> Depending on vmstate size, disk read speed, and similar factors this >> can take quite a while though and it is not possible to start the VM >> normally during this time. >> >> To get around this, this patch intercepts the error, looks whether a >> corresponding KVM thread is still running, and waits for/kills it, >> before continuing. >> >> Signed-off-by: Daniel Tschlatscher >> --- >> >> Changes from v2: >> * Rebased to current master >> * Changed warn to use 'log_warn' instead >> * Reworded log message when waiting for lingering qemu process >> >> PVE/QemuServer.pm | 40 +++++++++++++++++++++++++++++++++------- >> 1 file changed, 33 insertions(+), 7 deletions(-) >> >> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm >> index 2adbe3a..f63dc3f 100644 >> --- a/PVE/QemuServer.pm >> +++ b/PVE/QemuServer.pm >> @@ -5884,15 +5884,41 @@ sub vm_start_nolock { >> $tpmpid = start_swtpm($storecfg, $vmid, $tpm, $migratedfrom); >> } >> >> - my $exitcode = run_command($cmd, %run_params); >> - if ($exitcode) { >> - if ($tpmpid) { >> - warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup error\n"; >> - kill 'TERM', $tpmpid; >> + eval { >> + my $exitcode = run_command($cmd, %run_params); >> + >> + if ($exitcode) { >> + if ($tpmpid) { >> + log_warn "stopping swtpm instance (pid $tpmpid) due to QEMU startup > error\n"; > > this warn -> log_warn change kind of slipped in, it's not really part of this > patch? Because I changed this line anyway, I changed it to log_warn as it is imported already and, as I understood, the preferable alternative to calling 'warn'. Sourcing this in it's own patch seems overkill to me, or would you rather suggest something like this should be handled in, e.g. a file-encompassing refactoring? > >> + kill 'TERM', $tpmpid; >> + } >> + die "QEMU exited with code $exitcode\n"; >> } >> - die "QEMU exited with code $exitcode\n"; >> + }; >> + >> + if (my $err = $@) { >> + my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid); >> + >> + if ($pid ne "") { > > can be combined: > if (my $pid = ...) { > > } > > (empty string evaluates to false in perl ;)) Thanks for the input! > >> + my $count = 0; >> + my $timeout = 300; >> + >> + print "Waiting $timeout seconds for detached qemu process $pid to exit\n"; >> + while (($count < $timeout) && >> + PVE::QemuServer::Helpers::vm_running_locally($vmid)) { >> + $count++; >> + sleep(1); >> + } >> + > > either here > >> + if ($count >= $timeout) { >> + log_warn "Reached timeout. Terminating now with SIGKILL\n"; > > or here, recheck that VM is still running and still has the same PID, and log > accordingly instead of KILLing if not.. > > the same is also true in _do_vm_stop Alright, I will look into it > >> + kill(9, $pid); >> + } >> + } >> + >> + die $err; >> } >> - }; >> + } >> }; >> >> if ($conf->{hugepages}) { >> -- >> 2.30.2 >> >> >> >> _______________________________________________ >> pve-devel mailing list >> pve-devel@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >> >> >> > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > >