From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id AFE647149A for ; Thu, 8 Apr 2021 08:41:43 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 98985169A3 for ; Thu, 8 Apr 2021 08:41:13 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id D01421698D for ; Thu, 8 Apr 2021 08:41:12 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 917A141F97 for ; Thu, 8 Apr 2021 08:41:12 +0200 (CEST) Date: Thu, 08 Apr 2021 08:41:04 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox VE development discussion References: <20210407142306.29851-1-d.whyte@proxmox.com> In-Reply-To: <20210407142306.29851-1-d.whyte@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1617863129.ab2wi267fv.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.027 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, vzdump.pm] Subject: Re: [pve-devel] [PATCH pve-manager] fix #3369: auto-start vms after failed pbs backup X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Apr 2021 06:41:43 -0000 On April 7, 2021 4:23 pm, Dylan Whyte wrote: > Fixes an issue in which a VM fails to automatically restart after a > failed stop-mode backup to pbs. >=20 > Signed-off-by: Dylan Whyte > --- > PVE/VZDump.pm | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) >=20 > Notes: > 1. The 1sec time delay was needed, as the check to see if the VM is runni= ng > was still true while this code was executed (although the vm was just > about to stop) >=20 > 2. The previously used vm_status call just checks if a PID exists and > returns true if so. This also returns true when the VM is in "prelauch" > state, hence PVE::QemuServer::vmstatus was used to see the exact state > and handle the situation accordingly. Otherwise, the VM gets stuck in > prelauch state from time to time. >=20 >=20 > diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm > index fb4c8bad..1bda1f15 100644 > --- a/PVE/VZDump.pm > +++ b/PVE/VZDump.pm > @@ -23,6 +23,7 @@ use PVE::VZDump::Common; > use PVE::VZDump::Plugin; > use PVE::Tools qw(extract_param split_list); > use PVE::API2Tools; > +use PVE::QemuServer; > =20 > my @posix_filesystems =3D qw(ext3 ext4 nfs nfs4 reiserfs xfs); > =20 > @@ -1039,10 +1040,17 @@ sub exec_backup_task { > debugmsg ('info', "resume vm", $logfd); > $plugin->resume_vm ($task, $vmid); > } else { > - my $running =3D $plugin->vm_status($vmid); > - if (!$running) { > + sleep(1); I wonder where this second comes from? some kind of timeout in PBS code? > + my $vmstatus =3D PVE::QemuServer::vmstatus($vmid, 1); we don't know this is a VM? > + my $stat =3D $vmstatus->{$vmid}; > + my $status =3D $stat->{qmpstatus}; > + > + if ($status eq "stopped") { > + $plugin->start_vm ($task, $vmid); > + debugmsg ('info', "restarting vm", $logfd); > + } elsif ($status eq "prelaunch") { > + $plugin->resume_vm ($task, $vmid); this can occur if the - VM was runnning at the start of the backup, but with stop mode - a problem occured while the VM is in the prelaunch state normally, the qemu-server VZDump plugin handles resuming. but there are=20 two 'die' statements in archive_pbs that can trigger before resuming=20 happens, and restoring the power state does nothing if the VM is already=20 running. so either of those two should be fixed to handle the prelaunch=20 issue. the prelaunch issue also seems to affect VMA, although it might be=20 harder to reliably trigger an error during the initial backup start=20 window. > debugmsg ('info', "restarting vm", $logfd); > - $plugin->start_vm ($task, $vmid); > } > } > $self->run_hook_script ('post-restart', $task, $logfd); > --=20 > 2.20.1 >=20 >=20 >=20 > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >=20 >=20 >=20 =