From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 9A7D11FF141 for ; Fri, 13 Feb 2026 13:19:53 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 538892194; Fri, 13 Feb 2026 13:20:37 +0100 (CET) Date: Fri, 13 Feb 2026 13:20:30 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds To: Dominik Csapak , Fiona Ebner , pve-devel@lists.proxmox.com References: <20260210111612.2017883-1-d.csapak@proxmox.com> <7ee8d206-36fd-4ade-893b-c7c2222a8883@proxmox.com> In-Reply-To: <7ee8d206-36fd-4ade-893b-c7c2222a8883@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1770985110.nme4v4xomn.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1770985229807 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.046 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: X2OOJRQHMR7FM4ZN6RMOIBXROMPIVTGP X-Message-ID-Hash: X2OOJRQHMR7FM4ZN6RMOIBXROMPIVTGP X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On February 13, 2026 1:14 pm, Fiona Ebner wrote: > Am 10.02.26 um 12:14 PM schrieb Dominik Csapak: >> When qmeventd detects a vm exiting, it starts 'qm cleanup' to cleanup >> files, executing hookscripts, etc. >>=20 >> Since the vm process exits is sometimes not instant, wait up to 30 >> seconds here to start the cleanup process instead of immediately >> aborting if the pid still exits. This prevented executing the hookscript >> on the 'post-stop' phase. >>=20 >> This can be easily reproduced by e.g. passing through a usb device, >> which delays the qemu process exit for a few seconds. >>=20 >> Signed-off-by: Dominik Csapak >> --- >> changes from v1: >> * use correct while condition (time() is always >=3D $starttime) >>=20 >> original comment: >>=20 >> The 30 second timeout was arbitrarily chosen, but we could probably >> start with something smaller, like 10 seconds? Could be adapted on >> applying though. >>=20 >> In my (short) tests the usb passthrough part only adds a single second, >> but i can imagine different devices on other systems could block it for >> much longer. >>=20 >> src/PVE/CLI/qm.pm | 13 ++++++++++++- >> 1 file changed, 12 insertions(+), 1 deletion(-) >>=20 >> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm >> index bdae9641..16875ed2 100755 >> --- a/src/PVE/CLI/qm.pm >> +++ b/src/PVE/CLI/qm.pm >> @@ -1101,8 +1101,19 @@ __PACKAGE__->register_method({ >> 60, >> sub { >> my $conf =3D PVE::QemuConfig->load_config($vmid); >> + >> + # wait for some timeout until vm process exits, since t= his might not be instant >=20 > s/timeout/time/ >=20 > Nit: s/vm/the QEMU/ >=20 > Maybe add "after the QMP 'SHUTDOWN' event"? >=20 >> + my $timeout =3D 30; >> + my $starttime =3D time(); >> my $pid =3D PVE::QemuServer::check_running($vmid); >> - die "vm still running\n" if $pid; >> + warn "vm still running - waiting up to $timeout seconds= \n" if $pid; >=20 > While we're at it, we could improve the message here. Something like > 'QEMU process $pid for VM $vmid still running (or newly started)' > Having the PID is nice info for developers/support engineers and the > case where a new instance is started before the cleanup was done is also > possible. >=20 > In fact, the case with the new instance is easily triggered by 'stop' > mode backups. Maybe we should fix that up first before adding a timeout > here? >=20 > Feb 13 13:09:48 pve9a1 qm[92975]: end task > UPID:pve9a1:00016B30:000CDF80:698F1485:qmshutdown:102:root@pam: OK > Feb 13 13:09:48 pve9a1 systemd[1]: Started 102.scope. > Feb 13 13:09:48 pve9a1 qmeventd[93079]: Starting cleanup for 102 > Feb 13 13:09:48 pve9a1 qmeventd[93079]: trying to acquire lock... > Feb 13 13:09:48 pve9a1 vzdump[92895]: VM 102 started with PID 93116. > Feb 13 13:09:48 pve9a1 qmeventd[93079]: OK > Feb 13 13:09:48 pve9a1 qmeventd[93079]: vm still running does this mean we should actually have some sort of mechanism similar to the reboot flag to indicate a pending cleanup, and block/delay starts if it is still set?