From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id A16E11FF141
	for <inbox@lore.proxmox.com>; Fri, 13 Feb 2026 13:21:37 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id B563C246D;
	Fri, 13 Feb 2026 13:22:24 +0100 (CET)
Message-ID: <5246d854-03cf-4fe2-9f01-5dffa69aa96b@proxmox.com>
Date: Fri, 13 Feb 2026 13:22:19 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process
 exiting for up to 30 seconds
To: Fiona Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
References: <20260210111612.2017883-1-d.csapak@proxmox.com>
 <7ee8d206-36fd-4ade-893b-c7c2222a8883@proxmox.com>
Content-Language: en-US
From: Dominik Csapak <d.csapak@proxmox.com>
In-Reply-To: <7ee8d206-36fd-4ade-893b-c7c2222a8883@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1770985336602
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.032 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: OVRRR23YX2QDIBKIVJM4YIBSBYJKWUPU
X-Message-ID-Hash: OVRRR23YX2QDIBKIVJM4YIBSBYJKWUPU
X-MailFrom: d.csapak@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>


On 2/13/26 1:14 PM, Fiona Ebner wrote:
> Am 10.02.26 um 12:14 PM schrieb Dominik Csapak:
>> When qmeventd detects a vm exiting, it starts 'qm cleanup' to cleanup
>> files, executing hookscripts, etc.
>>
>> Since the vm process exits is sometimes not instant, wait up to 30
>> seconds here to start the cleanup process instead of immediately
>> aborting if the pid still exits. This prevented executing the hookscript
>> on the 'post-stop' phase.
>>
>> This can be easily reproduced by e.g. passing through a usb device,
>> which delays the qemu process exit for a few seconds.
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> changes from v1:
>> * use correct while condition (time() is always >= $starttime)
>>
>> original comment:
>>
>> The 30 second timeout was arbitrarily chosen, but we could probably
>> start with something smaller, like 10 seconds? Could be adapted on
>> applying though.
>>
>> In my (short) tests the usb passthrough part only adds a single second,
>> but i can imagine different devices on other systems could block it for
>> much longer.
>>
>>   src/PVE/CLI/qm.pm | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm
>> index bdae9641..16875ed2 100755
>> --- a/src/PVE/CLI/qm.pm
>> +++ b/src/PVE/CLI/qm.pm
>> @@ -1101,8 +1101,19 @@ __PACKAGE__->register_method({
>>               60,
>>               sub {
>>                   my $conf = PVE::QemuConfig->load_config($vmid);
>> +
>> +                # wait for some timeout until vm process exits, since this might not be instant
> 
> s/timeout/time/
> 
> Nit: s/vm/the QEMU/
> 
> Maybe add "after the QMP 'SHUTDOWN' event"?
> 
>> +                my $timeout = 30;
>> +                my $starttime = time();
>>                   my $pid = PVE::QemuServer::check_running($vmid);
>> -                die "vm still running\n" if $pid;
>> +                warn "vm still running - waiting up to $timeout seconds\n" if $pid;
> 
> While we're at it, we could improve the message here. Something like
> 'QEMU process $pid for VM $vmid still running (or newly started)'
> Having the PID is nice info for developers/support engineers and the
> case where a new instance is started before the cleanup was done is also
> possible.
> 
> In fact, the case with the new instance is easily triggered by 'stop'
> mode backups. Maybe we should fix that up first before adding a timeout
> here?
> 
> Feb 13 13:09:48 pve9a1 qm[92975]: <root@pam> end task
> UPID:pve9a1:00016B30:000CDF80:698F1485:qmshutdown:102:root@pam: OK
> Feb 13 13:09:48 pve9a1 systemd[1]: Started 102.scope.
> Feb 13 13:09:48 pve9a1 qmeventd[93079]: Starting cleanup for 102
> Feb 13 13:09:48 pve9a1 qmeventd[93079]: trying to acquire lock...
> Feb 13 13:09:48 pve9a1 vzdump[92895]: VM 102 started with PID 93116.
> Feb 13 13:09:48 pve9a1 qmeventd[93079]:  OK
> Feb 13 13:09:48 pve9a1 qmeventd[93079]: vm still running
> 

Sounds good, one possibility would be to do no cleanup at all when doing
a stop mode backup?
We already know we'll need the resources (pid/socket/etc. files, 
vgpus,...) again?

Or is there some situation where that might not be the case?

> 
>> +
>> +                while ($pid && (time() - $starttime) < $timeout) {
>> +                    sleep(1);
>> +                    $pid = PVE::QemuServer::check_running($vmid);
>> +                }
>> +
>> +                die "vm still running - aborting cleanup\n" if $pid;
>>   
>>                   # Rollback already does cleanup when preparing and afterwards temporarily drops the
>>                   # lock on the configuration file to rollback the volumes. Deactivating volumes here
>