From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id C33981FF13E
	for <inbox@lore.proxmox.com>; Fri, 20 Feb 2026 15:51:06 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id BF27FC2D2;
	Fri, 20 Feb 2026 15:51:54 +0100 (CET)
Message-ID: <c1670bdd-807b-46e7-92fe-e8ecc866eea7@proxmox.com>
Date: Fri, 20 Feb 2026 15:51:11 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process
 exiting for up to 30 seconds
To: Fiona Ebner <f.ebner@proxmox.com>,
 =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>,
 pve-devel@lists.proxmox.com
References: <20260210111612.2017883-1-d.csapak@proxmox.com>
 <7ee8d206-36fd-4ade-893b-c7c2222a8883@proxmox.com>
 <1770985110.nme4v4xomn.astroid@yuna.none>
 <9d501c98-a85c-44d4-af0e-0301b203d691@proxmox.com>
 <1771231158.rte62d97r5.astroid@yuna.none>
 <38236a30-a249-4ebe-bf89-788d67f36bd1@proxmox.com>
 <d022adf4-3bde-4440-b30e-8990592f13db@proxmox.com>
 <e7a2a15b-e992-4824-8330-c89b2bbdc9f3@proxmox.com>
 <7bbce03b-d8d6-4459-876c-2a71257959a4@proxmox.com>
 <8099db49-d35a-4ab1-9e33-c82689aee016@proxmox.com>
Content-Language: en-US
From: Dominik Csapak <d.csapak@proxmox.com>
In-Reply-To: <8099db49-d35a-4ab1-9e33-c82689aee016@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1771599068692
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.032 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: J2CJRXQIMCT522SQNNIGDIW22574KWMZ
X-Message-ID-Hash: J2CJRXQIMCT522SQNNIGDIW22574KWMZ
X-MailFrom: d.csapak@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>


On 2/20/26 3:30 PM, Fiona Ebner wrote:
> Am 20.02.26 um 10:36 AM schrieb Dominik Csapak:
>> On 2/19/26 2:27 PM, Fiona Ebner wrote:
>>> Am 19.02.26 um 11:15 AM schrieb Dominik Csapak:
>>>> On 2/16/26 10:15 AM, Fiona Ebner wrote:
>>>>> Am 16.02.26 um 9:42 AM schrieb Fabian Grünbichler:
>>>>>> On February 13, 2026 2:16 pm, Fiona Ebner wrote:
>>>>>
>>>>> I guess the actual need is to have more consistent behavior.
>>>>>
>>>>
>>>> ok so i think we'd need to
>>>> * create a cleanup flag for each vm when qmevent detects a vm shutting
>>>> down (in /var/run/qemu-server/VMID.cleanup, possibly with timestamp)
>>>> * removing that cleanup flag after cleanup (obviously)
>>>> * on start, check for that flag and block for some timeout before
>>>> starting (e.g. check the timestamp in the flag if it's longer than some
>>>> time, start it regardless?)
>>>
>>> Sounds good to me.
>>>
>>> Unfortunately, something else: turns out that we kinda rely on qmeventd
>>> not doing the cleanup for the optimization with keeping the volumes
>>> active (i.e. $keepActive). And actually, the optimization applies
>>> randomly depending on who wins the race.
>>>
>>> Output below with added log line
>>> "doing cleanup for $vmid with keepActive=$keepActive"
>>> in vm_stop_cleanup() to be able to see what happens.
>>>
>>> We try to use the optimization but qmeventd interferes:
>>>
>>>> Feb 19 14:09:43 pve9a1 vzdump[168878]: <root@pam> starting task
>>>> UPID:pve9a1:000293AF:0017CFF8:69970B97:vzdump:102:root@pam:
>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: starting new backup job:
>>>> vzdump 102 --storage pbs --mode stop
>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: Starting Backup of VM
>>>> 102 (qemu)
>>>> Feb 19 14:09:44 pve9a1 qm[168960]: shutdown VM 102:
>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>>>> Feb 19 14:09:44 pve9a1 qm[168959]: <root@pam> starting task
>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>>>> Feb 19 14:09:47 pve9a1 qm[168960]: VM 102 qga command failed - VM 102
>>>> qga command 'guest-ping' failed - got timeout
>>>> Feb 19 14:09:50 pve9a1 qmeventd[166736]: read: Connection reset by peer
>>>> Feb 19 14:09:50 pve9a1 pvedaemon[166884]: <root@pam> end task
>>>> UPID:pve9a1:000290CD:0017B515:69970B52:vncproxy:102:root@pam: OK
>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Deactivated successfully.
>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Consumed 41.780s CPU
>>>> time, 1.9G memory peak.
>>>> Feb 19 14:09:51 pve9a1 qm[168960]: doing cleanup for 102 with
>>>> keepActive=1
>>>> Feb 19 14:09:51 pve9a1 qm[168959]: <root@pam> end task
>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam: OK
>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Starting cleanup for 102
>>>> Feb 19 14:09:51 pve9a1 qm[168986]: doing cleanup for 102 with
>>>> keepActive=0
>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Finished cleanup for 102
>>>> Feb 19 14:09:51 pve9a1 systemd[1]: Started 102.scope.
>>>> Feb 19 14:09:51 pve9a1 vzdump[168879]: VM 102 started with PID 169021.
>>>
>>> We manage to get the optimization:
>>>
>>>> Feb 19 14:16:01 pve9a1 qm[174585]: shutdown VM 102:
>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam:
>>>> Feb 19 14:16:04 pve9a1 qm[174585]: VM 102 qga command failed - VM 102
>>>> qga command 'guest-ping' failed - got timeout
>>>> Feb 19 14:16:07 pve9a1 qmeventd[166736]: read: Connection reset by peer
>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Deactivated successfully.
>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Consumed 46.363s CPU
>>>> time, 2G memory peak.
>>>> Feb 19 14:16:08 pve9a1 qm[174585]: doing cleanup for 102 with
>>>> keepActive=1
>>>> Feb 19 14:16:08 pve9a1 qm[174582]: <root@pam> end task
>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam: OK
>>>> Feb 19 14:16:08 pve9a1 systemd[1]: Started 102.scope.
>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: Starting cleanup for 102
>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: trying to acquire lock...
>>>> Feb 19 14:16:08 pve9a1 vzdump[174326]: VM 102 started with PID 174718.
>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]:  OK
>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: vm still running
>>>
>>> For regular shutdown, we'll also do the cleanup twice.
>>>
>>> Maybe we also need a way to tell qmeventd that we already did the
>>> cleanup?
>>
>>
>> ok well then i'd try to do something like this:
>>
>> in
>>
>> 'vm_stop' we'll create a cleanup flag with timestamp + state (e.g.
>> 'queued')
>>
>> in vm_stop_cleanup we change/create the flag with
>> 'started' and clear the flag after cleanup
> 
> Why is the one in vm_stop needed? Is there any advantage over creating
> it directly in vm_stop_cleanup()?
> 

after a bit of experimenting and re-reading the code, i think
I can simplify the logic

at the beginning of vm_stop, we create the cleanup flag
in 'qm cleanup', we only do the cleanup if the flag does not exist
in 'vm_start' we clean the flag

this should work because these parts are under a config lock anyway:
* from vm_stop to vm_stop_cleanup
* most of the qm cleanup code
* vm_start

so we only really have to mark that the cleanup was done from
the vm_stop codepath

(we have to create the flag at the beginning of vm_stop, because
then there is no race between calling it's cleanup and qmeventd
picking up the vanishing process)

does that make sense to you?

>> (if it's here already in 'started' state within a timelimit, ignore it)
>>
>> in vm_start we block until the cleanup flag is gone or until some timeout
>>
>> in 'qm cleanup' we only start it if the flag does not exist
> 
> Hmm, it does also call vm_stop_cleanup() so we could just re-use the
> check there for that part? I guess doing an early check doesn't hurt
> either, as long as we do call the post-stop hook.
> 
>> I think this should make the behavior consistent?
> 
>