From: Dominik Csapak <d.csapak@proxmox.com>
To: "Fiona Ebner" <f.ebner@proxmox.com>,
"Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds
Date: Fri, 20 Feb 2026 10:36:49 +0100 [thread overview]
Message-ID: <7bbce03b-d8d6-4459-876c-2a71257959a4@proxmox.com> (raw)
In-Reply-To: <e7a2a15b-e992-4824-8330-c89b2bbdc9f3@proxmox.com>
On 2/19/26 2:27 PM, Fiona Ebner wrote:
> Am 19.02.26 um 11:15 AM schrieb Dominik Csapak:
>> On 2/16/26 10:15 AM, Fiona Ebner wrote:
>>> Am 16.02.26 um 9:42 AM schrieb Fabian Grünbichler:
>>>> On February 13, 2026 2:16 pm, Fiona Ebner wrote:
>>>>> Am 13.02.26 um 1:20 PM schrieb Fabian Grünbichler:
>>>>>> On February 13, 2026 1:14 pm, Fiona Ebner wrote:
>>>>>>> Am 10.02.26 um 12:14 PM schrieb Dominik Csapak:
[snip]
>>>
>>> I guess the actual need is to have more consistent behavior.
>>>
>>
>> ok so i think we'd need to
>> * create a cleanup flag for each vm when qmevent detects a vm shutting
>> down (in /var/run/qemu-server/VMID.cleanup, possibly with timestamp)
>> * removing that cleanup flag after cleanup (obviously)
>> * on start, check for that flag and block for some timeout before
>> starting (e.g. check the timestamp in the flag if it's longer than some
>> time, start it regardless?)
>
> Sounds good to me.
>
> Unfortunately, something else: turns out that we kinda rely on qmeventd
> not doing the cleanup for the optimization with keeping the volumes
> active (i.e. $keepActive). And actually, the optimization applies
> randomly depending on who wins the race.
>
> Output below with added log line
> "doing cleanup for $vmid with keepActive=$keepActive"
> in vm_stop_cleanup() to be able to see what happens.
>
> We try to use the optimization but qmeventd interferes:
>
>> Feb 19 14:09:43 pve9a1 vzdump[168878]: <root@pam> starting task UPID:pve9a1:000293AF:0017CFF8:69970B97:vzdump:102:root@pam:
>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: starting new backup job: vzdump 102 --storage pbs --mode stop
>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: Starting Backup of VM 102 (qemu)
>> Feb 19 14:09:44 pve9a1 qm[168960]: shutdown VM 102: UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>> Feb 19 14:09:44 pve9a1 qm[168959]: <root@pam> starting task UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>> Feb 19 14:09:47 pve9a1 qm[168960]: VM 102 qga command failed - VM 102 qga command 'guest-ping' failed - got timeout
>> Feb 19 14:09:50 pve9a1 qmeventd[166736]: read: Connection reset by peer
>> Feb 19 14:09:50 pve9a1 pvedaemon[166884]: <root@pam> end task UPID:pve9a1:000290CD:0017B515:69970B52:vncproxy:102:root@pam: OK
>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Deactivated successfully.
>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Consumed 41.780s CPU time, 1.9G memory peak.
>> Feb 19 14:09:51 pve9a1 qm[168960]: doing cleanup for 102 with keepActive=1
>> Feb 19 14:09:51 pve9a1 qm[168959]: <root@pam> end task UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam: OK
>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Starting cleanup for 102
>> Feb 19 14:09:51 pve9a1 qm[168986]: doing cleanup for 102 with keepActive=0
>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Finished cleanup for 102
>> Feb 19 14:09:51 pve9a1 systemd[1]: Started 102.scope.
>> Feb 19 14:09:51 pve9a1 vzdump[168879]: VM 102 started with PID 169021.
>
> We manage to get the optimization:
>
>> Feb 19 14:16:01 pve9a1 qm[174585]: shutdown VM 102: UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam:
>> Feb 19 14:16:04 pve9a1 qm[174585]: VM 102 qga command failed - VM 102 qga command 'guest-ping' failed - got timeout
>> Feb 19 14:16:07 pve9a1 qmeventd[166736]: read: Connection reset by peer
>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Deactivated successfully.
>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Consumed 46.363s CPU time, 2G memory peak.
>> Feb 19 14:16:08 pve9a1 qm[174585]: doing cleanup for 102 with keepActive=1
>> Feb 19 14:16:08 pve9a1 qm[174582]: <root@pam> end task UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam: OK
>> Feb 19 14:16:08 pve9a1 systemd[1]: Started 102.scope.
>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: Starting cleanup for 102
>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: trying to acquire lock...
>> Feb 19 14:16:08 pve9a1 vzdump[174326]: VM 102 started with PID 174718.
>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: OK
>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: vm still running
>
> For regular shutdown, we'll also do the cleanup twice.
>
> Maybe we also need a way to tell qmeventd that we already did the cleanup?
ok well then i'd try to do something like this:
in
'vm_stop' we'll create a cleanup flag with timestamp + state (e.g. 'queued')
in vm_stop_cleanup we change/create the flag with
'started' and clear the flag after cleanup
(if it's here already in 'started' state within a timelimit, ignore it)
in vm_start we block until the cleanup flag is gone or until some timeout
in 'qm cleanup' we only start it if the flag does not exist
I think this should make the behavior consistent?
next prev parent reply other threads:[~2026-02-20 9:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 11:15 Dominik Csapak
2026-02-12 20:33 ` Benjamin McGuire
2026-02-13 11:40 ` Fabian Grünbichler
2026-02-13 12:14 ` Fiona Ebner
2026-02-13 12:20 ` Fabian Grünbichler
2026-02-13 13:16 ` Fiona Ebner
2026-02-16 8:42 ` Fabian Grünbichler
2026-02-16 9:15 ` Fiona Ebner
2026-02-19 10:15 ` Dominik Csapak
2026-02-19 13:27 ` Fiona Ebner
2026-02-20 9:36 ` Dominik Csapak [this message]
2026-02-20 14:30 ` Fiona Ebner
2026-02-20 14:51 ` Dominik Csapak
2026-02-13 12:22 ` Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7bbce03b-d8d6-4459-876c-2a71257959a4@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox