public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: "Dominik Csapak" <d.csapak@proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds
Date: Mon, 23 Feb 2026 11:27:27 +0100	[thread overview]
Message-ID: <14d42ac4-634a-468b-8b5e-4e32fa823564@proxmox.com> (raw)
In-Reply-To: <c1670bdd-807b-46e7-92fe-e8ecc866eea7@proxmox.com>

Am 20.02.26 um 3:51 PM schrieb Dominik Csapak:
> On 2/20/26 3:30 PM, Fiona Ebner wrote:
>> Am 20.02.26 um 10:36 AM schrieb Dominik Csapak:
>>> On 2/19/26 2:27 PM, Fiona Ebner wrote:
>>>> Am 19.02.26 um 11:15 AM schrieb Dominik Csapak:
>>>>> On 2/16/26 10:15 AM, Fiona Ebner wrote:
>>>>>> Am 16.02.26 um 9:42 AM schrieb Fabian Grünbichler:
>>>>>>> On February 13, 2026 2:16 pm, Fiona Ebner wrote:
>>>>>>
>>>>>> I guess the actual need is to have more consistent behavior.
>>>>>>
>>>>>
>>>>> ok so i think we'd need to
>>>>> * create a cleanup flag for each vm when qmevent detects a vm shutting
>>>>> down (in /var/run/qemu-server/VMID.cleanup, possibly with timestamp)
>>>>> * removing that cleanup flag after cleanup (obviously)
>>>>> * on start, check for that flag and block for some timeout before
>>>>> starting (e.g. check the timestamp in the flag if it's longer than
>>>>> some
>>>>> time, start it regardless?)
>>>>
>>>> Sounds good to me.
>>>>
>>>> Unfortunately, something else: turns out that we kinda rely on qmeventd
>>>> not doing the cleanup for the optimization with keeping the volumes
>>>> active (i.e. $keepActive). And actually, the optimization applies
>>>> randomly depending on who wins the race.
>>>>
>>>> Output below with added log line
>>>> "doing cleanup for $vmid with keepActive=$keepActive"
>>>> in vm_stop_cleanup() to be able to see what happens.
>>>>
>>>> We try to use the optimization but qmeventd interferes:
>>>>
>>>>> Feb 19 14:09:43 pve9a1 vzdump[168878]: <root@pam> starting task
>>>>> UPID:pve9a1:000293AF:0017CFF8:69970B97:vzdump:102:root@pam:
>>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: starting new backup job:
>>>>> vzdump 102 --storage pbs --mode stop
>>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: Starting Backup of VM
>>>>> 102 (qemu)
>>>>> Feb 19 14:09:44 pve9a1 qm[168960]: shutdown VM 102:
>>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>>>>> Feb 19 14:09:44 pve9a1 qm[168959]: <root@pam> starting task
>>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>>>>> Feb 19 14:09:47 pve9a1 qm[168960]: VM 102 qga command failed - VM 102
>>>>> qga command 'guest-ping' failed - got timeout
>>>>> Feb 19 14:09:50 pve9a1 qmeventd[166736]: read: Connection reset by
>>>>> peer
>>>>> Feb 19 14:09:50 pve9a1 pvedaemon[166884]: <root@pam> end task
>>>>> UPID:pve9a1:000290CD:0017B515:69970B52:vncproxy:102:root@pam: OK
>>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Deactivated
>>>>> successfully.
>>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Consumed 41.780s CPU
>>>>> time, 1.9G memory peak.
>>>>> Feb 19 14:09:51 pve9a1 qm[168960]: doing cleanup for 102 with
>>>>> keepActive=1
>>>>> Feb 19 14:09:51 pve9a1 qm[168959]: <root@pam> end task
>>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam: OK
>>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Starting cleanup for 102
>>>>> Feb 19 14:09:51 pve9a1 qm[168986]: doing cleanup for 102 with
>>>>> keepActive=0
>>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Finished cleanup for 102
>>>>> Feb 19 14:09:51 pve9a1 systemd[1]: Started 102.scope.
>>>>> Feb 19 14:09:51 pve9a1 vzdump[168879]: VM 102 started with PID 169021.
>>>>
>>>> We manage to get the optimization:
>>>>
>>>>> Feb 19 14:16:01 pve9a1 qm[174585]: shutdown VM 102:
>>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam:
>>>>> Feb 19 14:16:04 pve9a1 qm[174585]: VM 102 qga command failed - VM 102
>>>>> qga command 'guest-ping' failed - got timeout
>>>>> Feb 19 14:16:07 pve9a1 qmeventd[166736]: read: Connection reset by
>>>>> peer
>>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Deactivated
>>>>> successfully.
>>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Consumed 46.363s CPU
>>>>> time, 2G memory peak.
>>>>> Feb 19 14:16:08 pve9a1 qm[174585]: doing cleanup for 102 with
>>>>> keepActive=1
>>>>> Feb 19 14:16:08 pve9a1 qm[174582]: <root@pam> end task
>>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam: OK
>>>>> Feb 19 14:16:08 pve9a1 systemd[1]: Started 102.scope.
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: Starting cleanup for 102
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: trying to acquire lock...
>>>>> Feb 19 14:16:08 pve9a1 vzdump[174326]: VM 102 started with PID 174718.
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]:  OK
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: vm still running
>>>>
>>>> For regular shutdown, we'll also do the cleanup twice.

So, to expand on this, in the qm cleanup endpoint we have:

>                 if (!$clean || $guest) {
>                     # vm was shutdown from inside the guest or crashed, doing api cleanup
>                     PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
>                 }

and the duplicate cleanup during shutdown happens when $guest evaluates
to true. We have $guest=1 when the shutdown was initiated via
'system_powerdown' and 'guest-shutdown' QMP commands and when initiated
from within the guest. We have $guest=0 with the 'quit' QMP command
which is used for hard stop.

It kinda does look like we wanted to avoid reaching vm_stop_cleanup() a
second time if not required, but we don't have the necessary information
to distinguish between guest-initiated shutdown from inside and
guest-initiated shutdown triggered from outside. I don't see a good way
to get that information from the top of my head.

That said, with the cleanup flag file, we won't even need to look at
$guest anymore.

>>>> Maybe we also need a way to tell qmeventd that we already did the
>>>> cleanup?
>>>
>>>
>>> ok well then i'd try to do something like this:
>>>
>>> in
>>>
>>> 'vm_stop' we'll create a cleanup flag with timestamp + state (e.g.
>>> 'queued')
>>>
>>> in vm_stop_cleanup we change/create the flag with
>>> 'started' and clear the flag after cleanup
>>
>> Why is the one in vm_stop needed? Is there any advantage over creating
>> it directly in vm_stop_cleanup()?
>>
> 
> after a bit of experimenting and re-reading the code, i think
> I can simplify the logic
> 
> at the beginning of vm_stop, we create the cleanup flag

You'll also need to create one in vm_reboot(), right?

> in 'qm cleanup', we only do the cleanup if the flag does not exist
> in 'vm_start' we clean the flag
> 
> this should work because these parts are under a config lock anyway:
> * from vm_stop to vm_stop_cleanup
> * most of the qm cleanup code
> * vm_start
> 
> so we only really have to mark that the cleanup was done from
> the vm_stop codepath
> 
> (we have to create the flag at the beginning of vm_stop, because
> then there is no race between calling it's cleanup and qmeventd
> picking up the vanishing process)
> 
> does that make sense to you?

Yes, sounds good :)

>>> (if it's here already in 'started' state within a timelimit, ignore it)
>>>
>>> in vm_start we block until the cleanup flag is gone or until some
>>> timeout
>>>
>>> in 'qm cleanup' we only start it if the flag does not exist
>>
>> Hmm, it does also call vm_stop_cleanup() so we could just re-use the
>> check there for that part? I guess doing an early check doesn't hurt
>> either, as long as we do call the post-stop hook.
>>
>>> I think this should make the behavior consistent?
>>
>>
> 





  parent reply	other threads:[~2026-02-23 10:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 11:15 Dominik Csapak
2026-02-12 20:33 ` Benjamin McGuire
2026-02-13 11:40 ` Fabian Grünbichler
2026-02-13 12:14 ` Fiona Ebner
2026-02-13 12:20   ` Fabian Grünbichler
2026-02-13 13:16     ` Fiona Ebner
2026-02-16  8:42       ` Fabian Grünbichler
2026-02-16  9:15         ` Fiona Ebner
2026-02-19 10:15           ` Dominik Csapak
2026-02-19 13:27             ` Fiona Ebner
2026-02-20  9:36               ` Dominik Csapak
2026-02-20 14:30                 ` Fiona Ebner
2026-02-20 14:51                   ` Dominik Csapak
2026-02-21  3:11                     ` Benjamin McGuire
2026-02-23  9:48                       ` Fiona Ebner
2026-02-23 10:27                     ` Fiona Ebner [this message]
2026-02-13 12:22   ` Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14d42ac4-634a-468b-8b5e-4e32fa823564@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal