From: Dominik Csapak <d.csapak@proxmox.com>
To: "Fiona Ebner" <f.ebner@proxmox.com>,
"Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds
Date: Thu, 19 Feb 2026 11:15:33 +0100 [thread overview]
Message-ID: <d022adf4-3bde-4440-b30e-8990592f13db@proxmox.com> (raw)
In-Reply-To: <38236a30-a249-4ebe-bf89-788d67f36bd1@proxmox.com>
On 2/16/26 10:15 AM, Fiona Ebner wrote:
> Am 16.02.26 um 9:42 AM schrieb Fabian Grünbichler:
>> On February 13, 2026 2:16 pm, Fiona Ebner wrote:
>>> Am 13.02.26 um 1:20 PM schrieb Fabian Grünbichler:
>>>> On February 13, 2026 1:14 pm, Fiona Ebner wrote:
>>>>> Am 10.02.26 um 12:14 PM schrieb Dominik Csapak:
>>>>>> + my $timeout = 30;
>>>>>> + my $starttime = time();
>>>>>> my $pid = PVE::QemuServer::check_running($vmid);
>>>>>> - die "vm still running\n" if $pid;
>>>>>> + warn "vm still running - waiting up to $timeout seconds\n" if $pid;
>>>>>
>>>>> While we're at it, we could improve the message here. Something like
>>>>> 'QEMU process $pid for VM $vmid still running (or newly started)'
>>>>> Having the PID is nice info for developers/support engineers and the
>>>>> case where a new instance is started before the cleanup was done is also
>>>>> possible.
>>>>>
>>>>> In fact, the case with the new instance is easily triggered by 'stop'
>>>>> mode backups. Maybe we should fix that up first before adding a timeout
>>>>> here?
>>>>>
>>>>> Feb 13 13:09:48 pve9a1 qm[92975]: <root@pam> end task
>>>>> UPID:pve9a1:00016B30:000CDF80:698F1485:qmshutdown:102:root@pam: OK
>>>>> Feb 13 13:09:48 pve9a1 systemd[1]: Started 102.scope.
>>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]: Starting cleanup for 102
>>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]: trying to acquire lock...
>>>>> Feb 13 13:09:48 pve9a1 vzdump[92895]: VM 102 started with PID 93116.
>>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]: OK
>>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]: vm still running
>>>>
>>>> does this mean we should actually have some sort of mechanism similar to
>>>> the reboot flag to indicate a pending cleanup, and block/delay starts if
>>>> it is still set?
>>>
>>> Blocking/delaying starts is not what happens for the reboot flag/file:
>>
>> that's not what I meant, the similarity was just "have a flag", not
>> "have a flag that behaves identical" ;)
>>
>> my proposal was:
>> - add a flag that indicates cleanup is pending (similar to reboot is
>> pending)
>> - *handle that flag* in the start flow to wait for the cleanup to be
>> done before starting
>
> Shouldn't we change the reboot flag to also do this?
>
>>>> Feb 13 14:00:16 pve9a1 qm[124470]: <root@pam> starting task UPID:pve9a1:0001E639:001180FE:698F2060:qmreboot:102:root@pam:
>>>> Feb 13 14:00:16 pve9a1 qm[124472]: <root@pam> starting task UPID:pve9a1:0001E63A:0011811E:698F2060:qmstart:102:root@pam:
>>>> Feb 13 14:00:16 pve9a1 qm[124474]: start VM 102: UPID:pve9a1:0001E63A:0011811E:698F2060:qmstart:102:root@pam:
>>>> [...]
>>>> Feb 13 14:00:22 pve9a1 systemd[1]: 102.scope: Deactivated successfully.
>>>> Feb 13 14:00:22 pve9a1 systemd[1]: 102.scope: Consumed 2min 3.333s CPU time, 2G memory peak.
>>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]: Starting cleanup for 102
>>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]: trying to acquire lock...
>>>> Feb 13 14:00:23 pve9a1 qm[124470]: <root@pam> end task UPID:pve9a1:0001E639:001180FE:698F2060:qmreboot:102:root@pam: OK
>>>> Feb 13 14:00:23 pve9a1 systemd[1]: Started 102.scope.
>>>> Feb 13 14:00:23 pve9a1 qm[124474]: VM 102 started with PID 124620.
>>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]: OK
>>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]: vm still running
>>>
>>> Currently, it's just indicating whether the cleanup handler should start
>>> the VM again afterwards.
>>>
>>> Am 13.02.26 um 1:22 PM schrieb Dominik Csapak:
>>>> Sounds good, one possibility would be to do no cleanup at all when doing
>>>> a stop mode backup?
>>>> We already know we'll need the resources (pid/socket/etc. files, vgpus,...) again?
>>>>
>>>> Or is there some situation where that might not be the case?
>>>
>>> We do it for reboot (if not another start task sneaks in like in my
>>> example above), and I don't see a good reason from the top of my head
>>> why 'stop' mode backup should behave differently from a reboot (for
>>> running VMs). It even applies pending changes just like a reboot right now.
>>
>> but what about external callers doing something like:
>>
>> - stop
>> - do whatever
>> - start
>>
>> in rapid (automated) succession? those would still (possibly) trigger
>> cleanup after "doing whatever" and starting the VM again already? and in
>> particular if we skip cleanup for "our" cases of stop;start it will be
>> easy to introduce sideeffects in cleanup that break such usage?
>
> I did not argue for skipping cleanup. I argued for being consistent with
> reboot where we (try to) do cleanup. I just wasn't sure it's really needed.
>
>>> I'm not sure if there is an actual need to do cleanup or if we could
>
> I guess the actual need is to have more consistent behavior.
>
ok so i think we'd need to
* create a cleanup flag for each vm when qmevent detects a vm shutting
down (in /var/run/qemu-server/VMID.cleanup, possibly with timestamp)
* removing that cleanup flag after cleanup (obviously)
* on start, check for that flag and block for some timeout before
starting (e.g. check the timestamp in the flag if it's longer than some
time, start it regardless?)
?
>>> also skip it when we are planning to spin up another instance right
>>> away. But we do it for reboot, so the "safe" variant is also doing it
>>> for 'stop' mode backup. History tells me it's been there since the
>>> reboot functionality was added:
>>> https://lists.proxmox.com/pipermail/pve-devel/2019-September/038988.html
>
>
next prev parent reply other threads:[~2026-02-19 10:14 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 11:15 Dominik Csapak
2026-02-12 20:33 ` Benjamin McGuire
2026-02-13 11:40 ` Fabian Grünbichler
2026-02-13 12:14 ` Fiona Ebner
2026-02-13 12:20 ` Fabian Grünbichler
2026-02-13 13:16 ` Fiona Ebner
2026-02-16 8:42 ` Fabian Grünbichler
2026-02-16 9:15 ` Fiona Ebner
2026-02-19 10:15 ` Dominik Csapak [this message]
2026-02-19 13:27 ` Fiona Ebner
2026-02-20 9:36 ` Dominik Csapak
2026-02-20 14:30 ` Fiona Ebner
2026-02-20 14:51 ` Dominik Csapak
2026-02-13 12:22 ` Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d022adf4-3bde-4440-b30e-8990592f13db@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox