From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 6AFEF1FF136
	for <inbox@lore.proxmox.com>; Mon, 23 Feb 2026 08:42:04 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 2F91D47D;
	Mon, 23 Feb 2026 08:42:54 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1771643521; x=1772248321;
 darn=lists.proxmox.com;
        h=to:references:message-id:content-transfer-encoding:cc:date
         :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=2DYCKkWAhkOXcx8dXnT85Bc/+dSHN8Y6hw+p3Wr0kqY=;
        b=JS6K2VmQg51mZaeIeGMtIS+NsP0TUOhZu+isqbJ/cTp5HPRUrorZQeKC/9qGPJ+fiZ
         sGYMEun+IE3awpqNnSQfijED8fNN+1WpUo/IqceeuklQOYqNQj1/fvRe2XjynXos/zIZ
         /BWf8aqSKBsqs5qi3JA0rabf0j0Or4naxqMDK0A7dVso1ReUqihjNX5gWB3FiKyRdETI
         U3loM/MSsxiPKh9XfBWVokDUPtk9P0YybPavXqf2eucqcTW5f+LfiRKiqs4mCZjIv587
         Ga+Mpd1eiIQqPaOGTiGRg7AJT1dD/Tf/8u/u/BnsiSqIrINxzn5kaB6aLi3K0WcnCSQ+
         yGLA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1771643521; x=1772248321;
        h=to:references:message-id:content-transfer-encoding:cc:date
         :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=2DYCKkWAhkOXcx8dXnT85Bc/+dSHN8Y6hw+p3Wr0kqY=;
        b=WtbVxejAycBhZ9y/9LB9XQ93yaUyxdSqBQXKy9IIH9mwWaoUxoiIuhU2qHJdgXH0DC
         fihR4jOgIGelQ4w3+4k56DvFeXKmadVPyZS8zy458MTSJHzyD+1Vn/Cs8gXKogn8T/2K
         wQv9MgoYBt4g1koRYc2QNYNG2alVy0dJw0PjHl7Lv2eqP0LpaR1rWmhLOdC20ZCfSc7T
         ycxVsDQ+4gxOWGAVupV9fhn/rnvzIJzgQ9JGaLd5OB97JcnpIauzuTil0dJLM9wlQy0L
         gmwuZXEespAMlqVDNqRd+5iugvyxZkGB1xghzUcKi0qnw+i9QCre13QOlazsTz6+JEBB
         bHMw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUldPxb2aRU804otUBOZ1Y7T15ADMoAshLrCEax7VGIend8bvj7YwV2UKqrqvT4ZVTm6+DGpiZpbcc=@lists.proxmox.com
X-Gm-Message-State: AOJu0Yy2Icp2T8mGNOqODyyP74G93EJ23c6gMw91sptbZ8Hs27Q1k71p
	5/Drg/cRDDA59kn6k1TayZcRKtXbWezEP00MzjR00iYPCyPLrPxVZ8TCv4pKGpGgDvI=
X-Gm-Gg: AZuq6aJyVdVfavbh6uIpl0vYWGO+Fk5lthH1yUJ5eRk3nIGR3TZODEe8+Sein1xIHoC
	sB19CDQEC+YPbvDf8LUzoElVYOVaSn/PIJFtqUkNqpfIAVs/LChmQPFsEyJiARH3dU+G3rQ0qg5
	bs+x00rbAToME4K+RgqTeD9cEBqWBlRtxe5EA3GVwSexwH4giPnSGKBlRkgOEKOihPRE+vK/jSU
	feKOQYhw7rZPqe3VhHKN060feFMlz6YxZpogoGG1wnttztSJfK2DxIveT6a1FjloFLRygaEcAyz
	gw3JzJ+yUFpV6qta6XYmeJYVUQdwZKQqI8QCQ6kc/zqVdRbYRj/vJV59XA0vPWWTWvsVxuIa2GR
	coYqH/0Uq/NGEcgaE+qAHDXcyQnqvGqgvj4KGfg6jfw57/mwDwgauoAs/yjLKsdBGc3/dLhl+oh
	ohfgaY0wIfK0GB74OefF1+Wzendl1qwNJZEqtxZUOcvyiu2AXTeSA6
X-Received: by 2002:a05:690e:1581:20b0:64a:d671:b905 with SMTP id
 956f58d0204a3-64c78d175fdmr1530473d50.42.1771643520693;
        Fri, 20 Feb 2026 19:12:00 -0800 (PST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.300.41.1.7\))
Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process
 exiting for up to 30 seconds
From: Benjamin McGuire <jaminmc@gmail.com>
In-Reply-To: <c1670bdd-807b-46e7-92fe-e8ecc866eea7@proxmox.com>
Date: Fri, 20 Feb 2026 22:11:47 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <A06B1824-27F4-4410-BA0A-D0FF6D6E5B21@gmail.com>
References: <20260210111612.2017883-1-d.csapak@proxmox.com>
 <7ee8d206-36fd-4ade-893b-c7c2222a8883@proxmox.com>
 <1770985110.nme4v4xomn.astroid@yuna.none>
 <9d501c98-a85c-44d4-af0e-0301b203d691@proxmox.com>
 <1771231158.rte62d97r5.astroid@yuna.none>
 <38236a30-a249-4ebe-bf89-788d67f36bd1@proxmox.com>
 <d022adf4-3bde-4440-b30e-8990592f13db@proxmox.com>
 <e7a2a15b-e992-4824-8330-c89b2bbdc9f3@proxmox.com>
 <7bbce03b-d8d6-4459-876c-2a71257959a4@proxmox.com>
 <8099db49-d35a-4ab1-9e33-c82689aee016@proxmox.com>
 <c1670bdd-807b-46e7-92fe-e8ecc866eea7@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>
X-Mailer: Apple Mail (2.3864.300.41.1.7)
X-SPAM-LEVEL: Spam detection results:  0
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
	DKIM_VALID               -0.1 Message has at least one valid DKIM or DK
 signature
	DKIM_VALID_AU            -0.1 Message has a valid DKIM or DK signature from
 author's domain
	DKIM_VALID_EF            -0.1 Message has a valid DKIM or DK signature from
 envelope-from domain
	DMARC_PASS               -0.1 DMARC pass policy
	FREEMAIL_FROM           0.001 Sender email is commonly abused enduser mail
 provider
	RCVD_IN_DNSWL_NONE     -0.0001 Sender listed at https://www.dnswl.org/,
 no trust
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
X-MailFrom: jaminmc@gmail.com
X-Mailman-Rule-Hits: nonmember-moderation
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation
Message-ID-Hash: PYG4DDFN7KYVB237POPQTC476CSYVPW7
X-Message-ID-Hash: PYG4DDFN7KYVB237POPQTC476CSYVPW7
X-Mailman-Approved-At: Mon, 23 Feb 2026 08:42:55 +0100
CC: pve-devel@lists.proxmox.com
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

I=E2=80=99ve come up with a different patch where the wait happens in =
the child process that runs the qm cleanup (With help from Cursor).=20

Bug #7119:
When qmeventd notices a VM=E2=80=99s QMP socket closing, it immediately =
splits off and runs qm cleanup. But, the QEMU process might not have =
completely finished yet=E2=80=94for instance, USB passthrough might take =
a few seconds to shut down after the socket closes. This makes qm =
cleanup think the process is still running and stop, skipping the =
post-stop hookscript, and possibly having an incomplete cleanup.

To fix this, we can wait for the QEMU process to actually exit in the =
child process before running qm cleanup.
This works with pidfd + poll(), which:
 * Returns right away if the process is already dead (which is usually =
the case)
 * Pauses for up to 10 seconds if the teardown is slow (like with USB =
passthrough)
 * Runs in the child process, so qmeventd=E2=80=99s event loop isn=E2=80=99=
t blocked, or slowed down

For guest-initiated shutdowns (where terminate_client() wasn=E2=80=99t =
called and no pidfd is in the client struct), we open a new pidfd using =
pidfd_open(). If pidfd_open() fails (the process is already dead, or the =
kernel doesn=E2=80=99t support pidfd), we skip the wait and go =
ahead=E2=80=94the same as before this patch.

We set the client=E2=80=99s pidfd to -1 before removing it from the hash =
table to stop cleanup_client() from closing the fd twice.

This way, we avoid any Perl-side polling or sleeping while holding the =
VM config lock, which was the way we did it in the v2 patch.

It my test, with my USB receiver passed through, It only needed to wait =
for 1 - 2 seconds. So I made the timeout 10 seconds instead of 30. We =
could even make it 5. But slower computers may need more time for the =
USB teardown.

Feb 20 21:30:53 Froto qmeventd[3129101]: read: Connection reset by peer
  This is where the QMP socket was closed from the guest initiated =
shutdown.
Feb 20 21:30:55 Froto qmeventd[3135521]: Starting cleanup for 100
  2 seconds later, The cleanup was started.
Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] HOOK =
CALLED =E2=80=93 VM: 100  Phase: post-stop  Time: 2026-02-20 21:30:56
Hookscript was started 1 second later.
Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] Starting =
GPU recovery (post-stop phase)
Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] Detected =
PCI devices for recovery: 0000:03:00.0 0000:03:00.1 0000:12:00.6
Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] Detected =
Vega 10 series GPU (video): 0000:03:00.0 =E2=80=94 using light recovery =
(no remove/rescan)
Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery]   =E2=86=92=
 Attempt 1/12 (Vega light mode)
Feb 20 21:30:57 Froto qmeventd[3135521]: [VM 100 GPU Recovery]   =
SUCCESS: 0000:03:00.0 recovered on attempt 1!
Feb 20 21:30:57 Froto qmeventd[3135521]: [VM 100 GPU Recovery] GPU =
recovery complete
  Hook script complete
Feb 20 21:30:57 Froto qmeventd[3135521]: Finished cleanup for 100
  Cleanup complete.


Also tested with no passthrough and it still works as it should.
Feb 20 22:00:00 Froto qmeventd[3129101]: read: Connection reset by peer
Feb 20 22:00:00 Froto qmeventd[3166503]: Starting cleanup for 300
Feb 20 22:00:00 Froto qmeventd[3166503]: [VM 300 GPU Recovery] HOOK =
CALLED =E2=80=93 VM: 300  Phase: post-stop  Time: 2026-02-20 22:00:00
Feb 20 22:00:00 Froto qmeventd[3166503]: [VM 300 GPU Recovery] Starting =
GPU recovery (post-stop phase)
Feb 20 22:00:01 Froto qmeventd[3166503]: [VM 300 GPU Recovery] No PCI =
passthrough devices found.
Feb 20 22:00:01 Froto qmeventd[3166503]: Finished cleanup for 300


diff --git a/src/qmeventd/qmeventd.c b/src/qmeventd/qmeventd.c
index 1d9eb74a..d9d26586 100644
--- a/src/qmeventd/qmeventd.c
+++ b/src/qmeventd/qmeventd.c
@@ -31,6 +31,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <poll.h>
 #include <sys/epoll.h>
 #include <sys/socket.h>
 #include <sys/types.h>
@@ -370,23 +371,41 @@ static void cleanup_qemu_client(struct Client =
*client) {
     unsigned short guest =3D client->qemu.guest;
     char vmid[sizeof(client->qemu.vmid)];
     strncpy(vmid, client->qemu.vmid, sizeof(vmid));
+
+    int pidfd =3D client->pidfd;
+    client->pidfd =3D -1;
+
+    if (pidfd <=3D 0) {
+        pidfd =3D pidfd_open(client->pid, 0);
+    }
+
     g_hash_table_remove(vm_clients, &vmid); // frees key, ignore errors
     VERBOSE_PRINT("%s: executing cleanup (graceful: %d, guest: %d)\n", =
vmid, graceful, guest);
=20
     int pid =3D fork();
     if (pid < 0) {
         fprintf(stderr, "fork failed: %s\n", strerror(errno));
+        if (pidfd > 0) (void)close(pidfd);
         return;
     }
     if (pid =3D=3D 0) {
-        char *script =3D "/usr/sbin/qm";
+        if (pidfd > 0) {
+            struct pollfd pfd =3D { .fd =3D pidfd, .events =3D POLLIN =
};
+            if (poll(&pfd, 1, 10 * 1000) < 0) {
+                perror("poll on pidfd");
+            }
+            (void)close(pidfd);
+        }
=20
+        char *script =3D "/usr/sbin/qm";
         char *args[] =3D {script, "cleanup", vmid, graceful ? "1" : =
"0", guest ? "1" : "0", NULL};
-
         execvp(script, args);
         perror("execvp");
         _exit(1);
     }
+
+    // parent
+    if (pidfd > 0) (void)close(pidfd);
 }
=20
 void cleanup_client(struct Client *client) {


> On Feb 20, 2026, at 9:51=E2=80=AFAM, Dominik Csapak =
<d.csapak@proxmox.com> wrote:
>=20
>=20
>=20
> On 2/20/26 3:30 PM, Fiona Ebner wrote:
>> Am 20.02.26 um 10:36 AM schrieb Dominik Csapak:
>>> On 2/19/26 2:27 PM, Fiona Ebner wrote:
>>>> Am 19.02.26 um 11:15 AM schrieb Dominik Csapak:
>>>>> On 2/16/26 10:15 AM, Fiona Ebner wrote:
>>>>>> Am 16.02.26 um 9:42 AM schrieb Fabian Gr=C3=BCnbichler:
>>>>>>> On February 13, 2026 2:16 pm, Fiona Ebner wrote:
>>>>>>=20
>>>>>> I guess the actual need is to have more consistent behavior.
>>>>>>=20
>>>>>=20
>>>>> ok so i think we'd need to
>>>>> * create a cleanup flag for each vm when qmevent detects a vm =
shutting
>>>>> down (in /var/run/qemu-server/VMID.cleanup, possibly with =
timestamp)
>>>>> * removing that cleanup flag after cleanup (obviously)
>>>>> * on start, check for that flag and block for some timeout before
>>>>> starting (e.g. check the timestamp in the flag if it's longer than =
some
>>>>> time, start it regardless?)
>>>>=20
>>>> Sounds good to me.
>>>>=20
The qmeventd.c patch makes this unnecessary, as the process is already =
closed.
>>>> Unfortunately, something else: turns out that we kinda rely on =
qmeventd
>>>> not doing the cleanup for the optimization with keeping the volumes
>>>> active (i.e. $keepActive). And actually, the optimization applies
>>>> randomly depending on who wins the race.
>>>>=20
>>>> Output below with added log line
>>>> "doing cleanup for $vmid with keepActive=3D$keepActive"
>>>> in vm_stop_cleanup() to be able to see what happens.
>>>>=20
>>>> We try to use the optimization but qmeventd interferes:
>>>>=20
>>>>> Feb 19 14:09:43 pve9a1 vzdump[168878]: <root@pam> starting task
>>>>> UPID:pve9a1:000293AF:0017CFF8:69970B97:vzdump:102:root@pam:
>>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: starting new backup =
job:
>>>>> vzdump 102 --storage pbs --mode stop
>>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: Starting Backup of VM
>>>>> 102 (qemu)
>>>>> Feb 19 14:09:44 pve9a1 qm[168960]: shutdown VM 102:
>>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>>>>> Feb 19 14:09:44 pve9a1 qm[168959]: <root@pam> starting task
>>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam:
>>>>> Feb 19 14:09:47 pve9a1 qm[168960]: VM 102 qga command failed - VM =
102
>>>>> qga command 'guest-ping' failed - got timeout
>>>>> Feb 19 14:09:50 pve9a1 qmeventd[166736]: read: Connection reset by =
peer
>>>>> Feb 19 14:09:50 pve9a1 pvedaemon[166884]: <root@pam> end task
>>>>> UPID:pve9a1:000290CD:0017B515:69970B52:vncproxy:102:root@pam: OK
>>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Deactivated =
successfully.
>>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Consumed 41.780s CPU
>>>>> time, 1.9G memory peak.
>>>>> Feb 19 14:09:51 pve9a1 qm[168960]: doing cleanup for 102 with
>>>>> keepActive=3D1
>>>>> Feb 19 14:09:51 pve9a1 qm[168959]: <root@pam> end task
>>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam: OK
>>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Starting cleanup for 102
>>>>> Feb 19 14:09:51 pve9a1 qm[168986]: doing cleanup for 102 with
>>>>> keepActive=3D0
>>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Finished cleanup for 102
>>>>> Feb 19 14:09:51 pve9a1 systemd[1]: Started 102.scope.
>>>>> Feb 19 14:09:51 pve9a1 vzdump[168879]: VM 102 started with PID =
169021.
>>>>=20
>>>> We manage to get the optimization:
>>>>=20
>>>>> Feb 19 14:16:01 pve9a1 qm[174585]: shutdown VM 102:
>>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam:
>>>>> Feb 19 14:16:04 pve9a1 qm[174585]: VM 102 qga command failed - VM =
102
>>>>> qga command 'guest-ping' failed - got timeout
>>>>> Feb 19 14:16:07 pve9a1 qmeventd[166736]: read: Connection reset by =
peer
>>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Deactivated =
successfully.
>>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Consumed 46.363s CPU
>>>>> time, 2G memory peak.
>>>>> Feb 19 14:16:08 pve9a1 qm[174585]: doing cleanup for 102 with
>>>>> keepActive=3D1
>>>>> Feb 19 14:16:08 pve9a1 qm[174582]: <root@pam> end task
>>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam: OK
>>>>> Feb 19 14:16:08 pve9a1 systemd[1]: Started 102.scope.
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: Starting cleanup for 102
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: trying to acquire lock...
>>>>> Feb 19 14:16:08 pve9a1 vzdump[174326]: VM 102 started with PID =
174718.
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]:  OK
>>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: vm still running
>>>>=20
>>>> For regular shutdown, we'll also do the cleanup twice.
>>>>=20
>>>> Maybe we also need a way to tell qmeventd that we already did the
>>>> cleanup?
>>>=20
>>>=20
>>> ok well then i'd try to do something like this:
>>>=20
>>> in
>>>=20
>>> 'vm_stop' we'll create a cleanup flag with timestamp + state (e.g.
>>> 'queued')
>>>=20
>>> in vm_stop_cleanup we change/create the flag with
>>> 'started' and clear the flag after cleanup
>> Why is the one in vm_stop needed? Is there any advantage over =
creating
>> it directly in vm_stop_cleanup()?
>=20
> after a bit of experimenting and re-reading the code, i think
> I can simplify the logic
>=20
> at the beginning of vm_stop, we create the cleanup flag
> in 'qm cleanup', we only do the cleanup if the flag does not exist
> in 'vm_start' we clean the flag
>=20
> this should work because these parts are under a config lock anyway:
> * from vm_stop to vm_stop_cleanup
> * most of the qm cleanup code
> * vm_start
>=20
> so we only really have to mark that the cleanup was done from
> the vm_stop codepath
>=20
> (we have to create the flag at the beginning of vm_stop, because
> then there is no race between calling it's cleanup and qmeventd
> picking up the vanishing process)
>=20
> does that make sense to you?
By patching qmeventd.c to wait for the process to close in a child =
process, by the time qm cleanup is ran, the process is closed, and the =
shutdown is normal, as it is with current code when no USB device is =
passed through.=20

>=20
>>> (if it's here already in 'started' state within a timelimit, ignore =
it)
>>>=20
>>> in vm_start we block until the cleanup flag is gone or until some =
timeout
>>>=20
>>> in 'qm cleanup' we only start it if the flag does not exist
>> Hmm, it does also call vm_stop_cleanup() so we could just re-use the
>> check there for that part? I guess doing an early check doesn't hurt
>> either, as long as we do call the post-stop hook.
>>> I think this should make the behavior consistent?
>=20
>=20
>=20
>=20
>=20