From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 6AFEF1FF136 for ; Mon, 23 Feb 2026 08:42:04 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2F91D47D; Mon, 23 Feb 2026 08:42:54 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771643521; x=1772248321; darn=lists.proxmox.com; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2DYCKkWAhkOXcx8dXnT85Bc/+dSHN8Y6hw+p3Wr0kqY=; b=JS6K2VmQg51mZaeIeGMtIS+NsP0TUOhZu+isqbJ/cTp5HPRUrorZQeKC/9qGPJ+fiZ sGYMEun+IE3awpqNnSQfijED8fNN+1WpUo/IqceeuklQOYqNQj1/fvRe2XjynXos/zIZ /BWf8aqSKBsqs5qi3JA0rabf0j0Or4naxqMDK0A7dVso1ReUqihjNX5gWB3FiKyRdETI U3loM/MSsxiPKh9XfBWVokDUPtk9P0YybPavXqf2eucqcTW5f+LfiRKiqs4mCZjIv587 Ga+Mpd1eiIQqPaOGTiGRg7AJT1dD/Tf/8u/u/BnsiSqIrINxzn5kaB6aLi3K0WcnCSQ+ yGLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771643521; x=1772248321; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=2DYCKkWAhkOXcx8dXnT85Bc/+dSHN8Y6hw+p3Wr0kqY=; b=WtbVxejAycBhZ9y/9LB9XQ93yaUyxdSqBQXKy9IIH9mwWaoUxoiIuhU2qHJdgXH0DC fihR4jOgIGelQ4w3+4k56DvFeXKmadVPyZS8zy458MTSJHzyD+1Vn/Cs8gXKogn8T/2K wQv9MgoYBt4g1koRYc2QNYNG2alVy0dJw0PjHl7Lv2eqP0LpaR1rWmhLOdC20ZCfSc7T ycxVsDQ+4gxOWGAVupV9fhn/rnvzIJzgQ9JGaLd5OB97JcnpIauzuTil0dJLM9wlQy0L gmwuZXEespAMlqVDNqRd+5iugvyxZkGB1xghzUcKi0qnw+i9QCre13QOlazsTz6+JEBB bHMw== X-Forwarded-Encrypted: i=1; AJvYcCUldPxb2aRU804otUBOZ1Y7T15ADMoAshLrCEax7VGIend8bvj7YwV2UKqrqvT4ZVTm6+DGpiZpbcc=@lists.proxmox.com X-Gm-Message-State: AOJu0Yy2Icp2T8mGNOqODyyP74G93EJ23c6gMw91sptbZ8Hs27Q1k71p 5/Drg/cRDDA59kn6k1TayZcRKtXbWezEP00MzjR00iYPCyPLrPxVZ8TCv4pKGpGgDvI= X-Gm-Gg: AZuq6aJyVdVfavbh6uIpl0vYWGO+Fk5lthH1yUJ5eRk3nIGR3TZODEe8+Sein1xIHoC sB19CDQEC+YPbvDf8LUzoElVYOVaSn/PIJFtqUkNqpfIAVs/LChmQPFsEyJiARH3dU+G3rQ0qg5 bs+x00rbAToME4K+RgqTeD9cEBqWBlRtxe5EA3GVwSexwH4giPnSGKBlRkgOEKOihPRE+vK/jSU feKOQYhw7rZPqe3VhHKN060feFMlz6YxZpogoGG1wnttztSJfK2DxIveT6a1FjloFLRygaEcAyz gw3JzJ+yUFpV6qta6XYmeJYVUQdwZKQqI8QCQ6kc/zqVdRbYRj/vJV59XA0vPWWTWvsVxuIa2GR coYqH/0Uq/NGEcgaE+qAHDXcyQnqvGqgvj4KGfg6jfw57/mwDwgauoAs/yjLKsdBGc3/dLhl+oh ohfgaY0wIfK0GB74OefF1+Wzendl1qwNJZEqtxZUOcvyiu2AXTeSA6 X-Received: by 2002:a05:690e:1581:20b0:64a:d671:b905 with SMTP id 956f58d0204a3-64c78d175fdmr1530473d50.42.1771643520693; Fri, 20 Feb 2026 19:12:00 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.300.41.1.7\)) Subject: Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds From: Benjamin McGuire In-Reply-To: Date: Fri, 20 Feb 2026 22:11:47 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20260210111612.2017883-1-d.csapak@proxmox.com> <7ee8d206-36fd-4ade-893b-c7c2222a8883@proxmox.com> <1770985110.nme4v4xomn.astroid@yuna.none> <9d501c98-a85c-44d4-af0e-0301b203d691@proxmox.com> <1771231158.rte62d97r5.astroid@yuna.none> <38236a30-a249-4ebe-bf89-788d67f36bd1@proxmox.com> <7bbce03b-d8d6-4459-876c-2a71257959a4@proxmox.com> <8099db49-d35a-4ab1-9e33-c82689aee016@proxmox.com> To: Dominik Csapak X-Mailer: Apple Mail (2.3864.300.41.1.7) X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record X-MailFrom: jaminmc@gmail.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation Message-ID-Hash: PYG4DDFN7KYVB237POPQTC476CSYVPW7 X-Message-ID-Hash: PYG4DDFN7KYVB237POPQTC476CSYVPW7 X-Mailman-Approved-At: Mon, 23 Feb 2026 08:42:55 +0100 CC: pve-devel@lists.proxmox.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: I=E2=80=99ve come up with a different patch where the wait happens in = the child process that runs the qm cleanup (With help from Cursor).=20 Bug #7119: When qmeventd notices a VM=E2=80=99s QMP socket closing, it immediately = splits off and runs qm cleanup. But, the QEMU process might not have = completely finished yet=E2=80=94for instance, USB passthrough might take = a few seconds to shut down after the socket closes. This makes qm = cleanup think the process is still running and stop, skipping the = post-stop hookscript, and possibly having an incomplete cleanup. To fix this, we can wait for the QEMU process to actually exit in the = child process before running qm cleanup. This works with pidfd + poll(), which: * Returns right away if the process is already dead (which is usually = the case) * Pauses for up to 10 seconds if the teardown is slow (like with USB = passthrough) * Runs in the child process, so qmeventd=E2=80=99s event loop isn=E2=80=99= t blocked, or slowed down For guest-initiated shutdowns (where terminate_client() wasn=E2=80=99t = called and no pidfd is in the client struct), we open a new pidfd using = pidfd_open(). If pidfd_open() fails (the process is already dead, or the = kernel doesn=E2=80=99t support pidfd), we skip the wait and go = ahead=E2=80=94the same as before this patch. We set the client=E2=80=99s pidfd to -1 before removing it from the hash = table to stop cleanup_client() from closing the fd twice. This way, we avoid any Perl-side polling or sleeping while holding the = VM config lock, which was the way we did it in the v2 patch. It my test, with my USB receiver passed through, It only needed to wait = for 1 - 2 seconds. So I made the timeout 10 seconds instead of 30. We = could even make it 5. But slower computers may need more time for the = USB teardown. Feb 20 21:30:53 Froto qmeventd[3129101]: read: Connection reset by peer This is where the QMP socket was closed from the guest initiated = shutdown. Feb 20 21:30:55 Froto qmeventd[3135521]: Starting cleanup for 100 2 seconds later, The cleanup was started. Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] HOOK = CALLED =E2=80=93 VM: 100 Phase: post-stop Time: 2026-02-20 21:30:56 Hookscript was started 1 second later. Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] Starting = GPU recovery (post-stop phase) Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] Detected = PCI devices for recovery: 0000:03:00.0 0000:03:00.1 0000:12:00.6 Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] Detected = Vega 10 series GPU (video): 0000:03:00.0 =E2=80=94 using light recovery = (no remove/rescan) Feb 20 21:30:56 Froto qmeventd[3135521]: [VM 100 GPU Recovery] =E2=86=92= Attempt 1/12 (Vega light mode) Feb 20 21:30:57 Froto qmeventd[3135521]: [VM 100 GPU Recovery] = SUCCESS: 0000:03:00.0 recovered on attempt 1! Feb 20 21:30:57 Froto qmeventd[3135521]: [VM 100 GPU Recovery] GPU = recovery complete Hook script complete Feb 20 21:30:57 Froto qmeventd[3135521]: Finished cleanup for 100 Cleanup complete. Also tested with no passthrough and it still works as it should. Feb 20 22:00:00 Froto qmeventd[3129101]: read: Connection reset by peer Feb 20 22:00:00 Froto qmeventd[3166503]: Starting cleanup for 300 Feb 20 22:00:00 Froto qmeventd[3166503]: [VM 300 GPU Recovery] HOOK = CALLED =E2=80=93 VM: 300 Phase: post-stop Time: 2026-02-20 22:00:00 Feb 20 22:00:00 Froto qmeventd[3166503]: [VM 300 GPU Recovery] Starting = GPU recovery (post-stop phase) Feb 20 22:00:01 Froto qmeventd[3166503]: [VM 300 GPU Recovery] No PCI = passthrough devices found. Feb 20 22:00:01 Froto qmeventd[3166503]: Finished cleanup for 300 diff --git a/src/qmeventd/qmeventd.c b/src/qmeventd/qmeventd.c index 1d9eb74a..d9d26586 100644 --- a/src/qmeventd/qmeventd.c +++ b/src/qmeventd/qmeventd.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -370,23 +371,41 @@ static void cleanup_qemu_client(struct Client = *client) { unsigned short guest =3D client->qemu.guest; char vmid[sizeof(client->qemu.vmid)]; strncpy(vmid, client->qemu.vmid, sizeof(vmid)); + + int pidfd =3D client->pidfd; + client->pidfd =3D -1; + + if (pidfd <=3D 0) { + pidfd =3D pidfd_open(client->pid, 0); + } + g_hash_table_remove(vm_clients, &vmid); // frees key, ignore errors VERBOSE_PRINT("%s: executing cleanup (graceful: %d, guest: %d)\n", = vmid, graceful, guest); =20 int pid =3D fork(); if (pid < 0) { fprintf(stderr, "fork failed: %s\n", strerror(errno)); + if (pidfd > 0) (void)close(pidfd); return; } if (pid =3D=3D 0) { - char *script =3D "/usr/sbin/qm"; + if (pidfd > 0) { + struct pollfd pfd =3D { .fd =3D pidfd, .events =3D POLLIN = }; + if (poll(&pfd, 1, 10 * 1000) < 0) { + perror("poll on pidfd"); + } + (void)close(pidfd); + } =20 + char *script =3D "/usr/sbin/qm"; char *args[] =3D {script, "cleanup", vmid, graceful ? "1" : = "0", guest ? "1" : "0", NULL}; - execvp(script, args); perror("execvp"); _exit(1); } + + // parent + if (pidfd > 0) (void)close(pidfd); } =20 void cleanup_client(struct Client *client) { > On Feb 20, 2026, at 9:51=E2=80=AFAM, Dominik Csapak = wrote: >=20 >=20 >=20 > On 2/20/26 3:30 PM, Fiona Ebner wrote: >> Am 20.02.26 um 10:36 AM schrieb Dominik Csapak: >>> On 2/19/26 2:27 PM, Fiona Ebner wrote: >>>> Am 19.02.26 um 11:15 AM schrieb Dominik Csapak: >>>>> On 2/16/26 10:15 AM, Fiona Ebner wrote: >>>>>> Am 16.02.26 um 9:42 AM schrieb Fabian Gr=C3=BCnbichler: >>>>>>> On February 13, 2026 2:16 pm, Fiona Ebner wrote: >>>>>>=20 >>>>>> I guess the actual need is to have more consistent behavior. >>>>>>=20 >>>>>=20 >>>>> ok so i think we'd need to >>>>> * create a cleanup flag for each vm when qmevent detects a vm = shutting >>>>> down (in /var/run/qemu-server/VMID.cleanup, possibly with = timestamp) >>>>> * removing that cleanup flag after cleanup (obviously) >>>>> * on start, check for that flag and block for some timeout before >>>>> starting (e.g. check the timestamp in the flag if it's longer than = some >>>>> time, start it regardless?) >>>>=20 >>>> Sounds good to me. >>>>=20 The qmeventd.c patch makes this unnecessary, as the process is already = closed. >>>> Unfortunately, something else: turns out that we kinda rely on = qmeventd >>>> not doing the cleanup for the optimization with keeping the volumes >>>> active (i.e. $keepActive). And actually, the optimization applies >>>> randomly depending on who wins the race. >>>>=20 >>>> Output below with added log line >>>> "doing cleanup for $vmid with keepActive=3D$keepActive" >>>> in vm_stop_cleanup() to be able to see what happens. >>>>=20 >>>> We try to use the optimization but qmeventd interferes: >>>>=20 >>>>> Feb 19 14:09:43 pve9a1 vzdump[168878]: starting task >>>>> UPID:pve9a1:000293AF:0017CFF8:69970B97:vzdump:102:root@pam: >>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: starting new backup = job: >>>>> vzdump 102 --storage pbs --mode stop >>>>> Feb 19 14:09:43 pve9a1 vzdump[168879]: INFO: Starting Backup of VM >>>>> 102 (qemu) >>>>> Feb 19 14:09:44 pve9a1 qm[168960]: shutdown VM 102: >>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam: >>>>> Feb 19 14:09:44 pve9a1 qm[168959]: starting task >>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam: >>>>> Feb 19 14:09:47 pve9a1 qm[168960]: VM 102 qga command failed - VM = 102 >>>>> qga command 'guest-ping' failed - got timeout >>>>> Feb 19 14:09:50 pve9a1 qmeventd[166736]: read: Connection reset by = peer >>>>> Feb 19 14:09:50 pve9a1 pvedaemon[166884]: end task >>>>> UPID:pve9a1:000290CD:0017B515:69970B52:vncproxy:102:root@pam: OK >>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Deactivated = successfully. >>>>> Feb 19 14:09:50 pve9a1 systemd[1]: 102.scope: Consumed 41.780s CPU >>>>> time, 1.9G memory peak. >>>>> Feb 19 14:09:51 pve9a1 qm[168960]: doing cleanup for 102 with >>>>> keepActive=3D1 >>>>> Feb 19 14:09:51 pve9a1 qm[168959]: end task >>>>> UPID:pve9a1:00029400:0017D035:69970B98:qmshutdown:102:root@pam: OK >>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Starting cleanup for 102 >>>>> Feb 19 14:09:51 pve9a1 qm[168986]: doing cleanup for 102 with >>>>> keepActive=3D0 >>>>> Feb 19 14:09:51 pve9a1 qmeventd[168986]: Finished cleanup for 102 >>>>> Feb 19 14:09:51 pve9a1 systemd[1]: Started 102.scope. >>>>> Feb 19 14:09:51 pve9a1 vzdump[168879]: VM 102 started with PID = 169021. >>>>=20 >>>> We manage to get the optimization: >>>>=20 >>>>> Feb 19 14:16:01 pve9a1 qm[174585]: shutdown VM 102: >>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam: >>>>> Feb 19 14:16:04 pve9a1 qm[174585]: VM 102 qga command failed - VM = 102 >>>>> qga command 'guest-ping' failed - got timeout >>>>> Feb 19 14:16:07 pve9a1 qmeventd[166736]: read: Connection reset by = peer >>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Deactivated = successfully. >>>>> Feb 19 14:16:07 pve9a1 systemd[1]: 102.scope: Consumed 46.363s CPU >>>>> time, 2G memory peak. >>>>> Feb 19 14:16:08 pve9a1 qm[174585]: doing cleanup for 102 with >>>>> keepActive=3D1 >>>>> Feb 19 14:16:08 pve9a1 qm[174582]: end task >>>>> UPID:pve9a1:0002A9F9:0018636B:69970D11:qmshutdown:102:root@pam: OK >>>>> Feb 19 14:16:08 pve9a1 systemd[1]: Started 102.scope. >>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: Starting cleanup for 102 >>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: trying to acquire lock... >>>>> Feb 19 14:16:08 pve9a1 vzdump[174326]: VM 102 started with PID = 174718. >>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: OK >>>>> Feb 19 14:16:08 pve9a1 qmeventd[174685]: vm still running >>>>=20 >>>> For regular shutdown, we'll also do the cleanup twice. >>>>=20 >>>> Maybe we also need a way to tell qmeventd that we already did the >>>> cleanup? >>>=20 >>>=20 >>> ok well then i'd try to do something like this: >>>=20 >>> in >>>=20 >>> 'vm_stop' we'll create a cleanup flag with timestamp + state (e.g. >>> 'queued') >>>=20 >>> in vm_stop_cleanup we change/create the flag with >>> 'started' and clear the flag after cleanup >> Why is the one in vm_stop needed? Is there any advantage over = creating >> it directly in vm_stop_cleanup()? >=20 > after a bit of experimenting and re-reading the code, i think > I can simplify the logic >=20 > at the beginning of vm_stop, we create the cleanup flag > in 'qm cleanup', we only do the cleanup if the flag does not exist > in 'vm_start' we clean the flag >=20 > this should work because these parts are under a config lock anyway: > * from vm_stop to vm_stop_cleanup > * most of the qm cleanup code > * vm_start >=20 > so we only really have to mark that the cleanup was done from > the vm_stop codepath >=20 > (we have to create the flag at the beginning of vm_stop, because > then there is no race between calling it's cleanup and qmeventd > picking up the vanishing process) >=20 > does that make sense to you? By patching qmeventd.c to wait for the process to close in a child = process, by the time qm cleanup is ran, the process is closed, and the = shutdown is normal, as it is with current code when no USB device is = passed through.=20 >=20 >>> (if it's here already in 'started' state within a timelimit, ignore = it) >>>=20 >>> in vm_start we block until the cleanup flag is gone or until some = timeout >>>=20 >>> in 'qm cleanup' we only start it if the flag does not exist >> Hmm, it does also call vm_stop_cleanup() so we could just re-use the >> check there for that part? I guess doing an early check doesn't hurt >> either, as long as we do call the post-stop hook. >>> I think this should make the behavior consistent? >=20 >=20 >=20 >=20 >=20