From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id B4D661FF14C for ; Fri, 15 May 2026 12:13:35 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 17E8C17BED; Fri, 15 May 2026 12:13:33 +0200 (CEST) Message-ID: Date: Fri, 15 May 2026 12:12:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds To: Dominik Csapak , pve-devel@lists.proxmox.com References: <20260226140752.1792378-1-d.csapak@proxmox.com> <20260226140752.1792378-4-d.csapak@proxmox.com> <6e5aaa63-60ce-4d8c-806d-b13f68327e8e@proxmox.com> <8f13f65a-abc9-4e4c-9357-9d8050f9df79@proxmox.com> Content-Language: en-US From: Fiona Ebner In-Reply-To: <8f13f65a-abc9-4e4c-9357-9d8050f9df79@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778839971702 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.009 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: LDDH2MQ4VNICECYWEWAS4QKGSOSPCFH6 X-Message-ID-Hash: LDDH2MQ4VNICECYWEWAS4QKGSOSPCFH6 X-MailFrom: f.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 15.05.26 um 11:52 AM schrieb Dominik Csapak: > On 5/13/26 5:14 PM, Fiona Ebner wrote: >> Am 26.02.26 um 3:08 PM schrieb Dominik Csapak: >>> When qmeventd detects a vm exiting, it starts 'qm cleanup'. >>> >>> Since the vm process exits is sometimes not instant, wait up to 30 >>> seconds here to start the cleanup process instead of immediately >>> aborting if the pid still exits. This prevented executing the hookscript >>> on the 'post-stop' phase when either >>> * the cleanup mechanism is still the old one >>> * the guest was powered down from inside, not via the API >>> >>> This can be reproduced by e.g. passing through a usb device, which >>> delays the qemu process exit for a few seconds (for most devices). >>> >>> Signed-off-by: Dominik Csapak >>> --- >>>   src/PVE/CLI/qm.pm | 16 ++++++++++++++-- >>>   1 file changed, 14 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm >>> index 6aff5b7a..ee3ccedd 100755 >>> --- a/src/PVE/CLI/qm.pm >>> +++ b/src/PVE/CLI/qm.pm >>> @@ -1101,7 +1101,7 @@ __PACKAGE__->register_method({ >>>               60, >>>               sub { >>>                   my $conf = PVE::QemuConfig->load_config($vmid); >>> -                my $pid = PVE::QemuServer::check_running($vmid); >>> +                my $pid = >>> PVE::QemuServer::Helpers::vm_running_locally($vmid); >>>                     # With a stop mode backup, we might run here into >>> a running vm with a backup >>>                   # lock, but this already did the cleanup and is an >>> expected state, so abort >>> @@ -1109,7 +1109,19 @@ __PACKAGE__->register_method({ >>>                   die "skipping cleanup - 'backup' lock is present >>> and vm is running again\n" >>>                       if $pid && $clean && $conf->{lock} && $conf- >>> >{lock} eq 'backup'; >>>   -                die "vm still running\n" if $pid; >>> +                # wait for some time until the QEMU process exits >>> after the QMP >>> +                # 'SHUTDOWN' event, since this might not be instant >>> +                my $timeout = 30; >>> +                my $starttime = time(); >>> +                warn "QEMU process $pid for VM $vmid still running >>> (or newly started)\n" >>> +                    if $pid; >> >> Should we maybe warn once after 10? seconds rather than instantly? Not >> sure if we should warn at all if it's expected to take a while in some >> cases. If we time out, we still get the error below. >> > > ofc i can drop the warning, but IMO getting a warning that an operation > might take longer is also helpful? Not if you then get it every time in the cases where it is fully expected. That's just noise and not helpful information, making actual unusual situations less visible. > maybe a different wording like: > > 'QEMU process [..] (or newly started), waiting up to $timeout seconds\n' > > ? > > logging only after some feels wrong since the first 10 seconds i don't > know what's going on... Even without logging in the first bit of time, you do know that the cleanup handling is happening and taking some time, why would you need more? If terminating the QEMU instance takes longer than some time, that is an unusual situation, we might want to inform people about. I guess we should include "VM cleanup:" as a prefix to the message to make it clear which operation it is for.