* [PATCH qemu-server v3 0/3] improve guest cleanup handling
@ 2026-02-26 13:51 Dominik Csapak
2026-02-26 13:52 ` [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Dominik Csapak @ 2026-02-26 13:51 UTC (permalink / raw)
To: pve-devel
this is a combination of my previous patches[0][1] (v3 because [0] was
already v2)
First we make the cleanup handling more consistent (1/3)
then we check explicitely for the backup lock to improve the error
message for stop backup mode (2/3)
and then we fix #7119 by waiting up to 30s for a possibly still running
guest to stop (e.g. this can occur when using usb passthrouh) (3/3)
changes from v2/RFC:
* use 'vm_running_locally' for getting the pid
* improve error messages
* use a 'use_old_cleanup' flag that will be auto-removed by a reboot
to signal if we can use the new cleanup logic or the old
0: https://lore.proxmox.com/pve-devel/20260210111612.2017883-1-d.csapak@proxmox.com/
1: https://lore.proxmox.com/pve-devel/20260223105542.1525232-1-d.csapak@proxmox.com/
Dominik Csapak (3):
cleanup: refactor to make cleanup flow consistent
qm cleanup: die early when encountering a running stop mode backup
fix #7119: qm cleanup: wait for process exiting for up to 30 seconds
debian/preinst | 16 +++++++++++++++
src/PVE/CLI/qm.pm | 35 +++++++++++++++++++++++++++-----
src/PVE/QemuServer.pm | 14 +++++++++++++
src/PVE/QemuServer/RunState.pm | 28 +++++++++++++++++++++++++
src/test/MigrationTest/QmMock.pm | 16 +++++++++++++++
5 files changed, 104 insertions(+), 5 deletions(-)
create mode 100755 debian/preinst
--
2.47.3
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent 2026-02-26 13:51 [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak @ 2026-02-26 13:52 ` Dominik Csapak 2026-02-27 11:44 ` Dominik Csapak 2026-05-13 15:02 ` Fiona Ebner 2026-02-26 13:52 ` [PATCH qemu-server v3 2/3] qm cleanup: die early when encountering a running stop mode backup Dominik Csapak ` (3 subsequent siblings) 4 siblings, 2 replies; 14+ messages in thread From: Dominik Csapak @ 2026-02-26 13:52 UTC (permalink / raw) To: pve-devel There are two ways a cleanup can be triggered: * When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'. * When the guest process disconnects from qmeventd, 'qm cleanup' is called, which in turn also tries to call 'vm_stop_cleanup'. Both of these happen under a qemu config lock, so there is no direct race condition that it will be called out of order, but it could happen that the 'qm cleanup' call happened in addition so cleanup was called twice. Which could be a problem when the shutdown was called with 'keepActive' which 'qm cleanup' would simply know nothing of and ignore. Also the post-stop hook might not be triggered in case e.g. a stop-mode backup was done, since that was only happening via qm cleanup and this would sometimes detect the now again running guest and abort. To improve the situation we move the exec_hookscript call at the end of vm_stop_cleanup. At this point we know the vm is stopped and we still have the config lock. To prevent a double cleanup, create a new cleanup flag on vm startup and only cleanup when this is still there, then delete it at the end. This means we can now drop the check for clean/guest shutdown to do that. Since this can only work for guests started after this logic is introduced, create an additional flag that we must use the old logic which will be cleared on reboot. After that the new logic is used. That flag will be created with a 'preinst' script, so we create it when we come from an older version before the code is actually used. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- debian/preinst | 16 ++++++++++++++++ src/PVE/CLI/qm.pm | 12 +++++++++--- src/PVE/QemuServer.pm | 14 ++++++++++++++ src/PVE/QemuServer/RunState.pm | 28 ++++++++++++++++++++++++++++ src/test/MigrationTest/QmMock.pm | 16 ++++++++++++++++ 5 files changed, 83 insertions(+), 3 deletions(-) create mode 100755 debian/preinst diff --git a/debian/preinst b/debian/preinst new file mode 100755 index 00000000..1697020f --- /dev/null +++ b/debian/preinst @@ -0,0 +1,16 @@ +#!/bin/sh + +set -e + +case "$1" in + upgrade) + if dpkg --compare-versions "$2" 'lt' '9.1.5'; then + # set use_old_cleanup flag file so the old cleanup code will be used until reboot + touch /run/qemu-server/use_old_cleanup + fi + ;; + install|abort-install) + ;; +esac + +exit 0 diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm index bdae9641..b7bc4d9f 100755 --- a/src/PVE/CLI/qm.pm +++ b/src/PVE/CLI/qm.pm @@ -1120,12 +1120,18 @@ __PACKAGE__->register_method({ } } - if (!$clean || $guest) { - # vm was shutdown from inside the guest or crashed, doing api cleanup + my $can_use_cleanup_flag = PVE::QemuServer::RunState::can_use_cleanup_flag(); + + if (!$clean || $guest || $can_use_cleanup_flag) { + # either we can use the new mechanism to check if cleanup is done, or + # vm was shutdown from inside the guest or crashed PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1); } - PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop'); + if (!$can_use_cleanup_flag) { + # if the new cleanup mechanism is used, this will be called from 'vm_stop_cleanup' + PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop'); + } $restart = eval { PVE::QemuServer::clear_reboot_request($vmid) }; warn $@ if $@; diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm index 545758dc..2f1d5c11 100644 --- a/src/PVE/QemuServer.pm +++ b/src/PVE/QemuServer.pm @@ -5817,6 +5817,8 @@ sub vm_start_nolock { syslog("info", "VM $vmid started with PID $pid."); + PVE::QemuServer::RunState::create_cleanup_flag($vmid); + if (defined(my $migrate = $res->{migrate})) { if ($migrate->{proto} eq 'tcp') { my $nodename = nodename(); @@ -6124,6 +6126,11 @@ sub cleanup_pci_devices { sub vm_stop_cleanup { my ($storecfg, $vmid, $conf, $keepActive, $apply_pending_changes, $noerr) = @_; + my $can_use_cleanup_flag = PVE::QemuServer::RunState::can_use_cleanup_flag(); + if ($can_use_cleanup_flag) { + return if !PVE::QemuServer::RunState::cleanup_flag_exists($vmid); + } + eval { PVE::QemuServer::QSD::quit($vmid) if PVE::QemuServer::Helpers::qsd_running_locally($vmid); @@ -6158,6 +6165,13 @@ sub vm_stop_cleanup { die $err if !$noerr; warn $err; } + + if ($can_use_cleanup_flag) { + # if the old cleanup mechanism is in place, this will be called by 'qm cleanup' + PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop'); + } + + PVE::QemuServer::RunState::clear_cleanup_flag($vmid); } # call only in locked context diff --git a/src/PVE/QemuServer/RunState.pm b/src/PVE/QemuServer/RunState.pm index 6a5fdbd7..e2bd4d94 100644 --- a/src/PVE/QemuServer/RunState.pm +++ b/src/PVE/QemuServer/RunState.pm @@ -6,6 +6,7 @@ use warnings; use POSIX qw(strftime); use PVE::Cluster; +use PVE::File; use PVE::RPCEnvironment; use PVE::Storage; @@ -183,4 +184,31 @@ sub vm_resume { ); } +my sub get_cleanup_path { + my ($vmid) = @_; + return "/var/run/qemu-server/$vmid.cleanup"; +} + +sub create_cleanup_flag { + my ($vmid) = @_; + PVE::File::file_set_contents(get_cleanup_path($vmid), time()); +} + +sub clear_cleanup_flag { + my ($vmid) = @_; + my $path = get_cleanup_path($vmid); + unlink $path or $! == POSIX::ENOENT or die "removing cleanup flag for $vmid failed: $!\n"; +} + +sub cleanup_flag_exists { + my ($vmid) = @_; + return -f get_cleanup_path($vmid); +} + +# checks if /run/qemu-server/use_old_cleanup exists that will be created on +# package update and cleared on bootup so we can be sure the guests were +# started recently enough +sub can_use_cleanup_flag { + !-f "/run/qemu-server/use_old_cleanup"; +} 1; diff --git a/src/test/MigrationTest/QmMock.pm b/src/test/MigrationTest/QmMock.pm index 78be47d3..4d5fa3ac 100644 --- a/src/test/MigrationTest/QmMock.pm +++ b/src/test/MigrationTest/QmMock.pm @@ -77,6 +77,22 @@ $qemu_server_helpers_module->mock( }, ); +my $qemu_server_runstate_module = Test::MockModule->new("PVE::QemuServer::RunState"); +$qemu_server_runstate_module->mock( + create_cleanup_flag => sub { + return undef; + }, + clear_cleanup_flag => sub { + return undef; + }, + cleanup_flag_exists => sub { + return 1; + }, + can_use_cleanup_flag => sub { + return 1; + }, +); + our $qemu_server_machine_module = Test::MockModule->new("PVE::QemuServer::Machine"); $qemu_server_machine_module->mock( get_current_qemu_machine => sub { -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent 2026-02-26 13:52 ` [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak @ 2026-02-27 11:44 ` Dominik Csapak 2026-05-13 15:02 ` Fiona Ebner 1 sibling, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2026-02-27 11:44 UTC (permalink / raw) To: pve-devel forgot to note in the commit notes: i used the current package version as check, if we bump before this is applied, the version has to be adapted of course ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent 2026-02-26 13:52 ` [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak 2026-02-27 11:44 ` Dominik Csapak @ 2026-05-13 15:02 ` Fiona Ebner 1 sibling, 0 replies; 14+ messages in thread From: Fiona Ebner @ 2026-05-13 15:02 UTC (permalink / raw) To: Dominik Csapak, pve-devel Am 26.02.26 um 3:07 PM schrieb Dominik Csapak: > diff --git a/debian/preinst b/debian/preinst > new file mode 100755 > index 00000000..1697020f > --- /dev/null > +++ b/debian/preinst > @@ -0,0 +1,16 @@ > +#!/bin/sh > + > +set -e > + Missing #DEBHELPER# > +case "$1" in > + upgrade) > + if dpkg --compare-versions "$2" 'lt' '9.1.5'; then As you already anticipated, a bump is needed here. > + # set use_old_cleanup flag file so the old cleanup code will be used until reboot > + touch /run/qemu-server/use_old_cleanup > + fi > + ;; > + install|abort-install) > + ;; > +esac > + > +exit 0 ---snip 8<--- > diff --git a/src/PVE/QemuServer/RunState.pm b/src/PVE/QemuServer/RunState.pm > index 6a5fdbd7..e2bd4d94 100644 > --- a/src/PVE/QemuServer/RunState.pm > +++ b/src/PVE/QemuServer/RunState.pm > @@ -6,6 +6,7 @@ use warnings; > use POSIX qw(strftime); > > use PVE::Cluster; > +use PVE::File; > use PVE::RPCEnvironment; > use PVE::Storage; > > @@ -183,4 +184,31 @@ sub vm_resume { > ); > } > > +my sub get_cleanup_path { > + my ($vmid) = @_; > + return "/var/run/qemu-server/$vmid.cleanup"; Nit: our code is already mixing /var/run/qemu-server and /run/qemu-server. I'd suggest using the latter for new usages (part of your patch already does). It's auto-created by systemd-tmpfiles now. > +} > + > +sub create_cleanup_flag { > + my ($vmid) = @_; > + PVE::File::file_set_contents(get_cleanup_path($vmid), time()); Is writing the time a precaution if we need future adaptations or is that actually used somewhere? Might be worth a comment what it's for. > +} > + > +sub clear_cleanup_flag { > + my ($vmid) = @_; > + my $path = get_cleanup_path($vmid); > + unlink $path or $! == POSIX::ENOENT or die "removing cleanup flag for $vmid failed: $!\n"; > +} > + > +sub cleanup_flag_exists { > + my ($vmid) = @_; > + return -f get_cleanup_path($vmid); > +} > + > +# checks if /run/qemu-server/use_old_cleanup exists that will be created on > +# package update and cleared on bootup so we can be sure the guests were > +# started recently enough > +sub can_use_cleanup_flag { > + !-f "/run/qemu-server/use_old_cleanup"; > +} Style nit: missing blank here > 1; > diff --git a/src/test/MigrationTest/QmMock.pm b/src/test/MigrationTest/QmMock.pm > index 78be47d3..4d5fa3ac 100644 > --- a/src/test/MigrationTest/QmMock.pm > +++ b/src/test/MigrationTest/QmMock.pm > @@ -77,6 +77,22 @@ $qemu_server_helpers_module->mock( > }, > ); > > +my $qemu_server_runstate_module = Test::MockModule->new("PVE::QemuServer::RunState"); > +$qemu_server_runstate_module->mock( > + create_cleanup_flag => sub { > + return undef; > + }, > + clear_cleanup_flag => sub { > + return undef; > + }, > + cleanup_flag_exists => sub { > + return 1; Wouldn't it be nicer to use an actual path the test may write to (i.e. the migration test run dir) and only mock get_cleanup_path() instead of the above three functions (if making get_cleanup_path() public for mocking, the name should be less generic and include 'flag'). > + }, > + can_use_cleanup_flag => sub { > + return 1; > + }, > +); > + > our $qemu_server_machine_module = Test::MockModule->new("PVE::QemuServer::Machine"); > $qemu_server_machine_module->mock( > get_current_qemu_machine => sub { ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH qemu-server v3 2/3] qm cleanup: die early when encountering a running stop mode backup 2026-02-26 13:51 [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak 2026-02-26 13:52 ` [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak @ 2026-02-26 13:52 ` Dominik Csapak 2026-05-13 15:08 ` Fiona Ebner 2026-02-26 13:52 ` [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak ` (2 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: Dominik Csapak @ 2026-02-26 13:52 UTC (permalink / raw) To: pve-devel this is an expected situation, so abort here early with a better message than 'vm still running' Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- src/PVE/CLI/qm.pm | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm index b7bc4d9f..6aff5b7a 100755 --- a/src/PVE/CLI/qm.pm +++ b/src/PVE/CLI/qm.pm @@ -1102,6 +1102,13 @@ __PACKAGE__->register_method({ sub { my $conf = PVE::QemuConfig->load_config($vmid); my $pid = PVE::QemuServer::check_running($vmid); + + # With a stop mode backup, we might run here into a running vm with a backup + # lock, but this already did the cleanup and is an expected state, so abort + # here with a good message + die "skipping cleanup - 'backup' lock is present and vm is running again\n" + if $pid && $clean && $conf->{lock} && $conf->{lock} eq 'backup'; + die "vm still running\n" if $pid; # Rollback already does cleanup when preparing and afterwards temporarily drops the -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 2/3] qm cleanup: die early when encountering a running stop mode backup 2026-02-26 13:52 ` [PATCH qemu-server v3 2/3] qm cleanup: die early when encountering a running stop mode backup Dominik Csapak @ 2026-05-13 15:08 ` Fiona Ebner 0 siblings, 0 replies; 14+ messages in thread From: Fiona Ebner @ 2026-05-13 15:08 UTC (permalink / raw) To: Dominik Csapak, pve-devel Am 26.02.26 um 3:07 PM schrieb Dominik Csapak: > this is an expected situation, so abort here early with a better message > than 'vm still running' > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> with one nit below > --- > src/PVE/CLI/qm.pm | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm > index b7bc4d9f..6aff5b7a 100755 > --- a/src/PVE/CLI/qm.pm > +++ b/src/PVE/CLI/qm.pm > @@ -1102,6 +1102,13 @@ __PACKAGE__->register_method({ > sub { > my $conf = PVE::QemuConfig->load_config($vmid); > my $pid = PVE::QemuServer::check_running($vmid); > + > + # With a stop mode backup, we might run here into a running vm with a backup > + # lock, but this already did the cleanup and is an expected state, so abort > + # here with a good message > + die "skipping cleanup - 'backup' lock is present and vm is running again\n" > + if $pid && $clean && $conf->{lock} && $conf->{lock} eq 'backup'; > + > die "vm still running\n" if $pid; Nit: both could be grouped inside an if ($pid) branch for slightly improved clarity > > # Rollback already does cleanup when preparing and afterwards temporarily drops the ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds 2026-02-26 13:51 [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak 2026-02-26 13:52 ` [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak 2026-02-26 13:52 ` [PATCH qemu-server v3 2/3] qm cleanup: die early when encountering a running stop mode backup Dominik Csapak @ 2026-02-26 13:52 ` Dominik Csapak 2026-04-15 7:14 ` Benjamin McGuire 2026-05-13 15:14 ` Fiona Ebner 2026-04-14 11:42 ` [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak 2026-05-15 10:09 ` superseded: " Dominik Csapak 4 siblings, 2 replies; 14+ messages in thread From: Dominik Csapak @ 2026-02-26 13:52 UTC (permalink / raw) To: pve-devel When qmeventd detects a vm exiting, it starts 'qm cleanup'. Since the vm process exits is sometimes not instant, wait up to 30 seconds here to start the cleanup process instead of immediately aborting if the pid still exits. This prevented executing the hookscript on the 'post-stop' phase when either * the cleanup mechanism is still the old one * the guest was powered down from inside, not via the API This can be reproduced by e.g. passing through a usb device, which delays the qemu process exit for a few seconds (for most devices). Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- src/PVE/CLI/qm.pm | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm index 6aff5b7a..ee3ccedd 100755 --- a/src/PVE/CLI/qm.pm +++ b/src/PVE/CLI/qm.pm @@ -1101,7 +1101,7 @@ __PACKAGE__->register_method({ 60, sub { my $conf = PVE::QemuConfig->load_config($vmid); - my $pid = PVE::QemuServer::check_running($vmid); + my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid); # With a stop mode backup, we might run here into a running vm with a backup # lock, but this already did the cleanup and is an expected state, so abort @@ -1109,7 +1109,19 @@ __PACKAGE__->register_method({ die "skipping cleanup - 'backup' lock is present and vm is running again\n" if $pid && $clean && $conf->{lock} && $conf->{lock} eq 'backup'; - die "vm still running\n" if $pid; + # wait for some time until the QEMU process exits after the QMP + # 'SHUTDOWN' event, since this might not be instant + my $timeout = 30; + my $starttime = time(); + warn "QEMU process $pid for VM $vmid still running (or newly started)\n" + if $pid; + + while ($pid && (time() - $starttime) < $timeout) { + sleep(1); + $pid = PVE::QemuServer::check_running($vmid); + } + + die "vm still running after timeout - aborting cleanup\n" if $pid; # Rollback already does cleanup when preparing and afterwards temporarily drops the # lock on the configuration file to rollback the volumes. Deactivating volumes here -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds 2026-02-26 13:52 ` [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak @ 2026-04-15 7:14 ` Benjamin McGuire 2026-05-13 15:14 ` Fiona Ebner 1 sibling, 0 replies; 14+ messages in thread From: Benjamin McGuire @ 2026-04-15 7:14 UTC (permalink / raw) To: Dominik Csapak; +Cc: pve-devel Tested-by: Benjamin McGuire <jaminmc@gmail.com> I’ve applied all three patches to the latest master. Here’s the commit: 473b67746639c4859f2a6838e4b8d05c7a9a3f5a. I’ve tested it with USB passthrough, shutting down the OS from within the VM, and the hook script works for all the VMs I have. I’ve tested it with Windows 11, Fedora, Ubuntu, macOS Mojave, and macOS Tahoe. My Vega 64, USB keyboard, and mouse were all passed through. The hook script now correctly runs in the cleanup stage, which fixes #7119. — Benjamin > On Feb 26, 2026, at 8:52 AM, Dominik Csapak <d.csapak@proxmox.com> wrote: > > When qmeventd detects a vm exiting, it starts 'qm cleanup'. > > Since the vm process exits is sometimes not instant, wait up to 30 > seconds here to start the cleanup process instead of immediately > aborting if the pid still exits. This prevented executing the hookscript > on the 'post-stop' phase when either > * the cleanup mechanism is still the old one > * the guest was powered down from inside, not via the API > > This can be reproduced by e.g. passing through a usb device, which > delays the qemu process exit for a few seconds (for most devices). > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > src/PVE/CLI/qm.pm | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm > index 6aff5b7a..ee3ccedd 100755 > --- a/src/PVE/CLI/qm.pm > +++ b/src/PVE/CLI/qm.pm > @@ -1101,7 +1101,7 @@ __PACKAGE__->register_method({ > 60, > sub { > my $conf = PVE::QemuConfig->load_config($vmid); > - my $pid = PVE::QemuServer::check_running($vmid); > + my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid); > > # With a stop mode backup, we might run here into a running vm with a backup > # lock, but this already did the cleanup and is an expected state, so abort > @@ -1109,7 +1109,19 @@ __PACKAGE__->register_method({ > die "skipping cleanup - 'backup' lock is present and vm is running again\n" > if $pid && $clean && $conf->{lock} && $conf->{lock} eq 'backup'; > > - die "vm still running\n" if $pid; > + # wait for some time until the QEMU process exits after the QMP > + # 'SHUTDOWN' event, since this might not be instant > + my $timeout = 30; > + my $starttime = time(); > + warn "QEMU process $pid for VM $vmid still running (or newly started)\n" > + if $pid; > + > + while ($pid && (time() - $starttime) < $timeout) { > + sleep(1); > + $pid = PVE::QemuServer::check_running($vmid); > + } > + > + die "vm still running after timeout - aborting cleanup\n" if $pid; > > # Rollback already does cleanup when preparing and afterwards temporarily drops the > # lock on the configuration file to rollback the volumes. Deactivating volumes here > -- > 2.47.3 > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds 2026-02-26 13:52 ` [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak 2026-04-15 7:14 ` Benjamin McGuire @ 2026-05-13 15:14 ` Fiona Ebner 2026-05-13 15:19 ` Fiona Ebner 2026-05-15 9:52 ` Dominik Csapak 1 sibling, 2 replies; 14+ messages in thread From: Fiona Ebner @ 2026-05-13 15:14 UTC (permalink / raw) To: Dominik Csapak, pve-devel Am 26.02.26 um 3:08 PM schrieb Dominik Csapak: > When qmeventd detects a vm exiting, it starts 'qm cleanup'. > > Since the vm process exits is sometimes not instant, wait up to 30 > seconds here to start the cleanup process instead of immediately > aborting if the pid still exits. This prevented executing the hookscript > on the 'post-stop' phase when either > * the cleanup mechanism is still the old one > * the guest was powered down from inside, not via the API > > This can be reproduced by e.g. passing through a usb device, which > delays the qemu process exit for a few seconds (for most devices). > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > src/PVE/CLI/qm.pm | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm > index 6aff5b7a..ee3ccedd 100755 > --- a/src/PVE/CLI/qm.pm > +++ b/src/PVE/CLI/qm.pm > @@ -1101,7 +1101,7 @@ __PACKAGE__->register_method({ > 60, > sub { > my $conf = PVE::QemuConfig->load_config($vmid); > - my $pid = PVE::QemuServer::check_running($vmid); > + my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid); > > # With a stop mode backup, we might run here into a running vm with a backup > # lock, but this already did the cleanup and is an expected state, so abort > @@ -1109,7 +1109,19 @@ __PACKAGE__->register_method({ > die "skipping cleanup - 'backup' lock is present and vm is running again\n" > if $pid && $clean && $conf->{lock} && $conf->{lock} eq 'backup'; > > - die "vm still running\n" if $pid; > + # wait for some time until the QEMU process exits after the QMP > + # 'SHUTDOWN' event, since this might not be instant > + my $timeout = 30; > + my $starttime = time(); > + warn "QEMU process $pid for VM $vmid still running (or newly started)\n" > + if $pid; Should we maybe warn once after 10? seconds rather than instantly? Not sure if we should warn at all if it's expected to take a while in some cases. If we time out, we still get the error below. > + > + while ($pid && (time() - $starttime) < $timeout) { > + sleep(1); > + $pid = PVE::QemuServer::check_running($vmid); Should also use the non-deprecated helper. > + } > + > + die "vm still running after timeout - aborting cleanup\n" if $pid; > > # Rollback already does cleanup when preparing and afterwards temporarily drops the > # lock on the configuration file to rollback the volumes. Deactivating volumes here ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds 2026-05-13 15:14 ` Fiona Ebner @ 2026-05-13 15:19 ` Fiona Ebner 2026-05-15 9:52 ` Dominik Csapak 1 sibling, 0 replies; 14+ messages in thread From: Fiona Ebner @ 2026-05-13 15:19 UTC (permalink / raw) To: Dominik Csapak, pve-devel Am 13.05.26 um 5:14 PM schrieb Fiona Ebner: > Am 26.02.26 um 3:08 PM schrieb Dominik Csapak: >> + die "vm still running after timeout - aborting cleanup\n" if $pid; Oh, and I would reformulate this to something like "aborting cleanup, because VM is still running after $timeout seconds\n" to mention which operation failed up front. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds 2026-05-13 15:14 ` Fiona Ebner 2026-05-13 15:19 ` Fiona Ebner @ 2026-05-15 9:52 ` Dominik Csapak 2026-05-15 10:12 ` Fiona Ebner 1 sibling, 1 reply; 14+ messages in thread From: Dominik Csapak @ 2026-05-15 9:52 UTC (permalink / raw) To: Fiona Ebner, pve-devel On 5/13/26 5:14 PM, Fiona Ebner wrote: > Am 26.02.26 um 3:08 PM schrieb Dominik Csapak: >> When qmeventd detects a vm exiting, it starts 'qm cleanup'. >> >> Since the vm process exits is sometimes not instant, wait up to 30 >> seconds here to start the cleanup process instead of immediately >> aborting if the pid still exits. This prevented executing the hookscript >> on the 'post-stop' phase when either >> * the cleanup mechanism is still the old one >> * the guest was powered down from inside, not via the API >> >> This can be reproduced by e.g. passing through a usb device, which >> delays the qemu process exit for a few seconds (for most devices). >> >> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> >> --- >> src/PVE/CLI/qm.pm | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm >> index 6aff5b7a..ee3ccedd 100755 >> --- a/src/PVE/CLI/qm.pm >> +++ b/src/PVE/CLI/qm.pm >> @@ -1101,7 +1101,7 @@ __PACKAGE__->register_method({ >> 60, >> sub { >> my $conf = PVE::QemuConfig->load_config($vmid); >> - my $pid = PVE::QemuServer::check_running($vmid); >> + my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid); >> >> # With a stop mode backup, we might run here into a running vm with a backup >> # lock, but this already did the cleanup and is an expected state, so abort >> @@ -1109,7 +1109,19 @@ __PACKAGE__->register_method({ >> die "skipping cleanup - 'backup' lock is present and vm is running again\n" >> if $pid && $clean && $conf->{lock} && $conf->{lock} eq 'backup'; >> >> - die "vm still running\n" if $pid; >> + # wait for some time until the QEMU process exits after the QMP >> + # 'SHUTDOWN' event, since this might not be instant >> + my $timeout = 30; >> + my $starttime = time(); >> + warn "QEMU process $pid for VM $vmid still running (or newly started)\n" >> + if $pid; > > Should we maybe warn once after 10? seconds rather than instantly? Not > sure if we should warn at all if it's expected to take a while in some > cases. If we time out, we still get the error below. > ofc i can drop the warning, but IMO getting a warning that an operation might take longer is also helpful? maybe a different wording like: 'QEMU process [..] (or newly started), waiting up to $timeout seconds\n' ? logging only after some feels wrong since the first 10 seconds i don't know what's going on... >> + >> + while ($pid && (time() - $starttime) < $timeout) { >> + sleep(1); >> + $pid = PVE::QemuServer::check_running($vmid); > > Should also use the non-deprecated helper. > >> + } >> + >> + die "vm still running after timeout - aborting cleanup\n" if $pid; >> >> # Rollback already does cleanup when preparing and afterwards temporarily drops the >> # lock on the configuration file to rollback the volumes. Deactivating volumes here > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds 2026-05-15 9:52 ` Dominik Csapak @ 2026-05-15 10:12 ` Fiona Ebner 0 siblings, 0 replies; 14+ messages in thread From: Fiona Ebner @ 2026-05-15 10:12 UTC (permalink / raw) To: Dominik Csapak, pve-devel Am 15.05.26 um 11:52 AM schrieb Dominik Csapak: > On 5/13/26 5:14 PM, Fiona Ebner wrote: >> Am 26.02.26 um 3:08 PM schrieb Dominik Csapak: >>> When qmeventd detects a vm exiting, it starts 'qm cleanup'. >>> >>> Since the vm process exits is sometimes not instant, wait up to 30 >>> seconds here to start the cleanup process instead of immediately >>> aborting if the pid still exits. This prevented executing the hookscript >>> on the 'post-stop' phase when either >>> * the cleanup mechanism is still the old one >>> * the guest was powered down from inside, not via the API >>> >>> This can be reproduced by e.g. passing through a usb device, which >>> delays the qemu process exit for a few seconds (for most devices). >>> >>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> >>> --- >>> src/PVE/CLI/qm.pm | 16 ++++++++++++++-- >>> 1 file changed, 14 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm >>> index 6aff5b7a..ee3ccedd 100755 >>> --- a/src/PVE/CLI/qm.pm >>> +++ b/src/PVE/CLI/qm.pm >>> @@ -1101,7 +1101,7 @@ __PACKAGE__->register_method({ >>> 60, >>> sub { >>> my $conf = PVE::QemuConfig->load_config($vmid); >>> - my $pid = PVE::QemuServer::check_running($vmid); >>> + my $pid = >>> PVE::QemuServer::Helpers::vm_running_locally($vmid); >>> # With a stop mode backup, we might run here into >>> a running vm with a backup >>> # lock, but this already did the cleanup and is an >>> expected state, so abort >>> @@ -1109,7 +1109,19 @@ __PACKAGE__->register_method({ >>> die "skipping cleanup - 'backup' lock is present >>> and vm is running again\n" >>> if $pid && $clean && $conf->{lock} && $conf- >>> >{lock} eq 'backup'; >>> - die "vm still running\n" if $pid; >>> + # wait for some time until the QEMU process exits >>> after the QMP >>> + # 'SHUTDOWN' event, since this might not be instant >>> + my $timeout = 30; >>> + my $starttime = time(); >>> + warn "QEMU process $pid for VM $vmid still running >>> (or newly started)\n" >>> + if $pid; >> >> Should we maybe warn once after 10? seconds rather than instantly? Not >> sure if we should warn at all if it's expected to take a while in some >> cases. If we time out, we still get the error below. >> > > ofc i can drop the warning, but IMO getting a warning that an operation > might take longer is also helpful? Not if you then get it every time in the cases where it is fully expected. That's just noise and not helpful information, making actual unusual situations less visible. > maybe a different wording like: > > 'QEMU process [..] (or newly started), waiting up to $timeout seconds\n' > > ? > > logging only after some feels wrong since the first 10 seconds i don't > know what's going on... Even without logging in the first bit of time, you do know that the cleanup handling is happening and taking some time, why would you need more? If terminating the QEMU instance takes longer than some time, that is an unusual situation, we might want to inform people about. I guess we should include "VM cleanup:" as a prefix to the message to make it clear which operation it is for. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH qemu-server v3 0/3] improve guest cleanup handling 2026-02-26 13:51 [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak ` (2 preceding siblings ...) 2026-02-26 13:52 ` [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak @ 2026-04-14 11:42 ` Dominik Csapak 2026-05-15 10:09 ` superseded: " Dominik Csapak 4 siblings, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2026-04-14 11:42 UTC (permalink / raw) To: pve-devel ping, these still apply ^ permalink raw reply [flat|nested] 14+ messages in thread
* superseded: [PATCH qemu-server v3 0/3] improve guest cleanup handling 2026-02-26 13:51 [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak ` (3 preceding siblings ...) 2026-04-14 11:42 ` [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak @ 2026-05-15 10:09 ` Dominik Csapak 4 siblings, 0 replies; 14+ messages in thread From: Dominik Csapak @ 2026-05-15 10:09 UTC (permalink / raw) To: pve-devel superseded by v4: https://lore.proxmox.com/pve-devel/20260515100842.1980636-1-d.csapak@proxmox.com/T/#t On 2/26/26 3:07 PM, Dominik Csapak wrote: > this is a combination of my previous patches[0][1] (v3 because [0] was > already v2) > > First we make the cleanup handling more consistent (1/3) > then we check explicitely for the backup lock to improve the error > message for stop backup mode (2/3) > and then we fix #7119 by waiting up to 30s for a possibly still running > guest to stop (e.g. this can occur when using usb passthrouh) (3/3) > > changes from v2/RFC: > * use 'vm_running_locally' for getting the pid > * improve error messages > * use a 'use_old_cleanup' flag that will be auto-removed by a reboot > to signal if we can use the new cleanup logic or the old > > 0: https://lore.proxmox.com/pve-devel/20260210111612.2017883-1-d.csapak@proxmox.com/ > 1: https://lore.proxmox.com/pve-devel/20260223105542.1525232-1-d.csapak@proxmox.com/ > > Dominik Csapak (3): > cleanup: refactor to make cleanup flow consistent > qm cleanup: die early when encountering a running stop mode backup > fix #7119: qm cleanup: wait for process exiting for up to 30 seconds > > debian/preinst | 16 +++++++++++++++ > src/PVE/CLI/qm.pm | 35 +++++++++++++++++++++++++++----- > src/PVE/QemuServer.pm | 14 +++++++++++++ > src/PVE/QemuServer/RunState.pm | 28 +++++++++++++++++++++++++ > src/test/MigrationTest/QmMock.pm | 16 +++++++++++++++ > 5 files changed, 104 insertions(+), 5 deletions(-) > create mode 100755 debian/preinst > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-05-15 10:13 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-26 13:51 [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak 2026-02-26 13:52 ` [PATCH qemu-server v3 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak 2026-02-27 11:44 ` Dominik Csapak 2026-05-13 15:02 ` Fiona Ebner 2026-02-26 13:52 ` [PATCH qemu-server v3 2/3] qm cleanup: die early when encountering a running stop mode backup Dominik Csapak 2026-05-13 15:08 ` Fiona Ebner 2026-02-26 13:52 ` [PATCH qemu-server v3 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak 2026-04-15 7:14 ` Benjamin McGuire 2026-05-13 15:14 ` Fiona Ebner 2026-05-13 15:19 ` Fiona Ebner 2026-05-15 9:52 ` Dominik Csapak 2026-05-15 10:12 ` Fiona Ebner 2026-04-14 11:42 ` [PATCH qemu-server v3 0/3] improve guest cleanup handling Dominik Csapak 2026-05-15 10:09 ` superseded: " Dominik Csapak
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.