* [PATCH qemu-server v5 0/3] improve guest cleanup handling
@ 2026-05-15 12:23 Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Dominik Csapak @ 2026-05-15 12:23 UTC (permalink / raw)
To: pve-devel
First we make the cleanup handling more consistent (1/3)
then we check explicitely for the backup lock to improve the error
message for stop backup mode (2/3)
and then we fix #7119 by waiting up to 30s for a possibly still running
guest to stop (e.g. this can occur when using usb passthrouh) (3/3)
changes from v4:
* only warn after 10 seconds exactly once, to indicate this is not a
normal situation
* drop the 'waiting for xx seconds' from warning message
changes from v3:
* update version in preinst check
* add #DEBHELPER# to preinst
* consistently use /run/ instead of /var/run
* make get_cleanup_flag_path public and use that for mocking in tests
* style fixes
* use non-deprecated is_running helper
* improve warning message
changes from v2/RFC:
* use 'vm_running_locally' for getting the pid
* improve error messages
* use a 'use_old_cleanup' flag that will be auto-removed by a reboot
to signal if we can use the new cleanup logic or the old
Dominik Csapak (3):
cleanup: refactor to make cleanup flow consistent
qm cleanup: die early when encountering a running stop mode backup
fix #7119: qm cleanup: wait for process exiting for up to 30 seconds
debian/preinst | 18 +++++++++++++
src/PVE/CLI/qm.pm | 43 ++++++++++++++++++++++++++++----
src/PVE/QemuServer.pm | 14 +++++++++++
src/PVE/QemuServer/RunState.pm | 29 +++++++++++++++++++++
src/test/MigrationTest/QmMock.pm | 11 ++++++++
5 files changed, 110 insertions(+), 5 deletions(-)
create mode 100755 debian/preinst
--
2.47.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH qemu-server v5 1/3] cleanup: refactor to make cleanup flow consistent
2026-05-15 12:23 [PATCH qemu-server v5 0/3] improve guest cleanup handling Dominik Csapak
@ 2026-05-15 12:23 ` Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 2/3] qm cleanup: die early when encountering a running stop mode backup Dominik Csapak
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Dominik Csapak @ 2026-05-15 12:23 UTC (permalink / raw)
To: pve-devel
There are two ways a cleanup can be triggered:
* When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'.
* When the guest process disconnects from qmeventd, 'qm cleanup' is
called, which in turn also tries to call 'vm_stop_cleanup'.
Both of these happen under a qemu config lock, so there is no direct
race condition that it will be called out of order, but it could happen
that the 'qm cleanup' call happened in addition so cleanup was called
twice. Which could be a problem when the shutdown was called with
'keepActive' which 'qm cleanup' would simply know nothing of and ignore.
Also the post-stop hook might not be triggered in case e.g. a stop-mode
backup was done, since that was only happening via qm cleanup and this
would sometimes detect the now again running guest and abort.
To improve the situation we move the exec_hookscript call at the end
of vm_stop_cleanup. At this point we know the vm is stopped and we still
have the config lock.
To prevent a double cleanup, create a new cleanup flag on vm startup and
only cleanup when this is still there, then delete it at the end.
This means we can now drop the check for clean/guest shutdown to do
that.
Since this can only work for guests started after this logic is
introduced, create an additional flag that we must use the old logic
which will be cleared on reboot. After that the new logic is used.
That flag will be created with a 'preinst' script, so we create it when
we come from an older version before the code is actually used.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
debian/preinst | 18 ++++++++++++++++++
src/PVE/CLI/qm.pm | 12 +++++++++---
src/PVE/QemuServer.pm | 14 ++++++++++++++
src/PVE/QemuServer/RunState.pm | 29 +++++++++++++++++++++++++++++
src/test/MigrationTest/QmMock.pm | 11 +++++++++++
5 files changed, 81 insertions(+), 3 deletions(-)
create mode 100755 debian/preinst
diff --git a/debian/preinst b/debian/preinst
new file mode 100755
index 00000000..af9039c3
--- /dev/null
+++ b/debian/preinst
@@ -0,0 +1,18 @@
+#!/bin/sh
+
+set -e
+
+#DEBHELPER#
+
+case "$1" in
+ upgrade)
+ if dpkg --compare-versions "$2" 'lt' '9.1.12'; then
+ # set use_old_cleanup flag file so the old cleanup code will be used until reboot
+ touch /run/qemu-server/use_old_cleanup
+ fi
+ ;;
+ install|abort-install)
+ ;;
+esac
+
+exit 0
diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm
index bc8a086c..81ccc564 100755
--- a/src/PVE/CLI/qm.pm
+++ b/src/PVE/CLI/qm.pm
@@ -1120,12 +1120,18 @@ __PACKAGE__->register_method({
}
}
- if (!$clean || $guest) {
- # vm was shutdown from inside the guest or crashed, doing api cleanup
+ my $can_use_cleanup_flag = PVE::QemuServer::RunState::can_use_cleanup_flag();
+
+ if (!$clean || $guest || $can_use_cleanup_flag) {
+ # either we can use the new mechanism to check if cleanup is done, or
+ # vm was shutdown from inside the guest or crashed
PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
}
- PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
+ if (!$can_use_cleanup_flag) {
+ # if the new cleanup mechanism is used, this will be called from 'vm_stop_cleanup'
+ PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
+ }
$restart = eval { PVE::QemuServer::clear_reboot_request($vmid) };
warn $@ if $@;
diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index a894684a..841a9026 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -5834,6 +5834,8 @@ sub vm_start_nolock {
syslog("info", "VM $vmid started with PID $pid.");
+ PVE::QemuServer::RunState::create_cleanup_flag($vmid);
+
if (defined(my $migrate = $res->{migrate})) {
if ($migrate->{proto} eq 'tcp') {
my $nodename = nodename();
@@ -6141,6 +6143,11 @@ sub cleanup_pci_devices {
sub vm_stop_cleanup {
my ($storecfg, $vmid, $conf, $keepActive, $apply_pending_changes, $noerr) = @_;
+ my $can_use_cleanup_flag = PVE::QemuServer::RunState::can_use_cleanup_flag();
+ if ($can_use_cleanup_flag) {
+ return if !PVE::QemuServer::RunState::cleanup_flag_exists($vmid);
+ }
+
eval {
PVE::QemuServer::QSD::quit($vmid)
if PVE::QemuServer::Helpers::qsd_running_locally($vmid);
@@ -6175,6 +6182,13 @@ sub vm_stop_cleanup {
die $err if !$noerr;
warn $err;
}
+
+ if ($can_use_cleanup_flag) {
+ # if the old cleanup mechanism is in place, this will be called by 'qm cleanup'
+ PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
+ }
+
+ PVE::QemuServer::RunState::clear_cleanup_flag($vmid);
}
# call only in locked context
diff --git a/src/PVE/QemuServer/RunState.pm b/src/PVE/QemuServer/RunState.pm
index 6a5fdbd7..c3cf41bf 100644
--- a/src/PVE/QemuServer/RunState.pm
+++ b/src/PVE/QemuServer/RunState.pm
@@ -6,6 +6,7 @@ use warnings;
use POSIX qw(strftime);
use PVE::Cluster;
+use PVE::File;
use PVE::RPCEnvironment;
use PVE::Storage;
@@ -183,4 +184,32 @@ sub vm_resume {
);
}
+sub get_cleanup_flag_path {
+ my ($vmid) = @_;
+ return "/run/qemu-server/$vmid.cleanup";
+}
+
+sub create_cleanup_flag {
+ my ($vmid) = @_;
+ # write time so we could check in a timeout if needed
+ PVE::File::file_set_contents(get_cleanup_flag_path($vmid), time());
+}
+
+sub clear_cleanup_flag {
+ my ($vmid) = @_;
+ my $path = get_cleanup_flag_path($vmid);
+ unlink $path or $! == POSIX::ENOENT or die "removing cleanup flag for $vmid failed: $!\n";
+}
+
+sub cleanup_flag_exists {
+ my ($vmid) = @_;
+ return -f get_cleanup_flag_path($vmid);
+}
+
+# checks if /run/qemu-server/use_old_cleanup exists that will be created on
+# package update and cleared on bootup so we can be sure the guests were
+# started recently enough
+sub can_use_cleanup_flag {
+ !-f "/run/qemu-server/use_old_cleanup";
+}
1;
diff --git a/src/test/MigrationTest/QmMock.pm b/src/test/MigrationTest/QmMock.pm
index 78be47d3..311dbb39 100644
--- a/src/test/MigrationTest/QmMock.pm
+++ b/src/test/MigrationTest/QmMock.pm
@@ -77,6 +77,17 @@ $qemu_server_helpers_module->mock(
},
);
+my $qemu_server_runstate_module = Test::MockModule->new("PVE::QemuServer::RunState");
+$qemu_server_runstate_module->mock(
+ get_cleanup_flag_path => sub {
+ my ($vmid) = @_;
+ return "${RUN_DIR_PATH}/${vmid}.cleanup";
+ },
+ can_use_cleanup_flag => sub {
+ return 1;
+ },
+);
+
our $qemu_server_machine_module = Test::MockModule->new("PVE::QemuServer::Machine");
$qemu_server_machine_module->mock(
get_current_qemu_machine => sub {
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH qemu-server v5 2/3] qm cleanup: die early when encountering a running stop mode backup
2026-05-15 12:23 [PATCH qemu-server v5 0/3] improve guest cleanup handling Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak
@ 2026-05-15 12:23 ` Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak
2026-05-15 13:05 ` [PATCH qemu-server v5 0/3] improve guest cleanup handling Fiona Ebner
3 siblings, 0 replies; 5+ messages in thread
From: Dominik Csapak @ 2026-05-15 12:23 UTC (permalink / raw)
To: pve-devel
this is an expected situation, so abort here early with a better message
than 'vm still running'
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PVE/CLI/qm.pm | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm
index 81ccc564..3c9e8812 100755
--- a/src/PVE/CLI/qm.pm
+++ b/src/PVE/CLI/qm.pm
@@ -1102,7 +1102,16 @@ __PACKAGE__->register_method({
sub {
my $conf = PVE::QemuConfig->load_config($vmid);
my $pid = PVE::QemuServer::check_running($vmid);
- die "vm still running\n" if $pid;
+
+ if ($pid) {
+ # With a stop mode backup, we might run here into a running vm with a backup
+ # lock, but this already did the cleanup and is an expected state, so abort
+ # here with a good message
+ die "skipping cleanup - 'backup' lock is present and vm is running again\n"
+ if $clean && $conf->{lock} && $conf->{lock} eq 'backup';
+
+ die "vm still running\n";
+ }
# Rollback already does cleanup when preparing and afterwards temporarily drops the
# lock on the configuration file to rollback the volumes. Deactivating volumes here
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH qemu-server v5 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds
2026-05-15 12:23 [PATCH qemu-server v5 0/3] improve guest cleanup handling Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 2/3] qm cleanup: die early when encountering a running stop mode backup Dominik Csapak
@ 2026-05-15 12:23 ` Dominik Csapak
2026-05-15 13:05 ` [PATCH qemu-server v5 0/3] improve guest cleanup handling Fiona Ebner
3 siblings, 0 replies; 5+ messages in thread
From: Dominik Csapak @ 2026-05-15 12:23 UTC (permalink / raw)
To: pve-devel
When qmeventd detects a vm exiting, it starts 'qm cleanup'.
Since the vm process exits is sometimes not instant, wait up to 30
seconds here to start the cleanup process instead of immediately
aborting if the pid still exits. This prevented executing the hookscript
on the 'post-stop' phase when either
* the cleanup mechanism is still the old one
* the guest was powered down from inside, not via the API
This can be reproduced by e.g. passing through a usb device, which
delays the qemu process exit for a few seconds (for most devices).
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PVE/CLI/qm.pm | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm
index 3c9e8812..6b796440 100755
--- a/src/PVE/CLI/qm.pm
+++ b/src/PVE/CLI/qm.pm
@@ -1101,7 +1101,7 @@ __PACKAGE__->register_method({
60,
sub {
my $conf = PVE::QemuConfig->load_config($vmid);
- my $pid = PVE::QemuServer::check_running($vmid);
+ my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid);
if ($pid) {
# With a stop mode backup, we might run here into a running vm with a backup
@@ -1110,7 +1110,25 @@ __PACKAGE__->register_method({
die "skipping cleanup - 'backup' lock is present and vm is running again\n"
if $clean && $conf->{lock} && $conf->{lock} eq 'backup';
- die "vm still running\n";
+ # wait for some time until the QEMU process exits after the QMP
+ # 'SHUTDOWN' event, since this might not be instant
+
+ my $timeout = 30;
+ my $warned = 0;
+ my $starttime = time();
+
+ while ($pid && (time() - $starttime) < $timeout) {
+ if (!$warned && (time() - $starttime) > 10) {
+ warn
+ "VM cleanup: QEMU process $pid for VM $vmid still running (or newly started)\n";
+ $warned = 1;
+ }
+ sleep(1);
+ $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid);
+ }
+
+ die "aborting cleanup, VM is still running after $timeout seconds\n"
+ if $pid;
}
# Rollback already does cleanup when preparing and afterwards temporarily drops the
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH qemu-server v5 0/3] improve guest cleanup handling
2026-05-15 12:23 [PATCH qemu-server v5 0/3] improve guest cleanup handling Dominik Csapak
` (2 preceding siblings ...)
2026-05-15 12:23 ` [PATCH qemu-server v5 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak
@ 2026-05-15 13:05 ` Fiona Ebner
3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2026-05-15 13:05 UTC (permalink / raw)
To: Dominik Csapak, pve-devel
Am 15.05.26 um 2:25 PM schrieb Dominik Csapak:
> First we make the cleanup handling more consistent (1/3)
> then we check explicitely for the backup lock to improve the error
> message for stop backup mode (2/3)
> and then we fix #7119 by waiting up to 30s for a possibly still running
> guest to stop (e.g. this can occur when using usb passthrouh) (3/3)
Thanks!
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-15 13:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 12:23 [PATCH qemu-server v5 0/3] improve guest cleanup handling Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 1/3] cleanup: refactor to make cleanup flow consistent Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 2/3] qm cleanup: die early when encountering a running stop mode backup Dominik Csapak
2026-05-15 12:23 ` [PATCH qemu-server v5 3/3] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds Dominik Csapak
2026-05-15 13:05 ` [PATCH qemu-server v5 0/3] improve guest cleanup handling Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox