From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 88E551FF14C for ; Fri, 15 May 2026 12:09:03 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 271691779E; Fri, 15 May 2026 12:08:51 +0200 (CEST) From: Dominik Csapak To: pve-devel@lists.proxmox.com Subject: [PATCH qemu-server v4 1/3] cleanup: refactor to make cleanup flow consistent Date: Fri, 15 May 2026 12:04:52 +0200 Message-ID: <20260515100842.1980636-2-d.csapak@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260515100842.1980636-1-d.csapak@proxmox.com> References: <20260515100842.1980636-1-d.csapak@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.050 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: BADBF2HKBF7P73Z2KDUU7Q6MJ2EEETL7 X-Message-ID-Hash: BADBF2HKBF7P73Z2KDUU7Q6MJ2EEETL7 X-MailFrom: d.csapak@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: There are two ways a cleanup can be triggered: * When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'. * When the guest process disconnects from qmeventd, 'qm cleanup' is called, which in turn also tries to call 'vm_stop_cleanup'. Both of these happen under a qemu config lock, so there is no direct race condition that it will be called out of order, but it could happen that the 'qm cleanup' call happened in addition so cleanup was called twice. Which could be a problem when the shutdown was called with 'keepActive' which 'qm cleanup' would simply know nothing of and ignore. Also the post-stop hook might not be triggered in case e.g. a stop-mode backup was done, since that was only happening via qm cleanup and this would sometimes detect the now again running guest and abort. To improve the situation we move the exec_hookscript call at the end of vm_stop_cleanup. At this point we know the vm is stopped and we still have the config lock. To prevent a double cleanup, create a new cleanup flag on vm startup and only cleanup when this is still there, then delete it at the end. This means we can now drop the check for clean/guest shutdown to do that. Since this can only work for guests started after this logic is introduced, create an additional flag that we must use the old logic which will be cleared on reboot. After that the new logic is used. That flag will be created with a 'preinst' script, so we create it when we come from an older version before the code is actually used. Signed-off-by: Dominik Csapak --- changes from v3: * update version in preinst check * add #DEBHELPER# to preinst * consistently use /run/ instead of /var/run * make get_cleanup_flag_path public and use that for mocking in tests * style fixes debian/preinst | 18 ++++++++++++++++++ src/PVE/CLI/qm.pm | 12 +++++++++--- src/PVE/QemuServer.pm | 14 ++++++++++++++ src/PVE/QemuServer/RunState.pm | 29 +++++++++++++++++++++++++++++ src/test/MigrationTest/QmMock.pm | 11 +++++++++++ 5 files changed, 81 insertions(+), 3 deletions(-) create mode 100755 debian/preinst diff --git a/debian/preinst b/debian/preinst new file mode 100755 index 00000000..b58b9e54 --- /dev/null +++ b/debian/preinst @@ -0,0 +1,18 @@ +#!/bin/sh + +set -e + +#DEBHELPER# + +case "$1" in + upgrade) + if dpkg --compare-versions "$2" 'lt' '9.1.11'; then + # set use_old_cleanup flag file so the old cleanup code will be used until reboot + touch /run/qemu-server/use_old_cleanup + fi + ;; + install|abort-install) + ;; +esac + +exit 0 diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm index bc8a086c..81ccc564 100755 --- a/src/PVE/CLI/qm.pm +++ b/src/PVE/CLI/qm.pm @@ -1120,12 +1120,18 @@ __PACKAGE__->register_method({ } } - if (!$clean || $guest) { - # vm was shutdown from inside the guest or crashed, doing api cleanup + my $can_use_cleanup_flag = PVE::QemuServer::RunState::can_use_cleanup_flag(); + + if (!$clean || $guest || $can_use_cleanup_flag) { + # either we can use the new mechanism to check if cleanup is done, or + # vm was shutdown from inside the guest or crashed PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1); } - PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop'); + if (!$can_use_cleanup_flag) { + # if the new cleanup mechanism is used, this will be called from 'vm_stop_cleanup' + PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop'); + } $restart = eval { PVE::QemuServer::clear_reboot_request($vmid) }; warn $@ if $@; diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm index a894684a..841a9026 100644 --- a/src/PVE/QemuServer.pm +++ b/src/PVE/QemuServer.pm @@ -5834,6 +5834,8 @@ sub vm_start_nolock { syslog("info", "VM $vmid started with PID $pid."); + PVE::QemuServer::RunState::create_cleanup_flag($vmid); + if (defined(my $migrate = $res->{migrate})) { if ($migrate->{proto} eq 'tcp') { my $nodename = nodename(); @@ -6141,6 +6143,11 @@ sub cleanup_pci_devices { sub vm_stop_cleanup { my ($storecfg, $vmid, $conf, $keepActive, $apply_pending_changes, $noerr) = @_; + my $can_use_cleanup_flag = PVE::QemuServer::RunState::can_use_cleanup_flag(); + if ($can_use_cleanup_flag) { + return if !PVE::QemuServer::RunState::cleanup_flag_exists($vmid); + } + eval { PVE::QemuServer::QSD::quit($vmid) if PVE::QemuServer::Helpers::qsd_running_locally($vmid); @@ -6175,6 +6182,13 @@ sub vm_stop_cleanup { die $err if !$noerr; warn $err; } + + if ($can_use_cleanup_flag) { + # if the old cleanup mechanism is in place, this will be called by 'qm cleanup' + PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop'); + } + + PVE::QemuServer::RunState::clear_cleanup_flag($vmid); } # call only in locked context diff --git a/src/PVE/QemuServer/RunState.pm b/src/PVE/QemuServer/RunState.pm index 6a5fdbd7..c3cf41bf 100644 --- a/src/PVE/QemuServer/RunState.pm +++ b/src/PVE/QemuServer/RunState.pm @@ -6,6 +6,7 @@ use warnings; use POSIX qw(strftime); use PVE::Cluster; +use PVE::File; use PVE::RPCEnvironment; use PVE::Storage; @@ -183,4 +184,32 @@ sub vm_resume { ); } +sub get_cleanup_flag_path { + my ($vmid) = @_; + return "/run/qemu-server/$vmid.cleanup"; +} + +sub create_cleanup_flag { + my ($vmid) = @_; + # write time so we could check in a timeout if needed + PVE::File::file_set_contents(get_cleanup_flag_path($vmid), time()); +} + +sub clear_cleanup_flag { + my ($vmid) = @_; + my $path = get_cleanup_flag_path($vmid); + unlink $path or $! == POSIX::ENOENT or die "removing cleanup flag for $vmid failed: $!\n"; +} + +sub cleanup_flag_exists { + my ($vmid) = @_; + return -f get_cleanup_flag_path($vmid); +} + +# checks if /run/qemu-server/use_old_cleanup exists that will be created on +# package update and cleared on bootup so we can be sure the guests were +# started recently enough +sub can_use_cleanup_flag { + !-f "/run/qemu-server/use_old_cleanup"; +} 1; diff --git a/src/test/MigrationTest/QmMock.pm b/src/test/MigrationTest/QmMock.pm index 78be47d3..311dbb39 100644 --- a/src/test/MigrationTest/QmMock.pm +++ b/src/test/MigrationTest/QmMock.pm @@ -77,6 +77,17 @@ $qemu_server_helpers_module->mock( }, ); +my $qemu_server_runstate_module = Test::MockModule->new("PVE::QemuServer::RunState"); +$qemu_server_runstate_module->mock( + get_cleanup_flag_path => sub { + my ($vmid) = @_; + return "${RUN_DIR_PATH}/${vmid}.cleanup"; + }, + can_use_cleanup_flag => sub { + return 1; + }, +); + our $qemu_server_machine_module = Test::MockModule->new("PVE::QemuServer::Machine"); $qemu_server_machine_module->mock( get_current_qemu_machine => sub { -- 2.47.3