public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent
Date: Mon, 23 Feb 2026 16:49:59 +0100	[thread overview]
Message-ID: <2c237643-99bd-4e9b-8838-efa9a97f9d75@proxmox.com> (raw)
In-Reply-To: <20260223105542.1525232-1-d.csapak@proxmox.com>

Am 23.02.26 um 11:56 AM schrieb Dominik Csapak:
> There are two ways a cleanup can be triggered:
> 
> * When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'.
> * When the guest process disconnects from qmeventd, 'qm cleanup' is
>   called, which in turn also tries to call 'vm_stop_cleanup'.
> 
> Both of these happen under a qemu config lock, so there is no direct
> race condition that it will be called out of order, but it could happen
> that the 'qm cleanup' call happened in addition so cleanup was called
> twice. Which could be a problem when the shutdown was called with
> 'keepActive' which 'qm cleanup' would simply know nothing of and ignore.
> 
> Also the post-stop hook might not be triggered in case e.g. a stop-mode
> backup was done, since that was only happening via qm cleanup and this
> would detect the now again running guest and abort.
> 
> To improve the situation we move the exec_hookscript call at the end
> of vm_stop_cleanup. At this point we know the vm is stopped and we still
> have the config lock.
> 
> On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is
> created that marks the vm is cleaned up by the api, so 'qm cleanup'
> should not do it again.
> 
> On vm start, this flag is cleared.
> 
> There is still a tiny race possible:
> 
> a guest is stopped from within (or crashes) and the vm is started again
> via the api before 'qm cleanup' can run
> 
> This should be a very rare case though, and all operation via the API
> (reboot, shutdown+start, stop-mode backup, etc.) should work as intended.

How difficult is it to trigger the race with an HA-managed VM?

> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> I'm not sure how we could possibly eliminate the race i mentioned:
> * we can't block on start because i don't think we can sensibly decide between:
>   - vm was crashing/powered off
>   - vm was never started
> 
> We could maybe leave a 'started' flag somewhere too and clear the
> cleanup flag also in 'qm cleanup', then we would start the vm
> only when the cleanup flag is cleared
> (or have the cleanup flags have multiple states, like 'started',
> 'finished')
> 
> Though this has some regression potential i think...
> 
>  src/PVE/CLI/qm.pm                |  6 +++---
>  src/PVE/QemuServer.pm            |  6 ++++++
>  src/PVE/QemuServer/Helpers.pm    | 23 +++++++++++++++++++++++
>  src/test/MigrationTest/QmMock.pm |  9 +++++++++
>  4 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm
> index bdae9641..f31d2ee0 100755
> --- a/src/PVE/CLI/qm.pm
> +++ b/src/PVE/CLI/qm.pm
> @@ -1120,13 +1120,13 @@ __PACKAGE__->register_method({
>                      }
>                  }
>  
> -                if (!$clean || $guest) {
> +                if (PVE::QemuServer::Helpers::cleanup_flag_exists($vmid)) {
> +                    warn "Cleanup already done for $vmid, skipping...\n";
> +                } else {
>                      # vm was shutdown from inside the guest or crashed, doing api cleanup
>                      PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
>                  }
>  
> -                PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
> -
>                  $restart = eval { PVE::QemuServer::clear_reboot_request($vmid) };
>                  warn $@ if $@;
>              },
> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> index 545758dc..1253b601 100644
> --- a/src/PVE/QemuServer.pm
> +++ b/src/PVE/QemuServer.pm
> @@ -5439,6 +5439,8 @@ sub vm_start {
>      return PVE::QemuConfig->lock_config(
>          $vmid,
>          sub {
> +            PVE::QemuServer::Helpers::clear_cleanup_flag($vmid);
> +
>              my $conf = PVE::QemuConfig->load_config($vmid, $migrate_opts->{migratedfrom});
>  
>              die "you can't start a vm if it's a template\n"
> @@ -6158,6 +6160,7 @@ sub vm_stop_cleanup {
>          die $err if !$noerr;
>          warn $err;
>      }
> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
>  }
>  
>  # call only in locked context
> @@ -6167,6 +6170,8 @@ sub _do_vm_stop {
>      my $pid = check_running($vmid, $nocheck);
>      return if !$pid;
>  
> +    PVE::QemuServer::Helpers::create_cleanup_flag($vmid);

We don't do the cleanup if $nohcheck=1, so we may not create the flag then.

> +
>      my $conf;
>      if (!$nocheck) {
>          $conf = PVE::QemuConfig->load_config($vmid);
> @@ -6261,6 +6266,7 @@ sub vm_stop {
>      $force = 1 if !defined($force) && !$shutdown;
>  
>      if ($migratedfrom) {
> +        PVE::QemuServer::Helpers::create_cleanup_flag($vmid);
>          my $pid = check_running($vmid, $nocheck, $migratedfrom);
>          kill 15, $pid if $pid;
>          my $conf = PVE::QemuConfig->load_config($vmid, $migratedfrom);
> diff --git a/src/PVE/QemuServer/Helpers.pm b/src/PVE/QemuServer/Helpers.pm
> index 35c00754..79277dd6 100644
> --- a/src/PVE/QemuServer/Helpers.pm
> +++ b/src/PVE/QemuServer/Helpers.pm
> @@ -6,8 +6,10 @@ use warnings;
>  use File::stat;
>  use IO::File;
>  use JSON;
> +use POSIX qw();
>  
>  use PVE::Cluster;
> +use PVE::File;
>  use PVE::INotify;
>  use PVE::ProcFSTools;
>  use PVE::Tools qw(get_host_arch);
> @@ -363,4 +365,25 @@ sub get_iscsi_initiator_name {
>      return $initiator;
>  }
>  
> +my sub get_cleanup_path {
> +    my ($vmid) = @_;
> +    return "/var/run/qemu-server/$vmid.cleanup";
> +}
> +
> +sub create_cleanup_flag {
> +    my ($vmid) = @_;
> +    PVE::File::file_set_contents(get_cleanup_path($vmid), time());
> +}
> +
> +sub clear_cleanup_flag {
> +    my ($vmid) = @_;
> +    my $path = get_cleanup_path($vmid);
> +    unlink $path or $! == POSIX::ENOENT or die "removing cleanup flag for $vmid failed: $!\n";
> +}
> +
> +sub cleanup_flag_exists {
> +    my ($vmid) = @_;
> +    return -f get_cleanup_path($vmid);
> +}
> +

Maybe those helpers are better placed in the RunState module rather than
the generic Helper module?

>  1;
> diff --git a/src/test/MigrationTest/QmMock.pm b/src/test/MigrationTest/QmMock.pm
> index 78be47d3..7d514644 100644
> --- a/src/test/MigrationTest/QmMock.pm
> +++ b/src/test/MigrationTest/QmMock.pm
> @@ -75,6 +75,15 @@ $qemu_server_helpers_module->mock(
>      vm_running_locally => sub {
>          return $kvm_exectued;
>      },
> +    create_cleanup_flag => sub {
> +        return undef;
> +    },
> +    clear_cleanup_flag => sub {
> +        return undef;
> +    },
> +    cleanup_flag_exists => sub {
> +        return ''; # what -f returns if a file does not exist
> +    },
>  );
>  
>  our $qemu_server_machine_module = Test::MockModule->new("PVE::QemuServer::Machine");





  reply	other threads:[~2026-02-23 15:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 10:49 Dominik Csapak
2026-02-23 15:49 ` Fiona Ebner [this message]
2026-02-24  9:30   ` Fiona Ebner
2026-02-24  9:37     ` Dominik Csapak
2026-02-24  9:50       ` Fiona Ebner
2026-02-24 10:06         ` Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c237643-99bd-4e9b-8838-efa9a97f9d75@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal