[RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent
@ 2026-02-23 10:49 Dominik Csapak
  2026-02-23 15:49 ` Fiona Ebner
  0 siblings, 1 reply; 6+ messages in thread
From: Dominik Csapak @ 2026-02-23 10:49 UTC (permalink / raw)
  To: pve-devel

There are two ways a cleanup can be triggered:

* When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'.
* When the guest process disconnects from qmeventd, 'qm cleanup' is
  called, which in turn also tries to call 'vm_stop_cleanup'.

Both of these happen under a qemu config lock, so there is no direct
race condition that it will be called out of order, but it could happen
that the 'qm cleanup' call happened in addition so cleanup was called
twice. Which could be a problem when the shutdown was called with
'keepActive' which 'qm cleanup' would simply know nothing of and ignore.

Also the post-stop hook might not be triggered in case e.g. a stop-mode
backup was done, since that was only happening via qm cleanup and this
would detect the now again running guest and abort.

To improve the situation we move the exec_hookscript call at the end
of vm_stop_cleanup. At this point we know the vm is stopped and we still
have the config lock.

On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is
created that marks the vm is cleaned up by the api, so 'qm cleanup'
should not do it again.

On vm start, this flag is cleared.

There is still a tiny race possible:

a guest is stopped from within (or crashes) and the vm is started again
via the api before 'qm cleanup' can run

This should be a very rare case though, and all operation via the API
(reboot, shutdown+start, stop-mode backup, etc.) should work as intended.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
I'm not sure how we could possibly eliminate the race i mentioned:
* we can't block on start because i don't think we can sensibly decide between:
  - vm was crashing/powered off
  - vm was never started

We could maybe leave a 'started' flag somewhere too and clear the
cleanup flag also in 'qm cleanup', then we would start the vm
only when the cleanup flag is cleared
(or have the cleanup flags have multiple states, like 'started',
'finished')

Though this has some regression potential i think...

 src/PVE/CLI/qm.pm                |  6 +++---
 src/PVE/QemuServer.pm            |  6 ++++++
 src/PVE/QemuServer/Helpers.pm    | 23 +++++++++++++++++++++++
 src/test/MigrationTest/QmMock.pm |  9 +++++++++
 4 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm
index bdae9641..f31d2ee0 100755
--- a/src/PVE/CLI/qm.pm
+++ b/src/PVE/CLI/qm.pm
@@ -1120,13 +1120,13 @@ __PACKAGE__->register_method({
                     }
                 }
 
-                if (!$clean || $guest) {
+                if (PVE::QemuServer::Helpers::cleanup_flag_exists($vmid)) {
+                    warn "Cleanup already done for $vmid, skipping...\n";
+                } else {
                     # vm was shutdown from inside the guest or crashed, doing api cleanup
                     PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
                 }
 
-                PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
-
                 $restart = eval { PVE::QemuServer::clear_reboot_request($vmid) };
                 warn $@ if $@;
             },
diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index 545758dc..1253b601 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -5439,6 +5439,8 @@ sub vm_start {
     return PVE::QemuConfig->lock_config(
         $vmid,
         sub {
+            PVE::QemuServer::Helpers::clear_cleanup_flag($vmid);
+
             my $conf = PVE::QemuConfig->load_config($vmid, $migrate_opts->{migratedfrom});
 
             die "you can't start a vm if it's a template\n"
@@ -6158,6 +6160,7 @@ sub vm_stop_cleanup {
         die $err if !$noerr;
         warn $err;
     }
+    PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
 }
 
 # call only in locked context
@@ -6167,6 +6170,8 @@ sub _do_vm_stop {
     my $pid = check_running($vmid, $nocheck);
     return if !$pid;
 
+    PVE::QemuServer::Helpers::create_cleanup_flag($vmid);
+
     my $conf;
     if (!$nocheck) {
         $conf = PVE::QemuConfig->load_config($vmid);
@@ -6261,6 +6266,7 @@ sub vm_stop {
     $force = 1 if !defined($force) && !$shutdown;
 
     if ($migratedfrom) {
+        PVE::QemuServer::Helpers::create_cleanup_flag($vmid);
         my $pid = check_running($vmid, $nocheck, $migratedfrom);
         kill 15, $pid if $pid;
         my $conf = PVE::QemuConfig->load_config($vmid, $migratedfrom);
diff --git a/src/PVE/QemuServer/Helpers.pm b/src/PVE/QemuServer/Helpers.pm
index 35c00754..79277dd6 100644
--- a/src/PVE/QemuServer/Helpers.pm
+++ b/src/PVE/QemuServer/Helpers.pm
@@ -6,8 +6,10 @@ use warnings;
 use File::stat;
 use IO::File;
 use JSON;
+use POSIX qw();
 
 use PVE::Cluster;
+use PVE::File;
 use PVE::INotify;
 use PVE::ProcFSTools;
 use PVE::Tools qw(get_host_arch);
@@ -363,4 +365,25 @@ sub get_iscsi_initiator_name {
     return $initiator;
 }
 
+my sub get_cleanup_path {
+    my ($vmid) = @_;
+    return "/var/run/qemu-server/$vmid.cleanup";
+}
+
+sub create_cleanup_flag {
+    my ($vmid) = @_;
+    PVE::File::file_set_contents(get_cleanup_path($vmid), time());
+}
+
+sub clear_cleanup_flag {
+    my ($vmid) = @_;
+    my $path = get_cleanup_path($vmid);
+    unlink $path or $! == POSIX::ENOENT or die "removing cleanup flag for $vmid failed: $!\n";
+}
+
+sub cleanup_flag_exists {
+    my ($vmid) = @_;
+    return -f get_cleanup_path($vmid);
+}
+
 1;
diff --git a/src/test/MigrationTest/QmMock.pm b/src/test/MigrationTest/QmMock.pm
index 78be47d3..7d514644 100644
--- a/src/test/MigrationTest/QmMock.pm
+++ b/src/test/MigrationTest/QmMock.pm
@@ -75,6 +75,15 @@ $qemu_server_helpers_module->mock(
     vm_running_locally => sub {
         return $kvm_exectued;
     },
+    create_cleanup_flag => sub {
+        return undef;
+    },
+    clear_cleanup_flag => sub {
+        return undef;
+    },
+    cleanup_flag_exists => sub {
+        return ''; # what -f returns if a file does not exist
+    },
 );
 
 our $qemu_server_machine_module = Test::MockModule->new("PVE::QemuServer::Machine");
-- 
2.47.3





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent
  2026-02-23 10:49 [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent Dominik Csapak
@ 2026-02-23 15:49 ` Fiona Ebner
  2026-02-24  9:30   ` Fiona Ebner
  0 siblings, 1 reply; 6+ messages in thread
From: Fiona Ebner @ 2026-02-23 15:49 UTC (permalink / raw)
  To: Dominik Csapak, pve-devel

Am 23.02.26 um 11:56 AM schrieb Dominik Csapak:
> There are two ways a cleanup can be triggered:
> 
> * When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'.
> * When the guest process disconnects from qmeventd, 'qm cleanup' is
>   called, which in turn also tries to call 'vm_stop_cleanup'.
> 
> Both of these happen under a qemu config lock, so there is no direct
> race condition that it will be called out of order, but it could happen
> that the 'qm cleanup' call happened in addition so cleanup was called
> twice. Which could be a problem when the shutdown was called with
> 'keepActive' which 'qm cleanup' would simply know nothing of and ignore.
> 
> Also the post-stop hook might not be triggered in case e.g. a stop-mode
> backup was done, since that was only happening via qm cleanup and this
> would detect the now again running guest and abort.
> 
> To improve the situation we move the exec_hookscript call at the end
> of vm_stop_cleanup. At this point we know the vm is stopped and we still
> have the config lock.
> 
> On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is
> created that marks the vm is cleaned up by the api, so 'qm cleanup'
> should not do it again.
> 
> On vm start, this flag is cleared.
> 
> There is still a tiny race possible:
> 
> a guest is stopped from within (or crashes) and the vm is started again
> via the api before 'qm cleanup' can run
> 
> This should be a very rare case though, and all operation via the API
> (reboot, shutdown+start, stop-mode backup, etc.) should work as intended.

How difficult is it to trigger the race with an HA-managed VM?

> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> I'm not sure how we could possibly eliminate the race i mentioned:
> * we can't block on start because i don't think we can sensibly decide between:
>   - vm was crashing/powered off
>   - vm was never started
> 
> We could maybe leave a 'started' flag somewhere too and clear the
> cleanup flag also in 'qm cleanup', then we would start the vm
> only when the cleanup flag is cleared
> (or have the cleanup flags have multiple states, like 'started',
> 'finished')
> 
> Though this has some regression potential i think...
> 
>  src/PVE/CLI/qm.pm                |  6 +++---
>  src/PVE/QemuServer.pm            |  6 ++++++
>  src/PVE/QemuServer/Helpers.pm    | 23 +++++++++++++++++++++++
>  src/test/MigrationTest/QmMock.pm |  9 +++++++++
>  4 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm
> index bdae9641..f31d2ee0 100755
> --- a/src/PVE/CLI/qm.pm
> +++ b/src/PVE/CLI/qm.pm
> @@ -1120,13 +1120,13 @@ __PACKAGE__->register_method({
>                      }
>                  }
>  
> -                if (!$clean || $guest) {
> +                if (PVE::QemuServer::Helpers::cleanup_flag_exists($vmid)) {
> +                    warn "Cleanup already done for $vmid, skipping...\n";
> +                } else {
>                      # vm was shutdown from inside the guest or crashed, doing api cleanup
>                      PVE::QemuServer::vm_stop_cleanup($storecfg, $vmid, $conf, 0, 0, 1);
>                  }
>  
> -                PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
> -
>                  $restart = eval { PVE::QemuServer::clear_reboot_request($vmid) };
>                  warn $@ if $@;
>              },
> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> index 545758dc..1253b601 100644
> --- a/src/PVE/QemuServer.pm
> +++ b/src/PVE/QemuServer.pm
> @@ -5439,6 +5439,8 @@ sub vm_start {
>      return PVE::QemuConfig->lock_config(
>          $vmid,
>          sub {
> +            PVE::QemuServer::Helpers::clear_cleanup_flag($vmid);
> +
>              my $conf = PVE::QemuConfig->load_config($vmid, $migrate_opts->{migratedfrom});
>  
>              die "you can't start a vm if it's a template\n"
> @@ -6158,6 +6160,7 @@ sub vm_stop_cleanup {
>          die $err if !$noerr;
>          warn $err;
>      }
> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-stop');
>  }
>  
>  # call only in locked context
> @@ -6167,6 +6170,8 @@ sub _do_vm_stop {
>      my $pid = check_running($vmid, $nocheck);
>      return if !$pid;
>  
> +    PVE::QemuServer::Helpers::create_cleanup_flag($vmid);

We don't do the cleanup if $nohcheck=1, so we may not create the flag then.

> +
>      my $conf;
>      if (!$nocheck) {
>          $conf = PVE::QemuConfig->load_config($vmid);
> @@ -6261,6 +6266,7 @@ sub vm_stop {
>      $force = 1 if !defined($force) && !$shutdown;
>  
>      if ($migratedfrom) {
> +        PVE::QemuServer::Helpers::create_cleanup_flag($vmid);
>          my $pid = check_running($vmid, $nocheck, $migratedfrom);
>          kill 15, $pid if $pid;
>          my $conf = PVE::QemuConfig->load_config($vmid, $migratedfrom);
> diff --git a/src/PVE/QemuServer/Helpers.pm b/src/PVE/QemuServer/Helpers.pm
> index 35c00754..79277dd6 100644
> --- a/src/PVE/QemuServer/Helpers.pm
> +++ b/src/PVE/QemuServer/Helpers.pm
> @@ -6,8 +6,10 @@ use warnings;
>  use File::stat;
>  use IO::File;
>  use JSON;
> +use POSIX qw();
>  
>  use PVE::Cluster;
> +use PVE::File;
>  use PVE::INotify;
>  use PVE::ProcFSTools;
>  use PVE::Tools qw(get_host_arch);
> @@ -363,4 +365,25 @@ sub get_iscsi_initiator_name {
>      return $initiator;
>  }
>  
> +my sub get_cleanup_path {
> +    my ($vmid) = @_;
> +    return "/var/run/qemu-server/$vmid.cleanup";
> +}
> +
> +sub create_cleanup_flag {
> +    my ($vmid) = @_;
> +    PVE::File::file_set_contents(get_cleanup_path($vmid), time());
> +}
> +
> +sub clear_cleanup_flag {
> +    my ($vmid) = @_;
> +    my $path = get_cleanup_path($vmid);
> +    unlink $path or $! == POSIX::ENOENT or die "removing cleanup flag for $vmid failed: $!\n";
> +}
> +
> +sub cleanup_flag_exists {
> +    my ($vmid) = @_;
> +    return -f get_cleanup_path($vmid);
> +}
> +

Maybe those helpers are better placed in the RunState module rather than
the generic Helper module?

>  1;
> diff --git a/src/test/MigrationTest/QmMock.pm b/src/test/MigrationTest/QmMock.pm
> index 78be47d3..7d514644 100644
> --- a/src/test/MigrationTest/QmMock.pm
> +++ b/src/test/MigrationTest/QmMock.pm
> @@ -75,6 +75,15 @@ $qemu_server_helpers_module->mock(
>      vm_running_locally => sub {
>          return $kvm_exectued;
>      },
> +    create_cleanup_flag => sub {
> +        return undef;
> +    },
> +    clear_cleanup_flag => sub {
> +        return undef;
> +    },
> +    cleanup_flag_exists => sub {
> +        return ''; # what -f returns if a file does not exist
> +    },
>  );
>  
>  our $qemu_server_machine_module = Test::MockModule->new("PVE::QemuServer::Machine");





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent
  2026-02-23 15:49 ` Fiona Ebner
@ 2026-02-24  9:30   ` Fiona Ebner
  2026-02-24  9:37     ` Dominik Csapak
  0 siblings, 1 reply; 6+ messages in thread
From: Fiona Ebner @ 2026-02-24  9:30 UTC (permalink / raw)
  To: Dominik Csapak, pve-devel

Am 23.02.26 um 4:50 PM schrieb Fiona Ebner:
> Am 23.02.26 um 11:56 AM schrieb Dominik Csapak:
>> There are two ways a cleanup can be triggered:
>>
>> * When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'.
>> * When the guest process disconnects from qmeventd, 'qm cleanup' is
>>   called, which in turn also tries to call 'vm_stop_cleanup'.
>>
>> Both of these happen under a qemu config lock, so there is no direct
>> race condition that it will be called out of order, but it could happen
>> that the 'qm cleanup' call happened in addition so cleanup was called
>> twice. Which could be a problem when the shutdown was called with
>> 'keepActive' which 'qm cleanup' would simply know nothing of and ignore.
>>
>> Also the post-stop hook might not be triggered in case e.g. a stop-mode
>> backup was done, since that was only happening via qm cleanup and this
>> would detect the now again running guest and abort.
>>
>> To improve the situation we move the exec_hookscript call at the end
>> of vm_stop_cleanup. At this point we know the vm is stopped and we still
>> have the config lock.
>>
>> On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is
>> created that marks the vm is cleaned up by the api, so 'qm cleanup'
>> should not do it again.
>>
>> On vm start, this flag is cleared.

It feels untidy to have something left after cleaning up, even if it's
just the file indicating that cleanup was done. Maybe we can switch it
around, see below:

>>
>> There is still a tiny race possible:
>>
>> a guest is stopped from within (or crashes) and the vm is started again
>> via the api before 'qm cleanup' can run
>>
>> This should be a very rare case though, and all operation via the API
>> (reboot, shutdown+start, stop-mode backup, etc.) should work as intended.
> 
> How difficult is it to trigger the race with an HA-managed VM?
> 
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> I'm not sure how we could possibly eliminate the race i mentioned:
>> * we can't block on start because i don't think we can sensibly decide between:
>>   - vm was crashing/powered off
>>   - vm was never started
>>
>> We could maybe leave a 'started' flag somewhere too and clear the
>> cleanup flag also in 'qm cleanup', then we would start the vm
>> only when the cleanup flag is cleared
>> (or have the cleanup flags have multiple states, like 'started',
>> 'finished')

We already have something very similar, namely, the PID file. The issue
is that the PID file is removed automatically by QEMU upon clean
termination. For our use case we would need a second, more persistent
file. Then we could solve the issue of duplicate cleanup and the issue
of starting another instance before cleanup:

1. create a flag file at startup with an identifier for the
   QEMU instance, a second manual PID file?
2. at cleanup, check the file:
   a) if there is no such file, skip, somebody else already cleaned up
      NOTE: we need to ensure that pre-existing instances are still
      cleaned up. One possible way would be to create a flag file during
      host startup and only use the new behavior when that is present.
   b) if the file exists, check if the QEMU instance is still around. If
      it is, wait for the instance to be gone until hitting some
      timeout. Once it's gone, do cleanup.
3. make sure to run the post-stop hook whenever we remove the file
4. if the file still exists at startup, cleanup was not done yet, wait
   until some timeout and when hitting the timeout, either proceed with
   start anyway or suggest running cleanup manually. The latter would be
   safer, but also worse from an UX standpoint, since cleanup is
   root-only

What do you think?




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent
  2026-02-24  9:30   ` Fiona Ebner
@ 2026-02-24  9:37     ` Dominik Csapak
  2026-02-24  9:50       ` Fiona Ebner
  0 siblings, 1 reply; 6+ messages in thread
From: Dominik Csapak @ 2026-02-24  9:37 UTC (permalink / raw)
  To: Fiona Ebner, pve-devel



On 2/24/26 10:30 AM, Fiona Ebner wrote:
> Am 23.02.26 um 4:50 PM schrieb Fiona Ebner:
>> Am 23.02.26 um 11:56 AM schrieb Dominik Csapak:
>>> There are two ways a cleanup can be triggered:
>>>
>>> * When a guest is stopped/shutdown via the API, 'vm_stop' calls 'vm_stop_cleanup'.
>>> * When the guest process disconnects from qmeventd, 'qm cleanup' is
>>>    called, which in turn also tries to call 'vm_stop_cleanup'.
>>>
>>> Both of these happen under a qemu config lock, so there is no direct
>>> race condition that it will be called out of order, but it could happen
>>> that the 'qm cleanup' call happened in addition so cleanup was called
>>> twice. Which could be a problem when the shutdown was called with
>>> 'keepActive' which 'qm cleanup' would simply know nothing of and ignore.
>>>
>>> Also the post-stop hook might not be triggered in case e.g. a stop-mode
>>> backup was done, since that was only happening via qm cleanup and this
>>> would detect the now again running guest and abort.
>>>
>>> To improve the situation we move the exec_hookscript call at the end
>>> of vm_stop_cleanup. At this point we know the vm is stopped and we still
>>> have the config lock.
>>>
>>> On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is
>>> created that marks the vm is cleaned up by the api, so 'qm cleanup'
>>> should not do it again.
>>>
>>> On vm start, this flag is cleared.
> 
> It feels untidy to have something left after cleaning up, even if it's
> just the file indicating that cleanup was done. Maybe we can switch it
> around, see below:
> 
>>>
>>> There is still a tiny race possible:
>>>
>>> a guest is stopped from within (or crashes) and the vm is started again
>>> via the api before 'qm cleanup' can run
>>>
>>> This should be a very rare case though, and all operation via the API
>>> (reboot, shutdown+start, stop-mode backup, etc.) should work as intended.
>>
>> How difficult is it to trigger the race with an HA-managed VM?
>>
>>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>>> ---
>>> I'm not sure how we could possibly eliminate the race i mentioned:
>>> * we can't block on start because i don't think we can sensibly decide between:
>>>    - vm was crashing/powered off
>>>    - vm was never started
>>>
>>> We could maybe leave a 'started' flag somewhere too and clear the
>>> cleanup flag also in 'qm cleanup', then we would start the vm
>>> only when the cleanup flag is cleared
>>> (or have the cleanup flags have multiple states, like 'started',
>>> 'finished')
> 
> We already have something very similar, namely, the PID file. The issue
> is that the PID file is removed automatically by QEMU upon clean
> termination. For our use case we would need a second, more persistent
> file. Then we could solve the issue of duplicate cleanup and the issue
> of starting another instance before cleanup:
> 
> 1. create a flag file at startup with an identifier for the
>     QEMU instance, a second manual PID file?
> 2. at cleanup, check the file:
>     a) if there is no such file, skip, somebody else already cleaned up
>        NOTE: we need to ensure that pre-existing instances are still
>        cleaned up. One possible way would be to create a flag file during
>        host startup and only use the new behavior when that is present.
>     b) if the file exists, check if the QEMU instance is still around. If
>        it is, wait for the instance to be gone until hitting some
>        timeout. Once it's gone, do cleanup.
> 3. make sure to run the post-stop hook whenever we remove the file
> 4. if the file still exists at startup, cleanup was not done yet, wait
>     until some timeout and when hitting the timeout, either proceed with
>     start anyway or suggest running cleanup manually. The latter would be
>     safer, but also worse from an UX standpoint, since cleanup is
>     root-only
> 
> What do you think?

yes, that seems good to me,  i'll play around with that and send a next 
version




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent
  2026-02-24  9:37     ` Dominik Csapak
@ 2026-02-24  9:50       ` Fiona Ebner
  2026-02-24 10:06         ` Dominik Csapak
  0 siblings, 1 reply; 6+ messages in thread
From: Fiona Ebner @ 2026-02-24  9:50 UTC (permalink / raw)
  To: Dominik Csapak, pve-devel

Am 24.02.26 um 10:37 AM schrieb Dominik Csapak:
> On 2/24/26 10:30 AM, Fiona Ebner wrote:
>> Am 23.02.26 um 4:50 PM schrieb Fiona Ebner:
>>> Am 23.02.26 um 11:56 AM schrieb Dominik Csapak:
>>>> There are two ways a cleanup can be triggered:
>>>>
>>>> * When a guest is stopped/shutdown via the API, 'vm_stop' calls
>>>> 'vm_stop_cleanup'.
>>>> * When the guest process disconnects from qmeventd, 'qm cleanup' is
>>>>    called, which in turn also tries to call 'vm_stop_cleanup'.
>>>>
>>>> Both of these happen under a qemu config lock, so there is no direct
>>>> race condition that it will be called out of order, but it could happen
>>>> that the 'qm cleanup' call happened in addition so cleanup was called
>>>> twice. Which could be a problem when the shutdown was called with
>>>> 'keepActive' which 'qm cleanup' would simply know nothing of and
>>>> ignore.
>>>>
>>>> Also the post-stop hook might not be triggered in case e.g. a stop-mode
>>>> backup was done, since that was only happening via qm cleanup and this
>>>> would detect the now again running guest and abort.
>>>>
>>>> To improve the situation we move the exec_hookscript call at the end
>>>> of vm_stop_cleanup. At this point we know the vm is stopped and we
>>>> still
>>>> have the config lock.
>>>>
>>>> On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is
>>>> created that marks the vm is cleaned up by the api, so 'qm cleanup'
>>>> should not do it again.
>>>>
>>>> On vm start, this flag is cleared.
>>
>> It feels untidy to have something left after cleaning up, even if it's
>> just the file indicating that cleanup was done. Maybe we can switch it
>> around, see below:
>>
>>>>
>>>> There is still a tiny race possible:
>>>>
>>>> a guest is stopped from within (or crashes) and the vm is started again
>>>> via the api before 'qm cleanup' can run
>>>>
>>>> This should be a very rare case though, and all operation via the API
>>>> (reboot, shutdown+start, stop-mode backup, etc.) should work as
>>>> intended.
>>>
>>> How difficult is it to trigger the race with an HA-managed VM?
>>>
>>>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>>>> ---
>>>> I'm not sure how we could possibly eliminate the race i mentioned:
>>>> * we can't block on start because i don't think we can sensibly
>>>> decide between:
>>>>    - vm was crashing/powered off
>>>>    - vm was never started
>>>>
>>>> We could maybe leave a 'started' flag somewhere too and clear the
>>>> cleanup flag also in 'qm cleanup', then we would start the vm
>>>> only when the cleanup flag is cleared
>>>> (or have the cleanup flags have multiple states, like 'started',
>>>> 'finished')
>>
>> We already have something very similar, namely, the PID file. The issue
>> is that the PID file is removed automatically by QEMU upon clean
>> termination. For our use case we would need a second, more persistent
>> file. Then we could solve the issue of duplicate cleanup and the issue
>> of starting another instance before cleanup:
>>
>> 1. create a flag file at startup with an identifier for the
>>     QEMU instance, a second manual PID file?
>> 2. at cleanup, check the file:
>>     a) if there is no such file, skip, somebody else already cleaned up
>>        NOTE: we need to ensure that pre-existing instances are still
>>        cleaned up. One possible way would be to create a flag file during
>>        host startup and only use the new behavior when that is present.
>>     b) if the file exists, check if the QEMU instance is still around. If
>>        it is, wait for the instance to be gone until hitting some
>>        timeout. Once it's gone, do cleanup.
>> 3. make sure to run the post-stop hook whenever we remove the file
>> 4. if the file still exists at startup, cleanup was not done yet, wait
>>     until some timeout and when hitting the timeout, either proceed with
>>     start anyway or suggest running cleanup manually. The latter would be
>>     safer, but also worse from an UX standpoint, since cleanup is
>>     root-only
>>
>> What do you think?
> 
> yes, that seems good to me,  i'll play around with that and send a next
> version

Ah, one more thing. With stop mode backup, we'd still run into the issue
that the cleanup triggered by qmeventd might run into a newly started
instance and then wait around for nothing. We already skip cleanup if we
detect the 'rollback' lock since we know rollback does its own cleanup.
I think we can do the same for the backup lock (if shutdown was clean),
since we know stop mode backup does its own cleanup too. And it might be
better to do warn+return instead of die, since the situation is not
really unexpected (the one for rollback could be adapted too).




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent
  2026-02-24  9:50       ` Fiona Ebner
@ 2026-02-24 10:06         ` Dominik Csapak
  0 siblings, 0 replies; 6+ messages in thread
From: Dominik Csapak @ 2026-02-24 10:06 UTC (permalink / raw)
  To: Fiona Ebner, pve-devel



On 2/24/26 10:49 AM, Fiona Ebner wrote:
> Am 24.02.26 um 10:37 AM schrieb Dominik Csapak:
>> On 2/24/26 10:30 AM, Fiona Ebner wrote:
>>> Am 23.02.26 um 4:50 PM schrieb Fiona Ebner:
>>>> Am 23.02.26 um 11:56 AM schrieb Dominik Csapak:
>>>>> There are two ways a cleanup can be triggered:
>>>>>
>>>>> * When a guest is stopped/shutdown via the API, 'vm_stop' calls
>>>>> 'vm_stop_cleanup'.
>>>>> * When the guest process disconnects from qmeventd, 'qm cleanup' is
>>>>>     called, which in turn also tries to call 'vm_stop_cleanup'.
>>>>>
>>>>> Both of these happen under a qemu config lock, so there is no direct
>>>>> race condition that it will be called out of order, but it could happen
>>>>> that the 'qm cleanup' call happened in addition so cleanup was called
>>>>> twice. Which could be a problem when the shutdown was called with
>>>>> 'keepActive' which 'qm cleanup' would simply know nothing of and
>>>>> ignore.
>>>>>
>>>>> Also the post-stop hook might not be triggered in case e.g. a stop-mode
>>>>> backup was done, since that was only happening via qm cleanup and this
>>>>> would detect the now again running guest and abort.
>>>>>
>>>>> To improve the situation we move the exec_hookscript call at the end
>>>>> of vm_stop_cleanup. At this point we know the vm is stopped and we
>>>>> still
>>>>> have the config lock.
>>>>>
>>>>> On _do_vm_stop (and in the one case for migration) a 'cleanup-flag' is
>>>>> created that marks the vm is cleaned up by the api, so 'qm cleanup'
>>>>> should not do it again.
>>>>>
>>>>> On vm start, this flag is cleared.
>>>
>>> It feels untidy to have something left after cleaning up, even if it's
>>> just the file indicating that cleanup was done. Maybe we can switch it
>>> around, see below:
>>>
>>>>>
>>>>> There is still a tiny race possible:
>>>>>
>>>>> a guest is stopped from within (or crashes) and the vm is started again
>>>>> via the api before 'qm cleanup' can run
>>>>>
>>>>> This should be a very rare case though, and all operation via the API
>>>>> (reboot, shutdown+start, stop-mode backup, etc.) should work as
>>>>> intended.
>>>>
>>>> How difficult is it to trigger the race with an HA-managed VM?
>>>>
>>>>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>>>>> ---
>>>>> I'm not sure how we could possibly eliminate the race i mentioned:
>>>>> * we can't block on start because i don't think we can sensibly
>>>>> decide between:
>>>>>     - vm was crashing/powered off
>>>>>     - vm was never started
>>>>>
>>>>> We could maybe leave a 'started' flag somewhere too and clear the
>>>>> cleanup flag also in 'qm cleanup', then we would start the vm
>>>>> only when the cleanup flag is cleared
>>>>> (or have the cleanup flags have multiple states, like 'started',
>>>>> 'finished')
>>>
>>> We already have something very similar, namely, the PID file. The issue
>>> is that the PID file is removed automatically by QEMU upon clean
>>> termination. For our use case we would need a second, more persistent
>>> file. Then we could solve the issue of duplicate cleanup and the issue
>>> of starting another instance before cleanup:
>>>
>>> 1. create a flag file at startup with an identifier for the
>>>      QEMU instance, a second manual PID file?
>>> 2. at cleanup, check the file:
>>>      a) if there is no such file, skip, somebody else already cleaned up
>>>         NOTE: we need to ensure that pre-existing instances are still
>>>         cleaned up. One possible way would be to create a flag file during
>>>         host startup and only use the new behavior when that is present.
>>>      b) if the file exists, check if the QEMU instance is still around. If
>>>         it is, wait for the instance to be gone until hitting some
>>>         timeout. Once it's gone, do cleanup.
>>> 3. make sure to run the post-stop hook whenever we remove the file
>>> 4. if the file still exists at startup, cleanup was not done yet, wait
>>>      until some timeout and when hitting the timeout, either proceed with
>>>      start anyway or suggest running cleanup manually. The latter would be
>>>      safer, but also worse from an UX standpoint, since cleanup is
>>>      root-only
>>>
>>> What do you think?
>>
>> yes, that seems good to me,  i'll play around with that and send a next
>> version
> 
> Ah, one more thing. With stop mode backup, we'd still run into the issue
> that the cleanup triggered by qmeventd might run into a newly started
> instance and then wait around for nothing. We already skip cleanup if we
> detect the 'rollback' lock since we know rollback does its own cleanup.
> I think we can do the same for the backup lock (if shutdown was clean),
> since we know stop mode backup does its own cleanup too. And it might be
> better to do warn+return instead of die, since the situation is not
> really unexpected (the one for rollback could be adapted too).


sure makes sense, but i'd split that in a separate patch




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-24 10:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-23 10:49 [RFC PATCH qemu-server] cleanup: refactor to make cleanup flow more consistent Dominik Csapak
2026-02-23 15:49 ` Fiona Ebner
2026-02-24  9:30   ` Fiona Ebner
2026-02-24  9:37     ` Dominik Csapak
2026-02-24  9:50       ` Fiona Ebner
2026-02-24 10:06         ` Dominik Csapak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal