public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements
@ 2025-03-31 14:55 Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 1/6] savevm-async: improve setting state of snapshot operation in savevm-end handler Fiona Ebner
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 14:55 UTC (permalink / raw)
  To: pve-devel

Most importantly, start using a dedicated IO thread for the state
file when doing a live snapshot.

Having the state file be in the iohandler context means that a
blk_drain_all() call in the main thread or vCPU thread that happens
while the snapshot is running will result in a deadlock.

This change should also help in general to reduce load on the main
thread and for it to get stuck on IO, i.e. same benefits as using a
dedicated IO thread for regular drives. This is particularly
interesting when the VM state storage is a network storage like NFS.

With some luck, it could also help with bug #6262 [0]. The failure
there happens while issuing/right after the savevm-start QMP command,
so the most likely coroutine is the process_savevm_co() that was
previously scheduled to the iohandler context. Likely someone polls
the iohandler context and wants to enter the already scheduled
coroutine leading to the abort():
> qemu_aio_coroutine_enter: Co-routine was already scheduled in 'aio_co_schedule'
With a dedicated iothread, there hopefully is no such race.


Additionally, fix up some edge cases in error handling and setting the
state of the snapshot operation.


[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=6262

Fiona Ebner (6):
  savevm-async: improve setting state of snapshot operation in
    savevm-end handler
  savevm-async: rename saved_vm_running to vm_needs_start
  savevm-async: improve runstate preservation
  savevm-async: cleanup error handling in savevm_start
  savevm-async: use dedicated iothread for state file
  savevm-async: treat failure to set iothread context as a hard failure

 migration/savevm-async.c | 119 +++++++++++++++++++++++----------------
 1 file changed, 69 insertions(+), 50 deletions(-)

-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH qemu 1/6] savevm-async: improve setting state of snapshot operation in savevm-end handler
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
@ 2025-03-31 14:55 ` Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 2/6] savevm-async: rename saved_vm_running to vm_needs_start Fiona Ebner
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 14:55 UTC (permalink / raw)
  To: pve-devel

One of the callers of wait_for_close_co() already sets the state to
SAVE_STATE_DONE before, but that is not fully correct, because at that
moment, the operation is not fully done. In particular, if closing the
target later fails, the state would even be set to SAVE_STATE_ERROR
afterwards. DONE -> ERROR is not a valid state transition. Although,
it should not matter in practice as long as the relevant QMP commands
are sequential.

The other caller does not set the state and so there seems to be a
race that could lead to the state not getting set at all. This is
because before this commit, the wait_for_close_co() function could
return early when there is no target file anymore. This can only
happen when canceling and needs to happen right around the time when
the snapshot is already finishing and closing the target.

Simply avoid the early return and always set the state within the
wait_for_close_co() function rather than at the call site.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 migration/savevm-async.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 1e79fce9ba..e63dc6d8a3 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -450,23 +450,22 @@ static void coroutine_fn wait_for_close_co(void *opaque)
 {
     int64_t timeout;
 
-    if (!snap_state.target) {
-        DPRINTF("savevm-end: no target file open\n");
-        return;
-    }
-
-    /* wait until cleanup is done before returning, this ensures that after this
-     * call exits the statefile will be closed and can be removed immediately */
-    DPRINTF("savevm-end: waiting for cleanup\n");
-    timeout = 30L * 1000 * 1000 * 1000;
-    qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait,
-                              QEMU_CLOCK_REALTIME, timeout);
     if (snap_state.target) {
-        save_snapshot_error("timeout waiting for target file close in "
-                            "qmp_savevm_end");
-        /* we cannot assume the snapshot finished in this case, so leave the
-         * state alone - caller has to figure something out */
-        return;
+        /* wait until cleanup is done before returning, this ensures that after this
+         * call exits the statefile will be closed and can be removed immediately */
+        DPRINTF("savevm-end: waiting for cleanup\n");
+        timeout = 30L * 1000 * 1000 * 1000;
+        qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait,
+                                  QEMU_CLOCK_REALTIME, timeout);
+        if (snap_state.target) {
+            save_snapshot_error("timeout waiting for target file close in "
+                                "qmp_savevm_end");
+            /* we cannot assume the snapshot finished in this case, so leave the
+             * state alone - caller has to figure something out */
+            return;
+        }
+    } else {
+        DPRINTF("savevm-end: no target file open\n");
     }
 
     // File closed and no other error, so ensure next snapshot can be started.
@@ -497,8 +496,6 @@ void qmp_savevm_end(Error **errp)
         snap_state.saved_vm_running = false;
     }
 
-    snap_state.state = SAVE_STATE_DONE;
-
     qemu_coroutine_enter(wait_for_close);
 }
 
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH qemu 2/6] savevm-async: rename saved_vm_running to vm_needs_start
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 1/6] savevm-async: improve setting state of snapshot operation in savevm-end handler Fiona Ebner
@ 2025-03-31 14:55 ` Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 3/6] savevm-async: improve runstate preservation Fiona Ebner
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 14:55 UTC (permalink / raw)
  To: pve-devel

This is what the variable actually expresses. Otherwise, setting it
to false after starting the VM doesn't make sense.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 migration/savevm-async.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index e63dc6d8a3..1e34c31e8b 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -51,7 +51,7 @@ static struct SnapshotState {
     int state;
     Error *error;
     Error *blocker;
-    int saved_vm_running;
+    int vm_needs_start;
     QEMUFile *file;
     int64_t total_time;
     QEMUBH *finalize_bh;
@@ -224,9 +224,9 @@ static void process_savevm_finalize(void *opaque)
         save_snapshot_error("process_savevm_cleanup: invalid state: %d",
                             snap_state.state);
     }
-    if (snap_state.saved_vm_running) {
+    if (snap_state.vm_needs_start) {
         vm_start();
-        snap_state.saved_vm_running = false;
+        snap_state.vm_needs_start = false;
     }
 
     DPRINTF("timing: process_savevm_finalize (full) took %ld ms\n",
@@ -352,7 +352,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     }
 
     /* initialize snapshot info */
-    snap_state.saved_vm_running = runstate_is_running();
+    snap_state.vm_needs_start = runstate_is_running();
     snap_state.bs_pos = 0;
     snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     snap_state.blocker = NULL;
@@ -440,9 +440,9 @@ restart:
 
     save_snapshot_error("setup failed");
 
-    if (snap_state.saved_vm_running) {
+    if (snap_state.vm_needs_start) {
         vm_start();
-        snap_state.saved_vm_running = false;
+        snap_state.vm_needs_start = false;
     }
 }
 
@@ -491,9 +491,9 @@ void qmp_savevm_end(Error **errp)
         return;
     }
 
-    if (snap_state.saved_vm_running) {
+    if (snap_state.vm_needs_start) {
         vm_start();
-        snap_state.saved_vm_running = false;
+        snap_state.vm_needs_start = false;
     }
 
     qemu_coroutine_enter(wait_for_close);
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH qemu 3/6] savevm-async: improve runstate preservation
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 1/6] savevm-async: improve setting state of snapshot operation in savevm-end handler Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 2/6] savevm-async: rename saved_vm_running to vm_needs_start Fiona Ebner
@ 2025-03-31 14:55 ` Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 4/6] savevm-async: cleanup error handling in savevm_start Fiona Ebner
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 14:55 UTC (permalink / raw)
  To: pve-devel

Determine if VM needs to be started after finishing right before
actually stopping the VM instead of at the beginning.

In qmp_savevm_start(), the only path stopping the VM returns right
aftwards, so there is no need for the vm_start() handling after
errors. The next commit will inline the remaining error handling and
avoid the confusing naming of the 'restart' goto label.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 migration/savevm-async.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 1e34c31e8b..72d8ed2739 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -178,6 +178,7 @@ static void process_savevm_finalize(void *opaque)
      */
     blk_set_aio_context(snap_state.target, qemu_get_aio_context(), NULL);
 
+    snap_state.vm_needs_start = runstate_is_running();
     ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
     if (ret < 0) {
         save_snapshot_error("vm_stop_force_state error %d", ret);
@@ -352,7 +353,6 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     }
 
     /* initialize snapshot info */
-    snap_state.vm_needs_start = runstate_is_running();
     snap_state.bs_pos = 0;
     snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     snap_state.blocker = NULL;
@@ -364,6 +364,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     }
 
     if (!statefile) {
+        snap_state.vm_needs_start = runstate_is_running();
         vm_stop(RUN_STATE_SAVE_VM);
         snap_state.state = SAVE_STATE_COMPLETED;
         return;
@@ -439,11 +440,6 @@ void qmp_savevm_start(const char *statefile, Error **errp)
 restart:
 
     save_snapshot_error("setup failed");
-
-    if (snap_state.vm_needs_start) {
-        vm_start();
-        snap_state.vm_needs_start = false;
-    }
 }
 
 static void coroutine_fn wait_for_close_co(void *opaque)
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH qemu 4/6] savevm-async: cleanup error handling in savevm_start
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
                   ` (2 preceding siblings ...)
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 3/6] savevm-async: improve runstate preservation Fiona Ebner
@ 2025-03-31 14:55 ` Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 5/6] savevm-async: use dedicated iothread for state file Fiona Ebner
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 14:55 UTC (permalink / raw)
  To: pve-devel

The 'restart' label is a complete misnomer since the last commit and
the single operation of setting the error in the snapshot state can be
inlined everywhere. Also adds it in two branches it was missing.

Lastly, improve the code style for checking whether migrate_init()
failed by explicitly comparing against 0.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 migration/savevm-async.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 72d8ed2739..d46f4e0275 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -371,6 +371,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     }
 
     if (qemu_savevm_state_blocked(errp)) {
+        save_snapshot_error("setup failed");
         return;
     }
 
@@ -381,12 +382,14 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     snap_state.target = blk_new_open(statefile, NULL, options, bdrv_oflags, &local_err);
     if (!snap_state.target) {
         error_setg(errp, "failed to open '%s'", statefile);
-        goto restart;
+        save_snapshot_error("setup failed");
+        return;
     }
     target_bs = blk_bs(snap_state.target);
     if (!target_bs) {
         error_setg(errp, "failed to open '%s' - no block driver state", statefile);
-        goto restart;
+        save_snapshot_error("setup failed");
+        return;
     }
 
     QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
@@ -395,7 +398,8 @@ void qmp_savevm_start(const char *statefile, Error **errp)
 
     if (!snap_state.file) {
         error_setg(errp, "failed to open '%s'", statefile);
-        goto restart;
+        save_snapshot_error("setup failed");
+        return;
     }
 
     /*
@@ -403,7 +407,8 @@ void qmp_savevm_start(const char *statefile, Error **errp)
      * State is cleared in process_savevm_co, but has to be initialized
      * here (blocking main thread, from QMP) to avoid race conditions.
      */
-    if (migrate_init(ms, errp)) {
+    if (migrate_init(ms, errp) != 0) {
+        save_snapshot_error("setup failed");
         return;
     }
     memset(&mig_stats, 0, sizeof(mig_stats));
@@ -419,6 +424,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     if (ret != 0) {
         error_setg_errno(errp, -ret, "savevm state setup failed: %s",
                          local_err ? error_get_pretty(local_err) : "unknown error");
+        save_snapshot_error("setup failed");
         return;
     }
 
@@ -436,10 +442,6 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     aio_co_schedule(iohandler_ctx, snap_state.co);
 
     return;
-
-restart:
-
-    save_snapshot_error("setup failed");
 }
 
 static void coroutine_fn wait_for_close_co(void *opaque)
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH qemu 5/6] savevm-async: use dedicated iothread for state file
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
                   ` (3 preceding siblings ...)
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 4/6] savevm-async: cleanup error handling in savevm_start Fiona Ebner
@ 2025-03-31 14:55 ` Fiona Ebner
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 6/6] savevm-async: treat failure to set iothread context as a hard failure Fiona Ebner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 14:55 UTC (permalink / raw)
  To: pve-devel

Having the state file be in the iohandler context means that a
blk_drain_all() call in the main thread or vCPU thread that happens
while the snapshot is running will result in a deadlock.

For example, the main thread might be stuck in:

> 0  0x00007300ac9552d6 in __ppoll (fds=0x64bd5a411a50, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
> 1  0x000064bd51af3cad in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:64
> 2  0x000064bd51ad8799 in fdmon_poll_wait (ctx=0x64bd58d968a0, ready_list=0x7ffcfcc15558, timeout=-1) at ../util/fdmon-poll.c:79
> 3  0x000064bd51ad7c3d in aio_poll (ctx=0x64bd58d968a0, blocking=blocking@entry=true) at ../util/aio-posix.c:671
> 4  0x000064bd519a0b5d in bdrv_drain_all_begin () at ../block/io.c:531
> 5  bdrv_drain_all_begin () at ../block/io.c:510
> 6  0x000064bd519943c4 in blk_drain_all () at ../block/block-backend.c:2085
> 7  0x000064bd5160fc5a in virtio_scsi_dataplane_stop (vdev=0x64bd5a215190) at ../hw/scsi/virtio-scsi-dataplane.c:213
> 8  0x000064bd51664e90 in virtio_bus_stop_ioeventfd (bus=0x64bd5a215110) at ../hw/virtio/virtio-bus.c:259
> 9  0x000064bd5166511b in virtio_bus_stop_ioeventfd (bus=<optimized out>) at ../hw/virtio/virtio-bus.c:251
> 10 virtio_bus_reset (bus=<optimized out>) at ../hw/virtio/virtio-bus.c:107
> 11 0x000064bd51667431 in virtio_pci_reset (qdev=<optimized out>) at ../hw/virtio/virtio-pci.c:2296
...
> 34 0x000064bd517aa951 in pc_machine_reset (machine=<optimized out>, type=<optimized out>) at ../hw/i386/pc.c:1722
> 35 0x000064bd516aa4c4 in qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_GUEST_RESET) at ../system/runstate.c:525
> 36 0x000064bd516aaeb9 in main_loop_should_exit (status=<synthetic pointer>) at ../system/runstate.c:801
> 37 qemu_main_loop () at ../system/runstate.c:834

which is in block/io.c:

> /* Now poll the in-flight requests */
> AIO_WAIT_WHILE_UNLOCKED(NULL, bdrv_drain_all_poll());

The working theory is: The deadlock happens because the IO is issued
from the process_savevm_co() coroutine, which doesn't get scheduled
again to complete in-flight requests when the main thread is stuck
there polling. The main thread itself is the one that would need to
schedule it. In case of a vCPU triggering the VirtIO SCSI dataplane
stop, which happens during (Linux) boot, the vCPU thread will hold the
big QEMU lock (BQL) blocking the main thread from making progress
scheduling the process_savevm_co() coroutine.

This change should also help in general to reduce load on the main
thread and for it to get stuck on IO, i.e. same benefits as using a
dedicated IO thread for regular drives. This is particularly
interesting when the VM state storage is a network storage like NFS.

With some luck, it could also help with bug #6262 [0]. The failure
there happens while issuing/right after the savevm-start QMP command,
so the most likely coroutine is the process_savevm_co() that was
previously scheduled to the iohandler context. Likely someone polls
the iohandler context and wants to enter the already scheduled
coroutine leading to the abort():
> qemu_aio_coroutine_enter: Co-routine was already scheduled in 'aio_co_schedule'
With a dedicated iothread, there hopefully is no such race.

The comment above querying the pending bytes wrongly talked about the
"iothread lock", but should've been "iohandler lock". This was even
renamed to BQL (big QEMU lock) a few releases ago. Even if that was
not a typo to begin with, there are no AioContext locks anymore.

[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=6262

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 migration/savevm-async.c | 59 ++++++++++++++++++++++++++++------------
 1 file changed, 41 insertions(+), 18 deletions(-)

diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index d46f4e0275..7176946691 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -25,6 +25,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/rcu.h"
 #include "qemu/yank.h"
+#include "sysemu/iothread.h"
 
 /* #define DEBUG_SAVEVM_STATE */
 
@@ -57,6 +58,7 @@ static struct SnapshotState {
     QEMUBH *finalize_bh;
     Coroutine *co;
     QemuCoSleep target_close_wait;
+    IOThread *iothread;
 } snap_state;
 
 static bool savevm_aborted(void)
@@ -256,16 +258,13 @@ static void coroutine_fn process_savevm_co(void *opaque)
         uint64_t threshold = 400 * 1000;
 
         /*
-         * pending_{estimate,exact} are expected to be called without iothread
-         * lock. Similar to what is done in migration.c, call the exact variant
-         * only once pend_precopy in the estimate is below the threshold.
+         * Similar to what is done in migration.c, call the exact variant only
+         * once pend_precopy in the estimate is below the threshold.
          */
-        bql_unlock();
         qemu_savevm_state_pending_estimate(&pend_precopy, &pend_postcopy);
         if (pend_precopy <= threshold) {
             qemu_savevm_state_pending_exact(&pend_precopy, &pend_postcopy);
         }
-        bql_lock();
         pending_size = pend_precopy + pend_postcopy;
 
         /*
@@ -332,11 +331,22 @@ static void coroutine_fn process_savevm_co(void *opaque)
     qemu_bh_schedule(snap_state.finalize_bh);
 }
 
+static void savevm_cleanup_iothread(void) {
+    if (snap_state.iothread) {
+        iothread_destroy(snap_state.iothread);
+        snap_state.iothread = NULL;
+    }
+}
+
+static void savevm_start_failed(void) {
+    savevm_cleanup_iothread();
+    save_snapshot_error("setup failed");
+}
+
 void qmp_savevm_start(const char *statefile, Error **errp)
 {
     Error *local_err = NULL;
     MigrationState *ms = migrate_get_current();
-    AioContext *iohandler_ctx = iohandler_get_aio_context();
     BlockDriverState *target_bs = NULL;
     int ret = 0;
 
@@ -371,10 +381,24 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     }
 
     if (qemu_savevm_state_blocked(errp)) {
-        save_snapshot_error("setup failed");
+        savevm_start_failed();
         return;
     }
 
+    if (snap_state.iothread) {
+        /* This is not expected, so warn about it, but no point in re-creating a new iothread. */
+        warn_report("iothread for snapshot already exists - re-using");
+    } else {
+        snap_state.iothread =
+            iothread_create("__proxmox_savevm_async_iothread__", &local_err);
+        if (!snap_state.iothread) {
+            error_setg(errp, "creating iothread failed: %s",
+                       local_err ? error_get_pretty(local_err) : "unknown error");
+            savevm_start_failed();
+            return;
+        }
+    }
+
     /* Open the image */
     QDict *options = NULL;
     options = qdict_new();
@@ -382,13 +406,13 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     snap_state.target = blk_new_open(statefile, NULL, options, bdrv_oflags, &local_err);
     if (!snap_state.target) {
         error_setg(errp, "failed to open '%s'", statefile);
-        save_snapshot_error("setup failed");
+        savevm_start_failed();
         return;
     }
     target_bs = blk_bs(snap_state.target);
     if (!target_bs) {
         error_setg(errp, "failed to open '%s' - no block driver state", statefile);
-        save_snapshot_error("setup failed");
+        savevm_start_failed();
         return;
     }
 
@@ -398,7 +422,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
 
     if (!snap_state.file) {
         error_setg(errp, "failed to open '%s'", statefile);
-        save_snapshot_error("setup failed");
+        savevm_start_failed();
         return;
     }
 
@@ -408,7 +432,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
      * here (blocking main thread, from QMP) to avoid race conditions.
      */
     if (migrate_init(ms, errp) != 0) {
-        save_snapshot_error("setup failed");
+        savevm_start_failed();
         return;
     }
     memset(&mig_stats, 0, sizeof(mig_stats));
@@ -424,22 +448,19 @@ void qmp_savevm_start(const char *statefile, Error **errp)
     if (ret != 0) {
         error_setg_errno(errp, -ret, "savevm state setup failed: %s",
                          local_err ? error_get_pretty(local_err) : "unknown error");
-        save_snapshot_error("setup failed");
+        savevm_start_failed();
         return;
     }
 
-    /* Async processing from here on out happens in iohandler context, so let
-     * the target bdrv have its home there.
-     */
-    ret = blk_set_aio_context(snap_state.target, iohandler_ctx, &local_err);
+    ret = blk_set_aio_context(snap_state.target, snap_state.iothread->ctx, &local_err);
     if (ret != 0) {
-        warn_report("failed to set iohandler context for VM state target: %s %s",
+        warn_report("failed to set iothread context for VM state target: %s %s",
                     local_err ? error_get_pretty(local_err) : "unknown error",
                     strerror(-ret));
     }
 
     snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
-    aio_co_schedule(iohandler_ctx, snap_state.co);
+    aio_co_schedule(snap_state.iothread->ctx, snap_state.co);
 
     return;
 }
@@ -466,6 +487,8 @@ static void coroutine_fn wait_for_close_co(void *opaque)
         DPRINTF("savevm-end: no target file open\n");
     }
 
+    savevm_cleanup_iothread();
+
     // File closed and no other error, so ensure next snapshot can be started.
     if (snap_state.state != SAVE_STATE_ERROR) {
         snap_state.state = SAVE_STATE_DONE;
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH qemu 6/6] savevm-async: treat failure to set iothread context as a hard failure
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
                   ` (4 preceding siblings ...)
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 5/6] savevm-async: use dedicated iothread for state file Fiona Ebner
@ 2025-03-31 14:55 ` Fiona Ebner
  2025-03-31 15:06 ` [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
  2025-04-02 15:35 ` [pve-devel] applied-series: " Wolfgang Bumiller
  7 siblings, 0 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 14:55 UTC (permalink / raw)
  To: pve-devel

This is not expected to ever fail and there might be assumptions about
having the expected context down the line.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 migration/savevm-async.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 7176946691..03823ecd17 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -454,9 +454,10 @@ void qmp_savevm_start(const char *statefile, Error **errp)
 
     ret = blk_set_aio_context(snap_state.target, snap_state.iothread->ctx, &local_err);
     if (ret != 0) {
-        warn_report("failed to set iothread context for VM state target: %s %s",
-                    local_err ? error_get_pretty(local_err) : "unknown error",
-                    strerror(-ret));
+        error_setg_errno(errp, -ret, "failed to set iothread context for VM state target: %s",
+                         local_err ? error_get_pretty(local_err) : "unknown error");
+        savevm_start_failed();
+        return;
     }
 
     snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
                   ` (5 preceding siblings ...)
  2025-03-31 14:55 ` [pve-devel] [PATCH qemu 6/6] savevm-async: treat failure to set iothread context as a hard failure Fiona Ebner
@ 2025-03-31 15:06 ` Fiona Ebner
  2025-04-02 15:35 ` [pve-devel] applied-series: " Wolfgang Bumiller
  7 siblings, 0 replies; 9+ messages in thread
From: Fiona Ebner @ 2025-03-31 15:06 UTC (permalink / raw)
  To: pve-devel

Am 31.03.25 um 16:55 schrieb Fiona Ebner:
> Most importantly, start using a dedicated IO thread for the state
> file when doing a live snapshot.
> 
> Having the state file be in the iohandler context means that a
> blk_drain_all() call in the main thread or vCPU thread that happens
> while the snapshot is running will result in a deadlock.

Forgot to mention, this is easily reproducible by shutting down or
rebooting a Linux VM with SCSI disk (from within the guest) while a
snapshot is being taken. Patch 5/6 contains technical details.

Note that shutting down the VM will still result in failure after these
patches, but it's much better, because it's not a deadlock anymore and I
guess it's kinda philosophically okay, because there can be no VM state
of a shutdown VM. Can still be changed later if we want to do something
differently there.

Much more interesting is the reboot scenario of course (think in the
context of time-based automated snapshots where the guest might be doing
whatever at the time a snapshot is taken, in particular reboot), which
now should work seamlessly and not result in a deadlock either anymore.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] applied-series: [PATCH-SERIES qemu 0/6] async snapshot improvements
  2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
                   ` (6 preceding siblings ...)
  2025-03-31 15:06 ` [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
@ 2025-04-02 15:35 ` Wolfgang Bumiller
  7 siblings, 0 replies; 9+ messages in thread
From: Wolfgang Bumiller @ 2025-04-02 15:35 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

Applied the series, but I undid the removal of the `restart:` goto-style
error handling and instead renamed it to 'fail', since we'd otherwise
copy the exact same function call into multiple locations which later
becomes its own extra helper function specifically for this one purpose,
which is now inlined into the `fail:` label instead.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-04-02 15:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-31 14:55 [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
2025-03-31 14:55 ` [pve-devel] [PATCH qemu 1/6] savevm-async: improve setting state of snapshot operation in savevm-end handler Fiona Ebner
2025-03-31 14:55 ` [pve-devel] [PATCH qemu 2/6] savevm-async: rename saved_vm_running to vm_needs_start Fiona Ebner
2025-03-31 14:55 ` [pve-devel] [PATCH qemu 3/6] savevm-async: improve runstate preservation Fiona Ebner
2025-03-31 14:55 ` [pve-devel] [PATCH qemu 4/6] savevm-async: cleanup error handling in savevm_start Fiona Ebner
2025-03-31 14:55 ` [pve-devel] [PATCH qemu 5/6] savevm-async: use dedicated iothread for state file Fiona Ebner
2025-03-31 14:55 ` [pve-devel] [PATCH qemu 6/6] savevm-async: treat failure to set iothread context as a hard failure Fiona Ebner
2025-03-31 15:06 ` [pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements Fiona Ebner
2025-04-02 15:35 ` [pve-devel] applied-series: " Wolfgang Bumiller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal