From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 742881FF38A for ; Fri, 28 Jun 2024 11:02:30 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9C31433F36; Fri, 28 Jun 2024 11:02:40 +0200 (CEST) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Fri, 28 Jun 2024 11:02:33 +0200 Message-Id: <20240628090233.48637-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.063 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, meson.build] Subject: [pve-devel] [PATCH qemu] async snapshot: fix crash with VirtIO block with iothread when not saving VM state X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" As reported in the community forum [0], doing a snapshot without saving the VM state for a VM with a VirtIO block device with iothread would lead to an assertion failure [1] and thus crash. The issue is that vm_start() is called from the coroutine qmp_savevm_end() which violates assumptions about graph locking down the line. Factor out the part of qmp_savevm_end() that actually needs to be a coroutine into a separate helper and turn qmp_savevm_end() into a non-coroutine, so that it can call vm_start() safely. The issue is likely not new, but was exposed by the recent graph locking rework introducing stricter checks. The issue does not occur when saving the VM state, because then the non-coroutine process_savevm_finalize() will already call vm_start() before qmp_savevm_end(). [0]: https://forum.proxmox.com/threads/149883/ [1]: > #0 0x00007353e6096e2c __pthread_kill_implementation (libc.so.6 + 0x8ae2c) > #1 0x00007353e6047fb2 __GI_raise (libc.so.6 + 0x3bfb2) > #2 0x00007353e6032472 __GI_abort (libc.so.6 + 0x26472) > #3 0x00007353e6032395 __assert_fail_base (libc.so.6 + 0x26395) > #4 0x00007353e6040eb2 __GI___assert_fail (libc.so.6 + 0x34eb2) > #5 0x0000592002307bb3 bdrv_graph_rdlock_main_loop (qemu-system-x86_64 + 0x83abb3) > #6 0x00005920022da455 bdrv_change_aio_context (qemu-system-x86_64 + 0x80d455) > #7 0x00005920022da6cb bdrv_try_change_aio_context (qemu-system-x86_64 + 0x80d6cb) > #8 0x00005920022fe122 blk_set_aio_context (qemu-system-x86_64 + 0x831122) > #9 0x00005920021b7b90 virtio_blk_start_ioeventfd (qemu-system-x86_64 + 0x6eab90) > #10 0x0000592002022927 virtio_bus_start_ioeventfd (qemu-system-x86_64 + 0x555927) > #11 0x0000592002066cc4 vm_state_notify (qemu-system-x86_64 + 0x599cc4) > #12 0x000059200205d517 vm_prepare_start (qemu-system-x86_64 + 0x590517) > #13 0x000059200205d56b vm_start (qemu-system-x86_64 + 0x59056b) > #14 0x00005920020a43fd qmp_savevm_end (qemu-system-x86_64 + 0x5d73fd) > #15 0x00005920023f3749 qmp_marshal_savevm_end (qemu-system-x86_64 + 0x926749) > #16 0x000059200242f1d8 qmp_dispatch (qemu-system-x86_64 + 0x9621d8) > #17 0x000059200238fa98 monitor_qmp_dispatch (qemu-system-x86_64 + 0x8c2a98) > #18 0x000059200239044e monitor_qmp_dispatcher_co (qemu-system-x86_64 + 0x8c344e) > #19 0x000059200245359b coroutine_trampoline (qemu-system-x86_64 + 0x98659b) > #20 0x00007353e605d9c0 n/a (libc.so.6 + 0x519c0) Signed-off-by: Fiona Ebner --- ...async-for-background-state-snapshots.patch | 62 +++++++++++-------- ...add-optional-buffer-size-to-QEMUFile.patch | 4 +- 2 files changed, 37 insertions(+), 29 deletions(-) diff --git a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch index f97226b..964caf3 100644 --- a/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch +++ b/debian/patches/pve/0017-PVE-add-savevm-async-for-background-state-snapshots.patch @@ -27,7 +27,8 @@ Signed-off-by: Stefan Reiter [FE: further improve aborting adapt to removal of QEMUFileOps improve condition for entering final stage - adapt to QAPI and other changes for 8.2] + adapt to QAPI and other changes for 8.2 + make sure to not call vm_start() from coroutine] Signed-off-by: Fiona Ebner --- hmp-commands-info.hx | 13 + @@ -35,13 +36,13 @@ Signed-off-by: Fiona Ebner include/migration/snapshot.h | 2 + include/monitor/hmp.h | 3 + migration/meson.build | 1 + - migration/savevm-async.c | 531 +++++++++++++++++++++++++++++++++++ + migration/savevm-async.c | 538 +++++++++++++++++++++++++++++++++++ monitor/hmp-cmds.c | 38 +++ qapi/migration.json | 34 +++ qapi/misc.json | 18 ++ qemu-options.hx | 12 + system/vl.c | 10 + - 11 files changed, 679 insertions(+) + 11 files changed, 686 insertions(+) create mode 100644 migration/savevm-async.c diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx @@ -139,10 +140,10 @@ index 95d1cf2250..800f12a60d 100644 'threadinfo.c', diff --git a/migration/savevm-async.c b/migration/savevm-async.c new file mode 100644 -index 0000000000..779e4e2a78 +index 0000000000..72cf6588c2 --- /dev/null +++ b/migration/savevm-async.c -@@ -0,0 +1,531 @@ +@@ -0,0 +1,538 @@ +#include "qemu/osdep.h" +#include "migration/channel-savevm-async.h" +#include "migration/migration.h" @@ -570,29 +571,10 @@ index 0000000000..779e4e2a78 + } +} + -+void coroutine_fn qmp_savevm_end(Error **errp) ++static void coroutine_fn wait_for_close_co(void *opaque) +{ + int64_t timeout; + -+ if (snap_state.state == SAVE_STATE_DONE) { -+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, -+ "VM snapshot not started\n"); -+ return; -+ } -+ -+ if (snap_state.state == SAVE_STATE_ACTIVE) { -+ snap_state.state = SAVE_STATE_CANCELLED; -+ goto wait_for_close; -+ } -+ -+ if (snap_state.saved_vm_running) { -+ vm_start(); -+ snap_state.saved_vm_running = false; -+ } -+ -+ snap_state.state = SAVE_STATE_DONE; -+ -+wait_for_close: + if (!snap_state.target) { + DPRINTF("savevm-end: no target file open\n"); + return; @@ -620,6 +602,32 @@ index 0000000000..779e4e2a78 + DPRINTF("savevm-end: cleanup done\n"); +} + ++void qmp_savevm_end(Error **errp) ++{ ++ if (snap_state.state == SAVE_STATE_DONE) { ++ error_set(errp, ERROR_CLASS_GENERIC_ERROR, ++ "VM snapshot not started\n"); ++ return; ++ } ++ ++ Coroutine *wait_for_close = qemu_coroutine_create(wait_for_close_co, NULL); ++ ++ if (snap_state.state == SAVE_STATE_ACTIVE) { ++ snap_state.state = SAVE_STATE_CANCELLED; ++ qemu_coroutine_enter(wait_for_close); ++ return; ++ } ++ ++ if (snap_state.saved_vm_running) { ++ vm_start(); ++ snap_state.saved_vm_running = false; ++ } ++ ++ snap_state.state = SAVE_STATE_DONE; ++ ++ qemu_coroutine_enter(wait_for_close); ++} ++ +int load_snapshot_from_blockdev(const char *filename, Error **errp) +{ + BlockBackend *be; @@ -773,7 +781,7 @@ index 8c65b90328..ed20d066cd 100644 # @query-migrate: # diff --git a/qapi/misc.json b/qapi/misc.json -index ec30e5c570..7147199a12 100644 +index ec30e5c570..3c68633f68 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -454,6 +454,24 @@ @@ -796,7 +804,7 @@ index ec30e5c570..7147199a12 100644 +# Resume VM after a snapshot. +# +## -+{ 'command': 'savevm-end', 'coroutine': true } ++{ 'command': 'savevm-end' } + ## # @CommandLineParameterType: diff --git a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch index 1c0e9ef..043b249 100644 --- a/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch +++ b/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch @@ -193,7 +193,7 @@ index 32fd4a34fd..36a0cd8cc8 100644 /* diff --git a/migration/savevm-async.c b/migration/savevm-async.c -index 779e4e2a78..bf36fc06d2 100644 +index 72cf6588c2..fb4e8ea689 100644 --- a/migration/savevm-async.c +++ b/migration/savevm-async.c @@ -379,7 +379,7 @@ void qmp_savevm_start(const char *statefile, Error **errp) @@ -205,7 +205,7 @@ index 779e4e2a78..bf36fc06d2 100644 if (!snap_state.file) { error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile); -@@ -496,7 +496,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp) +@@ -503,7 +503,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp) blk_op_block_all(be, blocker); /* restore the VM state */ -- 2.39.2 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel