* [pve-devel] [PATCH qemu] avoid segfault when aborting snapshot
@ 2022-07-26 12:25 Fiona Ebner
2022-08-02 9:52 ` Mira Limbeck
0 siblings, 1 reply; 3+ messages in thread
From: Fiona Ebner @ 2022-07-26 12:25 UTC (permalink / raw)
To: pve-devel
Reported in the community forum[0].
For 6.1.0, there were a few changes to the coroutine-sleep API, but
the adaptations in f376b2b ("update and rebase to QEMU v6.1.0") made
a mistake.
Currently, target_close_wait is NULL when passed to
qemu_co_sleep_ns_wakeable(), which further passes it to
qemu_co_sleep(), but there, it is dereferenced when trying to access
the 'to_wake' member:
> Thread 1 "kvm" received signal SIGSEGV, Segmentation fault.
> qemu_co_sleep (w=0x0) at ../util/qemu-coroutine-sleep.c:57
To fix it, create a proper struct and pass its address instead. Also
call qemu_co_sleep_wake unconditionally, because the NULL check (for
the 'to_wake' member) is done inside the function itself.
This patch is based on what the QEMU commits introducing the changes
to the coroutine-sleep API did to the callers in QEMU:
eaee072085 ("coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing")
29a6ea24eb ("coroutine-sleep: replace QemuCoSleepState pointer with struct in the API")
[0]: https://forum.proxmox.com/threads/112130/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
...async-for-background-state-snapshots.patch | 20 +++++++++----------
...add-optional-buffer-size-to-QEMUFile.patch | 6 +++---
...-register-yank-before-migration_inco.patch | 4 ++--
3 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/debian/patches/pve/0016-PVE-add-savevm-async-for-background-state-snapshots.patch b/debian/patches/pve/0016-PVE-add-savevm-async-for-background-state-snapshots.patch
index 0197289..62a8e98 100644
--- a/debian/patches/pve/0016-PVE-add-savevm-async-for-background-state-snapshots.patch
+++ b/debian/patches/pve/0016-PVE-add-savevm-async-for-background-state-snapshots.patch
@@ -23,19 +23,21 @@ Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[improve aborting]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+[FE: further improve aborting]
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hmp-commands-info.hx | 13 +
hmp-commands.hx | 33 ++
include/migration/snapshot.h | 2 +
include/monitor/hmp.h | 5 +
migration/meson.build | 1 +
- migration/savevm-async.c | 598 +++++++++++++++++++++++++++++++++++
+ migration/savevm-async.c | 596 +++++++++++++++++++++++++++++++++++
monitor/hmp-cmds.c | 57 ++++
qapi/migration.json | 34 ++
qapi/misc.json | 32 ++
qemu-options.hx | 12 +
softmmu/vl.c | 10 +
- 11 files changed, 797 insertions(+)
+ 11 files changed, 795 insertions(+)
create mode 100644 migration/savevm-async.c
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
@@ -151,10 +153,10 @@ index 8b5ca5c047..1e2aec8486 100644
), gnutls)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
new file mode 100644
-index 0000000000..79a0cda906
+index 0000000000..88215cdb70
--- /dev/null
+++ b/migration/savevm-async.c
-@@ -0,0 +1,598 @@
+@@ -0,0 +1,596 @@
+#include "qemu/osdep.h"
+#include "migration/migration.h"
+#include "migration/savevm.h"
@@ -210,7 +212,7 @@ index 0000000000..79a0cda906
+ int64_t total_time;
+ QEMUBH *finalize_bh;
+ Coroutine *co;
-+ QemuCoSleep *target_close_wait;
++ QemuCoSleep target_close_wait;
+} snap_state;
+
+static bool savevm_aborted(void)
@@ -285,9 +287,7 @@ index 0000000000..79a0cda906
+ blk_unref(snap_state.target);
+ snap_state.target = NULL;
+
-+ if (snap_state.target_close_wait) {
-+ qemu_co_sleep_wake(snap_state.target_close_wait);
-+ }
++ qemu_co_sleep_wake(&snap_state.target_close_wait);
+ }
+
+ return ret;
@@ -549,6 +549,7 @@ index 0000000000..79a0cda906
+ snap_state.bs_pos = 0;
+ snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+ snap_state.blocker = NULL;
++ snap_state.target_close_wait.to_wake = NULL;
+
+ if (snap_state.error) {
+ error_free(snap_state.error);
@@ -653,9 +654,8 @@ index 0000000000..79a0cda906
+ * call exits the statefile will be closed and can be removed immediately */
+ DPRINTF("savevm-end: waiting for cleanup\n");
+ timeout = 30L * 1000 * 1000 * 1000;
-+ qemu_co_sleep_ns_wakeable(snap_state.target_close_wait,
++ qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait,
+ QEMU_CLOCK_REALTIME, timeout);
-+ snap_state.target_close_wait = NULL;
+ if (snap_state.target) {
+ save_snapshot_error("timeout waiting for target file close in "
+ "qmp_savevm_end");
diff --git a/debian/patches/pve/0017-PVE-add-optional-buffer-size-to-QEMUFile.patch b/debian/patches/pve/0017-PVE-add-optional-buffer-size-to-QEMUFile.patch
index e64ebbe..788312a 100644
--- a/debian/patches/pve/0017-PVE-add-optional-buffer-size-to-QEMUFile.patch
+++ b/debian/patches/pve/0017-PVE-add-optional-buffer-size-to-QEMUFile.patch
@@ -165,10 +165,10 @@ index 3f36d4dc8c..67501fd9cf 100644
int qemu_get_fd(QEMUFile *f);
int qemu_fclose(QEMUFile *f);
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
-index 79a0cda906..970ee3b3fc 100644
+index 88215cdb70..615a4484c8 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
-@@ -418,7 +418,7 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
+@@ -417,7 +417,7 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
goto restart;
}
@@ -177,7 +177,7 @@ index 79a0cda906..970ee3b3fc 100644
if (!snap_state.file) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
-@@ -567,7 +567,7 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
+@@ -565,7 +565,7 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
blk_op_block_all(be, blocker);
/* restore the VM state */
diff --git a/debian/patches/pve/0049-PVE-savevm-async-register-yank-before-migration_inco.patch b/debian/patches/pve/0049-PVE-savevm-async-register-yank-before-migration_inco.patch
index 1359424..de6e2c2 100644
--- a/debian/patches/pve/0049-PVE-savevm-async-register-yank-before-migration_inco.patch
+++ b/debian/patches/pve/0049-PVE-savevm-async-register-yank-before-migration_inco.patch
@@ -11,7 +11,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 5 insertions(+)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
-index 970ee3b3fc..b3ccc069f1 100644
+index 615a4484c8..161b4b9985 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -19,6 +19,7 @@
@@ -22,7 +22,7 @@ index 970ee3b3fc..b3ccc069f1 100644
/* #define DEBUG_SAVEVM_STATE */
-@@ -580,6 +581,10 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
+@@ -578,6 +579,10 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
dirty_bitmap_mig_before_vm_start();
qemu_fclose(f);
--
2.30.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pve-devel] [PATCH qemu] avoid segfault when aborting snapshot
2022-07-26 12:25 [pve-devel] [PATCH qemu] avoid segfault when aborting snapshot Fiona Ebner
@ 2022-08-02 9:52 ` Mira Limbeck
2022-08-02 10:04 ` Fiona Ebner
0 siblings, 1 reply; 3+ messages in thread
From: Mira Limbeck @ 2022-08-02 9:52 UTC (permalink / raw)
To: pve-devel
On 7/26/22 14:25, Fiona Ebner wrote:
> Reported in the community forum[0].
>
> For 6.1.0, there were a few changes to the coroutine-sleep API, but
> the adaptations in f376b2b ("update and rebase to QEMU v6.1.0") made
> a mistake.
>
> Currently, target_close_wait is NULL when passed to
> qemu_co_sleep_ns_wakeable(), which further passes it to
> qemu_co_sleep(), but there, it is dereferenced when trying to access
> the 'to_wake' member:
>
>> Thread 1 "kvm" received signal SIGSEGV, Segmentation fault.
>> qemu_co_sleep (w=0x0) at ../util/qemu-coroutine-sleep.c:57
> To fix it, create a proper struct and pass its address instead. Also
> call qemu_co_sleep_wake unconditionally, because the NULL check (for
> the 'to_wake' member) is done inside the function itself.
>
> This patch is based on what the QEMU commits introducing the changes
> to the coroutine-sleep API did to the callers in QEMU:
> eaee072085 ("coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing")
> 29a6ea24eb ("coroutine-sleep: replace QemuCoSleepState pointer with struct in the API")
>
> [0]: https://forum.proxmox.com/threads/112130/
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
Tested-by: Mira Limbeck <m.limbeck@proxmox.com>
Found a strange behavior when aborting the snapshot. It no longer
crashes, but trying to snapshot the VM again leads to instant failure.
After the failed snapshot, the next one works again. So some state
doesn't seem to be cleaned up the first time.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pve-devel] [PATCH qemu] avoid segfault when aborting snapshot
2022-08-02 9:52 ` Mira Limbeck
@ 2022-08-02 10:04 ` Fiona Ebner
0 siblings, 0 replies; 3+ messages in thread
From: Fiona Ebner @ 2022-08-02 10:04 UTC (permalink / raw)
To: pve-devel, Mira Limbeck
Am 02.08.22 um 11:52 schrieb Mira Limbeck:
> On 7/26/22 14:25, Fiona Ebner wrote:
>> Reported in the community forum[0].
>>
>> For 6.1.0, there were a few changes to the coroutine-sleep API, but
>> the adaptations in f376b2b ("update and rebase to QEMU v6.1.0") made
>> a mistake.
>>
>> Currently, target_close_wait is NULL when passed to
>> qemu_co_sleep_ns_wakeable(), which further passes it to
>> qemu_co_sleep(), but there, it is dereferenced when trying to access
>> the 'to_wake' member:
>>
>>> Thread 1 "kvm" received signal SIGSEGV, Segmentation fault.
>>> qemu_co_sleep (w=0x0) at ../util/qemu-coroutine-sleep.c:57
>> To fix it, create a proper struct and pass its address instead. Also
>> call qemu_co_sleep_wake unconditionally, because the NULL check (for
>> the 'to_wake' member) is done inside the function itself.
>>
>> This patch is based on what the QEMU commits introducing the changes
>> to the coroutine-sleep API did to the callers in QEMU:
>> eaee072085 ("coroutine-sleep: allow qemu_co_sleep_wake that wakes
>> nothing")
>> 29a6ea24eb ("coroutine-sleep: replace QemuCoSleepState pointer with
>> struct in the API")
>>
>> [0]: https://forum.proxmox.com/threads/112130/
>>
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>> ---
>
> Tested-by: Mira Limbeck <m.limbeck@proxmox.com>
>
>
> Found a strange behavior when aborting the snapshot. It no longer
> crashes, but trying to snapshot the VM again leads to instant failure.
>
> After the failed snapshot, the next one works again. So some state
> doesn't seem to be cleaned up the first time.
Thanks for testing! This behavior was present even before 6.1.0, so not
related to the patch, but I wasn't entirely sure if it'd be fine to just
set the status to SAVE_STATE_DONE in qmp_savevm_end() after abort.
I'll take another look and send a follow-up. It's probably fine in case
the state file has been closed successfully, but I'll need to check the
users in qemu-server and test it.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-08-02 10:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-26 12:25 [pve-devel] [PATCH qemu] avoid segfault when aborting snapshot Fiona Ebner
2022-08-02 9:52 ` Mira Limbeck
2022-08-02 10:04 ` Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox