* [pve-devel] [RFC qemu] migration/block-dirty-bitmap: migrate other bitmaps even if one fails
@ 2020-11-03 13:57 Stefan Reiter
2020-11-04 17:42 ` [pve-devel] applied: " Thomas Lamprecht
0 siblings, 1 reply; 2+ messages in thread
From: Stefan Reiter @ 2020-11-03 13:57 UTC (permalink / raw)
To: pve-devel
If the checks in bdrv_dirty_bitmap_check fail, that only means that this
one specific bitmap cannot be migrated. That is not an error condition
for any other bitmaps on the same block device.
Fixes dirty-bitmap migration with sync=bitmap, as the bitmaps used for
that are obviously marked as "busy", which would cause none at all to be
transferred.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
NOTE: This is more or less a required workaround. The correct solution would
probably be to somehow exclude the bitmap used for sync=bitmap for migration
purposes, or handle it correctly otherwise.
ALSO: This does *not* fix the original bug, wherein QEMU segfaults on the source
side in sync=bitmap mode if any of the migration checks after the drive-mirror
fail. What this patch does is remove the error condition (which makes sense
either way), but if there is a different error, I'd still expect it to SEGV.
@Fabian: Potentially a bug in error handling from the bitmap sync mode patches
you carried? I'll keep looking too.
Crash is easily reproducible, just live-migrate a VM that has a replicated drive
on 5.1.0-4. 'add_bitmaps_to_list' will fail, and then QEMU segfaults with the
following stack trace (ioc=0 is the fault):
#0 0x0000555555de8b2f in qio_channel_detach_aio_context (ioc=0x0) at io/channel.c:452
#1 0x0000555555d91bd2 in nbd_client_detach_aio_context (bs=0x7fff879e91c0) at block/nbd.c:146
#2 0x0000555555d05050 in bdrv_detach_aio_context (bs=0x7fff879e91c0) at block.c:6267
#3 0x0000555555d05347 in bdrv_set_aio_context_ignore (bs=0x7fff879e91c0, new_context=0x7fffea332500, ignore=0x7fffffffcdc0) at block.c:6346
#4 0x0000555555d056f7 in bdrv_child_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, ignore_child=0x0, errp=0x0) at block.c:6450
#5 0x0000555555d0574e in bdrv_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, errp=0x0) at block.c:6459
#6 0x0000555555cfced7 in bdrv_replace_child (child=0x7fffea2cd580, new_bs=0x0) at block.c:2654
#7 0x0000555555cfd35e in bdrv_detach_child (child=0x7fffea2cd580) at block.c:2773
#8 0x0000555555cfd3a0 in bdrv_root_unref_child (child=0x7fffea2cd580) at block.c:2784
#9 0x0000555555d06f5b in block_job_remove_all_bdrv (job=0x7fffea896500) at blockjob.c:191
#10 0x0000555555d7c963 in mirror_exit_common (job=0x7fffea896500) at block/mirror.c:743
#11 0x0000555555d7cb02 in mirror_abort (job=0x7fffea896500) at block/mirror.c:783
#12 0x0000555555d0971a in job_abort (job=0x7fffea896500) at job.c:692
#13 0x0000555555d097be in job_finalize_single (job=0x7fffea896500) at job.c:713
#14 0x0000555555d09a47 in job_completed_txn_abort (job=0x7fffea896500) at job.c:791
#15 0x0000555555d09dc0 in job_completed (job=0x7fffea896500) at job.c:888
#16 0x0000555555d09e21 in job_exit (opaque=0x7fffea896500) at job.c:910
#17 0x0000555555e676fc in aio_bh_call (bh=0x7fffe8c141e0) at util/async.c:136
#18 0x0000555555e67806 in aio_bh_poll (ctx=0x7fffea332500) at util/async.c:164
#19 0x0000555555e51766 in aio_dispatch (ctx=0x7fffea332500) at util/aio-posix.c:380
#20 0x0000555555e67c39 in aio_ctx_dispatch (source=0x7fffea332500, callback=0x0, user_data=0x0) at util/async.c:306
#21 0x00007ffff7bf0f2e in g_main_context_dispatch + 0x2ae () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x0000555555e6ff77 in glib_pollfds_poll () at util/main-loop.c:217
#23 0x0000555555e6fff1 in os_host_main_loop_wait (timeout=0xd53f4) at util/main-loop.c:240
#24 0x0000555555e700f6 in main_loop_wait (nonblocking=0x0) at util/main-loop.c:516
migration/block-dirty-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 5bf0d9fbc6..1070c19181 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -323,7 +323,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, &local_err)) {
error_report_err(local_err);
- return -1;
+ continue;
}
bdrv_ref(bs);
--
2.20.1
^ permalink raw reply [flat|nested] 2+ messages in thread
* [pve-devel] applied: [RFC qemu] migration/block-dirty-bitmap: migrate other bitmaps even if one fails
2020-11-03 13:57 [pve-devel] [RFC qemu] migration/block-dirty-bitmap: migrate other bitmaps even if one fails Stefan Reiter
@ 2020-11-04 17:42 ` Thomas Lamprecht
0 siblings, 0 replies; 2+ messages in thread
From: Thomas Lamprecht @ 2020-11-04 17:42 UTC (permalink / raw)
To: Proxmox VE development discussion, Stefan Reiter
On 03.11.20 14:57, Stefan Reiter wrote:
> If the checks in bdrv_dirty_bitmap_check fail, that only means that this
> one specific bitmap cannot be migrated. That is not an error condition
> for any other bitmaps on the same block device.
>
> Fixes dirty-bitmap migration with sync=bitmap, as the bitmaps used for
> that are obviously marked as "busy", which would cause none at all to be
> transferred.
>
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>
> NOTE: This is more or less a required workaround. The correct solution would
> probably be to somehow exclude the bitmap used for sync=bitmap for migration
> purposes, or handle it correctly otherwise.
>
> ALSO: This does *not* fix the original bug, wherein QEMU segfaults on the source
> side in sync=bitmap mode if any of the migration checks after the drive-mirror
> fail. What this patch does is remove the error condition (which makes sense
> either way), but if there is a different error, I'd still expect it to SEGV.
>
> @Fabian: Potentially a bug in error handling from the bitmap sync mode patches
> you carried? I'll keep looking too.
>
> Crash is easily reproducible, just live-migrate a VM that has a replicated drive
> on 5.1.0-4. 'add_bitmaps_to_list' will fail, and then QEMU segfaults with the
> following stack trace (ioc=0 is the fault):
>
> #0 0x0000555555de8b2f in qio_channel_detach_aio_context (ioc=0x0) at io/channel.c:452
> #1 0x0000555555d91bd2 in nbd_client_detach_aio_context (bs=0x7fff879e91c0) at block/nbd.c:146
> #2 0x0000555555d05050 in bdrv_detach_aio_context (bs=0x7fff879e91c0) at block.c:6267
> #3 0x0000555555d05347 in bdrv_set_aio_context_ignore (bs=0x7fff879e91c0, new_context=0x7fffea332500, ignore=0x7fffffffcdc0) at block.c:6346
> #4 0x0000555555d056f7 in bdrv_child_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, ignore_child=0x0, errp=0x0) at block.c:6450
> #5 0x0000555555d0574e in bdrv_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, errp=0x0) at block.c:6459
> #6 0x0000555555cfced7 in bdrv_replace_child (child=0x7fffea2cd580, new_bs=0x0) at block.c:2654
> #7 0x0000555555cfd35e in bdrv_detach_child (child=0x7fffea2cd580) at block.c:2773
> #8 0x0000555555cfd3a0 in bdrv_root_unref_child (child=0x7fffea2cd580) at block.c:2784
> #9 0x0000555555d06f5b in block_job_remove_all_bdrv (job=0x7fffea896500) at blockjob.c:191
> #10 0x0000555555d7c963 in mirror_exit_common (job=0x7fffea896500) at block/mirror.c:743
> #11 0x0000555555d7cb02 in mirror_abort (job=0x7fffea896500) at block/mirror.c:783
> #12 0x0000555555d0971a in job_abort (job=0x7fffea896500) at job.c:692
> #13 0x0000555555d097be in job_finalize_single (job=0x7fffea896500) at job.c:713
> #14 0x0000555555d09a47 in job_completed_txn_abort (job=0x7fffea896500) at job.c:791
> #15 0x0000555555d09dc0 in job_completed (job=0x7fffea896500) at job.c:888
> #16 0x0000555555d09e21 in job_exit (opaque=0x7fffea896500) at job.c:910
> #17 0x0000555555e676fc in aio_bh_call (bh=0x7fffe8c141e0) at util/async.c:136
> #18 0x0000555555e67806 in aio_bh_poll (ctx=0x7fffea332500) at util/async.c:164
> #19 0x0000555555e51766 in aio_dispatch (ctx=0x7fffea332500) at util/aio-posix.c:380
> #20 0x0000555555e67c39 in aio_ctx_dispatch (source=0x7fffea332500, callback=0x0, user_data=0x0) at util/async.c:306
> #21 0x00007ffff7bf0f2e in g_main_context_dispatch + 0x2ae () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> #22 0x0000555555e6ff77 in glib_pollfds_poll () at util/main-loop.c:217
> #23 0x0000555555e6fff1 in os_host_main_loop_wait (timeout=0xd53f4) at util/main-loop.c:240
> #24 0x0000555555e700f6 in main_loop_wait (nonblocking=0x0) at util/main-loop.c:516
>
>
> migration/block-dirty-bitmap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>
applied, thanks!
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-11-04 17:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03 13:57 [pve-devel] [RFC qemu] migration/block-dirty-bitmap: migrate other bitmaps even if one fails Stefan Reiter
2020-11-04 17:42 ` [pve-devel] applied: " Thomas Lamprecht
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal