public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Stefan Reiter <s.reiter@proxmox.com>
Subject: [pve-devel] applied: [RFC qemu] migration/block-dirty-bitmap: migrate other bitmaps even if one fails
Date: Wed, 4 Nov 2020 18:42:14 +0100	[thread overview]
Message-ID: <1d5ca8b1-0761-0053-15a0-b5c19afdfaee@proxmox.com> (raw)
In-Reply-To: <20201103135732.3313-1-s.reiter@proxmox.com>

On 03.11.20 14:57, Stefan Reiter wrote:
> If the checks in bdrv_dirty_bitmap_check fail, that only means that this
> one specific bitmap cannot be migrated. That is not an error condition
> for any other bitmaps on the same block device.
> 
> Fixes dirty-bitmap migration with sync=bitmap, as the bitmaps used for
> that are obviously marked as "busy", which would cause none at all to be
> transferred.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> NOTE: This is more or less a required workaround. The correct solution would
> probably be to somehow exclude the bitmap used for sync=bitmap for migration
> purposes, or handle it correctly otherwise.
> 
> ALSO: This does *not* fix the original bug, wherein QEMU segfaults on the source
> side in sync=bitmap mode if any of the migration checks after the drive-mirror
> fail. What this patch does is remove the error condition (which makes sense
> either way), but if there is a different error, I'd still expect it to SEGV.
> 
> @Fabian: Potentially a bug in error handling from the bitmap sync mode patches
> you carried? I'll keep looking too.
> 
> Crash is easily reproducible, just live-migrate a VM that has a replicated drive
> on 5.1.0-4. 'add_bitmaps_to_list' will fail, and then QEMU segfaults with the
> following stack trace (ioc=0 is the fault):
> 
> #0    0x0000555555de8b2f in qio_channel_detach_aio_context (ioc=0x0) at io/channel.c:452
> #1    0x0000555555d91bd2 in nbd_client_detach_aio_context (bs=0x7fff879e91c0) at block/nbd.c:146
> #2    0x0000555555d05050 in bdrv_detach_aio_context (bs=0x7fff879e91c0) at block.c:6267
> #3    0x0000555555d05347 in bdrv_set_aio_context_ignore (bs=0x7fff879e91c0, new_context=0x7fffea332500, ignore=0x7fffffffcdc0) at block.c:6346
> #4    0x0000555555d056f7 in bdrv_child_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, ignore_child=0x0, errp=0x0) at block.c:6450
> #5    0x0000555555d0574e in bdrv_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, errp=0x0) at block.c:6459
> #6    0x0000555555cfced7 in bdrv_replace_child (child=0x7fffea2cd580, new_bs=0x0) at block.c:2654
> #7    0x0000555555cfd35e in bdrv_detach_child (child=0x7fffea2cd580) at block.c:2773
> #8    0x0000555555cfd3a0 in bdrv_root_unref_child (child=0x7fffea2cd580) at block.c:2784
> #9    0x0000555555d06f5b in block_job_remove_all_bdrv (job=0x7fffea896500) at blockjob.c:191
> #10   0x0000555555d7c963 in mirror_exit_common (job=0x7fffea896500) at block/mirror.c:743
> #11   0x0000555555d7cb02 in mirror_abort (job=0x7fffea896500) at block/mirror.c:783
> #12   0x0000555555d0971a in job_abort (job=0x7fffea896500) at job.c:692
> #13   0x0000555555d097be in job_finalize_single (job=0x7fffea896500) at job.c:713
> #14   0x0000555555d09a47 in job_completed_txn_abort (job=0x7fffea896500) at job.c:791
> #15   0x0000555555d09dc0 in job_completed (job=0x7fffea896500) at job.c:888
> #16   0x0000555555d09e21 in job_exit (opaque=0x7fffea896500) at job.c:910
> #17   0x0000555555e676fc in aio_bh_call (bh=0x7fffe8c141e0) at util/async.c:136
> #18   0x0000555555e67806 in aio_bh_poll (ctx=0x7fffea332500) at util/async.c:164
> #19   0x0000555555e51766 in aio_dispatch (ctx=0x7fffea332500) at util/aio-posix.c:380
> #20   0x0000555555e67c39 in aio_ctx_dispatch (source=0x7fffea332500, callback=0x0, user_data=0x0) at util/async.c:306
> #21   0x00007ffff7bf0f2e in g_main_context_dispatch + 0x2ae () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
> #22   0x0000555555e6ff77 in glib_pollfds_poll () at util/main-loop.c:217
> #23   0x0000555555e6fff1 in os_host_main_loop_wait (timeout=0xd53f4) at util/main-loop.c:240
> #24   0x0000555555e700f6 in main_loop_wait (nonblocking=0x0) at util/main-loop.c:516
> 
> 
>  migration/block-dirty-bitmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
>

applied, thanks!




      reply	other threads:[~2020-11-04 17:42 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03 13:57 [pve-devel] " Stefan Reiter
2020-11-04 17:42 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d5ca8b1-0761-0053-15a0-b5c19afdfaee@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=s.reiter@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal