From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 7C91B65B9E for ; Wed, 4 Nov 2020 18:42:17 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6AB2B1004C for ; Wed, 4 Nov 2020 18:42:17 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id C9D841003F for ; Wed, 4 Nov 2020 18:42:16 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 994B64601C for ; Wed, 4 Nov 2020 18:42:16 +0100 (CET) To: Proxmox VE development discussion , Stefan Reiter References: <20201103135732.3313-1-s.reiter@proxmox.com> From: Thomas Lamprecht Message-ID: <1d5ca8b1-0761-0053-15a0-b5c19afdfaee@proxmox.com> Date: Wed, 4 Nov 2020 18:42:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:83.0) Gecko/20100101 Thunderbird/83.0 MIME-Version: 1.0 In-Reply-To: <20201103135732.3313-1-s.reiter@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.117 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] applied: [RFC qemu] migration/block-dirty-bitmap: migrate other bitmaps even if one fails X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2020 17:42:17 -0000 On 03.11.20 14:57, Stefan Reiter wrote: > If the checks in bdrv_dirty_bitmap_check fail, that only means that this > one specific bitmap cannot be migrated. That is not an error condition > for any other bitmaps on the same block device. > > Fixes dirty-bitmap migration with sync=bitmap, as the bitmaps used for > that are obviously marked as "busy", which would cause none at all to be > transferred. > > Signed-off-by: Stefan Reiter > --- > > NOTE: This is more or less a required workaround. The correct solution would > probably be to somehow exclude the bitmap used for sync=bitmap for migration > purposes, or handle it correctly otherwise. > > ALSO: This does *not* fix the original bug, wherein QEMU segfaults on the source > side in sync=bitmap mode if any of the migration checks after the drive-mirror > fail. What this patch does is remove the error condition (which makes sense > either way), but if there is a different error, I'd still expect it to SEGV. > > @Fabian: Potentially a bug in error handling from the bitmap sync mode patches > you carried? I'll keep looking too. > > Crash is easily reproducible, just live-migrate a VM that has a replicated drive > on 5.1.0-4. 'add_bitmaps_to_list' will fail, and then QEMU segfaults with the > following stack trace (ioc=0 is the fault): > > #0 0x0000555555de8b2f in qio_channel_detach_aio_context (ioc=0x0) at io/channel.c:452 > #1 0x0000555555d91bd2 in nbd_client_detach_aio_context (bs=0x7fff879e91c0) at block/nbd.c:146 > #2 0x0000555555d05050 in bdrv_detach_aio_context (bs=0x7fff879e91c0) at block.c:6267 > #3 0x0000555555d05347 in bdrv_set_aio_context_ignore (bs=0x7fff879e91c0, new_context=0x7fffea332500, ignore=0x7fffffffcdc0) at block.c:6346 > #4 0x0000555555d056f7 in bdrv_child_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, ignore_child=0x0, errp=0x0) at block.c:6450 > #5 0x0000555555d0574e in bdrv_try_set_aio_context (bs=0x7fff879e91c0, ctx=0x7fffea332500, errp=0x0) at block.c:6459 > #6 0x0000555555cfced7 in bdrv_replace_child (child=0x7fffea2cd580, new_bs=0x0) at block.c:2654 > #7 0x0000555555cfd35e in bdrv_detach_child (child=0x7fffea2cd580) at block.c:2773 > #8 0x0000555555cfd3a0 in bdrv_root_unref_child (child=0x7fffea2cd580) at block.c:2784 > #9 0x0000555555d06f5b in block_job_remove_all_bdrv (job=0x7fffea896500) at blockjob.c:191 > #10 0x0000555555d7c963 in mirror_exit_common (job=0x7fffea896500) at block/mirror.c:743 > #11 0x0000555555d7cb02 in mirror_abort (job=0x7fffea896500) at block/mirror.c:783 > #12 0x0000555555d0971a in job_abort (job=0x7fffea896500) at job.c:692 > #13 0x0000555555d097be in job_finalize_single (job=0x7fffea896500) at job.c:713 > #14 0x0000555555d09a47 in job_completed_txn_abort (job=0x7fffea896500) at job.c:791 > #15 0x0000555555d09dc0 in job_completed (job=0x7fffea896500) at job.c:888 > #16 0x0000555555d09e21 in job_exit (opaque=0x7fffea896500) at job.c:910 > #17 0x0000555555e676fc in aio_bh_call (bh=0x7fffe8c141e0) at util/async.c:136 > #18 0x0000555555e67806 in aio_bh_poll (ctx=0x7fffea332500) at util/async.c:164 > #19 0x0000555555e51766 in aio_dispatch (ctx=0x7fffea332500) at util/aio-posix.c:380 > #20 0x0000555555e67c39 in aio_ctx_dispatch (source=0x7fffea332500, callback=0x0, user_data=0x0) at util/async.c:306 > #21 0x00007ffff7bf0f2e in g_main_context_dispatch + 0x2ae () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 > #22 0x0000555555e6ff77 in glib_pollfds_poll () at util/main-loop.c:217 > #23 0x0000555555e6fff1 in os_host_main_loop_wait (timeout=0xd53f4) at util/main-loop.c:240 > #24 0x0000555555e700f6 in main_loop_wait (nonblocking=0x0) at util/main-loop.c:516 > > > migration/block-dirty-bitmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > applied, thanks!