[pve-devel] [PATCH pve-qemu 1/2] add upstream fixes for qmp_block

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize
@ 2021-03-30 15:59 Stefan Reiter
  2021-03-30 15:59 ` [pve-devel] [PATCH qemu-server 2/2] increase timeout for QMP block_resize Stefan Reiter
  2021-03-30 16:57 ` [pve-devel] applied: [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize Thomas Lamprecht
  0 siblings, 2 replies; 5+ messages in thread
From: Stefan Reiter @ 2021-03-30 15:59 UTC (permalink / raw)
  To: pve-devel

cherry-picked cleanly from 6.0 development tree, fixes an issue with
resizing RBD drives (and reportedly also on krbd or potentially other
storage backends) with iothreads.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Reported here ff.:
https://forum.proxmox.com/threads/all-vms-locking-up-after-latest-pve-update.85397/post-380409

Posters could also reproduce this on krbd, for me it only occured on user-space
rbd. Nevertheless, this looks to be a hot smoking gun, especially since both
configs from the forum show iothread=1.


 ...lock-Fix-locking-in-qmp_block_resize.patch |  42 +++++++
 ...x-deadlock-in-bdrv_co_yield_to_drain.patch | 118 ++++++++++++++++++
 debian/patches/series                         |   2 +
 3 files changed, 162 insertions(+)
 create mode 100644 debian/patches/extra/0011-block-Fix-locking-in-qmp_block_resize.patch
 create mode 100644 debian/patches/extra/0012-block-Fix-deadlock-in-bdrv_co_yield_to_drain.patch

diff --git a/debian/patches/extra/0011-block-Fix-locking-in-qmp_block_resize.patch b/debian/patches/extra/0011-block-Fix-locking-in-qmp_block_resize.patch
new file mode 100644
index 0000000..4260c75
--- /dev/null
+++ b/debian/patches/extra/0011-block-Fix-locking-in-qmp_block_resize.patch
@@ -0,0 +1,42 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Kevin Wolf <kwolf@redhat.com>
+Date: Thu, 3 Dec 2020 18:23:10 +0100
+Subject: [PATCH] block: Fix locking in qmp_block_resize()
+
+The drain functions assume that we hold the AioContext lock of the
+drained block node. Make sure to actually take the lock.
+
+Cc: qemu-stable@nongnu.org
+Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
+Signed-off-by: Kevin Wolf <kwolf@redhat.com>
+Message-Id: <20201203172311.68232-3-kwolf@redhat.com>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+Signed-off-by: Kevin Wolf <kwolf@redhat.com>
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ blockdev.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+diff --git a/blockdev.c b/blockdev.c
+index fe6fb5dc1d..9a86e9fb4b 100644
+--- a/blockdev.c
++++ b/blockdev.c
+@@ -2481,14 +2481,17 @@ void coroutine_fn qmp_block_resize(bool has_device, const char *device,
+         goto out;
+     }
+ 
++    bdrv_co_lock(bs);
+     bdrv_drained_begin(bs);
++    bdrv_co_unlock(bs);
++
+     old_ctx = bdrv_co_enter(bs);
+     blk_truncate(blk, size, false, PREALLOC_MODE_OFF, 0, errp);
+     bdrv_co_leave(bs, old_ctx);
+-    bdrv_drained_end(bs);
+ 
+ out:
+     bdrv_co_lock(bs);
++    bdrv_drained_end(bs);
+     blk_unref(blk);
+     bdrv_co_unlock(bs);
+ }
diff --git a/debian/patches/extra/0012-block-Fix-deadlock-in-bdrv_co_yield_to_drain.patch b/debian/patches/extra/0012-block-Fix-deadlock-in-bdrv_co_yield_to_drain.patch
new file mode 100644
index 0000000..0e35160
--- /dev/null
+++ b/debian/patches/extra/0012-block-Fix-deadlock-in-bdrv_co_yield_to_drain.patch
@@ -0,0 +1,118 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Kevin Wolf <kwolf@redhat.com>
+Date: Thu, 3 Dec 2020 18:23:11 +0100
+Subject: [PATCH] block: Fix deadlock in bdrv_co_yield_to_drain()
+
+If bdrv_co_yield_to_drain() is called for draining a block node that
+runs in a different AioContext, it keeps that AioContext locked while it
+yields and schedules a BH in the AioContext to do the actual drain.
+
+As long as executing the BH is the very next thing that the event loop
+of the node's AioContext does, this actually happens to work, but when
+it tries to execute something else that wants to take the AioContext
+lock, it will deadlock. (In the bug report, this other thing is a
+virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)
+
+Instead, always drop the AioContext lock across the yield and reacquire
+it only when the coroutine is reentered. The BH needs to unconditionally
+take the lock for itself now.
+
+This fixes the 'block_resize' QMP command on a block node that runs in
+an iothread.
+
+Cc: qemu-stable@nongnu.org
+Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
+Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
+Signed-off-by: Kevin Wolf <kwolf@redhat.com>
+Message-Id: <20201203172311.68232-4-kwolf@redhat.com>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+Signed-off-by: Kevin Wolf <kwolf@redhat.com>
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ block/io.c | 41 ++++++++++++++++++++++++-----------------
+ 1 file changed, 24 insertions(+), 17 deletions(-)
+
+diff --git a/block/io.c b/block/io.c
+index ec5e152bb7..a9f56a9ab1 100644
+--- a/block/io.c
++++ b/block/io.c
+@@ -306,17 +306,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
+ 
+     if (bs) {
+         AioContext *ctx = bdrv_get_aio_context(bs);
+-        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
+-
+-        /*
+-         * When the coroutine yielded, the lock for its home context was
+-         * released, so we need to re-acquire it here. If it explicitly
+-         * acquired a different context, the lock is still held and we don't
+-         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
+-         */
+-        if (ctx == co_ctx) {
+-            aio_context_acquire(ctx);
+-        }
++        aio_context_acquire(ctx);
+         bdrv_dec_in_flight(bs);
+         if (data->begin) {
+             assert(!data->drained_end_counter);
+@@ -328,9 +318,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
+                                 data->ignore_bds_parents,
+                                 data->drained_end_counter);
+         }
+-        if (ctx == co_ctx) {
+-            aio_context_release(ctx);
+-        }
++        aio_context_release(ctx);
+     } else {
+         assert(data->begin);
+         bdrv_drain_all_begin();
+@@ -348,13 +336,16 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
+                                                 int *drained_end_counter)
+ {
+     BdrvCoDrainData data;
++    Coroutine *self = qemu_coroutine_self();
++    AioContext *ctx = bdrv_get_aio_context(bs);
++    AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
+ 
+     /* Calling bdrv_drain() from a BH ensures the current coroutine yields and
+      * other coroutines run if they were queued by aio_co_enter(). */
+ 
+     assert(qemu_in_coroutine());
+     data = (BdrvCoDrainData) {
+-        .co = qemu_coroutine_self(),
++        .co = self,
+         .bs = bs,
+         .done = false,
+         .begin = begin,
+@@ -368,13 +359,29 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
+     if (bs) {
+         bdrv_inc_in_flight(bs);
+     }
+-    replay_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
+-                                     bdrv_co_drain_bh_cb, &data);
++
++    /*
++     * Temporarily drop the lock across yield or we would get deadlocks.
++     * bdrv_co_drain_bh_cb() reaquires the lock as needed.
++     *
++     * When we yield below, the lock for the current context will be
++     * released, so if this is actually the lock that protects bs, don't drop
++     * it a second time.
++     */
++    if (ctx != co_ctx) {
++        aio_context_release(ctx);
++    }
++    replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_bh_cb, &data);
+ 
+     qemu_coroutine_yield();
+     /* If we are resumed from some other event (such as an aio completion or a
+      * timer callback), it is a bug in the caller that should be fixed. */
+     assert(data.done);
++
++    /* Reaquire the AioContext of bs if we dropped it */
++    if (ctx != co_ctx) {
++        aio_context_acquire(ctx);
++    }
+ }
+ 
+ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
diff --git a/debian/patches/series b/debian/patches/series
index b412693..8d36e53 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -8,6 +8,8 @@ extra/0007-virtiofsd-Add-_llseek-to-the-seccomp-whitelist.patch
 extra/0008-virtiofsd-Add-restart_syscall-to-the-seccomp-whiteli.patch
 extra/0009-i386-acpi-restore-device-paths-for-pre-5.1-vms.patch
 extra/0010-monitor-qmp-fix-race-on-CHR_EVENT_CLOSED-without-OOB.patch
+extra/0011-block-Fix-locking-in-qmp_block_resize.patch
+extra/0012-block-Fix-deadlock-in-bdrv_co_yield_to_drain.patch
 bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
 bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
 bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
-- 
2.20.1





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [PATCH qemu-server 2/2] increase timeout for QMP block_resize
  2021-03-30 15:59 [pve-devel] [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize Stefan Reiter
@ 2021-03-30 15:59 ` Stefan Reiter
  2021-03-30 16:57   ` Thomas Lamprecht
  2021-03-30 16:57   ` [pve-devel] applied: " Thomas Lamprecht
  2021-03-30 16:57 ` [pve-devel] applied: [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize Thomas Lamprecht
  1 sibling, 2 replies; 5+ messages in thread
From: Stefan Reiter @ 2021-03-30 15:59 UTC (permalink / raw)
  To: pve-devel

In testing this usually completes almost immediately, but in theory this
is a storage/IO operation and as such can take a bit to finish. It's
certainly not unthinkable that it might take longer than the default *3
seconds* we've given it so far. Make it a minute.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Optional for the fix, but seems like a good idea.

 PVE/QemuServer.pm | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 1c0b5c2..f9379f6 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4291,8 +4291,13 @@ sub qemu_block_resize {
     my $padding = (1024 - $size % 1024) % 1024;
     $size = $size + $padding;
 
-    mon_cmd($vmid, "block_resize", device => $deviceid, size => int($size));
-
+    mon_cmd(
+	$vmid,
+	"block_resize",
+	device => $deviceid,
+	size => int($size),
+	timeout => 60,
+    );
 }
 
 sub qemu_volume_snapshot {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] applied: [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize
  2021-03-30 15:59 [pve-devel] [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize Stefan Reiter
  2021-03-30 15:59 ` [pve-devel] [PATCH qemu-server 2/2] increase timeout for QMP block_resize Stefan Reiter
@ 2021-03-30 16:57 ` Thomas Lamprecht
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2021-03-30 16:57 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter

On 30.03.21 17:59, Stefan Reiter wrote:
> cherry-picked cleanly from 6.0 development tree, fixes an issue with
> resizing RBD drives (and reportedly also on krbd or potentially other
> storage backends) with iothreads.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Reported here ff.:
> https://forum.proxmox.com/threads/all-vms-locking-up-after-latest-pve-update.85397/post-380409
> 
> Posters could also reproduce this on krbd, for me it only occured on user-space
> rbd. Nevertheless, this looks to be a hot smoking gun, especially since both
> configs from the forum show iothread=1.
> 
> 
>  ...lock-Fix-locking-in-qmp_block_resize.patch |  42 +++++++
>  ...x-deadlock-in-bdrv_co_yield_to_drain.patch | 118 ++++++++++++++++++
>  debian/patches/series                         |   2 +
>  3 files changed, 162 insertions(+)
>  create mode 100644 debian/patches/extra/0011-block-Fix-locking-in-qmp_block_resize.patch
>  create mode 100644 debian/patches/extra/0012-block-Fix-deadlock-in-bdrv_co_yield_to_drain.patch
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 2/2] increase timeout for QMP block_resize
  2021-03-30 15:59 ` [pve-devel] [PATCH qemu-server 2/2] increase timeout for QMP block_resize Stefan Reiter
@ 2021-03-30 16:57   ` Thomas Lamprecht
  2021-03-30 16:57   ` [pve-devel] applied: " Thomas Lamprecht
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2021-03-30 16:57 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter

On 30.03.21 17:59, Stefan Reiter wrote:
> In testing this usually completes almost immediately, but in theory this
> is a storage/IO operation and as such can take a bit to finish. It's
> certainly not unthinkable that it might take longer than the default *3
> seconds* we've given it so far. Make it a minute.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Optional for the fix, but seems like a good idea.
> 
>  PVE/QemuServer.pm | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 1c0b5c2..f9379f6 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4291,8 +4291,13 @@ sub qemu_block_resize {
>      my $padding = (1024 - $size % 1024) % 1024;
>      $size = $size + $padding;
>  
> -    mon_cmd($vmid, "block_resize", device => $deviceid, size => int($size));
> -
> +    mon_cmd(
> +	$vmid,
> +	"block_resize",
> +	device => $deviceid,
> +	size => int($size),
> +	timeout => 60,
> +    );
>  }
>  
>  sub qemu_volume_snapshot {
> 





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] applied: [PATCH qemu-server 2/2] increase timeout for QMP block_resize
  2021-03-30 15:59 ` [pve-devel] [PATCH qemu-server 2/2] increase timeout for QMP block_resize Stefan Reiter
  2021-03-30 16:57   ` Thomas Lamprecht
@ 2021-03-30 16:57   ` Thomas Lamprecht
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2021-03-30 16:57 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter

On 30.03.21 17:59, Stefan Reiter wrote:
> In testing this usually completes almost immediately, but in theory this
> is a storage/IO operation and as such can take a bit to finish. It's
> certainly not unthinkable that it might take longer than the default *3
> seconds* we've given it so far. Make it a minute.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Optional for the fix, but seems like a good idea.
> 
>  PVE/QemuServer.pm | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-30 16:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 15:59 [pve-devel] [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize Stefan Reiter
2021-03-30 15:59 ` [pve-devel] [PATCH qemu-server 2/2] increase timeout for QMP block_resize Stefan Reiter
2021-03-30 16:57   ` Thomas Lamprecht
2021-03-30 16:57   ` [pve-devel] applied: " Thomas Lamprecht
2021-03-30 16:57 ` [pve-devel] applied: [PATCH pve-qemu 1/2] add upstream fixes for qmp_block_resize Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal