From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu] add patch to avoid potential deadlock with trim for IDE/SATA and draining
Date: Wed, 8 Mar 2023 12:51:05 +0100 [thread overview]
Message-ID: <20230308115105.91921-1-f.ebner@proxmox.com> (raw)
In particular, the deadlock can occur, together with unlucky timing
between the QEMU threads, when the guest is issuing trim requests
during the start of a backup operation.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
...ial-deadlock-when-draining-during-tr.patch | 78 +++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 79 insertions(+)
create mode 100644 debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch
diff --git a/debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch b/debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch
new file mode 100644
index 0000000..77d7eee
--- /dev/null
+++ b/debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch
@@ -0,0 +1,78 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Fiona Ebner <f.ebner@proxmox.com>
+Date: Tue, 7 Mar 2023 15:03:02 +0100
+Subject: [PATCH] ide: avoid potential deadlock when draining during trim
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The deadlock can happen as follows:
+1. ide_issue_trim is called, and increments the in_flight counter.
+2. ide_issue_trim_cb calls blk_aio_pdiscard.
+3. Somebody else starts draining (e.g. backup to insert the cbw node).
+4. ide_issue_trim_cb is called as the completion callback for
+ blk_aio_pdiscard.
+5. ide_issue_trim_cb issues yet another blk_aio_pdiscard request.
+6. The request is added to the wait queue via blk_wait_while_drained,
+ because draining has been started.
+7. Nobody ever decrements the in_flight counter and draining can't
+ finish. This would be done by ide_trim_bh_cb, which is called after
+ ide_issue_trim_cb has issued its last request, but
+ ide_issue_trim_cb is not called anymore, because it's the
+ completion callback of blk_aio_pdiscard, which waits on draining.
+
+Quoting Hanna Czenczek:
+> The point of 7e5cdb345f was that we need any in-flight count to
+> accompany a set s->bus->dma->aiocb. While blk_aio_pdiscard() is
+> happening, we don’t necessarily need another count. But we do need
+> it while there is no blk_aio_pdiscard().
+> ide_issue_trim_cb() returns in two cases (and, recursively through
+> its callers, leaves s->bus->dma->aiocb set):
+> 1. After calling blk_aio_pdiscard(), which will keep an in-flight
+> count,
+> 2. After calling replay_bh_schedule_event() (i.e.
+> qemu_bh_schedule()), which does not keep an in-flight count.
+
+Thus, even after moving the blk_inc_in_flight to above the
+replay_bh_schedule_event call, the invariant "ide_issue_trim_cb
+returns with an accompanying in-flight count" is still satisfied.
+
+Fixes: 7e5cdb345f ("ide: Increment BB in-flight counter for TRIM BH")
+Suggested-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+---
+ hw/ide/core.c | 7 +++----
+ 1 file changed, 3 insertions(+), 4 deletions(-)
+
+diff --git a/hw/ide/core.c b/hw/ide/core.c
+index 39afdc0006..6474522bc9 100644
+--- a/hw/ide/core.c
++++ b/hw/ide/core.c
+@@ -443,7 +443,7 @@ static void ide_trim_bh_cb(void *opaque)
+ iocb->bh = NULL;
+ qemu_aio_unref(iocb);
+
+- /* Paired with an increment in ide_issue_trim() */
++ /* Paired with an increment in ide_issue_trim_cb() */
+ blk_dec_in_flight(blk);
+ }
+
+@@ -503,6 +503,8 @@ static void ide_issue_trim_cb(void *opaque, int ret)
+ done:
+ iocb->aiocb = NULL;
+ if (iocb->bh) {
++ /* Paired with a decrement in ide_trim_bh_cb() */
++ blk_inc_in_flight(s->blk);
+ replay_bh_schedule_event(iocb->bh);
+ }
+ }
+@@ -514,9 +516,6 @@ BlockAIOCB *ide_issue_trim(
+ IDEState *s = opaque;
+ TrimAIOCB *iocb;
+
+- /* Paired with a decrement in ide_trim_bh_cb() */
+- blk_inc_in_flight(s->blk);
+-
+ iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
+ iocb->s = s;
+ iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
diff --git a/debian/patches/series b/debian/patches/series
index 7a85c63..e4c661b 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -7,6 +7,7 @@ extra/0006-virtio-rng-pci-fix-migration-compat-for-vectors.patch
extra/0007-block-fix-detect-zeroes-with-BDRV_REQ_REGISTERED_BUF.patch
extra/0008-memory-prevent-dma-reentracy-issues.patch
extra/0009-block-iscsi-fix-double-free-on-BUSY-or-similar-statu.patch
+extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch
bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
--
2.30.2
next reply other threads:[~2023-03-08 11:52 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-08 11:51 Fiona Ebner [this message]
2023-03-08 14:13 ` [pve-devel] applied: " Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230308115105.91921-1-f.ebner@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.