From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 6C430922A for ; Wed, 8 Mar 2023 12:52:08 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4F3D21FD8F for ; Wed, 8 Mar 2023 12:51:38 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 8 Mar 2023 12:51:37 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 2E6F94578B for ; Wed, 8 Mar 2023 12:51:37 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Wed, 8 Mar 2023 12:51:05 +0100 Message-Id: <20230308115105.91921-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.003 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH qemu] add patch to avoid potential deadlock with trim for IDE/SATA and draining X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Mar 2023 11:52:08 -0000 In particular, the deadlock can occur, together with unlucky timing between the QEMU threads, when the guest is issuing trim requests during the start of a backup operation. Signed-off-by: Fiona Ebner --- ...ial-deadlock-when-draining-during-tr.patch | 78 +++++++++++++++++++ debian/patches/series | 1 + 2 files changed, 79 insertions(+) create mode 100644 debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch diff --git a/debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch b/debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch new file mode 100644 index 0000000..77d7eee --- /dev/null +++ b/debian/patches/extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch @@ -0,0 +1,78 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Fiona Ebner +Date: Tue, 7 Mar 2023 15:03:02 +0100 +Subject: [PATCH] ide: avoid potential deadlock when draining during trim +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +The deadlock can happen as follows: +1. ide_issue_trim is called, and increments the in_flight counter. +2. ide_issue_trim_cb calls blk_aio_pdiscard. +3. Somebody else starts draining (e.g. backup to insert the cbw node). +4. ide_issue_trim_cb is called as the completion callback for + blk_aio_pdiscard. +5. ide_issue_trim_cb issues yet another blk_aio_pdiscard request. +6. The request is added to the wait queue via blk_wait_while_drained, + because draining has been started. +7. Nobody ever decrements the in_flight counter and draining can't + finish. This would be done by ide_trim_bh_cb, which is called after + ide_issue_trim_cb has issued its last request, but + ide_issue_trim_cb is not called anymore, because it's the + completion callback of blk_aio_pdiscard, which waits on draining. + +Quoting Hanna Czenczek: +> The point of 7e5cdb345f was that we need any in-flight count to +> accompany a set s->bus->dma->aiocb. While blk_aio_pdiscard() is +> happening, we don’t necessarily need another count. But we do need +> it while there is no blk_aio_pdiscard(). +> ide_issue_trim_cb() returns in two cases (and, recursively through +> its callers, leaves s->bus->dma->aiocb set): +> 1. After calling blk_aio_pdiscard(), which will keep an in-flight +> count, +> 2. After calling replay_bh_schedule_event() (i.e. +> qemu_bh_schedule()), which does not keep an in-flight count. + +Thus, even after moving the blk_inc_in_flight to above the +replay_bh_schedule_event call, the invariant "ide_issue_trim_cb +returns with an accompanying in-flight count" is still satisfied. + +Fixes: 7e5cdb345f ("ide: Increment BB in-flight counter for TRIM BH") +Suggested-by: Hanna Czenczek +Signed-off-by: Fiona Ebner +--- + hw/ide/core.c | 7 +++---- + 1 file changed, 3 insertions(+), 4 deletions(-) + +diff --git a/hw/ide/core.c b/hw/ide/core.c +index 39afdc0006..6474522bc9 100644 +--- a/hw/ide/core.c ++++ b/hw/ide/core.c +@@ -443,7 +443,7 @@ static void ide_trim_bh_cb(void *opaque) + iocb->bh = NULL; + qemu_aio_unref(iocb); + +- /* Paired with an increment in ide_issue_trim() */ ++ /* Paired with an increment in ide_issue_trim_cb() */ + blk_dec_in_flight(blk); + } + +@@ -503,6 +503,8 @@ static void ide_issue_trim_cb(void *opaque, int ret) + done: + iocb->aiocb = NULL; + if (iocb->bh) { ++ /* Paired with a decrement in ide_trim_bh_cb() */ ++ blk_inc_in_flight(s->blk); + replay_bh_schedule_event(iocb->bh); + } + } +@@ -514,9 +516,6 @@ BlockAIOCB *ide_issue_trim( + IDEState *s = opaque; + TrimAIOCB *iocb; + +- /* Paired with a decrement in ide_trim_bh_cb() */ +- blk_inc_in_flight(s->blk); +- + iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque); + iocb->s = s; + iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb); diff --git a/debian/patches/series b/debian/patches/series index 7a85c63..e4c661b 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -7,6 +7,7 @@ extra/0006-virtio-rng-pci-fix-migration-compat-for-vectors.patch extra/0007-block-fix-detect-zeroes-with-BDRV_REQ_REGISTERED_BUF.patch extra/0008-memory-prevent-dma-reentracy-issues.patch extra/0009-block-iscsi-fix-double-free-on-BUSY-or-similar-statu.patch +extra/0010-ide-avoid-potential-deadlock-when-draining-during-tr.patch bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch -- 2.30.2