From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 6035C1FF179 for ; Wed, 26 Nov 2025 12:03:40 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1728C2CC9; Wed, 26 Nov 2025 12:03:51 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Wed, 26 Nov 2025 12:03:24 +0100 Message-ID: <20251126110340.62312-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1764154988264 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.018 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, nongnu.org] Subject: [pve-devel] [PATCH qemu] add fix for io_uring issue to unbreak backup with TPM disk potentially getting stuck X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" As reported in the community forum [0], backups with TPM disk on BTRFS storages would likely get stuck since the switch to '-blockdev' in qemu-server, i.e. commit 439f6e2a ("backup: use blockdev for TPM state file"), which moved away from using a 'drive_add' HMP command to attach the TPM state drive. Before that, the aio mode was QEMU's default 'threads'. Since that commit, the aio mode is the PVE default 'io_uring'. The issue is actually not BTRFS-specific, but a logic bug in QEMU's current io_uring implementation, see the added patch for details. QEMU 10.2 includes a major rework of the io_uring feature, so the issue is already fixed there with QEMU commit 047dabef97 ("block/io_uring: use aio_add_sqe()"). While still on 10.1, a different fix is needed. Upstream submission for the patch [1]. [0]: https://forum.proxmox.com/threads/170045/ [1]: https://lists.nongnu.org/archive/html/qemu-stable/2025-11/msg00321.html Signed-off-by: Fiona Ebner --- ...void-potentially-getting-stuck-after.patch | 153 ++++++++++++++++++ debian/patches/series | 1 + 2 files changed, 154 insertions(+) create mode 100644 debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch diff --git a/debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch b/debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch new file mode 100644 index 0000000..372ecad --- /dev/null +++ b/debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch @@ -0,0 +1,153 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Fiona Ebner +Date: Mon, 24 Nov 2025 15:28:27 +0100 +Subject: [PATCH] block/io_uring: avoid potentially getting stuck after + resubmit at the end of ioq_submit() + +Note that this issue seems already fixed as a consequence of the large +io_uring rework with 047dabef97 ("block/io_uring: use aio_add_sqe()") +in current master, so this is purely for QEMU stable branches. + +At the end of ioq_submit(), there is an opportunistic call to +luring_process_completions(). This is the single caller of +luring_process_completions() that doesn't use the +luring_process_completions_and_submit() wrapper. + +Other callers use the wrapper, because luring_process_completions() +might require a subsequent call to ioq_submit() after resubmitting a +request. As noted for luring_resubmit(): + +> Resubmit a request by appending it to submit_queue. The caller must ensure +> that ioq_submit() is called later so that submit_queue requests are started. + +So the caller at the end of ioq_submit() violates the contract and can +in fact be problematic if no other requests come in later. In such a +case, the request intended to be resubmitted will never be actually be +submitted via io_uring_submit(). + +A reproducer exposing this issue is [0], which is based on user +reports from [1]. Another reproducer is iotest 109 with '-i io_uring'. + +I had the most success to trigger the issue with [0] when using a +BTRFS RAID 1 storage. With tmpfs, it can take quite a few iterations, +but also triggers eventually on my machine. With iotest 109 with '-i +io_uring' the issue triggers reliably on my ext4 file system. + +Have ioq_submit() submit any resubmitted requests after calling +luring_process_completions(). The return value from io_uring_submit() +is checked to be non-negative before the opportunistic processing of +completions and going for the new resubmit logic, to ensure that a +failure of io_uring_submit() is not missed. Also note that the return +value already was not necessarily the total number of submissions, +since the loop might've been iterated more than once even before the +current change. + +Only trigger the resubmission logic if it is actually necessary to +avoid changing behavior more than necessary. For example iotest 109 +would produce more 'mirror ready' events if always resubmitting after +luring_process_completions() at the end of ioq_submit(). + +Note iotest 109 still does not pass as is when run with '-i io_uring', +because of two offset values for BLOCK_JOB_COMPLETED events being zero +instead of non-zero as in the expected output. Note that the two +affected test cases are expected failures and still fail, so they just +fail "faster". The test cases are actually not triggering the resubmit +logic, so the reason seems to be different ordering of requests and +completions of the current aio=io_uring implementation versus +aio=threads. + +[0]: + +> #!/bin/bash -e +> #file=/mnt/btrfs/disk.raw +> file=/tmp/disk.raw +> filesize=256 +> readsize=512 +> rm -f $file +> truncate -s $filesize $file +> ./qemu-system-x86_64 --trace '*uring*' --qmp stdio \ +> --blockdev raw,node-name=node0,file.driver=file,file.cache.direct=off,file.filename=$file,file.aio=io_uring \ +> < {"execute": "qmp_capabilities"} +> {"execute": "human-monitor-command", "arguments": { "command-line": "qemu-io node0 \"read 0 $readsize \"" }} +> {"execute": "quit"} +> EOF + +[1]: https://forum.proxmox.com/threads/170045/ + +Cc: qemu-stable@nongnu.org +Signed-off-by: Fiona Ebner +--- + block/io_uring.c | 16 +++++++++++++--- + 1 file changed, 13 insertions(+), 3 deletions(-) + +diff --git a/block/io_uring.c b/block/io_uring.c +index dd4f304910..5dbafc8f7b 100644 +--- a/block/io_uring.c ++++ b/block/io_uring.c +@@ -120,11 +120,14 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb, + * event loop. When there are no events left to complete the BH is being + * canceled. + * ++ * Returns whether ioq_submit() must be called again afterwards since requests ++ * were resubmitted via luring_resubmit(). + */ +-static void luring_process_completions(LuringState *s) ++static bool luring_process_completions(LuringState *s) + { + struct io_uring_cqe *cqes; + int total_bytes; ++ bool resubmit = false; + + defer_call_begin(); + +@@ -182,6 +185,7 @@ static void luring_process_completions(LuringState *s) + */ + if (ret == -EINTR || ret == -EAGAIN) { + luring_resubmit(s, luringcb); ++ resubmit = true; + continue; + } + } else if (!luringcb->qiov) { +@@ -194,6 +198,7 @@ static void luring_process_completions(LuringState *s) + if (luringcb->is_read) { + if (ret > 0) { + luring_resubmit_short_read(s, luringcb, ret); ++ resubmit = true; + continue; + } else { + /* Pad with zeroes */ +@@ -224,6 +229,8 @@ end: + qemu_bh_cancel(s->completion_bh); + + defer_call_end(); ++ ++ return resubmit; + } + + static int ioq_submit(LuringState *s) +@@ -231,6 +238,7 @@ static int ioq_submit(LuringState *s) + int ret = 0; + LuringAIOCB *luringcb, *luringcb_next; + ++resubmit: + while (s->io_q.in_queue > 0) { + /* + * Try to fetch sqes from the ring for requests waiting in +@@ -260,12 +268,14 @@ static int ioq_submit(LuringState *s) + } + s->io_q.blocked = (s->io_q.in_queue > 0); + +- if (s->io_q.in_flight) { ++ if (ret >= 0 && s->io_q.in_flight) { + /* + * We can try to complete something just right away if there are + * still requests in-flight. + */ +- luring_process_completions(s); ++ if (luring_process_completions(s)) { ++ goto resubmit; ++ } + } + return ret; + } diff --git a/debian/patches/series b/debian/patches/series index b1afcd4..83e7f6d 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -8,6 +8,7 @@ extra/0007-vfio-only-check-region-info-cache-for-initial-region.patch extra/0008-ui-vdagent-fix-windows-agent-regression.patch extra/0009-file-posix-populate-pwrite_zeroes_alignment.patch extra/0010-block-use-pwrite_zeroes_alignment-when-writing-first.patch +extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel