From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 228ED1FF1A6 for ; Fri, 19 Dec 2025 09:18:53 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 05AA44F96; Fri, 19 Dec 2025 09:19:41 +0100 (CET) Date: Fri, 19 Dec 2025 09:19:02 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox VE development discussion References: <20251126110340.62312-1-f.ebner@proxmox.com> In-Reply-To: <20251126110340.62312-1-f.ebner@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1766132092.y87v8dibj9.astroid@yuna.none> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1766132332698 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.045 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [nongnu.org, proxmox.com] Subject: Re: [pve-devel] [PATCH qemu] add fix for io_uring issue to unbreak backup with TPM disk potentially getting stuck X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Cc: Thomas Lamprecht Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" On November 26, 2025 12:03 pm, Fiona Ebner wrote: > As reported in the community forum [0], backups with TPM disk on BTRFS > storages would likely get stuck since the switch to '-blockdev' in > qemu-server, i.e. commit 439f6e2a ("backup: use blockdev for TPM state > file"), which moved away from using a 'drive_add' HMP command to > attach the TPM state drive. Before that, the aio mode was QEMU's > default 'threads'. Since that commit, the aio mode is the PVE default > 'io_uring'. > > The issue is actually not BTRFS-specific, but a logic bug in QEMU's > current io_uring implementation, see the added patch for details. QEMU > 10.2 includes a major rework of the io_uring feature, so the issue is > already fixed there with QEMU commit 047dabef97 ("block/io_uring: use > aio_add_sqe()"). While still on 10.1, a different fix is needed. > > Upstream submission for the patch [1]. > > [0]: https://forum.proxmox.com/threads/170045/ > [1]: https://lists.nongnu.org/archive/html/qemu-stable/2025-11/msg00321.html this has been reviewed and applied for 10.1.3 upstream, so we should either pull it in or update to 10.1.3 (I think 10.2 still had other issues?) > Signed-off-by: Fiona Ebner > --- > ...void-potentially-getting-stuck-after.patch | 153 ++++++++++++++++++ > debian/patches/series | 1 + > 2 files changed, 154 insertions(+) > create mode 100644 debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch > > diff --git a/debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch b/debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch > new file mode 100644 > index 0000000..372ecad > --- /dev/null > +++ b/debian/patches/extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch > @@ -0,0 +1,153 @@ > +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 > +From: Fiona Ebner > +Date: Mon, 24 Nov 2025 15:28:27 +0100 > +Subject: [PATCH] block/io_uring: avoid potentially getting stuck after > + resubmit at the end of ioq_submit() > + > +Note that this issue seems already fixed as a consequence of the large > +io_uring rework with 047dabef97 ("block/io_uring: use aio_add_sqe()") > +in current master, so this is purely for QEMU stable branches. > + > +At the end of ioq_submit(), there is an opportunistic call to > +luring_process_completions(). This is the single caller of > +luring_process_completions() that doesn't use the > +luring_process_completions_and_submit() wrapper. > + > +Other callers use the wrapper, because luring_process_completions() > +might require a subsequent call to ioq_submit() after resubmitting a > +request. As noted for luring_resubmit(): > + > +> Resubmit a request by appending it to submit_queue. The caller must ensure > +> that ioq_submit() is called later so that submit_queue requests are started. > + > +So the caller at the end of ioq_submit() violates the contract and can > +in fact be problematic if no other requests come in later. In such a > +case, the request intended to be resubmitted will never be actually be > +submitted via io_uring_submit(). > + > +A reproducer exposing this issue is [0], which is based on user > +reports from [1]. Another reproducer is iotest 109 with '-i io_uring'. > + > +I had the most success to trigger the issue with [0] when using a > +BTRFS RAID 1 storage. With tmpfs, it can take quite a few iterations, > +but also triggers eventually on my machine. With iotest 109 with '-i > +io_uring' the issue triggers reliably on my ext4 file system. > + > +Have ioq_submit() submit any resubmitted requests after calling > +luring_process_completions(). The return value from io_uring_submit() > +is checked to be non-negative before the opportunistic processing of > +completions and going for the new resubmit logic, to ensure that a > +failure of io_uring_submit() is not missed. Also note that the return > +value already was not necessarily the total number of submissions, > +since the loop might've been iterated more than once even before the > +current change. > + > +Only trigger the resubmission logic if it is actually necessary to > +avoid changing behavior more than necessary. For example iotest 109 > +would produce more 'mirror ready' events if always resubmitting after > +luring_process_completions() at the end of ioq_submit(). > + > +Note iotest 109 still does not pass as is when run with '-i io_uring', > +because of two offset values for BLOCK_JOB_COMPLETED events being zero > +instead of non-zero as in the expected output. Note that the two > +affected test cases are expected failures and still fail, so they just > +fail "faster". The test cases are actually not triggering the resubmit > +logic, so the reason seems to be different ordering of requests and > +completions of the current aio=io_uring implementation versus > +aio=threads. > + > +[0]: > + > +> #!/bin/bash -e > +> #file=/mnt/btrfs/disk.raw > +> file=/tmp/disk.raw > +> filesize=256 > +> readsize=512 > +> rm -f $file > +> truncate -s $filesize $file > +> ./qemu-system-x86_64 --trace '*uring*' --qmp stdio \ > +> --blockdev raw,node-name=node0,file.driver=file,file.cache.direct=off,file.filename=$file,file.aio=io_uring \ > +> < +> {"execute": "qmp_capabilities"} > +> {"execute": "human-monitor-command", "arguments": { "command-line": "qemu-io node0 \"read 0 $readsize \"" }} > +> {"execute": "quit"} > +> EOF > + > +[1]: https://forum.proxmox.com/threads/170045/ > + > +Cc: qemu-stable@nongnu.org > +Signed-off-by: Fiona Ebner > +--- > + block/io_uring.c | 16 +++++++++++++--- > + 1 file changed, 13 insertions(+), 3 deletions(-) > + > +diff --git a/block/io_uring.c b/block/io_uring.c > +index dd4f304910..5dbafc8f7b 100644 > +--- a/block/io_uring.c > ++++ b/block/io_uring.c > +@@ -120,11 +120,14 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb, > + * event loop. When there are no events left to complete the BH is being > + * canceled. > + * > ++ * Returns whether ioq_submit() must be called again afterwards since requests > ++ * were resubmitted via luring_resubmit(). > + */ > +-static void luring_process_completions(LuringState *s) > ++static bool luring_process_completions(LuringState *s) > + { > + struct io_uring_cqe *cqes; > + int total_bytes; > ++ bool resubmit = false; > + > + defer_call_begin(); > + > +@@ -182,6 +185,7 @@ static void luring_process_completions(LuringState *s) > + */ > + if (ret == -EINTR || ret == -EAGAIN) { > + luring_resubmit(s, luringcb); > ++ resubmit = true; > + continue; > + } > + } else if (!luringcb->qiov) { > +@@ -194,6 +198,7 @@ static void luring_process_completions(LuringState *s) > + if (luringcb->is_read) { > + if (ret > 0) { > + luring_resubmit_short_read(s, luringcb, ret); > ++ resubmit = true; > + continue; > + } else { > + /* Pad with zeroes */ > +@@ -224,6 +229,8 @@ end: > + qemu_bh_cancel(s->completion_bh); > + > + defer_call_end(); > ++ > ++ return resubmit; > + } > + > + static int ioq_submit(LuringState *s) > +@@ -231,6 +238,7 @@ static int ioq_submit(LuringState *s) > + int ret = 0; > + LuringAIOCB *luringcb, *luringcb_next; > + > ++resubmit: > + while (s->io_q.in_queue > 0) { > + /* > + * Try to fetch sqes from the ring for requests waiting in > +@@ -260,12 +268,14 @@ static int ioq_submit(LuringState *s) > + } > + s->io_q.blocked = (s->io_q.in_queue > 0); > + > +- if (s->io_q.in_flight) { > ++ if (ret >= 0 && s->io_q.in_flight) { > + /* > + * We can try to complete something just right away if there are > + * still requests in-flight. > + */ > +- luring_process_completions(s); > ++ if (luring_process_completions(s)) { > ++ goto resubmit; > ++ } > + } > + return ret; > + } > diff --git a/debian/patches/series b/debian/patches/series > index b1afcd4..83e7f6d 100644 > --- a/debian/patches/series > +++ b/debian/patches/series > @@ -8,6 +8,7 @@ extra/0007-vfio-only-check-region-info-cache-for-initial-region.patch > extra/0008-ui-vdagent-fix-windows-agent-regression.patch > extra/0009-file-posix-populate-pwrite_zeroes_alignment.patch > extra/0010-block-use-pwrite_zeroes_alignment-when-writing-first.patch > +extra/0011-block-io_uring-avoid-potentially-getting-stuck-after.patch > bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch > bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch > bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch > -- > 2.47.3 > > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel