From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 5C7A11FF140 for ; Fri, 24 Apr 2026 11:13:58 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A017B10E95; Fri, 24 Apr 2026 11:13:57 +0200 (CEST) Message-ID: <525c4dad-6d04-41f0-8a21-9302b0c6baa4@proxmox.com> Date: Fri, 24 Apr 2026 11:13:16 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH qemu] fdmon-io_uring: avoid idle event loop being accounted as IO wait To: Thomas Lamprecht , pve-devel@lists.proxmox.com References: <20260414211600.4023940-1-t.lamprecht@proxmox.com> <3689d164-e7d3-44c1-96b8-2b84b7342dd5@proxmox.com> Content-Language: en-US From: Fiona Ebner In-Reply-To: <3689d164-e7d3-44c1-96b8-2b84b7342dd5@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1777021910964 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.008 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: VJ5CSB32XIHNZMCFJXDTGNJAR5ICWDFN X-Message-ID-Hash: VJ5CSB32XIHNZMCFJXDTGNJAR5ICWDFN X-MailFrom: f.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 23.04.26 um 10:37 PM schrieb Thomas Lamprecht: > Am 23.04.26 um 17:34 schrieb Fiona Ebner: >> Am 22.04.26 um 4:54 PM schrieb Fiona Ebner: >>> Am 14.04.26 um 11:15 PM schrieb Thomas Lamprecht: >>>> Based on Jens Axboe's succinct reply [0] on the kernel side: >>>>> It's not "IO pressure", it's the useless iowait metric... >>>>> [...] >>>>> If you won't want it, just turn it off with io_uring_set_iowait(). >>>> >>>> [0]: https://lore.kernel.org/all/49a977f3-45da-41dd-9fd6-75fd6760a591@kernel.dk/ >>>> >>> >>> During testing with an RBD stroage (no krbd), I found that compared to >>> pve-qemu-kvm=10.1.2-7 the IO wait is still extremely much higher even >>> with your patch, for VMs just idling around. And still "more idle" >>> correlating to "more IO pressure". I'm seeing 0.0% or 0.x% with >>> pve-qemu-kvm=10.1.2-7 and 40-100% even with the patch. >>> >>> Maybe there is a separate issue specific to the RBD block driver, but >>> not sure why it would be. At least, there is no IO pressure when setting >>> the IORING_ENTER_NO_IOWAIT flag unconditonally every time. My >>> suggestions below would only lead to setting the flag less often >>> compared to the current patch, so not help for this either. Will need to >>> investigate more. >> >> So while I haven't looked at what causes librbd to behave differently, >> the difference is that there is an additional poll SQE that gets >> re-armed at the end of process_cqe_aio_handler() via add_poll_add_sqe(). >> Reproducer [0]. >> >> So if we want to go with the current approach, we would also need to >> detect if that single poll SQE is the only one in the ring to be >> submitted for the blocking wait. Which seems pretty messy needing to >> keep track of that between threads, and adding detection for all >> io_uring_submit() callers to see which one submitted it. >> >> Alternative approach is to keep track of the number of in-flight >> requests for which accounting should be done. Proposal below [1]. > > many thanks for the proposal! > >>> diff --git a/include/block/aio.h b/include/block/aio.h >>> index 6049e6a0f4..0bdba5d17f 100644 >>> --- a/include/block/aio.h >>> +++ b/include/block/aio.h >>> @@ -77,6 +77,8 @@ struct CqeHandler { >>> >>> /* This field is filled in before ->cb() is called */ >>> struct io_uring_cqe cqe; >>> + >>> + bool iowait_accounting; >>> }; >>> >>> typedef QSIMPLEQ_HEAD(, CqeHandler) CqeHandlerSimpleQ; >>> @@ -317,6 +319,12 @@ struct AioContext { >>> >>> /* Pending callback state for cqe handlers */ >>> CqeHandlerSimpleQ cqe_handler_ready_list; >>> + >>> + /* >>> + * Number of in-flight requests to be accounted for IO wait. >>> + * Must be accessed using atomics. > > why though? AioContext is strictly single-threaded nowadays and (enqueue) > and process_cqe (dequeue) run on the owning thread. Or is this just > defensive protection for potential future changes - as it's cheap I'm > fine with it, just wanted to know if I'm overlooking something here. No, looking over it again, I think you are right. I was under the impression that somehow vCPUs might also add SQEs, but no, that happens in the iothread already. But my suggestion only applies to file/blockdev drivers, since others (like librbd) do not actually call aio_add_sqe(), which only happens via luring_co_submit() and that only happens in block/file-posix.c