From: Fiona Ebner <f.ebner@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu] fdmon-io_uring: avoid idle event loop being accounted as IO wait
Date: Fri, 24 Apr 2026 11:13:16 +0200 [thread overview]
Message-ID: <525c4dad-6d04-41f0-8a21-9302b0c6baa4@proxmox.com> (raw)
In-Reply-To: <3689d164-e7d3-44c1-96b8-2b84b7342dd5@proxmox.com>
Am 23.04.26 um 10:37 PM schrieb Thomas Lamprecht:
> Am 23.04.26 um 17:34 schrieb Fiona Ebner:
>> Am 22.04.26 um 4:54 PM schrieb Fiona Ebner:
>>> Am 14.04.26 um 11:15 PM schrieb Thomas Lamprecht:
>>>> Based on Jens Axboe's succinct reply [0] on the kernel side:
>>>>> It's not "IO pressure", it's the useless iowait metric...
>>>>> [...]
>>>>> If you won't want it, just turn it off with io_uring_set_iowait().
>>>>
>>>> [0]: https://lore.kernel.org/all/49a977f3-45da-41dd-9fd6-75fd6760a591@kernel.dk/
>>>>
>>>
>>> During testing with an RBD stroage (no krbd), I found that compared to
>>> pve-qemu-kvm=10.1.2-7 the IO wait is still extremely much higher even
>>> with your patch, for VMs just idling around. And still "more idle"
>>> correlating to "more IO pressure". I'm seeing 0.0% or 0.x% with
>>> pve-qemu-kvm=10.1.2-7 and 40-100% even with the patch.
>>>
>>> Maybe there is a separate issue specific to the RBD block driver, but
>>> not sure why it would be. At least, there is no IO pressure when setting
>>> the IORING_ENTER_NO_IOWAIT flag unconditonally every time. My
>>> suggestions below would only lead to setting the flag less often
>>> compared to the current patch, so not help for this either. Will need to
>>> investigate more.
>>
>> So while I haven't looked at what causes librbd to behave differently,
>> the difference is that there is an additional poll SQE that gets
>> re-armed at the end of process_cqe_aio_handler() via add_poll_add_sqe().
>> Reproducer [0].
>>
>> So if we want to go with the current approach, we would also need to
>> detect if that single poll SQE is the only one in the ring to be
>> submitted for the blocking wait. Which seems pretty messy needing to
>> keep track of that between threads, and adding detection for all
>> io_uring_submit() callers to see which one submitted it.
>>
>> Alternative approach is to keep track of the number of in-flight
>> requests for which accounting should be done. Proposal below [1].
>
> many thanks for the proposal!
>
>>> diff --git a/include/block/aio.h b/include/block/aio.h
>>> index 6049e6a0f4..0bdba5d17f 100644
>>> --- a/include/block/aio.h
>>> +++ b/include/block/aio.h
>>> @@ -77,6 +77,8 @@ struct CqeHandler {
>>>
>>> /* This field is filled in before ->cb() is called */
>>> struct io_uring_cqe cqe;
>>> +
>>> + bool iowait_accounting;
>>> };
>>>
>>> typedef QSIMPLEQ_HEAD(, CqeHandler) CqeHandlerSimpleQ;
>>> @@ -317,6 +319,12 @@ struct AioContext {
>>>
>>> /* Pending callback state for cqe handlers */
>>> CqeHandlerSimpleQ cqe_handler_ready_list;
>>> +
>>> + /*
>>> + * Number of in-flight requests to be accounted for IO wait.
>>> + * Must be accessed using atomics.
>
> why though? AioContext is strictly single-threaded nowadays and (enqueue)
> and process_cqe (dequeue) run on the owning thread. Or is this just
> defensive protection for potential future changes - as it's cheap I'm
> fine with it, just wanted to know if I'm overlooking something here.
No, looking over it again, I think you are right. I was under the
impression that somehow vCPUs might also add SQEs, but no, that happens
in the iothread already.
But my suggestion only applies to file/blockdev drivers, since others
(like librbd) do not actually call aio_add_sqe(), which only happens via
luring_co_submit() and that only happens in block/file-posix.c
prev parent reply other threads:[~2026-04-24 9:13 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 21:09 Thomas Lamprecht
2026-04-22 14:55 ` Fiona Ebner
2026-04-23 15:36 ` Fiona Ebner
2026-04-23 20:39 ` Thomas Lamprecht
2026-04-24 9:13 ` Fiona Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=525c4dad-6d04-41f0-8a21-9302b0c6baa4@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox