Re: [PATCH qemu] fdmon-io_uring: avoid idle event loop being accounted as IO wait

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

From: Fiona Ebner <f.ebner@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu] fdmon-io_uring: avoid idle event loop being accounted as IO wait
Date: Fri, 24 Apr 2026 11:13:16 +0200	[thread overview]
Message-ID: <525c4dad-6d04-41f0-8a21-9302b0c6baa4@proxmox.com> (raw)
In-Reply-To: <3689d164-e7d3-44c1-96b8-2b84b7342dd5@proxmox.com>

Am 23.04.26 um 10:37 PM schrieb Thomas Lamprecht:
> Am 23.04.26 um 17:34 schrieb Fiona Ebner:
>> Am 22.04.26 um 4:54 PM schrieb Fiona Ebner:
>>> Am 14.04.26 um 11:15 PM schrieb Thomas Lamprecht:
>>>> Based on Jens Axboe's succinct reply [0] on the kernel side:
>>>>> It's not "IO pressure", it's the useless iowait metric...
>>>>> [...]
>>>>> If you won't want it, just turn it off with io_uring_set_iowait().
>>>>
>>>> [0]: https://lore.kernel.org/all/49a977f3-45da-41dd-9fd6-75fd6760a591@kernel.dk/ 
>>>>
>>>
>>> During testing with an RBD stroage (no krbd), I found that compared to
>>> pve-qemu-kvm=10.1.2-7 the IO wait is still extremely much higher even
>>> with your patch, for VMs just idling around. And still "more idle"
>>> correlating to "more IO pressure". I'm seeing 0.0% or 0.x% with
>>> pve-qemu-kvm=10.1.2-7 and 40-100% even with the patch.
>>>
>>> Maybe there is a separate issue specific to the RBD block driver, but
>>> not sure why it would be. At least, there is no IO pressure when setting
>>> the IORING_ENTER_NO_IOWAIT flag unconditonally every time. My
>>> suggestions below would only lead to setting the flag less often
>>> compared to the current patch, so not help for this either. Will need to
>>> investigate more.
>>
>> So while I haven't looked at what causes librbd to behave differently,
>> the difference is that there is an additional poll SQE that gets
>> re-armed at the end of process_cqe_aio_handler() via add_poll_add_sqe().
>> Reproducer [0].
>>
>> So if we want to go with the current approach, we would also need to
>> detect if that single poll SQE is the only one in the ring to be
>> submitted for the blocking wait. Which seems pretty messy needing to
>> keep track of that between threads, and adding detection for all
>> io_uring_submit() callers to see which one submitted it.
>>
>> Alternative approach is to keep track of the number of in-flight
>> requests for which accounting should be done. Proposal below [1].
> 
> many thanks for the proposal!
> 
>>> diff --git a/include/block/aio.h b/include/block/aio.h
>>> index 6049e6a0f4..0bdba5d17f 100644
>>> --- a/include/block/aio.h
>>> +++ b/include/block/aio.h
>>> @@ -77,6 +77,8 @@ struct CqeHandler {
>>>  
>>>      /* This field is filled in before ->cb() is called */
>>>      struct io_uring_cqe cqe;
>>> +
>>> +    bool iowait_accounting;
>>>  };
>>>  
>>>  typedef QSIMPLEQ_HEAD(, CqeHandler) CqeHandlerSimpleQ;
>>> @@ -317,6 +319,12 @@ struct AioContext {
>>>  
>>>      /* Pending callback state for cqe handlers */
>>>      CqeHandlerSimpleQ cqe_handler_ready_list;
>>> +
>>> +    /*
>>> +     * Number of in-flight requests to be accounted for IO wait.
>>> +     * Must be accessed using atomics.
> 
> why though? AioContext is strictly single-threaded nowadays and (enqueue)
> and process_cqe (dequeue) run on the owning thread. Or is this just
> defensive protection for potential future changes - as it's cheap I'm
> fine with it, just wanted to know if I'm overlooking something here.

No, looking over it again, I think you are right. I was under the
impression that somehow vCPUs might also add SQEs, but no, that happens
in the iothread already.

But my suggestion only applies to file/blockdev drivers, since others
(like librbd) do not actually call aio_add_sqe(), which only happens via
luring_co_submit() and that only happens in block/file-posix.c

     prev parent reply	other threads:[~2026-04-24  9:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 21:09 Thomas Lamprecht
2026-04-22 14:55 ` Fiona Ebner
2026-04-23 15:36   ` Fiona Ebner
2026-04-23 20:39     ` Thomas Lamprecht
2026-04-24  9:13       ` Fiona Ebner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525c4dad-6d04-41f0-8a21-9302b0c6baa4@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal