public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Fiona Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu] fdmon-io_uring: avoid idle event loop being accounted as IO wait
Date: Thu, 23 Apr 2026 22:39:24 +0200	[thread overview]
Message-ID: <3689d164-e7d3-44c1-96b8-2b84b7342dd5@proxmox.com> (raw)
In-Reply-To: <e2a83933-5bdc-44d0-83d5-106a8d1ff7ad@proxmox.com>

Am 23.04.26 um 17:34 schrieb Fiona Ebner:
> Am 22.04.26 um 4:54 PM schrieb Fiona Ebner:
>> Am 14.04.26 um 11:15 PM schrieb Thomas Lamprecht:
>>> Based on Jens Axboe's succinct reply [0] on the kernel side:
>>>> It's not "IO pressure", it's the useless iowait metric...
>>>> [...]
>>>> If you won't want it, just turn it off with io_uring_set_iowait().
>>>
>>> [0]: https://lore.kernel.org/all/49a977f3-45da-41dd-9fd6-75fd6760a591@kernel.dk/ 
>>>
>>
>> During testing with an RBD stroage (no krbd), I found that compared to
>> pve-qemu-kvm=10.1.2-7 the IO wait is still extremely much higher even
>> with your patch, for VMs just idling around. And still "more idle"
>> correlating to "more IO pressure". I'm seeing 0.0% or 0.x% with
>> pve-qemu-kvm=10.1.2-7 and 40-100% even with the patch.
>>
>> Maybe there is a separate issue specific to the RBD block driver, but
>> not sure why it would be. At least, there is no IO pressure when setting
>> the IORING_ENTER_NO_IOWAIT flag unconditonally every time. My
>> suggestions below would only lead to setting the flag less often
>> compared to the current patch, so not help for this either. Will need to
>> investigate more.
> 
> So while I haven't looked at what causes librbd to behave differently,
> the difference is that there is an additional poll SQE that gets
> re-armed at the end of process_cqe_aio_handler() via add_poll_add_sqe().
> Reproducer [0].
> 
> So if we want to go with the current approach, we would also need to
> detect if that single poll SQE is the only one in the ring to be
> submitted for the blocking wait. Which seems pretty messy needing to
> keep track of that between threads, and adding detection for all
> io_uring_submit() callers to see which one submitted it.
> 
> Alternative approach is to keep track of the number of in-flight
> requests for which accounting should be done. Proposal below [1].

many thanks for the proposal!

>> diff --git a/include/block/aio.h b/include/block/aio.h
>> index 6049e6a0f4..0bdba5d17f 100644
>> --- a/include/block/aio.h
>> +++ b/include/block/aio.h
>> @@ -77,6 +77,8 @@ struct CqeHandler {
>>  
>>      /* This field is filled in before ->cb() is called */
>>      struct io_uring_cqe cqe;
>> +
>> +    bool iowait_accounting;
>>  };
>>  
>>  typedef QSIMPLEQ_HEAD(, CqeHandler) CqeHandlerSimpleQ;
>> @@ -317,6 +319,12 @@ struct AioContext {
>>  
>>      /* Pending callback state for cqe handlers */
>>      CqeHandlerSimpleQ cqe_handler_ready_list;
>> +
>> +    /*
>> +     * Number of in-flight requests to be accounted for IO wait.
>> +     * Must be accessed using atomics.

why though? AioContext is strictly single-threaded nowadays and (enqueue)
and process_cqe (dequeue) run on the owning thread. Or is this just
defensive protection for potential future changes - as it's cheap I'm
fine with it, just wanted to know if I'm overlooking something here.

rest look nic(er!), will adapt to this.

>> +     */
>> +    uint64_t iowait_accounting_reqs;
>>  #endif /* CONFIG_LINUX_IO_URING */
>>  
>>      /* TimerLists for calling timers - one per clock type.  Has its own




  reply	other threads:[~2026-04-23 20:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 21:09 Thomas Lamprecht
2026-04-22 14:55 ` Fiona Ebner
2026-04-23 15:36   ` Fiona Ebner
2026-04-23 20:39     ` Thomas Lamprecht [this message]
2026-04-24  9:13       ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3689d164-e7d3-44c1-96b8-2b84b7342dd5@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal