[pve-devel] [PATCH qemu] block/io_uring: resubmit when result is -EAGAIN

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH qemu] block/io_uring: resubmit when result is -EAGAIN
@ 2021-07-28 10:55 Fabian Ebner
  2021-07-29  8:37 ` Thomas Lamprecht
  0 siblings, 1 reply; 2+ messages in thread
From: Fabian Ebner @ 2021-07-28 10:55 UTC (permalink / raw)
  To: pve-devel

Quoting from [0]:

 Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
 completion path. Normally we expect this to happen inline as part
 of submission, but apparently SCSI has a weird corner case where it
 can happen as part of normal completions.

Host kernels without patch [0] can panic when this happens [1], and
resubmitting makes the panic more likely. On the other hand, for
kernels with patch [0], resubmitting ensures that a block job is not
aborted just because of such spurious errors. In particular, this
should fix the problem reported in [2] with PBS backups of LVM-backed
VMs.

[0]: https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u

[1]:
  #9 [ffffb732000c8b70] asm_exc_page_fault at ffffffffa4800ade
 #10 [ffffb732000c8bf8] io_prep_async_work at ffffffffa3d89c16
 #11 [ffffb732000c8c50] io_rw_reissue at ffffffffa3d8b2e1
 #12 [ffffb732000c8c78] io_complete_rw at ffffffffa3d8baa8
 #13 [ffffb732000c8c98] blkdev_bio_end_io at ffffffffa3d62a80
 #14 [ffffb732000c8cc8] bio_endio at ffffffffa3f4e800
 #15 [ffffb732000c8ce8] dec_pending at ffffffffa432f854
 #16 [ffffb732000c8d30] clone_endio at ffffffffa433170c
 #17 [ffffb732000c8d70] bio_endio at ffffffffa3f4e800
 #18 [ffffb732000c8d90] blk_update_request at ffffffffa3f53a37
 #19 [ffffb732000c8dd0] scsi_end_request at ffffffffa4233a5c
 #20 [ffffb732000c8e08] scsi_io_completion at ffffffffa423432c
 #21 [ffffb732000c8e58] scsi_finish_command at ffffffffa422c527
 #22 [ffffb732000c8e88] scsi_softirq_done at ffffffffa42341e4

[2]: https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

Mail sent upstream:
https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg06765.html

 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 00a3ee9fb8..77d162cb24 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -165,7 +165,7 @@ static void luring_process_completions(LuringState *s)
         total_bytes = ret + luringcb->total_read;
 
         if (ret < 0) {
-            if (ret == -EINTR) {
+            if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
                 continue;
             }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [pve-devel] [PATCH qemu] block/io_uring: resubmit when result is -EAGAIN
  2021-07-28 10:55 [pve-devel] [PATCH qemu] block/io_uring: resubmit when result is -EAGAIN Fabian Ebner
@ 2021-07-29  8:37 ` Thomas Lamprecht
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Lamprecht @ 2021-07-29  8:37 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Ebner

On 28/07/2021 12:55, Fabian Ebner wrote:
> Quoting from [0]:
> 
>  Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>  completion path. Normally we expect this to happen inline as part
>  of submission, but apparently SCSI has a weird corner case where it
>  can happen as part of normal completions.
> 
> Host kernels without patch [0] can panic when this happens [1], and
> resubmitting makes the panic more likely. On the other hand, for
> kernels with patch [0], resubmitting ensures that a block job is not
> aborted just because of such spurious errors. In particular, this
> should fix the problem reported in [2] with PBS backups of LVM-backed
> VMs.
> 
> [0]: https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
> 
> [1]:
>   #9 [ffffb732000c8b70] asm_exc_page_fault at ffffffffa4800ade
>  #10 [ffffb732000c8bf8] io_prep_async_work at ffffffffa3d89c16
>  #11 [ffffb732000c8c50] io_rw_reissue at ffffffffa3d8b2e1
>  #12 [ffffb732000c8c78] io_complete_rw at ffffffffa3d8baa8
>  #13 [ffffb732000c8c98] blkdev_bio_end_io at ffffffffa3d62a80
>  #14 [ffffb732000c8cc8] bio_endio at ffffffffa3f4e800
>  #15 [ffffb732000c8ce8] dec_pending at ffffffffa432f854
>  #16 [ffffb732000c8d30] clone_endio at ffffffffa433170c
>  #17 [ffffb732000c8d70] bio_endio at ffffffffa3f4e800
>  #18 [ffffb732000c8d90] blk_update_request at ffffffffa3f53a37
>  #19 [ffffb732000c8dd0] scsi_end_request at ffffffffa4233a5c
>  #20 [ffffb732000c8e08] scsi_io_completion at ffffffffa423432c
>  #21 [ffffb732000c8e58] scsi_finish_command at ffffffffa422c527
>  #22 [ffffb732000c8e88] scsi_softirq_done at ffffffffa42341e4
> 
> [2]: https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> Mail sent upstream:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg06765.html

I see that it's already reviewed (at least after the maintainers got CC'd ;)) and OK
besides some missing comments, so great work!

Can you please send this here as "patch of a patch" to be applied on pve-qemu though?
Would make it easier and commit the actual author to that git history too.

I'd put it into the `debian/patches/extra` folder there, it'll be gone again when
updating the submodule to a future QEMU 6.1 anyway, and it's not PVE specific.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-07-29  8:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28 10:55 [pve-devel] [PATCH qemu] block/io_uring: resubmit when result is -EAGAIN Fabian Ebner
2021-07-29  8:37 ` Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal