From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id B62148E97
 for <pve-devel@lists.proxmox.com>; Tue,  7 Mar 2023 15:20:21 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 7BB0DCD90
 for <pve-devel@lists.proxmox.com>; Tue,  7 Mar 2023 15:19:51 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Tue,  7 Mar 2023 15:19:49 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id D545A45D17
 for <pve-devel@lists.proxmox.com>; Tue,  7 Mar 2023 15:19:48 +0100 (CET)
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Tue,  7 Mar 2023 15:19:44 +0100
Message-Id: <20230307141944.2337485-1-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.30.2
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.003 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox.com]
Subject: [pve-devel] [PATCH kernel] add patch to fix issue with large IO
 requests
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 07 Mar 2023 14:20:21 -0000

Several people reported IO-related issues since kernel 6.1.6 [0].
Things got better with 6.1.10, but apparently the issues are not fully
resolved (e.g. [1]).

I ran into an issue with PBS backup of a VM with passed-through disks
(error with 6.1.6, hang with 6.1.10+) and found that the issue did not
occur anymore with v6.3-rc1. Bisecting what fixed the issue led to the
commit in this patch. The hope is that it fixes some other issues too.

The commit has a CC-stable tag for 5.15+, but telling from the absence
of user reports, it was much less likely to trigger before 6.1.x (it's
not clear what x is, because of the other issue in 6.1.6). The commit
says it depends on 613b14884b85 ("block: handle bio_split_to_limits()
NULL return") which is already present as a3f1c82e0413 ("block:
handle bio_split_to_limits() NULL return") in the Ubuntu tree.

[0]: https://forum.proxmox.com/threads/119483/post-530365
[1]: https://forum.proxmox.com/threads/119483/post-537991

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 ...w-multiple-bios-for-IOCB_NOWAIT-issu.patch | 69 +++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 patches/kernel/0018-block-don-t-allow-multiple-bios-for-IOCB_NOWAIT-issu.patch

diff --git a/patches/kernel/0018-block-don-t-allow-multiple-bios-for-IOCB_NOWAIT-issu.patch b/patches/kernel/0018-block-don-t-allow-multiple-bios-for-IOCB_NOWAIT-issu.patch
new file mode 100644
index 0000000..12e4453
--- /dev/null
+++ b/patches/kernel/0018-block-don-t-allow-multiple-bios-for-IOCB_NOWAIT-issu.patch
@@ -0,0 +1,69 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Jens Axboe <axboe@kernel.dk>
+Date: Mon, 16 Jan 2023 08:55:53 -0700
+Subject: [PATCH] block: don't allow multiple bios for IOCB_NOWAIT issue
+
+If we're doing a large IO request which needs to be split into multiple
+bios for issue, then we can run into the same situation as the below
+marked commit fixes - parts will complete just fine, one or more parts
+will fail to allocate a request. This will result in a partially
+completed read or write request, where the caller gets EAGAIN even though
+parts of the IO completed just fine.
+
+Do the same for large bios as we do for splits - fail a NOWAIT request
+with EAGAIN. This isn't technically fixing an issue in the below marked
+patch, but for stable purposes, we should have either none of them or
+both.
+
+This depends on: 613b14884b85 ("block: handle bio_split_to_limits() NULL return")
+
+Cc: stable@vger.kernel.org # 5.15+
+Fixes: 9cea62b2cbab ("block: don't allow splitting of a REQ_NOWAIT bio")
+Link: https://github.com/axboe/liburing/issues/766
+Reported-and-tested-by: Michael Kelley <mikelley@microsoft.com>
+Signed-off-by: Jens Axboe <axboe@kernel.dk>
+(commit 67d59247d4b52c917e373f05a807027756ab216f upstream)
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+---
+ block/fops.c | 21 ++++++++++++++++++---
+ 1 file changed, 18 insertions(+), 3 deletions(-)
+
+diff --git a/block/fops.c b/block/fops.c
+index b90742595317..e406aa605327 100644
+--- a/block/fops.c
++++ b/block/fops.c
+@@ -221,6 +221,24 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
+ 			bio_endio(bio);
+ 			break;
+ 		}
++		if (iocb->ki_flags & IOCB_NOWAIT) {
++			/*
++			 * This is nonblocking IO, and we need to allocate
++			 * another bio if we have data left to map. As we
++			 * cannot guarantee that one of the sub bios will not
++			 * fail getting issued FOR NOWAIT and as error results
++			 * are coalesced across all of them, be safe and ask for
++			 * a retry of this from blocking context.
++			 */
++			if (unlikely(iov_iter_count(iter))) {
++				bio_release_pages(bio, false);
++				bio_clear_flag(bio, BIO_REFFED);
++				bio_put(bio);
++				blk_finish_plug(&plug);
++				return -EAGAIN;
++			}
++			bio->bi_opf |= REQ_NOWAIT;
++		}
+ 
+ 		if (is_read) {
+ 			if (dio->flags & DIO_SHOULD_DIRTY)
+@@ -228,9 +246,6 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
+ 		} else {
+ 			task_io_account_write(bio->bi_iter.bi_size);
+ 		}
+-		if (iocb->ki_flags & IOCB_NOWAIT)
+-			bio->bi_opf |= REQ_NOWAIT;
+-
+ 		dio->size += bio->bi_iter.bi_size;
+ 		pos += bio->bi_iter.bi_size;
+ 
-- 
2.30.2