From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id B14171FF161 for ; Tue, 22 Oct 2024 18:06:10 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 84EED10E3E; Tue, 22 Oct 2024 18:06:47 +0200 (CEST) Message-ID: <36db42d0-7ade-4203-a2d2-eb7df20eed72@proxmox.com> Date: Tue, 22 Oct 2024 18:06:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Fiona Ebner , Proxmox VE development discussion References: <20241002143624.1260363-1-c.ebner@proxmox.com> <7d487881-6851-4c32-b2a2-dbb7ccdfe4e5@proxmox.com> Content-Language: en-US, de-DE From: Christian Ebner In-Reply-To: <7d487881-6851-4c32-b2a2-dbb7ccdfe4e5@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.027 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pve-devel] [[PATCH kernel]] fix 5683: netfs: reset subreq iov iter before tail clean X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" On 10/22/24 14:50, Fiona Ebner wrote: > Am 02.10.24 um 16:36 schrieb Christian Ebner: >> Fixes rare read corruption issues using the in kernel ceph client. >> >> On incomplete read requests, the clean tail flag should make sure to >> zero fill the remaining bytes for the subrequest. >> If the iov iterator is not at the correct position, this can however >> zero fill downloaded data, corrupting the read content. >> >> Link to issue: >> https://bugzilla.proxmox.com/show_bug.cgi?id=5683 >> >> Link to upstream issue: >> https://bugzilla.kernel.org/show_bug.cgi?id=219237 >> >> Signed-off-by: Christian Ebner >> --- >> This fixes the read corruption issue with my local reproducer. >> >> Providing a patched kernel to users affected by the issue for testing >> would be probably the best way to verify the fix. >> >> Also, I reached out once again to the kernel developers asking if >> this fix is a valid approach, hoping this can be included in current >> stable (as the patch does fix the issue also when applied on 6.11.1). >> >> ...et-subreq-iov-iter-before-tail-clean.patch | 31 +++++++++++++++++++ >> 1 file changed, 31 insertions(+) >> create mode 100644 patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch >> >> diff --git a/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch b/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch >> new file mode 100644 >> index 0000000..a87e722 >> --- /dev/null >> +++ b/patches/kernel/0021-netfs-reset-subreq-iov-iter-before-tail-clean.patch >> @@ -0,0 +1,31 @@ >> +From cd27abf0c555f39b12c05f9f6a8cb59ff25dfe45 Mon Sep 17 00:00:00 2001 >> +From: Christian Ebner >> +Date: Wed, 2 Oct 2024 15:24:31 +0200 >> +Subject: [PATCH] netfs: reset subreq iov iter before tail clean >> + >> +Make sure the iter is at the correct location when cleaning up tail >> +bytes for incomplete read subrequests. >> + > > Disclaimer that I'm not familiar at all with the code. > > So AFAIU, after short IO, the iov_iter_count() and subreq->len - > subreq->transferred might disagree. That is why before resubmission, > netfs_reset_subreq_iter() is called. That function aligns the iterator > position, so it will match the information from 'subreq'. > > In your edge case, there is no resubmission though, because the > NETFS_SREQ_CLEAR_TAIL flag is set. But it still was short IO, so the > mentioned mismatch happened. > > Now netfs_clear_unread() relies on the information from > iov_iter_count(), which does not match the actual 'subreq'. To fix it, > you call netfs_reset_subreq_iter() (like is done before resubmission) to > align that information. > > Before commit 92b6cc5d1e7c ("netfs: Add iov_iters to (sub)requests to > describe various buffers"), the information from the 'subreq' was used > to set up the iterator: > >> diff --git a/fs/netfs/io.c b/fs/netfs/io.c >> index 7f753380e047..e9d408e211b8 100644 >> --- a/fs/netfs/io.c >> +++ b/fs/netfs/io.c >> @@ -21,12 +21,7 @@ >> */ >> static void netfs_clear_unread(struct netfs_io_subrequest *subreq) >> { >> - struct iov_iter iter; >> - >> - iov_iter_xarray(&iter, ITER_DEST, &subreq->rreq->mapping->i_pages, >> - subreq->start + subreq->transferred, >> - subreq->len - subreq->transferred); >> - iov_iter_zero(iov_iter_count(&iter), &iter); >> + iov_iter_zero(iov_iter_count(&subreq->io_iter), &subreq->io_iter); >> } > > so that sounds good :) > > So with and without your change, after the netfs_clear_unread() call, > the iterator will be in the final position, i.e. iov_iter_count() == 0? > Then the information in 'subreq' is updated manually in the same branch > and it moves on to completion. I don't recall the exact code paths anymore from the top of my head, sorry. Will have to look at it once again, but the essential is that the iov_iter_zero() incorrectly clears out the data, which leads to the read corruption, yes. As I too do not have a in depth knowledge of this code base, I was hoping for upstream to confirm the validity of the patch. > How far off from reality am I ;)? FWIW, the change looks okay to me, but > again, I'm not familiar with the code and I haven't done any testing > (and have no reproducer). > > Of course it would be much nicer to have some confirmation from upstream > and/or users about this. Agreed, unfortunately no feedback so far. >> +Fixes: 92b6cc5d ("netfs: Add iov_iters to (sub)requests to describe various buffers") >> +Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219237 >> + >> +Signed-off-by: Christian Ebner >> +--- >> + fs/netfs/io.c | 1 + >> + 1 file changed, 1 insertion(+) >> + >> +diff --git a/fs/netfs/io.c b/fs/netfs/io.c >> +index d6ada4eba744..500119285346 100644 >> +--- a/fs/netfs/io.c >> ++++ b/fs/netfs/io.c >> +@@ -528,6 +528,7 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, >> + >> + incomplete: >> + if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { >> ++ netfs_reset_subreq_iter(rreq, subreq); >> + netfs_clear_unread(subreq); >> + subreq->transferred = subreq->len; >> + goto complete; >> +-- >> +2.39.5 >> + _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel