all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH zfsonlinux 1/2] cherry-pick fix for data corruption
Date: Wed, 29 Nov 2023 10:27:26 +0100	[thread overview]
Message-ID: <20231129092727.862143-4-f.gruenbichler@proxmox.com> (raw)
In-Reply-To: <20231129092727.862143-1-f.gruenbichler@proxmox.com>

cherry-picked from 2.2.0-staging, fixing
https://github.com/openzfs/zfs/issues/15526

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 ...heck-dnode-and-its-data-for-dirtines.patch | 97 +++++++++++++++++++
 debian/patches/series                         |  1 +
 2 files changed, 98 insertions(+)
 create mode 100644 debian/patches/0018-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch

diff --git a/debian/patches/0018-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch b/debian/patches/0018-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch
new file mode 100644
index 0000000..f79b09b
--- /dev/null
+++ b/debian/patches/0018-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch
@@ -0,0 +1,97 @@
+From 9b9b09f452a469458451c221debfbab944e7f081 Mon Sep 17 00:00:00 2001
+From: Rob N <robn@despairlabs.com>
+Date: Wed, 29 Nov 2023 04:15:48 +1100
+Subject: [PATCH] dnode_is_dirty: check dnode and its data for dirtiness
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Over its history this the dirty dnode test has been changed between
+checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
+`dn_dirty_record`.
+
+  de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
+  2531ce372 Revert "Report holes when there are only metadata changes"
+  ec4f9b8f3 Report holes when there are only metadata changes
+  454365bba Fix dirty check in dmu_offset_next()
+  66aca2473 SEEK_HOLE should not block on txg_wait_synced()
+
+Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9
+
+It turns out both are actually required.
+
+In the case of appending data to a newly created file, the dnode proper
+is dirtied (at least to change the blocksize) and dirty records are
+added.  Thus, a single logical operation is represented by separate
+dirty indicators, and must not be separated.
+
+The incorrect dirty check becomes a problem when the first block of a
+file is being appended to while another process is calling lseek to skip
+holes. There is a small window where the dnode part is undirtied while
+there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
+would not know that the file is dirty, and would go to
+`dnode_next_offset()`. Since the object has no data blocks yet, it
+returns `ESRCH`, indicating no data found, which results in `ENXIO`
+being returned to `lseek()`'s caller.
+
+Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
+uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
+replicate the holes in the target. When it hits the bug, its initial
+search for data fails, and it goes on to call `fallocate()` to create a
+hole over the entire destination file.
+
+This has come up more recently as users upgrade their systems, getting
+OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
+reproduced against 2.1, as well as on FreeBSD 13 and 14.
+
+This change simply updates the dirty check to check both types of dirty.
+If there's anything dirty at all, we immediately go to the "wait for
+sync" stage, It doesn't really matter after that; both changes are on
+disk, so the dirty fields should be correct.
+
+Sponsored-by: Klara, Inc.
+Sponsored-by: Wasabi Technology, Inc.
+Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
+Reviewed-by: Alexander Motin <mav@FreeBSD.org>
+Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
+Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
+Closes #15571
+Closes #15526
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ module/zfs/dnode.c | 12 ++++++++++--
+ 1 file changed, 10 insertions(+), 2 deletions(-)
+
+diff --git a/module/zfs/dnode.c b/module/zfs/dnode.c
+index 7cf03264d..ad9988366 100644
+--- a/module/zfs/dnode.c
++++ b/module/zfs/dnode.c
+@@ -1764,7 +1764,14 @@ dnode_try_claim(objset_t *os, uint64_t object, int slots)
+ }
+ 
+ /*
+- * Checks if the dnode contains any uncommitted dirty records.
++ * Checks if the dnode itself is dirty, or is carrying any uncommitted records.
++ * It is important to check both conditions, as some operations (eg appending
++ * to a file) can dirty both as a single logical unit, but they are not synced
++ * out atomically, so checking one and not the other can result in an object
++ * appearing to be clean mid-way through a commit.
++ *
++ * Do not change this lightly! If you get it wrong, dmu_offset_next() can
++ * detect a hole where there is really data, leading to silent corruption.
+  */
+ boolean_t
+ dnode_is_dirty(dnode_t *dn)
+@@ -1772,7 +1779,8 @@ dnode_is_dirty(dnode_t *dn)
+ 	mutex_enter(&dn->dn_mtx);
+ 
+ 	for (int i = 0; i < TXG_SIZE; i++) {
+-		if (multilist_link_active(&dn->dn_dirty_link[i])) {
++		if (multilist_link_active(&dn->dn_dirty_link[i]) ||
++		    !list_is_empty(&dn->dn_dirty_records[i])) {
+ 			mutex_exit(&dn->dn_mtx);
+ 			return (B_TRUE);
+ 		}
+-- 
+2.39.2
+
diff --git a/debian/patches/series b/debian/patches/series
index d20a605..ac820c7 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -15,3 +15,4 @@
 0015-Fix-block-cloning-between-unencrypted-and-encrypted-.patch
 0016-Add-a-tunable-to-disable-BRT-support.patch
 0017-zfs-2.2.1-Disable-block-cloning-by-default.patch
+0018-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch
-- 
2.39.2





  parent reply	other threads:[~2023-11-29  9:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-29  9:27 [pve-devel] (already applied) [PATCH zfsonlinux/kernel 0/4] Fix ZFS data corruption bug Fabian Grünbichler
2023-11-29  9:27 ` [pve-devel] [PATCH pve-kernel 1/2] update zfs to 2.2.0-pve4 Fabian Grünbichler
2023-11-29  9:27 ` [pve-devel] [PATCH pve-kernel 2/2] bump version to 6.5.11-6 Fabian Grünbichler
2023-11-29  9:27 ` Fabian Grünbichler [this message]
2023-11-29  9:27 ` [pve-devel] [PATCH zfsonlinux 2/2] bump version to 2.2.0-pve4 Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231129092727.862143-4-f.gruenbichler@proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal