public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4
@ 2024-05-07 15:02 Stoiko Ivanov
  2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 1/2] update zfs submodule to 2.2.4 and refresh patches Stoiko Ivanov
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Stoiko Ivanov @ 2024-05-07 15:02 UTC (permalink / raw)
  To: pve-devel

v1->v2:
Patch 2/2 (adaptation of arc_summary/arcstat patch) modified:
* right after sending the v1 I saw a report where pinning kernel 6.2 (thus
  ZFS 2.1) leads to a similar traceback - which I seem to have overlooked
  when packaging 2.2.0 ...
  adapted the patch by booting a VM with kernel 6.2 and the current
  userspace and running arc_summary /arcstat -a until no traceback was
  displayed with a single-disk pool.

original cover-letter for v1:
This patchset updates ZFS to the recently released 2.2.4

We had about half of the patches already in 2.2.3-2, due to the needed
support for kernel 6.8.

Compared to the last 2.2 point releases this one compares quite a few
potential performance improvments:
* for ZVOL workloads (relevant for qemu guests) multiple taskq were
  introduced [1] - this change is active by default (can be put back to
  the old behavior with explicitly setting `zvol_num_taskqs=1`
* the interface for ZFS submitting operations to the kernel's block layer
  was augmented to better deal with split-pages [2] - which should also
  improve performance, and prevent unaligned writes which are rejected by
  e.g. the SCSI subsystem. - The default remains with the current code
  (`zfs_vdev_disk_classic=0` turns on the 'new' behavior...)
* Speculative prefetching was improved [3], which introduced not kstats,
  which are reported by`arc_summary` and `arcstat`, as before with the
  MRU/MFU additions there was not guard for running the new user-space
  with an old kernel resulting in Python exceptions of both tools.
  I adapted the patch where Thomas fixed that back in the 2.1 release
  times. - sending as separate patch for easier review - and I hope it's
  ok that I dropped the S-o-b tag (as it's changed code) - glad to resend
  it, if this should be adapted.

Minimally tested on 2 VMs (the arcstat/arc_summary changes by running with
an old kernel and new user-space)


[0] https://github.com/openzfs/zfs/releases/tag/zfs-2.2.4
[1] https://github.com/openzfs/zfs/pull/15992
[2] https://github.com/openzfs/zfs/pull/15588
[3] https://github.com/openzfs/zfs/pull/16022

Stoiko Ivanov (2):
  update zfs submodule to 2.2.4 and refresh patches
  update arc_summary arcstat patch with new introduced values

 ...md-unit-for-importing-specific-pools.patch |   4 +-
 ...-move-manpage-arcstat-1-to-arcstat-8.patch |   2 +-
 ...-guard-access-to-freshly-introduced-.patch | 438 ++++++++++++
 ...-guard-access-to-l2arc-MFU-MRU-stats.patch | 113 ---
 ...hten-bounds-for-noalloc-stat-availab.patch |   4 +-
 ...rectly-handle-partition-16-and-later.patch |  52 --
 ...-use-splice_copy_file_range-for-fall.patch | 135 ----
 .../0014-linux-5.4-compat-page_size.patch     | 121 ----
 .../patches/0015-abd-add-page-iterator.patch  | 334 ---------
 ...-existing-functions-to-vdev_classic_.patch | 349 ---------
 ...v_disk-reorganise-vdev_disk_io_start.patch | 111 ---
 ...-read-write-IO-function-configurable.patch |  69 --
 ...e-BIO-filling-machinery-to-avoid-spl.patch | 671 ------------------
 ...dule-parameter-to-select-BIO-submiss.patch | 104 ---
 ...se-bio_chain-to-submit-multiple-BIOs.patch | 363 ----------
 ...on-t-use-compound-heads-on-Linux-4.5.patch |  96 ---
 ...ault-to-classic-submission-for-2.2.x.patch |  90 ---
 ...ion-caused-by-mmap-flushing-problems.patch | 104 ---
 ...touch-vbio-after-its-handed-off-to-t.patch |  57 --
 debian/patches/series                         |  16 +-
 upstream                                      |   2 +-
 21 files changed, 445 insertions(+), 2790 deletions(-)
 create mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
 delete mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
 delete mode 100644 debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch
 delete mode 100644 debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
 delete mode 100644 debian/patches/0014-linux-5.4-compat-page_size.patch
 delete mode 100644 debian/patches/0015-abd-add-page-iterator.patch
 delete mode 100644 debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
 delete mode 100644 debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch
 delete mode 100644 debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch
 delete mode 100644 debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
 delete mode 100644 debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
 delete mode 100644 debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
 delete mode 100644 debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
 delete mode 100644 debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
 delete mode 100644 debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
 delete mode 100644 debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch

-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [pve-devel] [PATCH zfsonlinux v2 1/2] update zfs submodule to 2.2.4 and refresh patches
  2024-05-07 15:02 [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Stoiko Ivanov
@ 2024-05-07 15:02 ` Stoiko Ivanov
  2024-05-21 13:56   ` Max Carrara
  2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 2/2] update arc_summary arcstat patch with new introduced values Stoiko Ivanov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Stoiko Ivanov @ 2024-05-07 15:02 UTC (permalink / raw)
  To: pve-devel

mostly - drop all patches we had queued up to get kernel 6.8
supported.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 ...md-unit-for-importing-specific-pools.patch |   4 +-
 ...-move-manpage-arcstat-1-to-arcstat-8.patch |   2 +-
 ...-guard-access-to-l2arc-MFU-MRU-stats.patch |  12 +-
 ...hten-bounds-for-noalloc-stat-availab.patch |   4 +-
 ...rectly-handle-partition-16-and-later.patch |  52 --
 ...-use-splice_copy_file_range-for-fall.patch | 135 ----
 .../0014-linux-5.4-compat-page_size.patch     | 121 ----
 .../patches/0015-abd-add-page-iterator.patch  | 334 ---------
 ...-existing-functions-to-vdev_classic_.patch | 349 ---------
 ...v_disk-reorganise-vdev_disk_io_start.patch | 111 ---
 ...-read-write-IO-function-configurable.patch |  69 --
 ...e-BIO-filling-machinery-to-avoid-spl.patch | 671 ------------------
 ...dule-parameter-to-select-BIO-submiss.patch | 104 ---
 ...se-bio_chain-to-submit-multiple-BIOs.patch | 363 ----------
 ...on-t-use-compound-heads-on-Linux-4.5.patch |  96 ---
 ...ault-to-classic-submission-for-2.2.x.patch |  90 ---
 ...ion-caused-by-mmap-flushing-problems.patch | 104 ---
 ...touch-vbio-after-its-handed-off-to-t.patch |  57 --
 debian/patches/series                         |  14 -
 upstream                                      |   2 +-
 20 files changed, 12 insertions(+), 2682 deletions(-)
 delete mode 100644 debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch
 delete mode 100644 debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
 delete mode 100644 debian/patches/0014-linux-5.4-compat-page_size.patch
 delete mode 100644 debian/patches/0015-abd-add-page-iterator.patch
 delete mode 100644 debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
 delete mode 100644 debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch
 delete mode 100644 debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch
 delete mode 100644 debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
 delete mode 100644 debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
 delete mode 100644 debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
 delete mode 100644 debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
 delete mode 100644 debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
 delete mode 100644 debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
 delete mode 100644 debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch

diff --git a/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch b/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch
index 8232978c..0600296f 100644
--- a/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch
+++ b/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch
@@ -18,7 +18,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
 ---
  etc/Makefile.am                           |  1 +
  etc/systemd/system/50-zfs.preset          |  1 +
- etc/systemd/system/zfs-import@.service.in | 18 ++++++++++++++++
+ etc/systemd/system/zfs-import@.service.in | 18 ++++++++++++++++++
  3 files changed, 20 insertions(+)
  create mode 100644 etc/systemd/system/zfs-import@.service.in
 
@@ -48,7 +48,7 @@ index e4056a92c..030611419 100644
  enable zfs-share.service
 diff --git a/etc/systemd/system/zfs-import@.service.in b/etc/systemd/system/zfs-import@.service.in
 new file mode 100644
-index 000000000..9b4ee9371
+index 000000000..5bd19fb79
 --- /dev/null
 +++ b/etc/systemd/system/zfs-import@.service.in
 @@ -0,0 +1,18 @@
diff --git a/debian/patches/0008-Patch-move-manpage-arcstat-1-to-arcstat-8.patch b/debian/patches/0008-Patch-move-manpage-arcstat-1-to-arcstat-8.patch
index c11c1ae8..9a4aea56 100644
--- a/debian/patches/0008-Patch-move-manpage-arcstat-1-to-arcstat-8.patch
+++ b/debian/patches/0008-Patch-move-manpage-arcstat-1-to-arcstat-8.patch
@@ -15,7 +15,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
  rename man/{man1/arcstat.1 => man8/arcstat.8} (99%)
 
 diff --git a/man/Makefile.am b/man/Makefile.am
-index 45156571e..3713e9371 100644
+index 43bb014dd..a9293468a 100644
 --- a/man/Makefile.am
 +++ b/man/Makefile.am
 @@ -2,7 +2,6 @@ dist_noinst_man_MANS = \
diff --git a/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch b/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
index f8cb3539..2e7c207d 100644
--- a/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
+++ b/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
@@ -27,7 +27,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
  2 files changed, 21 insertions(+), 21 deletions(-)
 
 diff --git a/cmd/arc_summary b/cmd/arc_summary
-index 9c69ec4f8..edf94ea2a 100755
+index 100fb1987..86b2260a1 100755
 --- a/cmd/arc_summary
 +++ b/cmd/arc_summary
 @@ -655,13 +655,13 @@ def section_arc(kstats_dict):
@@ -48,7 +48,7 @@ index 9c69ec4f8..edf94ea2a 100755
      prt_i1('L2 ineligible evictions:',
             f_bytes(arc_stats['evict_l2_ineligible']))
      print()
-@@ -851,20 +851,20 @@ def section_l2arc(kstats_dict):
+@@ -860,20 +860,20 @@ def section_l2arc(kstats_dict):
             f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
             f_bytes(arc_stats['l2_hdr_size']))
      prt_i2('MFU allocated size:',
@@ -80,10 +80,10 @@ index 9c69ec4f8..edf94ea2a 100755
      print()
      prt_1('L2ARC breakdown:', f_hits(l2_access_total))
 diff --git a/cmd/arcstat.in b/cmd/arcstat.in
-index 8df1c62f7..833348d0e 100755
+index c4f10a1d6..c570dca88 100755
 --- a/cmd/arcstat.in
 +++ b/cmd/arcstat.in
-@@ -565,8 +565,8 @@ def calculate():
+@@ -597,8 +597,8 @@ def calculate():
      v["el2skip"] = d["evict_l2_skip"] // sint
      v["el2cach"] = d["evict_l2_cached"] // sint
      v["el2el"] = d["evict_l2_eligible"] // sint
@@ -93,8 +93,8 @@ index 8df1c62f7..833348d0e 100755
 +    v["el2mru"] = d.get("evict_l2_eligible_mru", 0) // sint
      v["el2inel"] = d["evict_l2_ineligible"] // sint
      v["mtxmis"] = d["mutex_miss"] // sint
- 
-@@ -581,11 +581,11 @@ def calculate():
+     v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
+@@ -624,11 +624,11 @@ def calculate():
          v["l2size"] = cur["l2_size"]
          v["l2bytes"] = d["l2_read_bytes"] // sint
  
diff --git a/debian/patches/0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch b/debian/patches/0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
index 3c87b0cb..29c7f9ab 100644
--- a/debian/patches/0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
+++ b/debian/patches/0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
@@ -51,10 +51,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/cmd/zpool/zpool_main.c b/cmd/zpool/zpool_main.c
-index 69bf9649a..fd42ce7c1 100644
+index ed0b8d7a1..f3acc49d0 100644
 --- a/cmd/zpool/zpool_main.c
 +++ b/cmd/zpool/zpool_main.c
-@@ -2616,7 +2616,8 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
+@@ -2663,7 +2663,8 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
  
  	if (vs->vs_scan_removing != 0) {
  		(void) printf(gettext("  (removing)"));
diff --git a/debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch b/debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch
deleted file mode 100644
index 578b74bd..00000000
--- a/debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch
+++ /dev/null
@@ -1,52 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
-Date: Wed, 6 Mar 2024 10:39:06 +0100
-Subject: [PATCH] udev: correctly handle partition #16 and later
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-If a zvol has more than 15 partitions, the minor device number exhausts
-the slot count reserved for partitions next to the zvol itself. As a
-result, the minor number cannot be used to determine the partition
-number for the higher partition, and doing so results in wrong named
-symlinks being generated by udev.
-
-Since the partition number is encoded in the block device name anyway,
-let's just extract it from there instead.
-
-Fixes: #15904
-
-Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
-Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
----
- udev/zvol_id.c | 9 +++++----
- 1 file changed, 5 insertions(+), 4 deletions(-)
-
-diff --git a/udev/zvol_id.c b/udev/zvol_id.c
-index 5960b9787..609349594 100644
---- a/udev/zvol_id.c
-+++ b/udev/zvol_id.c
-@@ -51,7 +51,7 @@ const char *__asan_default_options(void) {
- int
- main(int argc, const char *const *argv)
- {
--	if (argc != 2) {
-+	if (argc != 2 || strncmp(argv[1], "/dev/zd", 7) != 0) {
- 		fprintf(stderr, "usage: %s /dev/zdX\n", argv[0]);
- 		return (1);
- 	}
-@@ -72,9 +72,10 @@ main(int argc, const char *const *argv)
- 		return (1);
- 	}
- 
--	unsigned int dev_part = minor(sb.st_rdev) % ZVOL_MINORS;
--	if (dev_part != 0)
--		sprintf(zvol_name + strlen(zvol_name), "-part%u", dev_part);
-+	const char *dev_part = strrchr(dev_name, 'p');
-+	if (dev_part != NULL) {
-+		sprintf(zvol_name + strlen(zvol_name), "-part%s", dev_part + 1);
-+	}
- 
- 	for (size_t i = 0; i < strlen(zvol_name); ++i)
- 		if (isblank(zvol_name[i]))
diff --git a/debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch b/debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
deleted file mode 100644
index 380d77c9..00000000
--- a/debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
+++ /dev/null
@@ -1,135 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob N <robn@despairlabs.com>
-Date: Thu, 21 Mar 2024 10:46:15 +1100
-Subject: [PATCH] Linux 6.8 compat: use splice_copy_file_range() for fallback
-
-Linux 6.8 removes generic_copy_file_range(), which had been reduced to a
-simple wrapper around splice_copy_file_range(). Detect that function
-directly and use it if generic_ is not available.
-
-Sponsored-by: https://despairlabs.com/sponsor/
-Reviewed-by: Tony Hutter <hutter2@llnl.gov>
-Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <robn@despairlabs.com>
-Closes #15930
-Closes #15931
-(cherry picked from commit ef08a4d4065d21414d7fedccac20da6bfda4dfd0)
----
- config/kernel-vfs-file_range.m4      | 27 +++++++++++++++++++++++++++
- config/kernel.m4                     |  2 ++
- module/os/linux/zfs/zpl_file_range.c | 16 ++++++++++++++--
- 3 files changed, 43 insertions(+), 2 deletions(-)
-
-diff --git a/config/kernel-vfs-file_range.m4 b/config/kernel-vfs-file_range.m4
-index cc96404d8..8a5cbe2ee 100644
---- a/config/kernel-vfs-file_range.m4
-+++ b/config/kernel-vfs-file_range.m4
-@@ -16,6 +16,9 @@ dnl #
- dnl # 5.3: VFS copy_file_range() expected to do its own fallback,
- dnl #      generic_copy_file_range() added to support it
- dnl #
-+dnl # 6.8: generic_copy_file_range() removed, replaced by
-+dnl #      splice_copy_file_range()
-+dnl #
- AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE], [
- 	ZFS_LINUX_TEST_SRC([vfs_copy_file_range], [
- 		#include <linux/fs.h>
-@@ -72,6 +75,30 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE], [
- 	])
- ])
- 
-+AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_SPLICE_COPY_FILE_RANGE], [
-+	ZFS_LINUX_TEST_SRC([splice_copy_file_range], [
-+		#include <linux/splice.h>
-+	], [
-+		struct file *src_file __attribute__ ((unused)) = NULL;
-+		loff_t src_off __attribute__ ((unused)) = 0;
-+		struct file *dst_file __attribute__ ((unused)) = NULL;
-+		loff_t dst_off __attribute__ ((unused)) = 0;
-+		size_t len __attribute__ ((unused)) = 0;
-+		splice_copy_file_range(src_file, src_off, dst_file, dst_off,
-+		    len);
-+	])
-+])
-+AC_DEFUN([ZFS_AC_KERNEL_VFS_SPLICE_COPY_FILE_RANGE], [
-+	AC_MSG_CHECKING([whether splice_copy_file_range() is available])
-+	ZFS_LINUX_TEST_RESULT([splice_copy_file_range], [
-+		AC_MSG_RESULT(yes)
-+		AC_DEFINE(HAVE_VFS_SPLICE_COPY_FILE_RANGE, 1,
-+		    [splice_copy_file_range() is available])
-+	],[
-+		AC_MSG_RESULT(no)
-+	])
-+])
-+
- AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE], [
- 	ZFS_LINUX_TEST_SRC([vfs_clone_file_range], [
- 		#include <linux/fs.h>
-diff --git a/config/kernel.m4 b/config/kernel.m4
-index e3f864577..1d0c5a27f 100644
---- a/config/kernel.m4
-+++ b/config/kernel.m4
-@@ -118,6 +118,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
- 	ZFS_AC_KERNEL_SRC_VFS_IOV_ITER
- 	ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE
- 	ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE
-+	ZFS_AC_KERNEL_SRC_VFS_SPLICE_COPY_FILE_RANGE
- 	ZFS_AC_KERNEL_SRC_VFS_REMAP_FILE_RANGE
- 	ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE
- 	ZFS_AC_KERNEL_SRC_VFS_DEDUPE_FILE_RANGE
-@@ -266,6 +267,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
- 	ZFS_AC_KERNEL_VFS_IOV_ITER
- 	ZFS_AC_KERNEL_VFS_COPY_FILE_RANGE
- 	ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE
-+	ZFS_AC_KERNEL_VFS_SPLICE_COPY_FILE_RANGE
- 	ZFS_AC_KERNEL_VFS_REMAP_FILE_RANGE
- 	ZFS_AC_KERNEL_VFS_CLONE_FILE_RANGE
- 	ZFS_AC_KERNEL_VFS_DEDUPE_FILE_RANGE
-diff --git a/module/os/linux/zfs/zpl_file_range.c b/module/os/linux/zfs/zpl_file_range.c
-index 3065d54fa..64728fdb1 100644
---- a/module/os/linux/zfs/zpl_file_range.c
-+++ b/module/os/linux/zfs/zpl_file_range.c
-@@ -26,6 +26,9 @@
- #include <linux/compat.h>
- #endif
- #include <linux/fs.h>
-+#ifdef HAVE_VFS_SPLICE_COPY_FILE_RANGE
-+#include <linux/splice.h>
-+#endif
- #include <sys/file.h>
- #include <sys/zfs_znode.h>
- #include <sys/zfs_vnops.h>
-@@ -102,7 +105,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
- 	ret = zpl_clone_file_range_impl(src_file, src_off,
- 	    dst_file, dst_off, len);
- 
--#ifdef HAVE_VFS_GENERIC_COPY_FILE_RANGE
-+#if defined(HAVE_VFS_GENERIC_COPY_FILE_RANGE)
- 	/*
- 	 * Since Linux 5.3 the filesystem driver is responsible for executing
- 	 * an appropriate fallback, and a generic fallback function is provided.
-@@ -111,6 +114,15 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
- 	    ret == -EAGAIN)
- 		ret = generic_copy_file_range(src_file, src_off, dst_file,
- 		    dst_off, len, flags);
-+#elif defined(HAVE_VFS_SPLICE_COPY_FILE_RANGE)
-+	/*
-+	 * Since 6.8 the fallback function is called splice_copy_file_range
-+	 * and has a slightly different signature.
-+	 */
-+	if (ret == -EOPNOTSUPP || ret == -EINVAL || ret == -EXDEV ||
-+	    ret == -EAGAIN)
-+		ret = splice_copy_file_range(src_file, src_off, dst_file,
-+		    dst_off, len);
- #else
- 	/*
- 	 * Before Linux 5.3 the filesystem has to return -EOPNOTSUPP to signal
-@@ -118,7 +130,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
- 	 */
- 	if (ret == -EINVAL || ret == -EXDEV || ret == -EAGAIN)
- 		ret = -EOPNOTSUPP;
--#endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE */
-+#endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE || HAVE_VFS_SPLICE_COPY_FILE_RANGE */
- 
- 	return (ret);
- }
diff --git a/debian/patches/0014-linux-5.4-compat-page_size.patch b/debian/patches/0014-linux-5.4-compat-page_size.patch
deleted file mode 100644
index 258c025d..00000000
--- a/debian/patches/0014-linux-5.4-compat-page_size.patch
+++ /dev/null
@@ -1,121 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Mon, 13 Nov 2023 17:55:29 +1100
-Subject: [PATCH] linux 5.4 compat: page_size()
-
-Before 5.4 we have to do a little math.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit df04efe321a49c650f1fbaa6fd701fa2928cbe21)
----
- config/kernel-mm-page-size.m4             | 17 +++++++++++
- config/kernel.m4                          |  2 ++
- include/os/linux/Makefile.am              |  1 +
- include/os/linux/kernel/linux/mm_compat.h | 36 +++++++++++++++++++++++
- 4 files changed, 56 insertions(+)
- create mode 100644 config/kernel-mm-page-size.m4
- create mode 100644 include/os/linux/kernel/linux/mm_compat.h
-
-diff --git a/config/kernel-mm-page-size.m4 b/config/kernel-mm-page-size.m4
-new file mode 100644
-index 000000000..d5ebd9269
---- /dev/null
-+++ b/config/kernel-mm-page-size.m4
-@@ -0,0 +1,17 @@
-+AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE], [
-+	ZFS_LINUX_TEST_SRC([page_size], [
-+		#include <linux/mm.h>
-+	],[
-+		unsigned long s;
-+		s = page_size(NULL);
-+	])
-+])
-+AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_SIZE], [
-+	AC_MSG_CHECKING([whether page_size() is available])
-+	ZFS_LINUX_TEST_RESULT([page_size], [
-+		AC_MSG_RESULT(yes)
-+		AC_DEFINE(HAVE_MM_PAGE_SIZE, 1, [page_size() is available])
-+	],[
-+		AC_MSG_RESULT(no)
-+	])
-+])
-diff --git a/config/kernel.m4 b/config/kernel.m4
-index 1d0c5a27f..548905ccd 100644
---- a/config/kernel.m4
-+++ b/config/kernel.m4
-@@ -167,6 +167,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
- 	ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_TABLE
- 	ZFS_AC_KERNEL_SRC_COPY_SPLICE_READ
- 	ZFS_AC_KERNEL_SRC_SYNC_BDEV
-+	ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE
- 	case "$host_cpu" in
- 		powerpc*)
- 			ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
-@@ -316,6 +317,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
- 	ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE
- 	ZFS_AC_KERNEL_COPY_SPLICE_READ
- 	ZFS_AC_KERNEL_SYNC_BDEV
-+	ZFS_AC_KERNEL_MM_PAGE_SIZE
- 	case "$host_cpu" in
- 		powerpc*)
- 			ZFS_AC_KERNEL_CPU_HAS_FEATURE
-diff --git a/include/os/linux/Makefile.am b/include/os/linux/Makefile.am
-index 3830d198d..51c27132b 100644
---- a/include/os/linux/Makefile.am
-+++ b/include/os/linux/Makefile.am
-@@ -5,6 +5,7 @@ kernel_linux_HEADERS = \
- 	%D%/kernel/linux/compiler_compat.h \
- 	%D%/kernel/linux/dcache_compat.h \
- 	%D%/kernel/linux/kmap_compat.h \
-+	%D%/kernel/linux/mm_compat.h \
- 	%D%/kernel/linux/mod_compat.h \
- 	%D%/kernel/linux/page_compat.h \
- 	%D%/kernel/linux/percpu_compat.h \
-diff --git a/include/os/linux/kernel/linux/mm_compat.h b/include/os/linux/kernel/linux/mm_compat.h
-new file mode 100644
-index 000000000..40056c68d
---- /dev/null
-+++ b/include/os/linux/kernel/linux/mm_compat.h
-@@ -0,0 +1,36 @@
-+/*
-+ * CDDL HEADER START
-+ *
-+ * The contents of this file are subject to the terms of the
-+ * Common Development and Distribution License (the "License").
-+ * You may not use this file except in compliance with the License.
-+ *
-+ * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
-+ * or https://opensource.org/licenses/CDDL-1.0.
-+ * See the License for the specific language governing permissions
-+ * and limitations under the License.
-+ *
-+ * When distributing Covered Code, include this CDDL HEADER in each
-+ * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
-+ * If applicable, add the following below this CDDL HEADER, with the
-+ * fields enclosed by brackets "[]" replaced with your own identifying
-+ * information: Portions Copyright [yyyy] [name of copyright owner]
-+ *
-+ * CDDL HEADER END
-+ */
-+
-+/*
-+ * Copyright (c) 2023, 2024, Klara Inc.
-+ */
-+
-+#ifndef _ZFS_MM_COMPAT_H
-+#define	_ZFS_MM_COMPAT_H
-+
-+#include <linux/mm.h>
-+
-+/* 5.4 introduced page_size(). Older kernels can use a trivial macro instead */
-+#ifndef HAVE_MM_PAGE_SIZE
-+#define	page_size(p) ((unsigned long)(PAGE_SIZE << compound_order(p)))
-+#endif
-+
-+#endif /* _ZFS_MM_COMPAT_H */
diff --git a/debian/patches/0015-abd-add-page-iterator.patch b/debian/patches/0015-abd-add-page-iterator.patch
deleted file mode 100644
index bb91ea32..00000000
--- a/debian/patches/0015-abd-add-page-iterator.patch
+++ /dev/null
@@ -1,334 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Mon, 11 Dec 2023 16:05:54 +1100
-Subject: [PATCH] abd: add page iterator
-
-The regular ABD iterators yield data buffers, so they have to map and
-unmap pages into kernel memory. If the caller only wants to count
-chunks, or can use page pointers directly, then the map/unmap is just
-unnecessary overhead.
-
-This adds adb_iterate_page_func, which yields unmapped struct page
-instead.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit 390b448726c580999dd337be7a40b0e95cf1d50b)
----
- include/sys/abd.h              |   7 +++
- include/sys/abd_impl.h         |  26 ++++++++-
- module/os/freebsd/zfs/abd_os.c |   4 +-
- module/os/linux/zfs/abd_os.c   | 104 ++++++++++++++++++++++++++++++---
- module/zfs/abd.c               |  42 +++++++++++++
- 5 files changed, 169 insertions(+), 14 deletions(-)
-
-diff --git a/include/sys/abd.h b/include/sys/abd.h
-index 750f9986c..8a2df0bca 100644
---- a/include/sys/abd.h
-+++ b/include/sys/abd.h
-@@ -79,6 +79,9 @@ typedef struct abd {
- 
- typedef int abd_iter_func_t(void *buf, size_t len, void *priv);
- typedef int abd_iter_func2_t(void *bufa, void *bufb, size_t len, void *priv);
-+#if defined(__linux__) && defined(_KERNEL)
-+typedef int abd_iter_page_func_t(struct page *, size_t, size_t, void *);
-+#endif
- 
- extern int zfs_abd_scatter_enabled;
- 
-@@ -125,6 +128,10 @@ void abd_release_ownership_of_buf(abd_t *);
- int abd_iterate_func(abd_t *, size_t, size_t, abd_iter_func_t *, void *);
- int abd_iterate_func2(abd_t *, abd_t *, size_t, size_t, size_t,
-     abd_iter_func2_t *, void *);
-+#if defined(__linux__) && defined(_KERNEL)
-+int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *,
-+    void *);
-+#endif
- void abd_copy_off(abd_t *, abd_t *, size_t, size_t, size_t);
- void abd_copy_from_buf_off(abd_t *, const void *, size_t, size_t);
- void abd_copy_to_buf_off(void *, abd_t *, size_t, size_t);
-diff --git a/include/sys/abd_impl.h b/include/sys/abd_impl.h
-index 40546d4af..f88ea25e2 100644
---- a/include/sys/abd_impl.h
-+++ b/include/sys/abd_impl.h
-@@ -21,6 +21,7 @@
- /*
-  * Copyright (c) 2014 by Chunwei Chen. All rights reserved.
-  * Copyright (c) 2016, 2019 by Delphix. All rights reserved.
-+ * Copyright (c) 2023, 2024, Klara Inc.
-  */
- 
- #ifndef _ABD_IMPL_H
-@@ -38,12 +39,30 @@ typedef enum abd_stats_op {
- 	ABDSTAT_DECR  /* Decrease abdstat values */
- } abd_stats_op_t;
- 
--struct scatterlist; /* forward declaration */
-+/* forward declarations */
-+struct scatterlist;
-+struct page;
- 
- struct abd_iter {
- 	/* public interface */
--	void		*iter_mapaddr;	/* addr corresponding to iter_pos */
--	size_t		iter_mapsize;	/* length of data valid at mapaddr */
-+	union {
-+		/* for abd_iter_map()/abd_iter_unmap() */
-+		struct {
-+			/* addr corresponding to iter_pos */
-+			void		*iter_mapaddr;
-+			/* length of data valid at mapaddr */
-+			size_t		iter_mapsize;
-+		};
-+		/* for abd_iter_page() */
-+		struct {
-+			/* current page */
-+			struct page	*iter_page;
-+			/* offset of data in page */
-+			size_t		iter_page_doff;
-+			/* size of data in page */
-+			size_t		iter_page_dsize;
-+		};
-+	};
- 
- 	/* private */
- 	abd_t		*iter_abd;	/* ABD being iterated through */
-@@ -78,6 +97,7 @@ boolean_t abd_iter_at_end(struct abd_iter *);
- void abd_iter_advance(struct abd_iter *, size_t);
- void abd_iter_map(struct abd_iter *);
- void abd_iter_unmap(struct abd_iter *);
-+void abd_iter_page(struct abd_iter *);
- 
- /*
-  * Helper macros
-diff --git a/module/os/freebsd/zfs/abd_os.c b/module/os/freebsd/zfs/abd_os.c
-index 58a37df62..3b812271f 100644
---- a/module/os/freebsd/zfs/abd_os.c
-+++ b/module/os/freebsd/zfs/abd_os.c
-@@ -417,10 +417,8 @@ abd_iter_init(struct abd_iter *aiter, abd_t *abd)
- {
- 	ASSERT(!abd_is_gang(abd));
- 	abd_verify(abd);
-+	memset(aiter, 0, sizeof (struct abd_iter));
- 	aiter->iter_abd = abd;
--	aiter->iter_pos = 0;
--	aiter->iter_mapaddr = NULL;
--	aiter->iter_mapsize = 0;
- }
- 
- /*
-diff --git a/module/os/linux/zfs/abd_os.c b/module/os/linux/zfs/abd_os.c
-index 24390fbbf..dae128012 100644
---- a/module/os/linux/zfs/abd_os.c
-+++ b/module/os/linux/zfs/abd_os.c
-@@ -21,6 +21,7 @@
- /*
-  * Copyright (c) 2014 by Chunwei Chen. All rights reserved.
-  * Copyright (c) 2019 by Delphix. All rights reserved.
-+ * Copyright (c) 2023, 2024, Klara Inc.
-  */
- 
- /*
-@@ -59,6 +60,7 @@
- #include <sys/zfs_znode.h>
- #ifdef _KERNEL
- #include <linux/kmap_compat.h>
-+#include <linux/mm_compat.h>
- #include <linux/scatterlist.h>
- #endif
- 
-@@ -895,14 +897,9 @@ abd_iter_init(struct abd_iter *aiter, abd_t *abd)
- {
- 	ASSERT(!abd_is_gang(abd));
- 	abd_verify(abd);
-+	memset(aiter, 0, sizeof (struct abd_iter));
- 	aiter->iter_abd = abd;
--	aiter->iter_mapaddr = NULL;
--	aiter->iter_mapsize = 0;
--	aiter->iter_pos = 0;
--	if (abd_is_linear(abd)) {
--		aiter->iter_offset = 0;
--		aiter->iter_sg = NULL;
--	} else {
-+	if (!abd_is_linear(abd)) {
- 		aiter->iter_offset = ABD_SCATTER(abd).abd_offset;
- 		aiter->iter_sg = ABD_SCATTER(abd).abd_sgl;
- 	}
-@@ -915,6 +912,7 @@ abd_iter_init(struct abd_iter *aiter, abd_t *abd)
- boolean_t
- abd_iter_at_end(struct abd_iter *aiter)
- {
-+	ASSERT3U(aiter->iter_pos, <=, aiter->iter_abd->abd_size);
- 	return (aiter->iter_pos == aiter->iter_abd->abd_size);
- }
- 
-@@ -926,8 +924,15 @@ abd_iter_at_end(struct abd_iter *aiter)
- void
- abd_iter_advance(struct abd_iter *aiter, size_t amount)
- {
-+	/*
-+	 * Ensure that last chunk is not in use. abd_iterate_*() must clear
-+	 * this state (directly or abd_iter_unmap()) before advancing.
-+	 */
- 	ASSERT3P(aiter->iter_mapaddr, ==, NULL);
- 	ASSERT0(aiter->iter_mapsize);
-+	ASSERT3P(aiter->iter_page, ==, NULL);
-+	ASSERT0(aiter->iter_page_doff);
-+	ASSERT0(aiter->iter_page_dsize);
- 
- 	/* There's nothing left to advance to, so do nothing */
- 	if (abd_iter_at_end(aiter))
-@@ -1009,6 +1014,88 @@ abd_cache_reap_now(void)
- }
- 
- #if defined(_KERNEL)
-+/*
-+ * Yield the next page struct and data offset and size within it, without
-+ * mapping it into the address space.
-+ */
-+void
-+abd_iter_page(struct abd_iter *aiter)
-+{
-+	if (abd_iter_at_end(aiter)) {
-+		aiter->iter_page = NULL;
-+		aiter->iter_page_doff = 0;
-+		aiter->iter_page_dsize = 0;
-+		return;
-+	}
-+
-+	struct page *page;
-+	size_t doff, dsize;
-+
-+	if (abd_is_linear(aiter->iter_abd)) {
-+		ASSERT3U(aiter->iter_pos, ==, aiter->iter_offset);
-+
-+		/* memory address at iter_pos */
-+		void *paddr = ABD_LINEAR_BUF(aiter->iter_abd) + aiter->iter_pos;
-+
-+		/* struct page for address */
-+		page = is_vmalloc_addr(paddr) ?
-+		    vmalloc_to_page(paddr) : virt_to_page(paddr);
-+
-+		/* offset of address within the page */
-+		doff = offset_in_page(paddr);
-+
-+		/* total data remaining in abd from this position */
-+		dsize = aiter->iter_abd->abd_size - aiter->iter_offset;
-+	} else {
-+		ASSERT(!abd_is_gang(aiter->iter_abd));
-+
-+		/* current scatter page */
-+		page = sg_page(aiter->iter_sg);
-+
-+		/* position within page */
-+		doff = aiter->iter_offset;
-+
-+		/* remaining data in scatterlist */
-+		dsize = MIN(aiter->iter_sg->length - aiter->iter_offset,
-+		    aiter->iter_abd->abd_size - aiter->iter_pos);
-+	}
-+	ASSERT(page);
-+
-+	if (PageTail(page)) {
-+		/*
-+		 * This page is part of a "compound page", which is a group of
-+		 * pages that can be referenced from a single struct page *.
-+		 * Its organised as a "head" page, followed by a series of
-+		 * "tail" pages.
-+		 *
-+		 * In OpenZFS, compound pages are allocated using the
-+		 * __GFP_COMP flag, which we get from scatter ABDs and SPL
-+		 * vmalloc slabs (ie >16K allocations). So a great many of the
-+		 * IO buffers we get are going to be of this type.
-+		 *
-+		 * The tail pages are just regular PAGE_SIZE pages, and can be
-+		 * safely used as-is. However, the head page has length
-+		 * covering itself and all the tail pages. If this ABD chunk
-+		 * spans multiple pages, then we can use the head page and a
-+		 * >PAGE_SIZE length, which is far more efficient.
-+		 *
-+		 * To do this, we need to adjust the offset to be counted from
-+		 * the head page. struct page for compound pages are stored
-+		 * contiguously, so we can just adjust by a simple offset.
-+		 */
-+		struct page *head = compound_head(page);
-+		doff += ((page - head) * PAGESIZE);
-+		page = head;
-+	}
-+
-+	/* final page and position within it */
-+	aiter->iter_page = page;
-+	aiter->iter_page_doff = doff;
-+
-+	/* amount of data in the chunk, up to the end of the page */
-+	aiter->iter_page_dsize = MIN(dsize, page_size(page) - doff);
-+}
-+
- /*
-  * bio_nr_pages for ABD.
-  * @off is the offset in @abd
-@@ -1163,4 +1250,5 @@ MODULE_PARM_DESC(zfs_abd_scatter_min_size,
- module_param(zfs_abd_scatter_max_order, uint, 0644);
- MODULE_PARM_DESC(zfs_abd_scatter_max_order,
- 	"Maximum order allocation used for a scatter ABD.");
--#endif
-+
-+#endif /* _KERNEL */
-diff --git a/module/zfs/abd.c b/module/zfs/abd.c
-index d982f201c..3388e2357 100644
---- a/module/zfs/abd.c
-+++ b/module/zfs/abd.c
-@@ -826,6 +826,48 @@ abd_iterate_func(abd_t *abd, size_t off, size_t size,
- 	return (ret);
- }
- 
-+#if defined(__linux__) && defined(_KERNEL)
-+int
-+abd_iterate_page_func(abd_t *abd, size_t off, size_t size,
-+    abd_iter_page_func_t *func, void *private)
-+{
-+	struct abd_iter aiter;
-+	int ret = 0;
-+
-+	if (size == 0)
-+		return (0);
-+
-+	abd_verify(abd);
-+	ASSERT3U(off + size, <=, abd->abd_size);
-+
-+	abd_t *c_abd = abd_init_abd_iter(abd, &aiter, off);
-+
-+	while (size > 0) {
-+		IMPLY(abd_is_gang(abd), c_abd != NULL);
-+
-+		abd_iter_page(&aiter);
-+
-+		size_t len = MIN(aiter.iter_page_dsize, size);
-+		ASSERT3U(len, >, 0);
-+
-+		ret = func(aiter.iter_page, aiter.iter_page_doff,
-+		    len, private);
-+
-+		aiter.iter_page = NULL;
-+		aiter.iter_page_doff = 0;
-+		aiter.iter_page_dsize = 0;
-+
-+		if (ret != 0)
-+			break;
-+
-+		size -= len;
-+		c_abd = abd_advance_abd_iter(abd, c_abd, &aiter, len);
-+	}
-+
-+	return (ret);
-+}
-+#endif
-+
- struct buf_arg {
- 	void *arg_buf;
- };
diff --git a/debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch b/debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
deleted file mode 100644
index ebabb1c8..00000000
--- a/debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
+++ /dev/null
@@ -1,349 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Tue, 9 Jan 2024 12:12:56 +1100
-Subject: [PATCH] vdev_disk: rename existing functions to vdev_classic_*
-
-This is just renaming the existing functions we're about to replace and
-grouping them together to make the next commits easier to follow.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit f3b85d706bae82957d2e3e0ef1d53a1cfab60eb4)
----
- include/sys/abd.h               |   2 +
- module/os/linux/zfs/abd_os.c    |   5 +
- module/os/linux/zfs/vdev_disk.c | 215 +++++++++++++++++---------------
- 3 files changed, 120 insertions(+), 102 deletions(-)
-
-diff --git a/include/sys/abd.h b/include/sys/abd.h
-index 8a2df0bca..bee38b831 100644
---- a/include/sys/abd.h
-+++ b/include/sys/abd.h
-@@ -220,6 +220,8 @@ void abd_fini(void);
- 
- /*
-  * Linux ABD bio functions
-+ * Note: these are only needed to support vdev_classic. See comment in
-+ * vdev_disk.c.
-  */
- #if defined(__linux__) && defined(_KERNEL)
- unsigned int abd_bio_map_off(struct bio *, abd_t *, unsigned int, size_t);
-diff --git a/module/os/linux/zfs/abd_os.c b/module/os/linux/zfs/abd_os.c
-index dae128012..3fe01c0b7 100644
---- a/module/os/linux/zfs/abd_os.c
-+++ b/module/os/linux/zfs/abd_os.c
-@@ -1096,6 +1096,11 @@ abd_iter_page(struct abd_iter *aiter)
- 	aiter->iter_page_dsize = MIN(dsize, page_size(page) - doff);
- }
- 
-+/*
-+ * Note: ABD BIO functions only needed to support vdev_classic. See comments in
-+ * vdev_disk.c.
-+ */
-+
- /*
-  * bio_nr_pages for ABD.
-  * @off is the offset in @abd
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index b0bda5fa2..957619b87 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -83,17 +83,6 @@ static uint_t zfs_vdev_open_timeout_ms = 1000;
-  */
- #define	EFI_MIN_RESV_SIZE	(16 * 1024)
- 
--/*
-- * Virtual device vector for disks.
-- */
--typedef struct dio_request {
--	zio_t			*dr_zio;	/* Parent ZIO */
--	atomic_t		dr_ref;		/* References */
--	int			dr_error;	/* Bio error */
--	int			dr_bio_count;	/* Count of bio's */
--	struct bio		*dr_bio[];	/* Attached bio's */
--} dio_request_t;
--
- /*
-  * BIO request failfast mask.
-  */
-@@ -467,85 +456,6 @@ vdev_disk_close(vdev_t *v)
- 	v->vdev_tsd = NULL;
- }
- 
--static dio_request_t *
--vdev_disk_dio_alloc(int bio_count)
--{
--	dio_request_t *dr = kmem_zalloc(sizeof (dio_request_t) +
--	    sizeof (struct bio *) * bio_count, KM_SLEEP);
--	atomic_set(&dr->dr_ref, 0);
--	dr->dr_bio_count = bio_count;
--	dr->dr_error = 0;
--
--	for (int i = 0; i < dr->dr_bio_count; i++)
--		dr->dr_bio[i] = NULL;
--
--	return (dr);
--}
--
--static void
--vdev_disk_dio_free(dio_request_t *dr)
--{
--	int i;
--
--	for (i = 0; i < dr->dr_bio_count; i++)
--		if (dr->dr_bio[i])
--			bio_put(dr->dr_bio[i]);
--
--	kmem_free(dr, sizeof (dio_request_t) +
--	    sizeof (struct bio *) * dr->dr_bio_count);
--}
--
--static void
--vdev_disk_dio_get(dio_request_t *dr)
--{
--	atomic_inc(&dr->dr_ref);
--}
--
--static void
--vdev_disk_dio_put(dio_request_t *dr)
--{
--	int rc = atomic_dec_return(&dr->dr_ref);
--
--	/*
--	 * Free the dio_request when the last reference is dropped and
--	 * ensure zio_interpret is called only once with the correct zio
--	 */
--	if (rc == 0) {
--		zio_t *zio = dr->dr_zio;
--		int error = dr->dr_error;
--
--		vdev_disk_dio_free(dr);
--
--		if (zio) {
--			zio->io_error = error;
--			ASSERT3S(zio->io_error, >=, 0);
--			if (zio->io_error)
--				vdev_disk_error(zio);
--
--			zio_delay_interrupt(zio);
--		}
--	}
--}
--
--BIO_END_IO_PROTO(vdev_disk_physio_completion, bio, error)
--{
--	dio_request_t *dr = bio->bi_private;
--
--	if (dr->dr_error == 0) {
--#ifdef HAVE_1ARG_BIO_END_IO_T
--		dr->dr_error = BIO_END_IO_ERROR(bio);
--#else
--		if (error)
--			dr->dr_error = -(error);
--		else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
--			dr->dr_error = EIO;
--#endif
--	}
--
--	/* Drop reference acquired by __vdev_disk_physio */
--	vdev_disk_dio_put(dr);
--}
--
- static inline void
- vdev_submit_bio_impl(struct bio *bio)
- {
-@@ -697,8 +607,107 @@ vdev_bio_alloc(struct block_device *bdev, gfp_t gfp_mask,
- 	return (bio);
- }
- 
-+/* ========== */
-+
-+/*
-+ * This is the classic, battle-tested BIO submission code.
-+ *
-+ * These functions have been renamed to vdev_classic_* to make it clear what
-+ * they belong to, but their implementations are unchanged.
-+ */
-+
-+/*
-+ * Virtual device vector for disks.
-+ */
-+typedef struct dio_request {
-+	zio_t			*dr_zio;	/* Parent ZIO */
-+	atomic_t		dr_ref;		/* References */
-+	int			dr_error;	/* Bio error */
-+	int			dr_bio_count;	/* Count of bio's */
-+	struct bio		*dr_bio[];	/* Attached bio's */
-+} dio_request_t;
-+
-+static dio_request_t *
-+vdev_classic_dio_alloc(int bio_count)
-+{
-+	dio_request_t *dr = kmem_zalloc(sizeof (dio_request_t) +
-+	    sizeof (struct bio *) * bio_count, KM_SLEEP);
-+	atomic_set(&dr->dr_ref, 0);
-+	dr->dr_bio_count = bio_count;
-+	dr->dr_error = 0;
-+
-+	for (int i = 0; i < dr->dr_bio_count; i++)
-+		dr->dr_bio[i] = NULL;
-+
-+	return (dr);
-+}
-+
-+static void
-+vdev_classic_dio_free(dio_request_t *dr)
-+{
-+	int i;
-+
-+	for (i = 0; i < dr->dr_bio_count; i++)
-+		if (dr->dr_bio[i])
-+			bio_put(dr->dr_bio[i]);
-+
-+	kmem_free(dr, sizeof (dio_request_t) +
-+	    sizeof (struct bio *) * dr->dr_bio_count);
-+}
-+
-+static void
-+vdev_classic_dio_get(dio_request_t *dr)
-+{
-+	atomic_inc(&dr->dr_ref);
-+}
-+
-+static void
-+vdev_classic_dio_put(dio_request_t *dr)
-+{
-+	int rc = atomic_dec_return(&dr->dr_ref);
-+
-+	/*
-+	 * Free the dio_request when the last reference is dropped and
-+	 * ensure zio_interpret is called only once with the correct zio
-+	 */
-+	if (rc == 0) {
-+		zio_t *zio = dr->dr_zio;
-+		int error = dr->dr_error;
-+
-+		vdev_classic_dio_free(dr);
-+
-+		if (zio) {
-+			zio->io_error = error;
-+			ASSERT3S(zio->io_error, >=, 0);
-+			if (zio->io_error)
-+				vdev_disk_error(zio);
-+
-+			zio_delay_interrupt(zio);
-+		}
-+	}
-+}
-+
-+BIO_END_IO_PROTO(vdev_classic_physio_completion, bio, error)
-+{
-+	dio_request_t *dr = bio->bi_private;
-+
-+	if (dr->dr_error == 0) {
-+#ifdef HAVE_1ARG_BIO_END_IO_T
-+		dr->dr_error = BIO_END_IO_ERROR(bio);
-+#else
-+		if (error)
-+			dr->dr_error = -(error);
-+		else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
-+			dr->dr_error = EIO;
-+#endif
-+	}
-+
-+	/* Drop reference acquired by vdev_classic_physio */
-+	vdev_classic_dio_put(dr);
-+}
-+
- static inline unsigned int
--vdev_bio_max_segs(zio_t *zio, int bio_size, uint64_t abd_offset)
-+vdev_classic_bio_max_segs(zio_t *zio, int bio_size, uint64_t abd_offset)
- {
- 	unsigned long nr_segs = abd_nr_pages_off(zio->io_abd,
- 	    bio_size, abd_offset);
-@@ -711,7 +720,7 @@ vdev_bio_max_segs(zio_t *zio, int bio_size, uint64_t abd_offset)
- }
- 
- static int
--__vdev_disk_physio(struct block_device *bdev, zio_t *zio,
-+vdev_classic_physio(struct block_device *bdev, zio_t *zio,
-     size_t io_size, uint64_t io_offset, int rw, int flags)
- {
- 	dio_request_t *dr;
-@@ -736,7 +745,7 @@ __vdev_disk_physio(struct block_device *bdev, zio_t *zio,
- 	}
- 
- retry:
--	dr = vdev_disk_dio_alloc(bio_count);
-+	dr = vdev_classic_dio_alloc(bio_count);
- 
- 	if (!(zio->io_flags & (ZIO_FLAG_IO_RETRY | ZIO_FLAG_TRYHARD)) &&
- 	    zio->io_vd->vdev_failfast == B_TRUE) {
-@@ -771,23 +780,23 @@ retry:
- 		 * this should be rare - see the comment above.
- 		 */
- 		if (dr->dr_bio_count == i) {
--			vdev_disk_dio_free(dr);
-+			vdev_classic_dio_free(dr);
- 			bio_count *= 2;
- 			goto retry;
- 		}
- 
--		nr_vecs = vdev_bio_max_segs(zio, bio_size, abd_offset);
-+		nr_vecs = vdev_classic_bio_max_segs(zio, bio_size, abd_offset);
- 		dr->dr_bio[i] = vdev_bio_alloc(bdev, GFP_NOIO, nr_vecs);
- 		if (unlikely(dr->dr_bio[i] == NULL)) {
--			vdev_disk_dio_free(dr);
-+			vdev_classic_dio_free(dr);
- 			return (SET_ERROR(ENOMEM));
- 		}
- 
--		/* Matching put called by vdev_disk_physio_completion */
--		vdev_disk_dio_get(dr);
-+		/* Matching put called by vdev_classic_physio_completion */
-+		vdev_classic_dio_get(dr);
- 
- 		BIO_BI_SECTOR(dr->dr_bio[i]) = bio_offset >> 9;
--		dr->dr_bio[i]->bi_end_io = vdev_disk_physio_completion;
-+		dr->dr_bio[i]->bi_end_io = vdev_classic_physio_completion;
- 		dr->dr_bio[i]->bi_private = dr;
- 		bio_set_op_attrs(dr->dr_bio[i], rw, flags);
- 
-@@ -801,7 +810,7 @@ retry:
- 	}
- 
- 	/* Extra reference to protect dio_request during vdev_submit_bio */
--	vdev_disk_dio_get(dr);
-+	vdev_classic_dio_get(dr);
- 
- 	if (dr->dr_bio_count > 1)
- 		blk_start_plug(&plug);
-@@ -815,11 +824,13 @@ retry:
- 	if (dr->dr_bio_count > 1)
- 		blk_finish_plug(&plug);
- 
--	vdev_disk_dio_put(dr);
-+	vdev_classic_dio_put(dr);
- 
- 	return (error);
- }
- 
-+/* ========== */
-+
- BIO_END_IO_PROTO(vdev_disk_io_flush_completion, bio, error)
- {
- 	zio_t *zio = bio->bi_private;
-@@ -1023,7 +1034,7 @@ vdev_disk_io_start(zio_t *zio)
- 	}
- 
- 	zio->io_target_timestamp = zio_handle_io_delay(zio);
--	error = __vdev_disk_physio(BDH_BDEV(vd->vd_bdh), zio,
-+	error = vdev_classic_physio(BDH_BDEV(vd->vd_bdh), zio,
- 	    zio->io_size, zio->io_offset, rw, 0);
- 	rw_exit(&vd->vd_lock);
- 
diff --git a/debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch b/debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch
deleted file mode 100644
index 23a946fc..00000000
--- a/debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch
+++ /dev/null
@@ -1,111 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Tue, 9 Jan 2024 12:23:30 +1100
-Subject: [PATCH] vdev_disk: reorganise vdev_disk_io_start
-
-Light reshuffle to make it a bit more linear to read and get rid of a
-bunch of args that aren't needed in all cases.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit 867178ae1db28e73051c8a7ce662f2f2f81cd8e6)
----
- module/os/linux/zfs/vdev_disk.c | 51 ++++++++++++++++++++-------------
- 1 file changed, 31 insertions(+), 20 deletions(-)
-
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index 957619b87..51e7cef2f 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -720,9 +720,16 @@ vdev_classic_bio_max_segs(zio_t *zio, int bio_size, uint64_t abd_offset)
- }
- 
- static int
--vdev_classic_physio(struct block_device *bdev, zio_t *zio,
--    size_t io_size, uint64_t io_offset, int rw, int flags)
-+vdev_classic_physio(zio_t *zio)
- {
-+	vdev_t *v = zio->io_vd;
-+	vdev_disk_t *vd = v->vdev_tsd;
-+	struct block_device *bdev = BDH_BDEV(vd->vd_bdh);
-+	size_t io_size = zio->io_size;
-+	uint64_t io_offset = zio->io_offset;
-+	int rw = zio->io_type == ZIO_TYPE_READ ? READ : WRITE;
-+	int flags = 0;
-+
- 	dio_request_t *dr;
- 	uint64_t abd_offset;
- 	uint64_t bio_offset;
-@@ -944,7 +951,7 @@ vdev_disk_io_start(zio_t *zio)
- {
- 	vdev_t *v = zio->io_vd;
- 	vdev_disk_t *vd = v->vdev_tsd;
--	int rw, error;
-+	int error;
- 
- 	/*
- 	 * If the vdev is closed, it's likely in the REMOVED or FAULTED state.
-@@ -1007,13 +1014,6 @@ vdev_disk_io_start(zio_t *zio)
- 		rw_exit(&vd->vd_lock);
- 		zio_execute(zio);
- 		return;
--	case ZIO_TYPE_WRITE:
--		rw = WRITE;
--		break;
--
--	case ZIO_TYPE_READ:
--		rw = READ;
--		break;
- 
- 	case ZIO_TYPE_TRIM:
- 		zio->io_error = vdev_disk_io_trim(zio);
-@@ -1026,23 +1026,34 @@ vdev_disk_io_start(zio_t *zio)
- #endif
- 		return;
- 
--	default:
-+	case ZIO_TYPE_READ:
-+	case ZIO_TYPE_WRITE:
-+		zio->io_target_timestamp = zio_handle_io_delay(zio);
-+		error = vdev_classic_physio(zio);
- 		rw_exit(&vd->vd_lock);
--		zio->io_error = SET_ERROR(ENOTSUP);
--		zio_interrupt(zio);
-+		if (error) {
-+			zio->io_error = error;
-+			zio_interrupt(zio);
-+		}
- 		return;
--	}
- 
--	zio->io_target_timestamp = zio_handle_io_delay(zio);
--	error = vdev_classic_physio(BDH_BDEV(vd->vd_bdh), zio,
--	    zio->io_size, zio->io_offset, rw, 0);
--	rw_exit(&vd->vd_lock);
-+	default:
-+		/*
-+		 * Getting here means our parent vdev has made a very strange
-+		 * request of us, and shouldn't happen. Assert here to force a
-+		 * crash in dev builds, but in production return the IO
-+		 * unhandled. The pool will likely suspend anyway but that's
-+		 * nicer than crashing the kernel.
-+		 */
-+		ASSERT3S(zio->io_type, ==, -1);
- 
--	if (error) {
--		zio->io_error = error;
-+		rw_exit(&vd->vd_lock);
-+		zio->io_error = SET_ERROR(ENOTSUP);
- 		zio_interrupt(zio);
- 		return;
- 	}
-+
-+	__builtin_unreachable();
- }
- 
- static void
diff --git a/debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch b/debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch
deleted file mode 100644
index a169979c..00000000
--- a/debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch
+++ /dev/null
@@ -1,69 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Tue, 9 Jan 2024 12:29:19 +1100
-Subject: [PATCH] vdev_disk: make read/write IO function configurable
-
-This is just setting up for the next couple of commits, which will add a
-new IO function and a parameter to select it.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit c4a13ba483f08a81aa47479d2f763a470d95b2b0)
----
- module/os/linux/zfs/vdev_disk.c | 23 +++++++++++++++++++++--
- 1 file changed, 21 insertions(+), 2 deletions(-)
-
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index 51e7cef2f..de4dba72f 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -946,6 +946,8 @@ vdev_disk_io_trim(zio_t *zio)
- #endif
- }
- 
-+int (*vdev_disk_io_rw_fn)(zio_t *zio) = NULL;
-+
- static void
- vdev_disk_io_start(zio_t *zio)
- {
-@@ -1029,7 +1031,7 @@ vdev_disk_io_start(zio_t *zio)
- 	case ZIO_TYPE_READ:
- 	case ZIO_TYPE_WRITE:
- 		zio->io_target_timestamp = zio_handle_io_delay(zio);
--		error = vdev_classic_physio(zio);
-+		error = vdev_disk_io_rw_fn(zio);
- 		rw_exit(&vd->vd_lock);
- 		if (error) {
- 			zio->io_error = error;
-@@ -1102,8 +1104,25 @@ vdev_disk_rele(vdev_t *vd)
- 	/* XXX: Implement me as a vnode rele for the device */
- }
- 
-+/*
-+ * At first use vdev use, set the submission function from the default value if
-+ * it hasn't been set already.
-+ */
-+static int
-+vdev_disk_init(spa_t *spa, nvlist_t *nv, void **tsd)
-+{
-+	(void) spa;
-+	(void) nv;
-+	(void) tsd;
-+
-+	if (vdev_disk_io_rw_fn == NULL)
-+		vdev_disk_io_rw_fn = vdev_classic_physio;
-+
-+	return (0);
-+}
-+
- vdev_ops_t vdev_disk_ops = {
--	.vdev_op_init = NULL,
-+	.vdev_op_init = vdev_disk_init,
- 	.vdev_op_fini = NULL,
- 	.vdev_op_open = vdev_disk_open,
- 	.vdev_op_close = vdev_disk_close,
diff --git a/debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch b/debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
deleted file mode 100644
index 8ccbf655..00000000
--- a/debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
+++ /dev/null
@@ -1,671 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Tue, 18 Jul 2023 11:11:29 +1000
-Subject: [PATCH] vdev_disk: rewrite BIO filling machinery to avoid split pages
-
-This commit tackles a number of issues in the way BIOs (`struct bio`)
-are constructed for submission to the Linux block layer.
-
-The kernel has a hard upper limit on the number of pages/segments that
-can be added to a BIO, as well as a separate limit for each device
-(related to its queue depth and other scheduling characteristics).
-
-ZFS counts the number of memory pages in the request ABD
-(`abd_nr_pages_off()`, and then uses that as the number of segments to
-put into the BIO, up to the hard upper limit. If it requires more than
-the limit, it will create multiple BIOs.
-
-Leaving aside the fact that page count method is wrong (see below), not
-limiting to the device segment max means that the device driver will
-need to split the BIO in half. This is alone is not necessarily a
-problem, but it interacts with another issue to cause a much larger
-problem.
-
-The kernel function to add a segment to a BIO (`bio_add_page()`) takes a
-`struct page` pointer, and offset+len within it. `struct page` can
-represent a run of contiguous memory pages (known as a "compound page").
-In can be of arbitrary length.
-
-The ZFS functions that count ABD pages and load them into the BIO
-(`abd_nr_pages_off()`, `bio_map()` and `abd_bio_map_off()`) will never
-consider a page to be more than `PAGE_SIZE` (4K), even if the `struct
-page` is for multiple pages. In this case, it will load the same `struct
-page` into the BIO multiple times, with the offset adjusted each time.
-
-With a sufficiently large ABD, this can easily lead to the BIO being
-entirely filled much earlier than it could have been. This is also
-further contributes to the problem caused by the incorrect segment limit
-calculation, as its much easier to go past the device limit, and so
-require a split.
-
-Again, this is not a problem on its own.
-
-The logic for "never submit more than `PAGE_SIZE`" is actually a little
-more subtle. It will actually never submit a buffer that crosses a 4K
-page boundary.
-
-In practice, this is fine, as most ABDs are scattered, that is a list of
-complete 4K pages, and so are loaded in as such.
-
-Linear ABDs are typically allocated from slabs, and for small sizes they
-are frequently not aligned to page boundaries. For example, a 12K
-allocation can span four pages, eg:
-
-     -- 4K -- -- 4K -- -- 4K -- -- 4K --
-    |        |        |        |        |
-          :## ######## ######## ######:    [1K, 4K, 4K, 3K]
-
-Such an allocation would be loaded into a BIO as you see:
-
-    [1K, 4K, 4K, 3K]
-
-This tends not to be a problem in practice, because even if the BIO were
-filled and needed to be split, each half would still have either a start
-or end aligned to the logical block size of the device (assuming 4K at
-least).
-
----
-
-In ideal circumstances, these shortcomings don't cause any particular
-problems. Its when they start to interact with other ZFS features that
-things get interesting.
-
-Aggregation will create a "gang" ABD, which is simply a list of other
-ABDs. Iterating over a gang ABD is just iterating over each ABD within
-it in turn.
-
-Because the segments are simply loaded in order, we can end up with
-uneven segments either side of the "gap" between the two ABDs. For
-example, two 12K ABDs might be aggregated and then loaded as:
-
-    [1K, 4K, 4K, 3K, 2K, 4K, 4K, 2K]
-
-Should a split occur, each individual BIO can end up either having an
-start or end offset that is not aligned to the logical block size, which
-some drivers (eg SCSI) will reject. However, this tends not to happen
-because the default aggregation limit usually keeps the BIO small enough
-to not require more than one split, and most pages are actually full 4K
-pages, so hitting an uneven gap is very rare anyway.
-
-If the pool is under particular memory pressure, then an IO can be
-broken down into a "gang block", a 512-byte block composed of a header
-and up to three block pointers. Each points to a fragment of the
-original write, or in turn, another gang block, breaking the original
-data up over and over until space can be found in the pool for each of
-them.
-
-Each gang header is a separate 512-byte memory allocation from a slab,
-that needs to be written down to disk. When the gang header is added to
-the BIO, its a single 512-byte segment.
-
-Pulling all this together, consider a large aggregated write of gang
-blocks. This results a BIO containing lots of 512-byte segments. Given
-our tendency to overfill the BIO, a split is likely, and most possible
-split points will yield a pair of BIOs that are misaligned. Drivers that
-care, like the SCSI driver, will reject them.
-
----
-
-This commit is a substantial refactor and rewrite of much of `vdev_disk`
-to sort all this out.
-
-`vdev_bio_max_segs()` now returns the ideal maximum size for the device,
-if available. There's also a tuneable `zfs_vdev_disk_max_segs` to
-override this, to assist with testing.
-
-We scan the ABD up front to count the number of pages within it, and to
-confirm that if we submitted all those pages to one or more BIOs, it
-could be split at any point with creating a misaligned BIO.  If the
-pages in the BIO are not usable (as in any of the above situations), the
-ABD is linearised, and then checked again. This is the same technique
-used in `vdev_geom` on FreeBSD, adjusted for Linux's variable page size
-and allocator quirks.
-
-`vbio_t` is a cleanup and enhancement of the old `dio_request_t`. The
-idea is simply that it can hold all the state needed to create, submit
-and return multiple BIOs, including all the refcounts, the ABD copy if
-it was needed, and so on. Apart from what I hope is a clearer interface,
-the major difference is that because we know how many BIOs we'll need up
-front, we don't need the old overflow logic that would grow the BIO
-array, throw away all the old work and restart. We can get it right from
-the start.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit 06a196020e6f70d2fedbd4d0d05bbe0c1ac6e4d8)
----
- include/os/linux/kernel/linux/mod_compat.h |   1 +
- man/man4/zfs.4                             |  10 +-
- module/os/linux/zfs/vdev_disk.c            | 439 ++++++++++++++++++++-
- 3 files changed, 447 insertions(+), 3 deletions(-)
-
-diff --git a/include/os/linux/kernel/linux/mod_compat.h b/include/os/linux/kernel/linux/mod_compat.h
-index 8e20a9613..039865b70 100644
---- a/include/os/linux/kernel/linux/mod_compat.h
-+++ b/include/os/linux/kernel/linux/mod_compat.h
-@@ -68,6 +68,7 @@ enum scope_prefix_types {
- 	zfs_trim,
- 	zfs_txg,
- 	zfs_vdev,
-+	zfs_vdev_disk,
- 	zfs_vdev_file,
- 	zfs_vdev_mirror,
- 	zfs_vnops,
-diff --git a/man/man4/zfs.4 b/man/man4/zfs.4
-index 352990e02..b5679f2f0 100644
---- a/man/man4/zfs.4
-+++ b/man/man4/zfs.4
-@@ -2,6 +2,7 @@
- .\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
- .\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
- .\" Copyright (c) 2019 Datto Inc.
-+.\" Copyright (c) 2023, 2024 Klara, Inc.
- .\" The contents of this file are subject to the terms of the Common Development
- .\" and Distribution License (the "License").  You may not use this file except
- .\" in compliance with the License. You can obtain a copy of the license at
-@@ -15,7 +16,7 @@
- .\" own identifying information:
- .\" Portions Copyright [yyyy] [name of copyright owner]
- .\"
--.Dd July 21, 2023
-+.Dd January 9, 2024
- .Dt ZFS 4
- .Os
- .
-@@ -1345,6 +1346,13 @@ _
- 	4	Driver	No driver retries on driver errors.
- .TE
- .
-+.It Sy zfs_vdev_disk_max_segs Ns = Ns Sy 0 Pq uint
-+Maximum number of segments to add to a BIO (min 4).
-+If this is higher than the maximum allowed by the device queue or the kernel
-+itself, it will be clamped.
-+Setting it to zero will cause the kernel's ideal size to be used.
-+This parameter only applies on Linux.
-+.
- .It Sy zfs_expire_snapshot Ns = Ns Sy 300 Ns s Pq int
- Time before expiring
- .Pa .zfs/snapshot .
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index de4dba72f..0ccb9ad96 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -24,6 +24,7 @@
-  * Rewritten for Linux by Brian Behlendorf <behlendorf1@llnl.gov>.
-  * LLNL-CODE-403049.
-  * Copyright (c) 2012, 2019 by Delphix. All rights reserved.
-+ * Copyright (c) 2023, 2024, Klara Inc.
-  */
- 
- #include <sys/zfs_context.h>
-@@ -66,6 +67,13 @@ typedef struct vdev_disk {
- 	krwlock_t			vd_lock;
- } vdev_disk_t;
- 
-+/*
-+ * Maximum number of segments to add to a bio (min 4). If this is higher than
-+ * the maximum allowed by the device queue or the kernel itself, it will be
-+ * clamped. Setting it to zero will cause the kernel's ideal size to be used.
-+ */
-+uint_t zfs_vdev_disk_max_segs = 0;
-+
- /*
-  * Unique identifier for the exclusive vdev holder.
-  */
-@@ -607,10 +615,433 @@ vdev_bio_alloc(struct block_device *bdev, gfp_t gfp_mask,
- 	return (bio);
- }
- 
-+static inline uint_t
-+vdev_bio_max_segs(struct block_device *bdev)
-+{
-+	/*
-+	 * Smallest of the device max segs and the tuneable max segs. Minimum
-+	 * 4, so there's room to finish split pages if they come up.
-+	 */
-+	const uint_t dev_max_segs = queue_max_segments(bdev_get_queue(bdev));
-+	const uint_t tune_max_segs = (zfs_vdev_disk_max_segs > 0) ?
-+	    MAX(4, zfs_vdev_disk_max_segs) : dev_max_segs;
-+	const uint_t max_segs = MIN(tune_max_segs, dev_max_segs);
-+
-+#ifdef HAVE_BIO_MAX_SEGS
-+	return (bio_max_segs(max_segs));
-+#else
-+	return (MIN(max_segs, BIO_MAX_PAGES));
-+#endif
-+}
-+
-+static inline uint_t
-+vdev_bio_max_bytes(struct block_device *bdev)
-+{
-+	return (queue_max_sectors(bdev_get_queue(bdev)) << 9);
-+}
-+
-+
-+/*
-+ * Virtual block IO object (VBIO)
-+ *
-+ * Linux block IO (BIO) objects have a limit on how many data segments (pages)
-+ * they can hold. Depending on how they're allocated and structured, a large
-+ * ZIO can require more than one BIO to be submitted to the kernel, which then
-+ * all have to complete before we can return the completed ZIO back to ZFS.
-+ *
-+ * A VBIO is a wrapper around multiple BIOs, carrying everything needed to
-+ * translate a ZIO down into the kernel block layer and back again.
-+ *
-+ * Note that these are only used for data ZIOs (read/write). Meta-operations
-+ * (flush/trim) don't need multiple BIOs and so can just make the call
-+ * directly.
-+ */
-+typedef struct {
-+	zio_t		*vbio_zio;	/* parent zio */
-+
-+	struct block_device *vbio_bdev;	/* blockdev to submit bios to */
-+
-+	abd_t		*vbio_abd;	/* abd carrying borrowed linear buf */
-+
-+	atomic_t	vbio_ref;	/* bio refcount */
-+	int		vbio_error;	/* error from failed bio */
-+
-+	uint_t		vbio_max_segs;	/* max segs per bio */
-+
-+	uint_t		vbio_max_bytes;	/* max bytes per bio */
-+	uint_t		vbio_lbs_mask;	/* logical block size mask */
-+
-+	uint64_t	vbio_offset;	/* start offset of next bio */
-+
-+	struct bio	*vbio_bio;	/* pointer to the current bio */
-+	struct bio	*vbio_bios;	/* list of all bios */
-+} vbio_t;
-+
-+static vbio_t *
-+vbio_alloc(zio_t *zio, struct block_device *bdev)
-+{
-+	vbio_t *vbio = kmem_zalloc(sizeof (vbio_t), KM_SLEEP);
-+
-+	vbio->vbio_zio = zio;
-+	vbio->vbio_bdev = bdev;
-+	atomic_set(&vbio->vbio_ref, 0);
-+	vbio->vbio_max_segs = vdev_bio_max_segs(bdev);
-+	vbio->vbio_max_bytes = vdev_bio_max_bytes(bdev);
-+	vbio->vbio_lbs_mask = ~(bdev_logical_block_size(bdev)-1);
-+	vbio->vbio_offset = zio->io_offset;
-+
-+	return (vbio);
-+}
-+
-+static int
-+vbio_add_page(vbio_t *vbio, struct page *page, uint_t size, uint_t offset)
-+{
-+	struct bio *bio;
-+	uint_t ssize;
-+
-+	while (size > 0) {
-+		bio = vbio->vbio_bio;
-+		if (bio == NULL) {
-+			/* New BIO, allocate and set up */
-+			bio = vdev_bio_alloc(vbio->vbio_bdev, GFP_NOIO,
-+			    vbio->vbio_max_segs);
-+			if (unlikely(bio == NULL))
-+				return (SET_ERROR(ENOMEM));
-+			BIO_BI_SECTOR(bio) = vbio->vbio_offset >> 9;
-+
-+			bio->bi_next = vbio->vbio_bios;
-+			vbio->vbio_bios = vbio->vbio_bio = bio;
-+		}
-+
-+		/*
-+		 * Only load as much of the current page data as will fit in
-+		 * the space left in the BIO, respecting lbs alignment. Older
-+		 * kernels will error if we try to overfill the BIO, while
-+		 * newer ones will accept it and split the BIO. This ensures
-+		 * everything works on older kernels, and avoids an additional
-+		 * overhead on the new.
-+		 */
-+		ssize = MIN(size, (vbio->vbio_max_bytes - BIO_BI_SIZE(bio)) &
-+		    vbio->vbio_lbs_mask);
-+		if (ssize > 0 &&
-+		    bio_add_page(bio, page, ssize, offset) == ssize) {
-+			/* Accepted, adjust and load any remaining. */
-+			size -= ssize;
-+			offset += ssize;
-+			continue;
-+		}
-+
-+		/* No room, set up for a new BIO and loop */
-+		vbio->vbio_offset += BIO_BI_SIZE(bio);
-+
-+		/* Signal new BIO allocation wanted */
-+		vbio->vbio_bio = NULL;
-+	}
-+
-+	return (0);
-+}
-+
-+BIO_END_IO_PROTO(vdev_disk_io_rw_completion, bio, error);
-+static void vbio_put(vbio_t *vbio);
-+
-+static void
-+vbio_submit(vbio_t *vbio, int flags)
-+{
-+	ASSERT(vbio->vbio_bios);
-+	struct bio *bio = vbio->vbio_bios;
-+	vbio->vbio_bio = vbio->vbio_bios = NULL;
-+
-+	/*
-+	 * We take a reference for each BIO as we submit it, plus one to
-+	 * protect us from BIOs completing before we're done submitting them
-+	 * all, causing vbio_put() to free vbio out from under us and/or the
-+	 * zio to be returned before all its IO has completed.
-+	 */
-+	atomic_set(&vbio->vbio_ref, 1);
-+
-+	/*
-+	 * If we're submitting more than one BIO, inform the block layer so
-+	 * it can batch them if it wants.
-+	 */
-+	struct blk_plug plug;
-+	boolean_t do_plug = (bio->bi_next != NULL);
-+	if (do_plug)
-+		blk_start_plug(&plug);
-+
-+	/* Submit all the BIOs */
-+	while (bio != NULL) {
-+		atomic_inc(&vbio->vbio_ref);
-+
-+		struct bio *next = bio->bi_next;
-+		bio->bi_next = NULL;
-+
-+		bio->bi_end_io = vdev_disk_io_rw_completion;
-+		bio->bi_private = vbio;
-+		bio_set_op_attrs(bio,
-+		    vbio->vbio_zio->io_type == ZIO_TYPE_WRITE ?
-+		    WRITE : READ, flags);
-+
-+		vdev_submit_bio(bio);
-+
-+		bio = next;
-+	}
-+
-+	/* Finish the batch */
-+	if (do_plug)
-+		blk_finish_plug(&plug);
-+
-+	/* Release the extra reference */
-+	vbio_put(vbio);
-+}
-+
-+static void
-+vbio_return_abd(vbio_t *vbio)
-+{
-+	zio_t *zio = vbio->vbio_zio;
-+	if (vbio->vbio_abd == NULL)
-+		return;
-+
-+	/*
-+	 * If we copied the ABD before issuing it, clean up and return the copy
-+	 * to the ADB, with changes if appropriate.
-+	 */
-+	void *buf = abd_to_buf(vbio->vbio_abd);
-+	abd_free(vbio->vbio_abd);
-+	vbio->vbio_abd = NULL;
-+
-+	if (zio->io_type == ZIO_TYPE_READ)
-+		abd_return_buf_copy(zio->io_abd, buf, zio->io_size);
-+	else
-+		abd_return_buf(zio->io_abd, buf, zio->io_size);
-+}
-+
-+static void
-+vbio_free(vbio_t *vbio)
-+{
-+	VERIFY0(atomic_read(&vbio->vbio_ref));
-+
-+	vbio_return_abd(vbio);
-+
-+	kmem_free(vbio, sizeof (vbio_t));
-+}
-+
-+static void
-+vbio_put(vbio_t *vbio)
-+{
-+	if (atomic_dec_return(&vbio->vbio_ref) > 0)
-+		return;
-+
-+	/*
-+	 * This was the last reference, so the entire IO is completed. Clean
-+	 * up and submit it for processing.
-+	 */
-+
-+	/*
-+	 * Get any data buf back to the original ABD, if necessary. We do this
-+	 * now so we can get the ZIO into the pipeline as quickly as possible,
-+	 * and then do the remaining cleanup after.
-+	 */
-+	vbio_return_abd(vbio);
-+
-+	zio_t *zio = vbio->vbio_zio;
-+
-+	/*
-+	 * Set the overall error. If multiple BIOs returned an error, only the
-+	 * first will be taken; the others are dropped (see
-+	 * vdev_disk_io_rw_completion()). Its pretty much impossible for
-+	 * multiple IOs to the same device to fail with different errors, so
-+	 * there's no real risk.
-+	 */
-+	zio->io_error = vbio->vbio_error;
-+	if (zio->io_error)
-+		vdev_disk_error(zio);
-+
-+	/* All done, submit for processing */
-+	zio_delay_interrupt(zio);
-+
-+	/* Finish cleanup */
-+	vbio_free(vbio);
-+}
-+
-+BIO_END_IO_PROTO(vdev_disk_io_rw_completion, bio, error)
-+{
-+	vbio_t *vbio = bio->bi_private;
-+
-+	if (vbio->vbio_error == 0) {
-+#ifdef HAVE_1ARG_BIO_END_IO_T
-+		vbio->vbio_error = BIO_END_IO_ERROR(bio);
-+#else
-+		if (error)
-+			vbio->vbio_error = -(error);
-+		else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
-+			vbio->vbio_error = EIO;
-+#endif
-+	}
-+
-+	/*
-+	 * Destroy the BIO. This is safe to do; the vbio owns its data and the
-+	 * kernel won't touch it again after the completion function runs.
-+	 */
-+	bio_put(bio);
-+
-+	/* Drop this BIOs reference acquired by vbio_submit() */
-+	vbio_put(vbio);
-+}
-+
-+/*
-+ * Iterator callback to count ABD pages and check their size & alignment.
-+ *
-+ * On Linux, each BIO segment can take a page pointer, and an offset+length of
-+ * the data within that page. A page can be arbitrarily large ("compound"
-+ * pages) but we still have to ensure the data portion is correctly sized and
-+ * aligned to the logical block size, to ensure that if the kernel wants to
-+ * split the BIO, the two halves will still be properly aligned.
-+ */
-+typedef struct {
-+	uint_t  bmask;
-+	uint_t  npages;
-+	uint_t  end;
-+} vdev_disk_check_pages_t;
-+
-+static int
-+vdev_disk_check_pages_cb(struct page *page, size_t off, size_t len, void *priv)
-+{
-+	vdev_disk_check_pages_t *s = priv;
-+
-+	/*
-+	 * If we didn't finish on a block size boundary last time, then there
-+	 * would be a gap if we tried to use this ABD as-is, so abort.
-+	 */
-+	if (s->end != 0)
-+		return (1);
-+
-+	/*
-+	 * Note if we're taking less than a full block, so we can check it
-+	 * above on the next call.
-+	 */
-+	s->end = len & s->bmask;
-+
-+	/* All blocks after the first must start on a block size boundary. */
-+	if (s->npages != 0 && (off & s->bmask) != 0)
-+		return (1);
-+
-+	s->npages++;
-+	return (0);
-+}
-+
-+/*
-+ * Check if we can submit the pages in this ABD to the kernel as-is. Returns
-+ * the number of pages, or 0 if it can't be submitted like this.
-+ */
-+static boolean_t
-+vdev_disk_check_pages(abd_t *abd, uint64_t size, struct block_device *bdev)
-+{
-+	vdev_disk_check_pages_t s = {
-+	    .bmask = bdev_logical_block_size(bdev)-1,
-+	    .npages = 0,
-+	    .end = 0,
-+	};
-+
-+	if (abd_iterate_page_func(abd, 0, size, vdev_disk_check_pages_cb, &s))
-+		return (B_FALSE);
-+
-+	return (B_TRUE);
-+}
-+
-+/* Iterator callback to submit ABD pages to the vbio. */
-+static int
-+vdev_disk_fill_vbio_cb(struct page *page, size_t off, size_t len, void *priv)
-+{
-+	vbio_t *vbio = priv;
-+	return (vbio_add_page(vbio, page, len, off));
-+}
-+
-+static int
-+vdev_disk_io_rw(zio_t *zio)
-+{
-+	vdev_t *v = zio->io_vd;
-+	vdev_disk_t *vd = v->vdev_tsd;
-+	struct block_device *bdev = BDH_BDEV(vd->vd_bdh);
-+	int flags = 0;
-+
-+	/*
-+	 * Accessing outside the block device is never allowed.
-+	 */
-+	if (zio->io_offset + zio->io_size > bdev->bd_inode->i_size) {
-+		vdev_dbgmsg(zio->io_vd,
-+		    "Illegal access %llu size %llu, device size %llu",
-+		    (u_longlong_t)zio->io_offset,
-+		    (u_longlong_t)zio->io_size,
-+		    (u_longlong_t)i_size_read(bdev->bd_inode));
-+		return (SET_ERROR(EIO));
-+	}
-+
-+	if (!(zio->io_flags & (ZIO_FLAG_IO_RETRY | ZIO_FLAG_TRYHARD)) &&
-+	    v->vdev_failfast == B_TRUE) {
-+		bio_set_flags_failfast(bdev, &flags, zfs_vdev_failfast_mask & 1,
-+		    zfs_vdev_failfast_mask & 2, zfs_vdev_failfast_mask & 4);
-+	}
-+
-+	/*
-+	 * Check alignment of the incoming ABD. If any part of it would require
-+	 * submitting a page that is not aligned to the logical block size,
-+	 * then we take a copy into a linear buffer and submit that instead.
-+	 * This should be impossible on a 512b LBS, and fairly rare on 4K,
-+	 * usually requiring abnormally-small data blocks (eg gang blocks)
-+	 * mixed into the same ABD as larger ones (eg aggregated).
-+	 */
-+	abd_t *abd = zio->io_abd;
-+	if (!vdev_disk_check_pages(abd, zio->io_size, bdev)) {
-+		void *buf;
-+		if (zio->io_type == ZIO_TYPE_READ)
-+			buf = abd_borrow_buf(zio->io_abd, zio->io_size);
-+		else
-+			buf = abd_borrow_buf_copy(zio->io_abd, zio->io_size);
-+
-+		/*
-+		 * Wrap the copy in an abd_t, so we can use the same iterators
-+		 * to count and fill the vbio later.
-+		 */
-+		abd = abd_get_from_buf(buf, zio->io_size);
-+
-+		/*
-+		 * False here would mean the borrowed copy has an invalid
-+		 * alignment too, which would mean we've somehow been passed a
-+		 * linear ABD with an interior page that has a non-zero offset
-+		 * or a size not a multiple of PAGE_SIZE. This is not possible.
-+		 * It would mean either zio_buf_alloc() or its underlying
-+		 * allocators have done something extremely strange, or our
-+		 * math in vdev_disk_check_pages() is wrong. In either case,
-+		 * something in seriously wrong and its not safe to continue.
-+		 */
-+		VERIFY(vdev_disk_check_pages(abd, zio->io_size, bdev));
-+	}
-+
-+	/* Allocate vbio, with a pointer to the borrowed ABD if necessary */
-+	int error = 0;
-+	vbio_t *vbio = vbio_alloc(zio, bdev);
-+	if (abd != zio->io_abd)
-+		vbio->vbio_abd = abd;
-+
-+	/* Fill it with pages */
-+	error = abd_iterate_page_func(abd, 0, zio->io_size,
-+	    vdev_disk_fill_vbio_cb, vbio);
-+	if (error != 0) {
-+		vbio_free(vbio);
-+		return (error);
-+	}
-+
-+	vbio_submit(vbio, flags);
-+	return (0);
-+}
-+
- /* ========== */
- 
- /*
-- * This is the classic, battle-tested BIO submission code.
-+ * This is the classic, battle-tested BIO submission code. Until we're totally
-+ * sure that the new code is safe and correct in all cases, this will remain
-+ * available and can be enabled by setting zfs_vdev_disk_classic=1 at module
-+ * load time.
-  *
-  * These functions have been renamed to vdev_classic_* to make it clear what
-  * they belong to, but their implementations are unchanged.
-@@ -1116,7 +1547,8 @@ vdev_disk_init(spa_t *spa, nvlist_t *nv, void **tsd)
- 	(void) tsd;
- 
- 	if (vdev_disk_io_rw_fn == NULL)
--		vdev_disk_io_rw_fn = vdev_classic_physio;
-+		/* XXX make configurable */
-+		vdev_disk_io_rw_fn = 0 ? vdev_classic_physio : vdev_disk_io_rw;
- 
- 	return (0);
- }
-@@ -1215,3 +1647,6 @@ ZFS_MODULE_PARAM(zfs_vdev, zfs_vdev_, open_timeout_ms, UINT, ZMOD_RW,
- 
- ZFS_MODULE_PARAM(zfs_vdev, zfs_vdev_, failfast_mask, UINT, ZMOD_RW,
- 	"Defines failfast mask: 1 - device, 2 - transport, 4 - driver");
-+
-+ZFS_MODULE_PARAM(zfs_vdev_disk, zfs_vdev_disk_, max_segs, UINT, ZMOD_RW,
-+	"Maximum number of data segments to add to an IO request (min 4)");
diff --git a/debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch b/debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
deleted file mode 100644
index b7aef38e..00000000
--- a/debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
+++ /dev/null
@@ -1,104 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Tue, 9 Jan 2024 13:28:57 +1100
-Subject: [PATCH] vdev_disk: add module parameter to select BIO submission
- method
-
-This makes the submission method selectable at module load time via the
-`zfs_vdev_disk_classic` parameter, allowing this change to be backported
-to 2.2 safely, and disabled in favour of the "classic" submission method
-if new problems come up.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit df2169d141aadc0c2cc728c5c5261d6f5c2a27f7)
----
- man/man4/zfs.4                  | 16 ++++++++++++++++
- module/os/linux/zfs/vdev_disk.c | 31 +++++++++++++++++++++++++++++--
- 2 files changed, 45 insertions(+), 2 deletions(-)
-
-diff --git a/man/man4/zfs.4 b/man/man4/zfs.4
-index b5679f2f0..6a628e7f3 100644
---- a/man/man4/zfs.4
-+++ b/man/man4/zfs.4
-@@ -1352,6 +1352,22 @@ If this is higher than the maximum allowed by the device queue or the kernel
- itself, it will be clamped.
- Setting it to zero will cause the kernel's ideal size to be used.
- This parameter only applies on Linux.
-+This parameter is ignored if
-+.Sy zfs_vdev_disk_classic Ns = Ns Sy 1 .
-+.
-+.It Sy zfs_vdev_disk_classic Ns = Ns Sy 0 Ns | Ns 1 Pq uint
-+If set to 1, OpenZFS will submit IO to Linux using the method it used in 2.2
-+and earlier.
-+This "classic" method has known issues with highly fragmented IO requests and
-+is slower on many workloads, but it has been in use for many years and is known
-+to be very stable.
-+If you set this parameter, please also open a bug report why you did so,
-+including the workload involved and any error messages.
-+.Pp
-+This parameter and the classic submission method will be removed once we have
-+total confidence in the new method.
-+.Pp
-+This parameter only applies on Linux, and can only be set at module load time.
- .
- .It Sy zfs_expire_snapshot Ns = Ns Sy 300 Ns s Pq int
- Time before expiring
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index 0ccb9ad96..a9110623a 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -1535,6 +1535,29 @@ vdev_disk_rele(vdev_t *vd)
- 	/* XXX: Implement me as a vnode rele for the device */
- }
- 
-+/*
-+ * BIO submission method. See comment above about vdev_classic.
-+ * Set zfs_vdev_disk_classic=0 for new, =1 for classic
-+ */
-+static uint_t zfs_vdev_disk_classic = 0;	/* default new */
-+
-+/* Set submission function from module parameter */
-+static int
-+vdev_disk_param_set_classic(const char *buf, zfs_kernel_param_t *kp)
-+{
-+	int err = param_set_uint(buf, kp);
-+	if (err < 0)
-+		return (SET_ERROR(err));
-+
-+	vdev_disk_io_rw_fn =
-+	    zfs_vdev_disk_classic ? vdev_classic_physio : vdev_disk_io_rw;
-+
-+	printk(KERN_INFO "ZFS: forcing %s BIO submission\n",
-+	    zfs_vdev_disk_classic ? "classic" : "new");
-+
-+	return (0);
-+}
-+
- /*
-  * At first use vdev use, set the submission function from the default value if
-  * it hasn't been set already.
-@@ -1547,8 +1570,8 @@ vdev_disk_init(spa_t *spa, nvlist_t *nv, void **tsd)
- 	(void) tsd;
- 
- 	if (vdev_disk_io_rw_fn == NULL)
--		/* XXX make configurable */
--		vdev_disk_io_rw_fn = 0 ? vdev_classic_physio : vdev_disk_io_rw;
-+		vdev_disk_io_rw_fn = zfs_vdev_disk_classic ?
-+		    vdev_classic_physio : vdev_disk_io_rw;
- 
- 	return (0);
- }
-@@ -1650,3 +1673,7 @@ ZFS_MODULE_PARAM(zfs_vdev, zfs_vdev_, failfast_mask, UINT, ZMOD_RW,
- 
- ZFS_MODULE_PARAM(zfs_vdev_disk, zfs_vdev_disk_, max_segs, UINT, ZMOD_RW,
- 	"Maximum number of data segments to add to an IO request (min 4)");
-+
-+ZFS_MODULE_PARAM_CALL(zfs_vdev_disk, zfs_vdev_disk_, classic,
-+    vdev_disk_param_set_classic, param_get_uint, ZMOD_RD,
-+	"Use classic BIO submission method");
diff --git a/debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch b/debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
deleted file mode 100644
index 2dbf8916..00000000
--- a/debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
+++ /dev/null
@@ -1,363 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Wed, 21 Feb 2024 11:07:21 +1100
-Subject: [PATCH] vdev_disk: use bio_chain() to submit multiple BIOs
-
-Simplifies our code a lot, so we don't have to wait for each and
-reassemble them.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit 72fd834c47558cb10d847948d1a4615e894c77c3)
----
- module/os/linux/zfs/vdev_disk.c | 231 +++++++++++---------------------
- 1 file changed, 80 insertions(+), 151 deletions(-)
-
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index a9110623a..36468fc21 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -454,10 +454,9 @@ vdev_disk_close(vdev_t *v)
- 	if (v->vdev_reopening || vd == NULL)
- 		return;
- 
--	if (vd->vd_bdh != NULL) {
-+	if (vd->vd_bdh != NULL)
- 		vdev_blkdev_put(vd->vd_bdh, spa_mode(v->vdev_spa),
- 		    zfs_vdev_holder);
--	}
- 
- 	rw_destroy(&vd->vd_lock);
- 	kmem_free(vd, sizeof (vdev_disk_t));
-@@ -663,9 +662,6 @@ typedef struct {
- 
- 	abd_t		*vbio_abd;	/* abd carrying borrowed linear buf */
- 
--	atomic_t	vbio_ref;	/* bio refcount */
--	int		vbio_error;	/* error from failed bio */
--
- 	uint_t		vbio_max_segs;	/* max segs per bio */
- 
- 	uint_t		vbio_max_bytes;	/* max bytes per bio */
-@@ -674,43 +670,52 @@ typedef struct {
- 	uint64_t	vbio_offset;	/* start offset of next bio */
- 
- 	struct bio	*vbio_bio;	/* pointer to the current bio */
--	struct bio	*vbio_bios;	/* list of all bios */
-+	int		vbio_flags;	/* bio flags */
- } vbio_t;
- 
- static vbio_t *
--vbio_alloc(zio_t *zio, struct block_device *bdev)
-+vbio_alloc(zio_t *zio, struct block_device *bdev, int flags)
- {
- 	vbio_t *vbio = kmem_zalloc(sizeof (vbio_t), KM_SLEEP);
- 
- 	vbio->vbio_zio = zio;
- 	vbio->vbio_bdev = bdev;
--	atomic_set(&vbio->vbio_ref, 0);
-+	vbio->vbio_abd = NULL;
- 	vbio->vbio_max_segs = vdev_bio_max_segs(bdev);
- 	vbio->vbio_max_bytes = vdev_bio_max_bytes(bdev);
- 	vbio->vbio_lbs_mask = ~(bdev_logical_block_size(bdev)-1);
- 	vbio->vbio_offset = zio->io_offset;
-+	vbio->vbio_bio = NULL;
-+	vbio->vbio_flags = flags;
- 
- 	return (vbio);
- }
- 
-+BIO_END_IO_PROTO(vbio_completion, bio, error);
-+
- static int
- vbio_add_page(vbio_t *vbio, struct page *page, uint_t size, uint_t offset)
- {
--	struct bio *bio;
-+	struct bio *bio = vbio->vbio_bio;
- 	uint_t ssize;
- 
- 	while (size > 0) {
--		bio = vbio->vbio_bio;
- 		if (bio == NULL) {
- 			/* New BIO, allocate and set up */
- 			bio = vdev_bio_alloc(vbio->vbio_bdev, GFP_NOIO,
- 			    vbio->vbio_max_segs);
--			if (unlikely(bio == NULL))
--				return (SET_ERROR(ENOMEM));
-+			VERIFY(bio);
-+
- 			BIO_BI_SECTOR(bio) = vbio->vbio_offset >> 9;
-+			bio_set_op_attrs(bio,
-+			    vbio->vbio_zio->io_type == ZIO_TYPE_WRITE ?
-+			    WRITE : READ, vbio->vbio_flags);
- 
--			bio->bi_next = vbio->vbio_bios;
--			vbio->vbio_bios = vbio->vbio_bio = bio;
-+			if (vbio->vbio_bio) {
-+				bio_chain(vbio->vbio_bio, bio);
-+				vdev_submit_bio(vbio->vbio_bio);
-+			}
-+			vbio->vbio_bio = bio;
- 		}
- 
- 		/*
-@@ -735,157 +740,97 @@ vbio_add_page(vbio_t *vbio, struct page *page, uint_t size, uint_t offset)
- 		vbio->vbio_offset += BIO_BI_SIZE(bio);
- 
- 		/* Signal new BIO allocation wanted */
--		vbio->vbio_bio = NULL;
-+		bio = NULL;
- 	}
- 
- 	return (0);
- }
- 
--BIO_END_IO_PROTO(vdev_disk_io_rw_completion, bio, error);
--static void vbio_put(vbio_t *vbio);
-+/* Iterator callback to submit ABD pages to the vbio. */
-+static int
-+vbio_fill_cb(struct page *page, size_t off, size_t len, void *priv)
-+{
-+	vbio_t *vbio = priv;
-+	return (vbio_add_page(vbio, page, len, off));
-+}
- 
-+/* Create some BIOs, fill them with data and submit them */
- static void
--vbio_submit(vbio_t *vbio, int flags)
-+vbio_submit(vbio_t *vbio, abd_t *abd, uint64_t size)
- {
--	ASSERT(vbio->vbio_bios);
--	struct bio *bio = vbio->vbio_bios;
--	vbio->vbio_bio = vbio->vbio_bios = NULL;
--
--	/*
--	 * We take a reference for each BIO as we submit it, plus one to
--	 * protect us from BIOs completing before we're done submitting them
--	 * all, causing vbio_put() to free vbio out from under us and/or the
--	 * zio to be returned before all its IO has completed.
--	 */
--	atomic_set(&vbio->vbio_ref, 1);
-+	ASSERT(vbio->vbio_bdev);
- 
- 	/*
--	 * If we're submitting more than one BIO, inform the block layer so
--	 * it can batch them if it wants.
-+	 * We plug so we can submit the BIOs as we go and only unplug them when
-+	 * they are fully created and submitted. This is important; if we don't
-+	 * plug, then the kernel may start executing earlier BIOs while we're
-+	 * still creating and executing later ones, and if the device goes
-+	 * away while that's happening, older kernels can get confused and
-+	 * trample memory.
- 	 */
- 	struct blk_plug plug;
--	boolean_t do_plug = (bio->bi_next != NULL);
--	if (do_plug)
--		blk_start_plug(&plug);
-+	blk_start_plug(&plug);
- 
--	/* Submit all the BIOs */
--	while (bio != NULL) {
--		atomic_inc(&vbio->vbio_ref);
-+	(void) abd_iterate_page_func(abd, 0, size, vbio_fill_cb, vbio);
-+	ASSERT(vbio->vbio_bio);
- 
--		struct bio *next = bio->bi_next;
--		bio->bi_next = NULL;
-+	vbio->vbio_bio->bi_end_io = vbio_completion;
-+	vbio->vbio_bio->bi_private = vbio;
- 
--		bio->bi_end_io = vdev_disk_io_rw_completion;
--		bio->bi_private = vbio;
--		bio_set_op_attrs(bio,
--		    vbio->vbio_zio->io_type == ZIO_TYPE_WRITE ?
--		    WRITE : READ, flags);
-+	vdev_submit_bio(vbio->vbio_bio);
- 
--		vdev_submit_bio(bio);
--
--		bio = next;
--	}
--
--	/* Finish the batch */
--	if (do_plug)
--		blk_finish_plug(&plug);
-+	blk_finish_plug(&plug);
- 
--	/* Release the extra reference */
--	vbio_put(vbio);
-+	vbio->vbio_bio = NULL;
-+	vbio->vbio_bdev = NULL;
- }
- 
--static void
--vbio_return_abd(vbio_t *vbio)
-+/* IO completion callback */
-+BIO_END_IO_PROTO(vbio_completion, bio, error)
- {
-+	vbio_t *vbio = bio->bi_private;
- 	zio_t *zio = vbio->vbio_zio;
--	if (vbio->vbio_abd == NULL)
--		return;
--
--	/*
--	 * If we copied the ABD before issuing it, clean up and return the copy
--	 * to the ADB, with changes if appropriate.
--	 */
--	void *buf = abd_to_buf(vbio->vbio_abd);
--	abd_free(vbio->vbio_abd);
--	vbio->vbio_abd = NULL;
--
--	if (zio->io_type == ZIO_TYPE_READ)
--		abd_return_buf_copy(zio->io_abd, buf, zio->io_size);
--	else
--		abd_return_buf(zio->io_abd, buf, zio->io_size);
--}
- 
--static void
--vbio_free(vbio_t *vbio)
--{
--	VERIFY0(atomic_read(&vbio->vbio_ref));
--
--	vbio_return_abd(vbio);
-+	ASSERT(zio);
- 
--	kmem_free(vbio, sizeof (vbio_t));
--}
-+	/* Capture and log any errors */
-+#ifdef HAVE_1ARG_BIO_END_IO_T
-+	zio->io_error = BIO_END_IO_ERROR(bio);
-+#else
-+	zio->io_error = 0;
-+	if (error)
-+		zio->io_error = -(error);
-+	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
-+		zio->io_error = EIO;
-+#endif
-+	ASSERT3U(zio->io_error, >=, 0);
- 
--static void
--vbio_put(vbio_t *vbio)
--{
--	if (atomic_dec_return(&vbio->vbio_ref) > 0)
--		return;
-+	if (zio->io_error)
-+		vdev_disk_error(zio);
- 
--	/*
--	 * This was the last reference, so the entire IO is completed. Clean
--	 * up and submit it for processing.
--	 */
-+	/* Return the BIO to the kernel */
-+	bio_put(bio);
- 
- 	/*
--	 * Get any data buf back to the original ABD, if necessary. We do this
--	 * now so we can get the ZIO into the pipeline as quickly as possible,
--	 * and then do the remaining cleanup after.
-+	 * If we copied the ABD before issuing it, clean up and return the copy
-+	 * to the ADB, with changes if appropriate.
- 	 */
--	vbio_return_abd(vbio);
-+	if (vbio->vbio_abd != NULL) {
-+		void *buf = abd_to_buf(vbio->vbio_abd);
-+		abd_free(vbio->vbio_abd);
-+		vbio->vbio_abd = NULL;
- 
--	zio_t *zio = vbio->vbio_zio;
-+		if (zio->io_type == ZIO_TYPE_READ)
-+			abd_return_buf_copy(zio->io_abd, buf, zio->io_size);
-+		else
-+			abd_return_buf(zio->io_abd, buf, zio->io_size);
-+	}
- 
--	/*
--	 * Set the overall error. If multiple BIOs returned an error, only the
--	 * first will be taken; the others are dropped (see
--	 * vdev_disk_io_rw_completion()). Its pretty much impossible for
--	 * multiple IOs to the same device to fail with different errors, so
--	 * there's no real risk.
--	 */
--	zio->io_error = vbio->vbio_error;
--	if (zio->io_error)
--		vdev_disk_error(zio);
-+	/* Final cleanup */
-+	kmem_free(vbio, sizeof (vbio_t));
- 
- 	/* All done, submit for processing */
- 	zio_delay_interrupt(zio);
--
--	/* Finish cleanup */
--	vbio_free(vbio);
--}
--
--BIO_END_IO_PROTO(vdev_disk_io_rw_completion, bio, error)
--{
--	vbio_t *vbio = bio->bi_private;
--
--	if (vbio->vbio_error == 0) {
--#ifdef HAVE_1ARG_BIO_END_IO_T
--		vbio->vbio_error = BIO_END_IO_ERROR(bio);
--#else
--		if (error)
--			vbio->vbio_error = -(error);
--		else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
--			vbio->vbio_error = EIO;
--#endif
--	}
--
--	/*
--	 * Destroy the BIO. This is safe to do; the vbio owns its data and the
--	 * kernel won't touch it again after the completion function runs.
--	 */
--	bio_put(bio);
--
--	/* Drop this BIOs reference acquired by vbio_submit() */
--	vbio_put(vbio);
- }
- 
- /*
-@@ -948,14 +893,6 @@ vdev_disk_check_pages(abd_t *abd, uint64_t size, struct block_device *bdev)
- 	return (B_TRUE);
- }
- 
--/* Iterator callback to submit ABD pages to the vbio. */
--static int
--vdev_disk_fill_vbio_cb(struct page *page, size_t off, size_t len, void *priv)
--{
--	vbio_t *vbio = priv;
--	return (vbio_add_page(vbio, page, len, off));
--}
--
- static int
- vdev_disk_io_rw(zio_t *zio)
- {
-@@ -1018,20 +955,12 @@ vdev_disk_io_rw(zio_t *zio)
- 	}
- 
- 	/* Allocate vbio, with a pointer to the borrowed ABD if necessary */
--	int error = 0;
--	vbio_t *vbio = vbio_alloc(zio, bdev);
-+	vbio_t *vbio = vbio_alloc(zio, bdev, flags);
- 	if (abd != zio->io_abd)
- 		vbio->vbio_abd = abd;
- 
--	/* Fill it with pages */
--	error = abd_iterate_page_func(abd, 0, zio->io_size,
--	    vdev_disk_fill_vbio_cb, vbio);
--	if (error != 0) {
--		vbio_free(vbio);
--		return (error);
--	}
--
--	vbio_submit(vbio, flags);
-+	/* Fill it with data pages and submit it to the kernel */
-+	vbio_submit(vbio, abd, zio->io_size);
- 	return (0);
- }
- 
diff --git a/debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch b/debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
deleted file mode 100644
index 28dbbf9d..00000000
--- a/debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
+++ /dev/null
@@ -1,96 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Thu, 14 Mar 2024 10:57:30 +1100
-Subject: [PATCH] abd_iter_page: don't use compound heads on Linux <4.5
-
-Before 4.5 (specifically, torvalds/linux@ddc58f2), head and tail pages
-in a compound page were refcounted separately. This means that using the
-head page without taking a reference to it could see it cleaned up later
-before we're finished with it. Specifically, bio_add_page() would take a
-reference, and drop its reference after the bio completion callback
-returns.
-
-If the zio is executed immediately from the completion callback, this is
-usually ok, as any data is referenced through the tail page referenced
-by the ABD, and so becomes "live" that way. If there's a delay in zio
-execution (high load, error injection), then the head page can be freed,
-along with any dirty flags or other indicators that the underlying
-memory is used. Later, when the zio completes and that memory is
-accessed, its either unmapped and an unhandled fault takes down the
-entire system, or it is mapped and we end up messing around in someone
-else's memory. Both of these are very bad.
-
-The solution on these older kernels is to take a reference to the head
-page when we use it, and release it when we're done. There's not really
-a sensible way under our current structure to do this; the "best" would
-be to keep a list of head page references in the ABD, and release them
-when the ABD is freed.
-
-Since this additional overhead is totally unnecessary on 4.5+, where
-head and tail pages share refcounts, I've opted to simply not use the
-compound head in ABD page iteration there. This is theoretically less
-efficient (though cleaning up head page references would add overhead),
-but its safe, and we still get the other benefits of not mapping pages
-before adding them to a bio and not mis-splitting pages.
-
-There doesn't appear to be an obvious symbol name or config option we
-can match on to discover this behaviour in configure (and the mm/page
-APIs have changed a lot since then anyway), so I've gone with a simple
-version check.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-Closes #15533
-Closes #15588
-(cherry picked from commit c6be6ce1755a3d9a3cbe70256cd8958ef83d8542)
----
- module/os/linux/zfs/abd_os.c | 14 ++++++++++++++
- 1 file changed, 14 insertions(+)
-
-diff --git a/module/os/linux/zfs/abd_os.c b/module/os/linux/zfs/abd_os.c
-index 3fe01c0b7..d3255dcbc 100644
---- a/module/os/linux/zfs/abd_os.c
-+++ b/module/os/linux/zfs/abd_os.c
-@@ -62,6 +62,7 @@
- #include <linux/kmap_compat.h>
- #include <linux/mm_compat.h>
- #include <linux/scatterlist.h>
-+#include <linux/version.h>
- #endif
- 
- #ifdef _KERNEL
-@@ -1061,6 +1062,7 @@ abd_iter_page(struct abd_iter *aiter)
- 	}
- 	ASSERT(page);
- 
-+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0)
- 	if (PageTail(page)) {
- 		/*
- 		 * This page is part of a "compound page", which is a group of
-@@ -1082,11 +1084,23 @@ abd_iter_page(struct abd_iter *aiter)
- 		 * To do this, we need to adjust the offset to be counted from
- 		 * the head page. struct page for compound pages are stored
- 		 * contiguously, so we can just adjust by a simple offset.
-+		 *
-+		 * Before kernel 4.5, compound page heads were refcounted
-+		 * separately, such that moving back to the head page would
-+		 * require us to take a reference to it and releasing it once
-+		 * we're completely finished with it. In practice, that means
-+		 * when our caller is done with the ABD, which we have no
-+		 * insight into from here. Rather than contort this API to
-+		 * track head page references on such ancient kernels, we just
-+		 * compile this block out and use the tail pages directly. This
-+		 * is slightly less efficient, but makes everything far
-+		 * simpler.
- 		 */
- 		struct page *head = compound_head(page);
- 		doff += ((page - head) * PAGESIZE);
- 		page = head;
- 	}
-+#endif
- 
- 	/* final page and position within it */
- 	aiter->iter_page = page;
diff --git a/debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch b/debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
deleted file mode 100644
index e2f1422f..00000000
--- a/debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
+++ /dev/null
@@ -1,90 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Wed, 27 Mar 2024 13:11:12 +1100
-Subject: [PATCH] vdev_disk: default to classic submission for 2.2.x
-
-We don't want to change to brand-new code in the middle of a stable
-series, but we want it available to test for people running into page
-splitting issues.
-
-This commits make zfs_vdev_disk_classic=1 the default, and updates the
-documentation to better explain what's going on.
-
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
----
- man/man4/zfs.4                  | 31 ++++++++++++++++++++++---------
- module/os/linux/zfs/vdev_disk.c |  8 +++++---
- 2 files changed, 27 insertions(+), 12 deletions(-)
-
-diff --git a/man/man4/zfs.4 b/man/man4/zfs.4
-index 6a628e7f3..a98ec519a 100644
---- a/man/man4/zfs.4
-+++ b/man/man4/zfs.4
-@@ -1355,17 +1355,30 @@ This parameter only applies on Linux.
- This parameter is ignored if
- .Sy zfs_vdev_disk_classic Ns = Ns Sy 1 .
- .
--.It Sy zfs_vdev_disk_classic Ns = Ns Sy 0 Ns | Ns 1 Pq uint
--If set to 1, OpenZFS will submit IO to Linux using the method it used in 2.2
--and earlier.
--This "classic" method has known issues with highly fragmented IO requests and
--is slower on many workloads, but it has been in use for many years and is known
--to be very stable.
--If you set this parameter, please also open a bug report why you did so,
-+.It Sy zfs_vdev_disk_classic Ns = Ns 0 Ns | Ns Sy 1 Pq uint
-+Controls the method used to submit IO to the Linux block layer
-+(default
-+.Sy 1 "classic" Ns
-+)
-+.Pp
-+If set to 1, the "classic" method is used.
-+This is the method that has been in use since the earliest versions of
-+ZFS-on-Linux.
-+It has known issues with highly fragmented IO requests and is less efficient on
-+many workloads, but it well known and well understood.
-+.Pp
-+If set to 0, the "new" method is used.
-+This method is available since 2.2.4 and should resolve all known issues and be
-+far more efficient, but has not had as much testing.
-+In the 2.2.x series, this parameter defaults to 1, to use the "classic" method.
-+.Pp
-+It is not recommended that you change it except on advice from the OpenZFS
-+developers.
-+If you do change it, please also open a bug report describing why you did so,
- including the workload involved and any error messages.
- .Pp
--This parameter and the classic submission method will be removed once we have
--total confidence in the new method.
-+This parameter and the "classic" submission method will be removed in a future
-+release of OpenZFS once we have total confidence in the new method.
- .Pp
- This parameter only applies on Linux, and can only be set at module load time.
- .
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index 36468fc21..e1c19a085 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -969,8 +969,10 @@ vdev_disk_io_rw(zio_t *zio)
- /*
-  * This is the classic, battle-tested BIO submission code. Until we're totally
-  * sure that the new code is safe and correct in all cases, this will remain
-- * available and can be enabled by setting zfs_vdev_disk_classic=1 at module
-- * load time.
-+ * available.
-+ *
-+ * It is enabled by setting zfs_vdev_disk_classic=1 at module load time. It is
-+ * enabled (=1) by default since 2.2.4, and disabled by default (=0) on master.
-  *
-  * These functions have been renamed to vdev_classic_* to make it clear what
-  * they belong to, but their implementations are unchanged.
-@@ -1468,7 +1470,7 @@ vdev_disk_rele(vdev_t *vd)
-  * BIO submission method. See comment above about vdev_classic.
-  * Set zfs_vdev_disk_classic=0 for new, =1 for classic
-  */
--static uint_t zfs_vdev_disk_classic = 0;	/* default new */
-+static uint_t zfs_vdev_disk_classic = 1;	/* default classic */
- 
- /* Set submission function from module parameter */
- static int
diff --git a/debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch b/debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
deleted file mode 100644
index 027f299d..00000000
--- a/debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
+++ /dev/null
@@ -1,104 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Robert Evans <rrevans@gmail.com>
-Date: Mon, 25 Mar 2024 17:56:49 -0400
-Subject: [PATCH] Fix corruption caused by mmap flushing problems
-
-1) Make mmap flushes synchronous. Linux may skip flushing dirty pages
-   already in writeback unless data-integrity sync is requested.
-
-2) Change zfs_putpage to use TXG_WAIT. Otherwise dirty pages may be
-   skipped due to DMU pushing back on TX assign.
-
-3) Add missing mmap flush when doing block cloning.
-
-4) While here, pass errors from putpage to writepage/writepages.
-
-This change fixes corruption edge cases, but unfortunately adds
-synchronous ZIL flushes for dirty mmap pages to llseek and bclone
-operations. It may be possible to avoid these sync writes later
-but would need more tricky refactoring of the writeback code.
-
-Reviewed-by: Alexander Motin <mav@FreeBSD.org>
-Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
-Signed-off-by: Robert Evans <evansr@google.com>
-Closes #15933
-Closes #16019
----
- module/os/linux/zfs/zfs_vnops_os.c | 5 +----
- module/os/linux/zfs/zpl_file.c     | 8 ++++----
- module/zfs/zfs_vnops.c             | 6 +++++-
- 3 files changed, 10 insertions(+), 9 deletions(-)
-
-diff --git a/module/os/linux/zfs/zfs_vnops_os.c b/module/os/linux/zfs/zfs_vnops_os.c
-index c06a75662..7c473bc7e 100644
---- a/module/os/linux/zfs/zfs_vnops_os.c
-+++ b/module/os/linux/zfs/zfs_vnops_os.c
-@@ -3792,11 +3792,8 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc,
- 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
- 	zfs_sa_upgrade_txholds(tx, zp);
- 
--	err = dmu_tx_assign(tx, TXG_NOWAIT);
-+	err = dmu_tx_assign(tx, TXG_WAIT);
- 	if (err != 0) {
--		if (err == ERESTART)
--			dmu_tx_wait(tx);
--
- 		dmu_tx_abort(tx);
- #ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO
- 		filemap_dirty_folio(page_mapping(pp), page_folio(pp));
-diff --git a/module/os/linux/zfs/zpl_file.c b/module/os/linux/zfs/zpl_file.c
-index 3caa0fc6c..9dec52215 100644
---- a/module/os/linux/zfs/zpl_file.c
-+++ b/module/os/linux/zfs/zpl_file.c
-@@ -720,23 +720,23 @@ zpl_putpage(struct page *pp, struct writeback_control *wbc, void *data)
- {
- 	boolean_t *for_sync = data;
- 	fstrans_cookie_t cookie;
-+	int ret;
- 
- 	ASSERT(PageLocked(pp));
- 	ASSERT(!PageWriteback(pp));
- 
- 	cookie = spl_fstrans_mark();
--	(void) zfs_putpage(pp->mapping->host, pp, wbc, *for_sync);
-+	ret = zfs_putpage(pp->mapping->host, pp, wbc, *for_sync);
- 	spl_fstrans_unmark(cookie);
- 
--	return (0);
-+	return (ret);
- }
- 
- #ifdef HAVE_WRITEPAGE_T_FOLIO
- static int
- zpl_putfolio(struct folio *pp, struct writeback_control *wbc, void *data)
- {
--	(void) zpl_putpage(&pp->page, wbc, data);
--	return (0);
-+	return (zpl_putpage(&pp->page, wbc, data));
- }
- #endif
- 
-diff --git a/module/zfs/zfs_vnops.c b/module/zfs/zfs_vnops.c
-index 2b37834d5..7020f88ec 100644
---- a/module/zfs/zfs_vnops.c
-+++ b/module/zfs/zfs_vnops.c
-@@ -130,7 +130,7 @@ zfs_holey_common(znode_t *zp, ulong_t cmd, loff_t *off)
- 
- 	/* Flush any mmap()'d data to disk */
- 	if (zn_has_cached_data(zp, 0, file_sz - 1))
--		zn_flush_cached_data(zp, B_FALSE);
-+		zn_flush_cached_data(zp, B_TRUE);
- 
- 	lr = zfs_rangelock_enter(&zp->z_rangelock, 0, UINT64_MAX, RL_READER);
- 	error = dmu_offset_next(ZTOZSB(zp)->z_os, zp->z_id, hole, &noff);
-@@ -1193,6 +1193,10 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
- 		}
- 	}
- 
-+	/* Flush any mmap()'d data to disk */
-+	if (zn_has_cached_data(inzp, inoff, inoff + len - 1))
-+		zn_flush_cached_data(inzp, B_TRUE);
-+
- 	/*
- 	 * Maintain predictable lock order.
- 	 */
diff --git a/debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch b/debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch
deleted file mode 100644
index 83eac378..00000000
--- a/debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch
+++ /dev/null
@@ -1,57 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Rob Norris <rob.norris@klarasystems.com>
-Date: Tue, 2 Apr 2024 15:14:54 +1100
-Subject: [PATCH] vdev_disk: don't touch vbio after its handed off to the
- kernel
-
-After IO is unplugged, it may complete immediately and vbio_completion
-be called on interrupt context. That may interrupt or deschedule our
-task. If its the last bio, the vbio will be freed. Then, we get
-rescheduled, and try to write to freed memory through vbio->.
-
-This patch just removes the the cleanup, and the corresponding assert.
-These were leftovers from a previous iteration of vbio_submit() and were
-always "belt and suspenders" ops anyway, never strictly required.
-
-Reported-by: Rich Ercolani <rincebrain@gmail.com>
-Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
-Sponsored-by: Klara, Inc.
-Sponsored-by: Wasabi Technology, Inc.
-(cherry picked from commit 34f662ad22206af6852020fd923ceccd836a855f)
-Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
----
- module/os/linux/zfs/vdev_disk.c | 11 ++++++-----
- 1 file changed, 6 insertions(+), 5 deletions(-)
-
-diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
-index e1c19a085..62c7aa14f 100644
---- a/module/os/linux/zfs/vdev_disk.c
-+++ b/module/os/linux/zfs/vdev_disk.c
-@@ -758,8 +758,6 @@ vbio_fill_cb(struct page *page, size_t off, size_t len, void *priv)
- static void
- vbio_submit(vbio_t *vbio, abd_t *abd, uint64_t size)
- {
--	ASSERT(vbio->vbio_bdev);
--
- 	/*
- 	 * We plug so we can submit the BIOs as we go and only unplug them when
- 	 * they are fully created and submitted. This is important; if we don't
-@@ -777,12 +775,15 @@ vbio_submit(vbio_t *vbio, abd_t *abd, uint64_t size)
- 	vbio->vbio_bio->bi_end_io = vbio_completion;
- 	vbio->vbio_bio->bi_private = vbio;
- 
-+	/*
-+	 * Once submitted, vbio_bio now owns vbio (through bi_private) and we
-+	 * can't touch it again. The bio may complete and vbio_completion() be
-+	 * called and free the vbio before this task is run again, so we must
-+	 * consider it invalid from this point.
-+	 */
- 	vdev_submit_bio(vbio->vbio_bio);
- 
- 	blk_finish_plug(&plug);
--
--	vbio->vbio_bio = NULL;
--	vbio->vbio_bdev = NULL;
- }
- 
- /* IO completion callback */
diff --git a/debian/patches/series b/debian/patches/series
index 7c1a5c6c..35f81d13 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -9,17 +9,3 @@
 0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
 0010-Fix-nfs_truncate_shares-without-etc-exports.d.patch
 0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
-0012-udev-correctly-handle-partition-16-and-later.patch
-0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
-0014-linux-5.4-compat-page_size.patch
-0015-abd-add-page-iterator.patch
-0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
-0017-vdev_disk-reorganise-vdev_disk_io_start.patch
-0018-vdev_disk-make-read-write-IO-function-configurable.patch
-0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
-0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
-0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
-0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
-0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
-0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
-0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch
diff --git a/upstream b/upstream
index c883088d..25665920 160000
--- a/upstream
+++ b/upstream
@@ -1 +1 @@
-Subproject commit c883088df83ced3a2b8b38e6d89a5e63fb153ee4
+Subproject commit 2566592045780e7be7afc899c2496b1ae3af4f4d
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [pve-devel] [PATCH zfsonlinux v2 2/2] update arc_summary arcstat patch with new introduced values
  2024-05-07 15:02 [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Stoiko Ivanov
  2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 1/2] update zfs submodule to 2.2.4 and refresh patches Stoiko Ivanov
@ 2024-05-07 15:02 ` Stoiko Ivanov
  2024-05-21 13:32   ` Max Carrara
  2024-05-21 13:31 ` [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Max Carrara
  2024-05-21 14:06 ` [pve-devel] applied-series: " Thomas Lamprecht
  3 siblings, 1 reply; 7+ messages in thread
From: Stoiko Ivanov @ 2024-05-07 15:02 UTC (permalink / raw)
  To: pve-devel

ZFS 2.2.4 added new kstats for speculative prefetch in:
026fe796465e3da7b27d06ef5338634ee6dd30d8

Adapt our patch introduced with ZFS 2.1 (for the then added MFU/MRU
stats), to also deal with the now introduced values not being present
(because an old kernel-module does not offer them).

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 ...-guard-access-to-freshly-introduced-.patch | 438 ++++++++++++++++++
 ...-guard-access-to-l2arc-MFU-MRU-stats.patch | 113 -----
 debian/patches/series                         |   2 +-
 3 files changed, 439 insertions(+), 114 deletions(-)
 create mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
 delete mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch

diff --git a/debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch b/debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
new file mode 100644
index 00000000..bc7db2a9
--- /dev/null
+++ b/debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
@@ -0,0 +1,438 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Thomas Lamprecht <t.lamprecht@proxmox.com>
+Date: Wed, 10 Nov 2021 09:29:47 +0100
+Subject: [PATCH] arc stat/summary: guard access to freshly introduced stats
+
+l2arc MFU/MRU and zfetch past future and stride stats were introduced
+in 2.1 and 2.2.4 respectively:
+
+commit 085321621e79a75bea41c2b6511da6ebfbf2ba0a added printing MFU
+and MRU stats for 2.1 user space tools, but those keys are not
+available in the 2.0 module. That means it may break the arcstat and
+arc_summary tools after upgrade to 2.1 (user space), before a reboot
+to the new 2.1 ZFS kernel-module happened, due to python raising a
+KeyError on the dict access then.
+
+Move those two keys to a .get accessor with `0` as fallback, as it
+should be better to show some possible wrong data for new stat-keys
+than throwing an exception.
+
+also move l2_mfu_asize  l2_mru_asize l2_prefetch_asize
+l2_bufc_data_asize l2_bufc_metadata_asize to .get accessor
+(these are only present with a cache device in the pool)
+
+guard access to iohits and uncached state introduced in
+792a6ee462efc15a7614f27e13f0f8aaa9414a08
+
+guard access to zfetch past future stride stats introduced in
+026fe796465e3da7b27d06ef5338634ee6dd30d8
+
+These are present in the current kernel, but lead to an exception, if
+running the new user-space with an old kernel module.
+
+Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
+---
+ cmd/arc_summary | 132 ++++++++++++++++++++++++------------------------
+ cmd/arcstat.in  |  48 +++++++++---------
+ 2 files changed, 90 insertions(+), 90 deletions(-)
+
+diff --git a/cmd/arc_summary b/cmd/arc_summary
+index 100fb1987..30f5d23e9 100755
+--- a/cmd/arc_summary
++++ b/cmd/arc_summary
+@@ -551,21 +551,21 @@ def section_arc(kstats_dict):
+     arc_target_size = arc_stats['c']
+     arc_max = arc_stats['c_max']
+     arc_min = arc_stats['c_min']
+-    meta = arc_stats['meta']
+-    pd = arc_stats['pd']
+-    pm = arc_stats['pm']
+-    anon_data = arc_stats['anon_data']
+-    anon_metadata = arc_stats['anon_metadata']
+-    mfu_data = arc_stats['mfu_data']
+-    mfu_metadata = arc_stats['mfu_metadata']
+-    mru_data = arc_stats['mru_data']
+-    mru_metadata = arc_stats['mru_metadata']
+-    mfug_data = arc_stats['mfu_ghost_data']
+-    mfug_metadata = arc_stats['mfu_ghost_metadata']
+-    mrug_data = arc_stats['mru_ghost_data']
+-    mrug_metadata = arc_stats['mru_ghost_metadata']
+-    unc_data = arc_stats['uncached_data']
+-    unc_metadata = arc_stats['uncached_metadata']
++    meta = arc_stats.get('meta', 0)
++    pd = arc_stats.get('pd', 0)
++    pm = arc_stats.get('pm', 0)
++    anon_data = arc_stats.get('anon_data', 0)
++    anon_metadata = arc_stats.get('anon_metadata', 0)
++    mfu_data = arc_stats.get('mfu_data', 0)
++    mfu_metadata = arc_stats.get('mfu_metadata', 0)
++    mru_data = arc_stats.get('mru_data', 0)
++    mru_metadata = arc_stats.get('mru_metadata', 0)
++    mfug_data = arc_stats.get('mfu_ghost_data', 0)
++    mfug_metadata = arc_stats.get('mfu_ghost_metadata', 0)
++    mrug_data = arc_stats.get('mru_ghost_data', 0)
++    mrug_metadata = arc_stats.get('mru_ghost_metadata', 0)
++    unc_data = arc_stats.get('uncached_data', 0)
++    unc_metadata = arc_stats.get('uncached_metadata', 0)
+     bonus_size = arc_stats['bonus_size']
+     dnode_limit = arc_stats['arc_dnode_limit']
+     dnode_size = arc_stats['dnode_size']
+@@ -655,13 +655,13 @@ def section_arc(kstats_dict):
+     prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
+     prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
+     prt_i2('L2 eligible MFU evictions:',
+-           f_perc(arc_stats['evict_l2_eligible_mfu'],
++           f_perc(arc_stats.get('evict_l2_eligible_mfu', 0), # 2.0 module compat
+            arc_stats['evict_l2_eligible']),
+-           f_bytes(arc_stats['evict_l2_eligible_mfu']))
++           f_bytes(arc_stats.get('evict_l2_eligible_mfu', 0)))
+     prt_i2('L2 eligible MRU evictions:',
+-           f_perc(arc_stats['evict_l2_eligible_mru'],
++           f_perc(arc_stats.get('evict_l2_eligible_mru', 0), # 2.0 module compat
+            arc_stats['evict_l2_eligible']),
+-           f_bytes(arc_stats['evict_l2_eligible_mru']))
++           f_bytes(arc_stats.get('evict_l2_eligible_mru', 0)))
+     prt_i1('L2 ineligible evictions:',
+            f_bytes(arc_stats['evict_l2_ineligible']))
+     print()
+@@ -672,106 +672,106 @@ def section_archits(kstats_dict):
+     """
+ 
+     arc_stats = isolate_section('arcstats', kstats_dict)
+-    all_accesses = int(arc_stats['hits'])+int(arc_stats['iohits'])+\
++    all_accesses = int(arc_stats['hits'])+int(arc_stats.get('iohits', 0))+\
+         int(arc_stats['misses'])
+ 
+     prt_1('ARC total accesses:', f_hits(all_accesses))
+     ta_todo = (('Total hits:', arc_stats['hits']),
+-               ('Total I/O hits:', arc_stats['iohits']),
++               ('Total I/O hits:', arc_stats.get('iohits', 0)),
+                ('Total misses:', arc_stats['misses']))
+     for title, value in ta_todo:
+         prt_i2(title, f_perc(value, all_accesses), f_hits(value))
+     print()
+ 
+     dd_total = int(arc_stats['demand_data_hits']) +\
+-        int(arc_stats['demand_data_iohits']) +\
++        int(arc_stats.get('demand_data_iohits', 0)) +\
+         int(arc_stats['demand_data_misses'])
+     prt_2('ARC demand data accesses:', f_perc(dd_total, all_accesses),
+          f_hits(dd_total))
+     dd_todo = (('Demand data hits:', arc_stats['demand_data_hits']),
+-               ('Demand data I/O hits:', arc_stats['demand_data_iohits']),
++               ('Demand data I/O hits:', arc_stats.get('demand_data_iohits', 0)),
+                ('Demand data misses:', arc_stats['demand_data_misses']))
+     for title, value in dd_todo:
+         prt_i2(title, f_perc(value, dd_total), f_hits(value))
+     print()
+ 
+     dm_total = int(arc_stats['demand_metadata_hits']) +\
+-        int(arc_stats['demand_metadata_iohits']) +\
++        int(arc_stats.get('demand_metadata_iohits', 0)) +\
+         int(arc_stats['demand_metadata_misses'])
+     prt_2('ARC demand metadata accesses:', f_perc(dm_total, all_accesses),
+           f_hits(dm_total))
+     dm_todo = (('Demand metadata hits:', arc_stats['demand_metadata_hits']),
+                ('Demand metadata I/O hits:',
+-                arc_stats['demand_metadata_iohits']),
++                arc_stats.get('demand_metadata_iohits', 0)),
+                ('Demand metadata misses:', arc_stats['demand_metadata_misses']))
+     for title, value in dm_todo:
+         prt_i2(title, f_perc(value, dm_total), f_hits(value))
+     print()
+ 
+     pd_total = int(arc_stats['prefetch_data_hits']) +\
+-        int(arc_stats['prefetch_data_iohits']) +\
++        int(arc_stats.get('prefetch_data_iohits', 0)) +\
+         int(arc_stats['prefetch_data_misses'])
+     prt_2('ARC prefetch data accesses:', f_perc(pd_total, all_accesses),
+           f_hits(pd_total))
+     pd_todo = (('Prefetch data hits:', arc_stats['prefetch_data_hits']),
+-               ('Prefetch data I/O hits:', arc_stats['prefetch_data_iohits']),
++               ('Prefetch data I/O hits:', arc_stats.get('prefetch_data_iohits', 0)),
+                ('Prefetch data misses:', arc_stats['prefetch_data_misses']))
+     for title, value in pd_todo:
+         prt_i2(title, f_perc(value, pd_total), f_hits(value))
+     print()
+ 
+     pm_total = int(arc_stats['prefetch_metadata_hits']) +\
+-        int(arc_stats['prefetch_metadata_iohits']) +\
++        int(arc_stats.get('prefetch_metadata_iohits', 0)) +\
+         int(arc_stats['prefetch_metadata_misses'])
+     prt_2('ARC prefetch metadata accesses:', f_perc(pm_total, all_accesses),
+           f_hits(pm_total))
+     pm_todo = (('Prefetch metadata hits:',
+                 arc_stats['prefetch_metadata_hits']),
+                ('Prefetch metadata I/O hits:',
+-                arc_stats['prefetch_metadata_iohits']),
++                arc_stats.get('prefetch_metadata_iohits', 0)),
+                ('Prefetch metadata misses:',
+                 arc_stats['prefetch_metadata_misses']))
+     for title, value in pm_todo:
+         prt_i2(title, f_perc(value, pm_total), f_hits(value))
+     print()
+ 
+-    all_prefetches = int(arc_stats['predictive_prefetch'])+\
+-        int(arc_stats['prescient_prefetch'])
++    all_prefetches = int(arc_stats.get('predictive_prefetch', 0))+\
++        int(arc_stats.get('prescient_prefetch', 0))
+     prt_2('ARC predictive prefetches:',
+-           f_perc(arc_stats['predictive_prefetch'], all_prefetches),
+-           f_hits(arc_stats['predictive_prefetch']))
++           f_perc(arc_stats.get('predictive_prefetch', 0), all_prefetches),
++           f_hits(arc_stats.get('predictive_prefetch', 0)))
+     prt_i2('Demand hits after predictive:',
+            f_perc(arc_stats['demand_hit_predictive_prefetch'],
+-                  arc_stats['predictive_prefetch']),
++                  arc_stats.get('predictive_prefetch', 0)),
+            f_hits(arc_stats['demand_hit_predictive_prefetch']))
+     prt_i2('Demand I/O hits after predictive:',
+-           f_perc(arc_stats['demand_iohit_predictive_prefetch'],
+-                  arc_stats['predictive_prefetch']),
+-           f_hits(arc_stats['demand_iohit_predictive_prefetch']))
+-    never = int(arc_stats['predictive_prefetch']) -\
++           f_perc(arc_stats.get('demand_iohit_predictive_prefetch', 0),
++                  arc_stats.get('predictive_prefetch', 0)),
++           f_hits(arc_stats.get('demand_iohit_predictive_prefetch', 0)))
++    never = int(arc_stats.get('predictive_prefetch', 0)) -\
+         int(arc_stats['demand_hit_predictive_prefetch']) -\
+-        int(arc_stats['demand_iohit_predictive_prefetch'])
++        int(arc_stats.get('demand_iohit_predictive_prefetch', 0))
+     prt_i2('Never demanded after predictive:',
+-           f_perc(never, arc_stats['predictive_prefetch']),
++           f_perc(never, arc_stats.get('predictive_prefetch', 0)),
+            f_hits(never))
+     print()
+ 
+     prt_2('ARC prescient prefetches:',
+-           f_perc(arc_stats['prescient_prefetch'], all_prefetches),
+-           f_hits(arc_stats['prescient_prefetch']))
++           f_perc(arc_stats.get('prescient_prefetch', 0), all_prefetches),
++           f_hits(arc_stats.get('prescient_prefetch', 0)))
+     prt_i2('Demand hits after prescient:',
+            f_perc(arc_stats['demand_hit_prescient_prefetch'],
+-                  arc_stats['prescient_prefetch']),
++                  arc_stats.get('prescient_prefetch', 0)),
+            f_hits(arc_stats['demand_hit_prescient_prefetch']))
+     prt_i2('Demand I/O hits after prescient:',
+-           f_perc(arc_stats['demand_iohit_prescient_prefetch'],
+-                  arc_stats['prescient_prefetch']),
+-           f_hits(arc_stats['demand_iohit_prescient_prefetch']))
+-    never = int(arc_stats['prescient_prefetch'])-\
++           f_perc(arc_stats.get('demand_iohit_prescient_prefetch', 0),
++                  arc_stats.get('prescient_prefetch', 0)),
++           f_hits(arc_stats.get('demand_iohit_prescient_prefetch', 0)))
++    never = int(arc_stats.get('prescient_prefetch', 0))-\
+         int(arc_stats['demand_hit_prescient_prefetch'])-\
+-        int(arc_stats['demand_iohit_prescient_prefetch'])
++        int(arc_stats.get('demand_iohit_prescient_prefetch', 0))
+     prt_i2('Never demanded after prescient:',
+-           f_perc(never, arc_stats['prescient_prefetch']),
++           f_perc(never, arc_stats.get('prescient_prefetch', 0)),
+            f_hits(never))
+     print()
+ 
+@@ -782,7 +782,7 @@ def section_archits(kstats_dict):
+                 arc_stats['mfu_ghost_hits']),
+                ('Most recently used (MRU) ghost:',
+                 arc_stats['mru_ghost_hits']),
+-               ('Uncached:', arc_stats['uncached_hits']))
++               ('Uncached:', arc_stats.get('uncached_hits', 0)))
+     for title, value in cl_todo:
+         prt_i2(title, f_perc(value, all_accesses), f_hits(value))
+     print()
+@@ -794,26 +794,26 @@ def section_dmu(kstats_dict):
+     zfetch_stats = isolate_section('zfetchstats', kstats_dict)
+ 
+     zfetch_access_total = int(zfetch_stats['hits']) +\
+-        int(zfetch_stats['future']) + int(zfetch_stats['stride']) +\
+-        int(zfetch_stats['past']) + int(zfetch_stats['misses'])
++        int(zfetch_stats.get('future', 0)) + int(zfetch_stats.get('stride', 0)) +\
++        int(zfetch_stats.get('past', 0)) + int(zfetch_stats['misses'])
+ 
+     prt_1('DMU predictive prefetcher calls:', f_hits(zfetch_access_total))
+     prt_i2('Stream hits:',
+            f_perc(zfetch_stats['hits'], zfetch_access_total),
+            f_hits(zfetch_stats['hits']))
+-    future = int(zfetch_stats['future']) + int(zfetch_stats['stride'])
++    future = int(zfetch_stats.get('future', 0)) + int(zfetch_stats.get('stride', 0))
+     prt_i2('Hits ahead of stream:', f_perc(future, zfetch_access_total),
+            f_hits(future))
+     prt_i2('Hits behind stream:',
+-           f_perc(zfetch_stats['past'], zfetch_access_total),
+-           f_hits(zfetch_stats['past']))
++           f_perc(zfetch_stats.get('past', 0), zfetch_access_total),
++           f_hits(zfetch_stats.get('past', 0)))
+     prt_i2('Stream misses:',
+            f_perc(zfetch_stats['misses'], zfetch_access_total),
+            f_hits(zfetch_stats['misses']))
+     prt_i2('Streams limit reached:',
+            f_perc(zfetch_stats['max_streams'], zfetch_stats['misses']),
+            f_hits(zfetch_stats['max_streams']))
+-    prt_i1('Stream strides:', f_hits(zfetch_stats['stride']))
++    prt_i1('Stream strides:', f_hits(zfetch_stats.get('stride', 0)))
+     prt_i1('Prefetches issued', f_hits(zfetch_stats['io_issued']))
+     print()
+ 
+@@ -860,20 +860,20 @@ def section_l2arc(kstats_dict):
+            f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
+            f_bytes(arc_stats['l2_hdr_size']))
+     prt_i2('MFU allocated size:',
+-           f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
+-           f_bytes(arc_stats['l2_mfu_asize']))
++           f_perc(arc_stats.get('l2_mfu_asize', 0), arc_stats['l2_asize']),
++           f_bytes(arc_stats.get('l2_mfu_asize', 0))) # 2.0 module compat
+     prt_i2('MRU allocated size:',
+-           f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
+-           f_bytes(arc_stats['l2_mru_asize']))
++           f_perc(arc_stats.get('l2_mru_asize', 0), arc_stats['l2_asize']),
++           f_bytes(arc_stats.get('l2_mru_asize', 0))) # 2.0 module compat
+     prt_i2('Prefetch allocated size:',
+-           f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
+-           f_bytes(arc_stats['l2_prefetch_asize']))
++           f_perc(arc_stats.get('l2_prefetch_asize', 0), arc_stats['l2_asize']),
++           f_bytes(arc_stats.get('l2_prefetch_asize',0))) # 2.0 module compat
+     prt_i2('Data (buffer content) allocated size:',
+-           f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
+-           f_bytes(arc_stats['l2_bufc_data_asize']))
++           f_perc(arc_stats.get('l2_bufc_data_asize', 0), arc_stats['l2_asize']),
++           f_bytes(arc_stats.get('l2_bufc_data_asize', 0))) # 2.0 module compat
+     prt_i2('Metadata (buffer content) allocated size:',
+-           f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
+-           f_bytes(arc_stats['l2_bufc_metadata_asize']))
++           f_perc(arc_stats.get('l2_bufc_metadata_asize', 0), arc_stats['l2_asize']),
++           f_bytes(arc_stats.get('l2_bufc_metadata_asize', 0))) # 2.0 module compat
+ 
+     print()
+     prt_1('L2ARC breakdown:', f_hits(l2_access_total))
+diff --git a/cmd/arcstat.in b/cmd/arcstat.in
+index c4f10a1d6..bf47ec90e 100755
+--- a/cmd/arcstat.in
++++ b/cmd/arcstat.in
+@@ -510,7 +510,7 @@ def calculate():
+     v = dict()
+     v["time"] = time.strftime("%H:%M:%S", time.localtime())
+     v["hits"] = d["hits"] // sint
+-    v["iohs"] = d["iohits"] // sint
++    v["iohs"] = d.get("iohits", 0) // sint
+     v["miss"] = d["misses"] // sint
+     v["read"] = v["hits"] + v["iohs"] + v["miss"]
+     v["hit%"] = 100 * v["hits"] // v["read"] if v["read"] > 0 else 0
+@@ -518,7 +518,7 @@ def calculate():
+     v["miss%"] = 100 - v["hit%"] - v["ioh%"] if v["read"] > 0 else 0
+ 
+     v["dhit"] = (d["demand_data_hits"] + d["demand_metadata_hits"]) // sint
+-    v["dioh"] = (d["demand_data_iohits"] + d["demand_metadata_iohits"]) // sint
++    v["dioh"] = (d.get("demand_data_iohits", 0) + d.get("demand_metadata_iohits", 0)) // sint
+     v["dmis"] = (d["demand_data_misses"] + d["demand_metadata_misses"]) // sint
+ 
+     v["dread"] = v["dhit"] + v["dioh"] + v["dmis"]
+@@ -527,7 +527,7 @@ def calculate():
+     v["dm%"] = 100 - v["dh%"] - v["di%"] if v["dread"] > 0 else 0
+ 
+     v["ddhit"] = d["demand_data_hits"] // sint
+-    v["ddioh"] = d["demand_data_iohits"] // sint
++    v["ddioh"] = d.get("demand_data_iohits", 0) // sint
+     v["ddmis"] = d["demand_data_misses"] // sint
+ 
+     v["ddread"] = v["ddhit"] + v["ddioh"] + v["ddmis"]
+@@ -536,7 +536,7 @@ def calculate():
+     v["ddm%"] = 100 - v["ddh%"] - v["ddi%"] if v["ddread"] > 0 else 0
+ 
+     v["dmhit"] = d["demand_metadata_hits"] // sint
+-    v["dmioh"] = d["demand_metadata_iohits"] // sint
++    v["dmioh"] = d.get("demand_metadata_iohits", 0) // sint
+     v["dmmis"] = d["demand_metadata_misses"] // sint
+ 
+     v["dmread"] = v["dmhit"] + v["dmioh"] + v["dmmis"]
+@@ -545,8 +545,8 @@ def calculate():
+     v["dmm%"] = 100 - v["dmh%"] - v["dmi%"] if v["dmread"] > 0 else 0
+ 
+     v["phit"] = (d["prefetch_data_hits"] + d["prefetch_metadata_hits"]) // sint
+-    v["pioh"] = (d["prefetch_data_iohits"] +
+-                 d["prefetch_metadata_iohits"]) // sint
++    v["pioh"] = (d.get("prefetch_data_iohits", 0) +
++                 d.get("prefetch_metadata_iohits", 0)) // sint
+     v["pmis"] = (d["prefetch_data_misses"] +
+                  d["prefetch_metadata_misses"]) // sint
+ 
+@@ -556,7 +556,7 @@ def calculate():
+     v["pm%"] = 100 - v["ph%"] - v["pi%"] if v["pread"] > 0 else 0
+ 
+     v["pdhit"] = d["prefetch_data_hits"] // sint
+-    v["pdioh"] = d["prefetch_data_iohits"] // sint
++    v["pdioh"] = d.get("prefetch_data_iohits", 0) // sint
+     v["pdmis"] = d["prefetch_data_misses"] // sint
+ 
+     v["pdread"] = v["pdhit"] + v["pdioh"] + v["pdmis"]
+@@ -565,7 +565,7 @@ def calculate():
+     v["pdm%"] = 100 - v["pdh%"] - v["pdi%"] if v["pdread"] > 0 else 0
+ 
+     v["pmhit"] = d["prefetch_metadata_hits"] // sint
+-    v["pmioh"] = d["prefetch_metadata_iohits"] // sint
++    v["pmioh"] = d.get("prefetch_metadata_iohits", 0) // sint
+     v["pmmis"] = d["prefetch_metadata_misses"] // sint
+ 
+     v["pmread"] = v["pmhit"] + v["pmioh"] + v["pmmis"]
+@@ -575,8 +575,8 @@ def calculate():
+ 
+     v["mhit"] = (d["prefetch_metadata_hits"] +
+                  d["demand_metadata_hits"]) // sint
+-    v["mioh"] = (d["prefetch_metadata_iohits"] +
+-                 d["demand_metadata_iohits"]) // sint
++    v["mioh"] = (d.get("prefetch_metadata_iohits", 0) +
++                 d.get("demand_metadata_iohits", 0)) // sint
+     v["mmis"] = (d["prefetch_metadata_misses"] +
+                  d["demand_metadata_misses"]) // sint
+ 
+@@ -592,24 +592,24 @@ def calculate():
+     v["mru"] = d["mru_hits"] // sint
+     v["mrug"] = d["mru_ghost_hits"] // sint
+     v["mfug"] = d["mfu_ghost_hits"] // sint
+-    v["unc"] = d["uncached_hits"] // sint
++    v["unc"] = d.get("uncached_hits", 0) // sint
+     v["eskip"] = d["evict_skip"] // sint
+     v["el2skip"] = d["evict_l2_skip"] // sint
+     v["el2cach"] = d["evict_l2_cached"] // sint
+     v["el2el"] = d["evict_l2_eligible"] // sint
+-    v["el2mfu"] = d["evict_l2_eligible_mfu"] // sint
+-    v["el2mru"] = d["evict_l2_eligible_mru"] // sint
++    v["el2mfu"] = d.get("evict_l2_eligible_mfu", 0) // sint
++    v["el2mru"] = d.get("evict_l2_eligible_mru", 0) // sint
+     v["el2inel"] = d["evict_l2_ineligible"] // sint
+     v["mtxmis"] = d["mutex_miss"] // sint
+-    v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
+-                   d["zfetch_past"] + d["zfetch_misses"]) // sint
++    v["ztotal"] = (d["zfetch_hits"] + d.get("zfetch_future", 0) + d.get("zfetch_stride", 0) +
++                   d.get("zfetch_past", 0) + d["zfetch_misses"]) // sint
+     v["zhits"] = d["zfetch_hits"] // sint
+-    v["zahead"] = (d["zfetch_future"] + d["zfetch_stride"]) // sint
+-    v["zpast"] = d["zfetch_past"] // sint
++    v["zahead"] = (d.get("zfetch_future", 0) + d.get("zfetch_stride", 0)) // sint
++    v["zpast"] = d.get("zfetch_past", 0) // sint
+     v["zmisses"] = d["zfetch_misses"] // sint
+     v["zmax"] = d["zfetch_max_streams"] // sint
+-    v["zfuture"] = d["zfetch_future"] // sint
+-    v["zstride"] = d["zfetch_stride"] // sint
++    v["zfuture"] = d.get("zfetch_future", 0) // sint
++    v["zstride"] = d.get("zfetch_stride", 0) // sint
+     v["zissued"] = d["zfetch_io_issued"] // sint
+     v["zactive"] = d["zfetch_io_active"] // sint
+ 
+@@ -624,11 +624,11 @@ def calculate():
+         v["l2size"] = cur["l2_size"]
+         v["l2bytes"] = d["l2_read_bytes"] // sint
+ 
+-        v["l2pref"] = cur["l2_prefetch_asize"]
+-        v["l2mfu"] = cur["l2_mfu_asize"]
+-        v["l2mru"] = cur["l2_mru_asize"]
+-        v["l2data"] = cur["l2_bufc_data_asize"]
+-        v["l2meta"] = cur["l2_bufc_metadata_asize"]
++        v["l2pref"] = cur.get("l2_prefetch_asize", 0)
++        v["l2mfu"] = cur.get("l2_mfu_asize", 0)
++        v["l2mru"] = cur.get("l2_mru_asize", 0)
++        v["l2data"] = cur.get("l2_bufc_data_asize", 0)
++        v["l2meta"] = cur.get("l2_bufc_metadata_asize", 0)
+         v["l2pref%"] = 100 * v["l2pref"] // v["l2asize"]
+         v["l2mfu%"] = 100 * v["l2mfu"] // v["l2asize"]
+         v["l2mru%"] = 100 * v["l2mru"] // v["l2asize"]
diff --git a/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch b/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
deleted file mode 100644
index 2e7c207d..00000000
--- a/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
+++ /dev/null
@@ -1,113 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Thomas Lamprecht <t.lamprecht@proxmox.com>
-Date: Wed, 10 Nov 2021 09:29:47 +0100
-Subject: [PATCH] arc stat/summary: guard access to l2arc MFU/MRU stats
-
-commit 085321621e79a75bea41c2b6511da6ebfbf2ba0a added printing MFU
-and MRU stats for 2.1 user space tools, but those keys are not
-available in the 2.0 module. That means it may break the arcstat and
-arc_summary tools after upgrade to 2.1 (user space), before a reboot
-to the new 2.1 ZFS kernel-module happened, due to python raising a
-KeyError on the dict access then.
-
-Move those two keys to a .get accessor with `0` as fallback, as it
-should be better to show some possible wrong data for new stat-keys
-than throwing an exception.
-
-Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
-
-also move l2_mfu_asize  l2_mru_asize l2_prefetch_asize
-l2_bufc_data_asize l2_bufc_metadata_asize to .get accessor
-(these are only present with a cache device in the pool)
-Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
-Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
----
- cmd/arc_summary | 28 ++++++++++++++--------------
- cmd/arcstat.in  | 14 +++++++-------
- 2 files changed, 21 insertions(+), 21 deletions(-)
-
-diff --git a/cmd/arc_summary b/cmd/arc_summary
-index 100fb1987..86b2260a1 100755
---- a/cmd/arc_summary
-+++ b/cmd/arc_summary
-@@ -655,13 +655,13 @@ def section_arc(kstats_dict):
-     prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
-     prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
-     prt_i2('L2 eligible MFU evictions:',
--           f_perc(arc_stats['evict_l2_eligible_mfu'],
-+           f_perc(arc_stats.get('evict_l2_eligible_mfu', 0), # 2.0 module compat
-            arc_stats['evict_l2_eligible']),
--           f_bytes(arc_stats['evict_l2_eligible_mfu']))
-+           f_bytes(arc_stats.get('evict_l2_eligible_mfu', 0)))
-     prt_i2('L2 eligible MRU evictions:',
--           f_perc(arc_stats['evict_l2_eligible_mru'],
-+           f_perc(arc_stats.get('evict_l2_eligible_mru', 0), # 2.0 module compat
-            arc_stats['evict_l2_eligible']),
--           f_bytes(arc_stats['evict_l2_eligible_mru']))
-+           f_bytes(arc_stats.get('evict_l2_eligible_mru', 0)))
-     prt_i1('L2 ineligible evictions:',
-            f_bytes(arc_stats['evict_l2_ineligible']))
-     print()
-@@ -860,20 +860,20 @@ def section_l2arc(kstats_dict):
-            f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
-            f_bytes(arc_stats['l2_hdr_size']))
-     prt_i2('MFU allocated size:',
--           f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
--           f_bytes(arc_stats['l2_mfu_asize']))
-+           f_perc(arc_stats.get('l2_mfu_asize', 0), arc_stats['l2_asize']),
-+           f_bytes(arc_stats.get('l2_mfu_asize', 0))) # 2.0 module compat
-     prt_i2('MRU allocated size:',
--           f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
--           f_bytes(arc_stats['l2_mru_asize']))
-+           f_perc(arc_stats.get('l2_mru_asize', 0), arc_stats['l2_asize']),
-+           f_bytes(arc_stats.get('l2_mru_asize', 0))) # 2.0 module compat
-     prt_i2('Prefetch allocated size:',
--           f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
--           f_bytes(arc_stats['l2_prefetch_asize']))
-+           f_perc(arc_stats.get('l2_prefetch_asize', 0), arc_stats['l2_asize']),
-+           f_bytes(arc_stats.get('l2_prefetch_asize',0))) # 2.0 module compat
-     prt_i2('Data (buffer content) allocated size:',
--           f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
--           f_bytes(arc_stats['l2_bufc_data_asize']))
-+           f_perc(arc_stats.get('l2_bufc_data_asize', 0), arc_stats['l2_asize']),
-+           f_bytes(arc_stats.get('l2_bufc_data_asize', 0))) # 2.0 module compat
-     prt_i2('Metadata (buffer content) allocated size:',
--           f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
--           f_bytes(arc_stats['l2_bufc_metadata_asize']))
-+           f_perc(arc_stats.get('l2_bufc_metadata_asize', 0), arc_stats['l2_asize']),
-+           f_bytes(arc_stats.get('l2_bufc_metadata_asize', 0))) # 2.0 module compat
- 
-     print()
-     prt_1('L2ARC breakdown:', f_hits(l2_access_total))
-diff --git a/cmd/arcstat.in b/cmd/arcstat.in
-index c4f10a1d6..c570dca88 100755
---- a/cmd/arcstat.in
-+++ b/cmd/arcstat.in
-@@ -597,8 +597,8 @@ def calculate():
-     v["el2skip"] = d["evict_l2_skip"] // sint
-     v["el2cach"] = d["evict_l2_cached"] // sint
-     v["el2el"] = d["evict_l2_eligible"] // sint
--    v["el2mfu"] = d["evict_l2_eligible_mfu"] // sint
--    v["el2mru"] = d["evict_l2_eligible_mru"] // sint
-+    v["el2mfu"] = d.get("evict_l2_eligible_mfu", 0) // sint
-+    v["el2mru"] = d.get("evict_l2_eligible_mru", 0) // sint
-     v["el2inel"] = d["evict_l2_ineligible"] // sint
-     v["mtxmis"] = d["mutex_miss"] // sint
-     v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
-@@ -624,11 +624,11 @@ def calculate():
-         v["l2size"] = cur["l2_size"]
-         v["l2bytes"] = d["l2_read_bytes"] // sint
- 
--        v["l2pref"] = cur["l2_prefetch_asize"]
--        v["l2mfu"] = cur["l2_mfu_asize"]
--        v["l2mru"] = cur["l2_mru_asize"]
--        v["l2data"] = cur["l2_bufc_data_asize"]
--        v["l2meta"] = cur["l2_bufc_metadata_asize"]
-+        v["l2pref"] = cur.get("l2_prefetch_asize", 0)
-+        v["l2mfu"] = cur.get("l2_mfu_asize", 0)
-+        v["l2mru"] = cur.get("l2_mru_asize", 0)
-+        v["l2data"] = cur.get("l2_bufc_data_asize", 0)
-+        v["l2meta"] = cur.get("l2_bufc_metadata_asize", 0)
-         v["l2pref%"] = 100 * v["l2pref"] // v["l2asize"]
-         v["l2mfu%"] = 100 * v["l2mfu"] // v["l2asize"]
-         v["l2mru%"] = 100 * v["l2mru"] // v["l2asize"]
diff --git a/debian/patches/series b/debian/patches/series
index 35f81d13..229027ff 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -6,6 +6,6 @@
 0006-dont-symlink-zed-scripts.patch
 0007-Add-systemd-unit-for-importing-specific-pools.patch
 0008-Patch-move-manpage-arcstat-1-to-arcstat-8.patch
-0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
+0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
 0010-Fix-nfs_truncate_shares-without-etc-exports.d.patch
 0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4
  2024-05-07 15:02 [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Stoiko Ivanov
  2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 1/2] update zfs submodule to 2.2.4 and refresh patches Stoiko Ivanov
  2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 2/2] update arc_summary arcstat patch with new introduced values Stoiko Ivanov
@ 2024-05-21 13:31 ` Max Carrara
  2024-05-21 14:06 ` [pve-devel] applied-series: " Thomas Lamprecht
  3 siblings, 0 replies; 7+ messages in thread
From: Max Carrara @ 2024-05-21 13:31 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Tue May 7, 2024 at 5:02 PM CEST, Stoiko Ivanov wrote:
> v1->v2:
> Patch 2/2 (adaptation of arc_summary/arcstat patch) modified:
> * right after sending the v1 I saw a report where pinning kernel 6.2 (thus
>   ZFS 2.1) leads to a similar traceback - which I seem to have overlooked
>   when packaging 2.2.0 ...
>   adapted the patch by booting a VM with kernel 6.2 and the current
>   userspace and running arc_summary /arcstat -a until no traceback was
>   displayed with a single-disk pool.

Testing
-------

* Built and installed ZFS with those two patches on my test VM
  - Note: Couldn't install zfs-initramfs and zfs-dracut due to some
    dependency issue
  - zfs-initramfs depends on initramfs-tools, but complained it wasn't
    available (even though the package is installed ...)
  - zfs-dracut did the same for dracut
  - initramfs-tools then conflicts with the virtual linux-initramfs-tool
  - Removing zfs-initramfs from the packages to be installed "fixed"
    this; all other packages then installed without any issue

* `arcstat -a` and `arc_summary` correctly displayed the new values
  while old kernel was still running

* Didn't encounter any exceptions

* VM also survived a reboot - same results for new kernel

* Didn't notice anything off overall while the VM was running - will
  holler if I find anything

Review
------

Looked specifically at patch 02; applied and diffed it on the upstream
ZFS sources checked out at tag `zfs-2.2.4`. What can I say, it's just
replacing calls to `obj.__getitem__()` with `obj.get('foo', 0)` - so,
pretty straightforward. (The original code could use a brush-up, but
that's beside the point.)

Summary
-------

All in all, LGTM - haven't really looked at patch 01 in detail, so I'll
add my R-b tag only to patch 02. Good work!


>
> original cover-letter for v1:
> This patchset updates ZFS to the recently released 2.2.4
>
> We had about half of the patches already in 2.2.3-2, due to the needed
> support for kernel 6.8.
>
> Compared to the last 2.2 point releases this one compares quite a few
> potential performance improvments:
> * for ZVOL workloads (relevant for qemu guests) multiple taskq were
>   introduced [1] - this change is active by default (can be put back to
>   the old behavior with explicitly setting `zvol_num_taskqs=1`
> * the interface for ZFS submitting operations to the kernel's block layer
>   was augmented to better deal with split-pages [2] - which should also
>   improve performance, and prevent unaligned writes which are rejected by
>   e.g. the SCSI subsystem. - The default remains with the current code
>   (`zfs_vdev_disk_classic=0` turns on the 'new' behavior...)
> * Speculative prefetching was improved [3], which introduced not kstats,
>   which are reported by`arc_summary` and `arcstat`, as before with the
>   MRU/MFU additions there was not guard for running the new user-space
>   with an old kernel resulting in Python exceptions of both tools.
>   I adapted the patch where Thomas fixed that back in the 2.1 release
>   times. - sending as separate patch for easier review - and I hope it's
>   ok that I dropped the S-o-b tag (as it's changed code) - glad to resend
>   it, if this should be adapted.
>
> Minimally tested on 2 VMs (the arcstat/arc_summary changes by running with
> an old kernel and new user-space)
>
>
> [0] https://github.com/openzfs/zfs/releases/tag/zfs-2.2.4
> [1] https://github.com/openzfs/zfs/pull/15992
> [2] https://github.com/openzfs/zfs/pull/15588
> [3] https://github.com/openzfs/zfs/pull/16022
>
> Stoiko Ivanov (2):
>   update zfs submodule to 2.2.4 and refresh patches
>   update arc_summary arcstat patch with new introduced values
>
>  ...md-unit-for-importing-specific-pools.patch |   4 +-
>  ...-move-manpage-arcstat-1-to-arcstat-8.patch |   2 +-
>  ...-guard-access-to-freshly-introduced-.patch | 438 ++++++++++++
>  ...-guard-access-to-l2arc-MFU-MRU-stats.patch | 113 ---
>  ...hten-bounds-for-noalloc-stat-availab.patch |   4 +-
>  ...rectly-handle-partition-16-and-later.patch |  52 --
>  ...-use-splice_copy_file_range-for-fall.patch | 135 ----
>  .../0014-linux-5.4-compat-page_size.patch     | 121 ----
>  .../patches/0015-abd-add-page-iterator.patch  | 334 ---------
>  ...-existing-functions-to-vdev_classic_.patch | 349 ---------
>  ...v_disk-reorganise-vdev_disk_io_start.patch | 111 ---
>  ...-read-write-IO-function-configurable.patch |  69 --
>  ...e-BIO-filling-machinery-to-avoid-spl.patch | 671 ------------------
>  ...dule-parameter-to-select-BIO-submiss.patch | 104 ---
>  ...se-bio_chain-to-submit-multiple-BIOs.patch | 363 ----------
>  ...on-t-use-compound-heads-on-Linux-4.5.patch |  96 ---
>  ...ault-to-classic-submission-for-2.2.x.patch |  90 ---
>  ...ion-caused-by-mmap-flushing-problems.patch | 104 ---
>  ...touch-vbio-after-its-handed-off-to-t.patch |  57 --
>  debian/patches/series                         |  16 +-
>  upstream                                      |   2 +-
>  21 files changed, 445 insertions(+), 2790 deletions(-)
>  create mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
>  delete mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
>  delete mode 100644 debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch
>  delete mode 100644 debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
>  delete mode 100644 debian/patches/0014-linux-5.4-compat-page_size.patch
>  delete mode 100644 debian/patches/0015-abd-add-page-iterator.patch
>  delete mode 100644 debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
>  delete mode 100644 debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch
>  delete mode 100644 debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch
>  delete mode 100644 debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
>  delete mode 100644 debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
>  delete mode 100644 debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
>  delete mode 100644 debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
>  delete mode 100644 debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
>  delete mode 100644 debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
>  delete mode 100644 debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pve-devel] [PATCH zfsonlinux v2 2/2] update arc_summary arcstat patch with new introduced values
  2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 2/2] update arc_summary arcstat patch with new introduced values Stoiko Ivanov
@ 2024-05-21 13:32   ` Max Carrara
  0 siblings, 0 replies; 7+ messages in thread
From: Max Carrara @ 2024-05-21 13:32 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Tue May 7, 2024 at 5:02 PM CEST, Stoiko Ivanov wrote:
> ZFS 2.2.4 added new kstats for speculative prefetch in:
> 026fe796465e3da7b27d06ef5338634ee6dd30d8
>
> Adapt our patch introduced with ZFS 2.1 (for the then added MFU/MRU
> stats), to also deal with the now introduced values not being present
> (because an old kernel-module does not offer them).
>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---

See my reply to the cover letter.

Reviewed-by: Max Carrara <m.carrara@proxmox.com>
Tested-by: Max Carrara <m.carrara@proxmox.com>

>  ...-guard-access-to-freshly-introduced-.patch | 438 ++++++++++++++++++
>  ...-guard-access-to-l2arc-MFU-MRU-stats.patch | 113 -----
>  debian/patches/series                         |   2 +-
>  3 files changed, 439 insertions(+), 114 deletions(-)
>  create mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
>  delete mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
>
> diff --git a/debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch b/debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
> new file mode 100644
> index 00000000..bc7db2a9
> --- /dev/null
> +++ b/debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
> @@ -0,0 +1,438 @@
> +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> +From: Thomas Lamprecht <t.lamprecht@proxmox.com>
> +Date: Wed, 10 Nov 2021 09:29:47 +0100
> +Subject: [PATCH] arc stat/summary: guard access to freshly introduced stats
> +
> +l2arc MFU/MRU and zfetch past future and stride stats were introduced
> +in 2.1 and 2.2.4 respectively:
> +
> +commit 085321621e79a75bea41c2b6511da6ebfbf2ba0a added printing MFU
> +and MRU stats for 2.1 user space tools, but those keys are not
> +available in the 2.0 module. That means it may break the arcstat and
> +arc_summary tools after upgrade to 2.1 (user space), before a reboot
> +to the new 2.1 ZFS kernel-module happened, due to python raising a
> +KeyError on the dict access then.
> +
> +Move those two keys to a .get accessor with `0` as fallback, as it
> +should be better to show some possible wrong data for new stat-keys
> +than throwing an exception.
> +
> +also move l2_mfu_asize  l2_mru_asize l2_prefetch_asize
> +l2_bufc_data_asize l2_bufc_metadata_asize to .get accessor
> +(these are only present with a cache device in the pool)
> +
> +guard access to iohits and uncached state introduced in
> +792a6ee462efc15a7614f27e13f0f8aaa9414a08
> +
> +guard access to zfetch past future stride stats introduced in
> +026fe796465e3da7b27d06ef5338634ee6dd30d8
> +
> +These are present in the current kernel, but lead to an exception, if
> +running the new user-space with an old kernel module.
> +
> +Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> +---
> + cmd/arc_summary | 132 ++++++++++++++++++++++++------------------------
> + cmd/arcstat.in  |  48 +++++++++---------
> + 2 files changed, 90 insertions(+), 90 deletions(-)
> +
> +diff --git a/cmd/arc_summary b/cmd/arc_summary
> +index 100fb1987..30f5d23e9 100755
> +--- a/cmd/arc_summary
> ++++ b/cmd/arc_summary
> +@@ -551,21 +551,21 @@ def section_arc(kstats_dict):
> +     arc_target_size = arc_stats['c']
> +     arc_max = arc_stats['c_max']
> +     arc_min = arc_stats['c_min']
> +-    meta = arc_stats['meta']
> +-    pd = arc_stats['pd']
> +-    pm = arc_stats['pm']
> +-    anon_data = arc_stats['anon_data']
> +-    anon_metadata = arc_stats['anon_metadata']
> +-    mfu_data = arc_stats['mfu_data']
> +-    mfu_metadata = arc_stats['mfu_metadata']
> +-    mru_data = arc_stats['mru_data']
> +-    mru_metadata = arc_stats['mru_metadata']
> +-    mfug_data = arc_stats['mfu_ghost_data']
> +-    mfug_metadata = arc_stats['mfu_ghost_metadata']
> +-    mrug_data = arc_stats['mru_ghost_data']
> +-    mrug_metadata = arc_stats['mru_ghost_metadata']
> +-    unc_data = arc_stats['uncached_data']
> +-    unc_metadata = arc_stats['uncached_metadata']
> ++    meta = arc_stats.get('meta', 0)
> ++    pd = arc_stats.get('pd', 0)
> ++    pm = arc_stats.get('pm', 0)
> ++    anon_data = arc_stats.get('anon_data', 0)
> ++    anon_metadata = arc_stats.get('anon_metadata', 0)
> ++    mfu_data = arc_stats.get('mfu_data', 0)
> ++    mfu_metadata = arc_stats.get('mfu_metadata', 0)
> ++    mru_data = arc_stats.get('mru_data', 0)
> ++    mru_metadata = arc_stats.get('mru_metadata', 0)
> ++    mfug_data = arc_stats.get('mfu_ghost_data', 0)
> ++    mfug_metadata = arc_stats.get('mfu_ghost_metadata', 0)
> ++    mrug_data = arc_stats.get('mru_ghost_data', 0)
> ++    mrug_metadata = arc_stats.get('mru_ghost_metadata', 0)
> ++    unc_data = arc_stats.get('uncached_data', 0)
> ++    unc_metadata = arc_stats.get('uncached_metadata', 0)
> +     bonus_size = arc_stats['bonus_size']
> +     dnode_limit = arc_stats['arc_dnode_limit']
> +     dnode_size = arc_stats['dnode_size']
> +@@ -655,13 +655,13 @@ def section_arc(kstats_dict):
> +     prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
> +     prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
> +     prt_i2('L2 eligible MFU evictions:',
> +-           f_perc(arc_stats['evict_l2_eligible_mfu'],
> ++           f_perc(arc_stats.get('evict_l2_eligible_mfu', 0), # 2.0 module compat
> +            arc_stats['evict_l2_eligible']),
> +-           f_bytes(arc_stats['evict_l2_eligible_mfu']))
> ++           f_bytes(arc_stats.get('evict_l2_eligible_mfu', 0)))
> +     prt_i2('L2 eligible MRU evictions:',
> +-           f_perc(arc_stats['evict_l2_eligible_mru'],
> ++           f_perc(arc_stats.get('evict_l2_eligible_mru', 0), # 2.0 module compat
> +            arc_stats['evict_l2_eligible']),
> +-           f_bytes(arc_stats['evict_l2_eligible_mru']))
> ++           f_bytes(arc_stats.get('evict_l2_eligible_mru', 0)))
> +     prt_i1('L2 ineligible evictions:',
> +            f_bytes(arc_stats['evict_l2_ineligible']))
> +     print()
> +@@ -672,106 +672,106 @@ def section_archits(kstats_dict):
> +     """
> + 
> +     arc_stats = isolate_section('arcstats', kstats_dict)
> +-    all_accesses = int(arc_stats['hits'])+int(arc_stats['iohits'])+\
> ++    all_accesses = int(arc_stats['hits'])+int(arc_stats.get('iohits', 0))+\
> +         int(arc_stats['misses'])
> + 
> +     prt_1('ARC total accesses:', f_hits(all_accesses))
> +     ta_todo = (('Total hits:', arc_stats['hits']),
> +-               ('Total I/O hits:', arc_stats['iohits']),
> ++               ('Total I/O hits:', arc_stats.get('iohits', 0)),
> +                ('Total misses:', arc_stats['misses']))
> +     for title, value in ta_todo:
> +         prt_i2(title, f_perc(value, all_accesses), f_hits(value))
> +     print()
> + 
> +     dd_total = int(arc_stats['demand_data_hits']) +\
> +-        int(arc_stats['demand_data_iohits']) +\
> ++        int(arc_stats.get('demand_data_iohits', 0)) +\
> +         int(arc_stats['demand_data_misses'])
> +     prt_2('ARC demand data accesses:', f_perc(dd_total, all_accesses),
> +          f_hits(dd_total))
> +     dd_todo = (('Demand data hits:', arc_stats['demand_data_hits']),
> +-               ('Demand data I/O hits:', arc_stats['demand_data_iohits']),
> ++               ('Demand data I/O hits:', arc_stats.get('demand_data_iohits', 0)),
> +                ('Demand data misses:', arc_stats['demand_data_misses']))
> +     for title, value in dd_todo:
> +         prt_i2(title, f_perc(value, dd_total), f_hits(value))
> +     print()
> + 
> +     dm_total = int(arc_stats['demand_metadata_hits']) +\
> +-        int(arc_stats['demand_metadata_iohits']) +\
> ++        int(arc_stats.get('demand_metadata_iohits', 0)) +\
> +         int(arc_stats['demand_metadata_misses'])
> +     prt_2('ARC demand metadata accesses:', f_perc(dm_total, all_accesses),
> +           f_hits(dm_total))
> +     dm_todo = (('Demand metadata hits:', arc_stats['demand_metadata_hits']),
> +                ('Demand metadata I/O hits:',
> +-                arc_stats['demand_metadata_iohits']),
> ++                arc_stats.get('demand_metadata_iohits', 0)),
> +                ('Demand metadata misses:', arc_stats['demand_metadata_misses']))
> +     for title, value in dm_todo:
> +         prt_i2(title, f_perc(value, dm_total), f_hits(value))
> +     print()
> + 
> +     pd_total = int(arc_stats['prefetch_data_hits']) +\
> +-        int(arc_stats['prefetch_data_iohits']) +\
> ++        int(arc_stats.get('prefetch_data_iohits', 0)) +\
> +         int(arc_stats['prefetch_data_misses'])
> +     prt_2('ARC prefetch data accesses:', f_perc(pd_total, all_accesses),
> +           f_hits(pd_total))
> +     pd_todo = (('Prefetch data hits:', arc_stats['prefetch_data_hits']),
> +-               ('Prefetch data I/O hits:', arc_stats['prefetch_data_iohits']),
> ++               ('Prefetch data I/O hits:', arc_stats.get('prefetch_data_iohits', 0)),
> +                ('Prefetch data misses:', arc_stats['prefetch_data_misses']))
> +     for title, value in pd_todo:
> +         prt_i2(title, f_perc(value, pd_total), f_hits(value))
> +     print()
> + 
> +     pm_total = int(arc_stats['prefetch_metadata_hits']) +\
> +-        int(arc_stats['prefetch_metadata_iohits']) +\
> ++        int(arc_stats.get('prefetch_metadata_iohits', 0)) +\
> +         int(arc_stats['prefetch_metadata_misses'])
> +     prt_2('ARC prefetch metadata accesses:', f_perc(pm_total, all_accesses),
> +           f_hits(pm_total))
> +     pm_todo = (('Prefetch metadata hits:',
> +                 arc_stats['prefetch_metadata_hits']),
> +                ('Prefetch metadata I/O hits:',
> +-                arc_stats['prefetch_metadata_iohits']),
> ++                arc_stats.get('prefetch_metadata_iohits', 0)),
> +                ('Prefetch metadata misses:',
> +                 arc_stats['prefetch_metadata_misses']))
> +     for title, value in pm_todo:
> +         prt_i2(title, f_perc(value, pm_total), f_hits(value))
> +     print()
> + 
> +-    all_prefetches = int(arc_stats['predictive_prefetch'])+\
> +-        int(arc_stats['prescient_prefetch'])
> ++    all_prefetches = int(arc_stats.get('predictive_prefetch', 0))+\
> ++        int(arc_stats.get('prescient_prefetch', 0))
> +     prt_2('ARC predictive prefetches:',
> +-           f_perc(arc_stats['predictive_prefetch'], all_prefetches),
> +-           f_hits(arc_stats['predictive_prefetch']))
> ++           f_perc(arc_stats.get('predictive_prefetch', 0), all_prefetches),
> ++           f_hits(arc_stats.get('predictive_prefetch', 0)))
> +     prt_i2('Demand hits after predictive:',
> +            f_perc(arc_stats['demand_hit_predictive_prefetch'],
> +-                  arc_stats['predictive_prefetch']),
> ++                  arc_stats.get('predictive_prefetch', 0)),
> +            f_hits(arc_stats['demand_hit_predictive_prefetch']))
> +     prt_i2('Demand I/O hits after predictive:',
> +-           f_perc(arc_stats['demand_iohit_predictive_prefetch'],
> +-                  arc_stats['predictive_prefetch']),
> +-           f_hits(arc_stats['demand_iohit_predictive_prefetch']))
> +-    never = int(arc_stats['predictive_prefetch']) -\
> ++           f_perc(arc_stats.get('demand_iohit_predictive_prefetch', 0),
> ++                  arc_stats.get('predictive_prefetch', 0)),
> ++           f_hits(arc_stats.get('demand_iohit_predictive_prefetch', 0)))
> ++    never = int(arc_stats.get('predictive_prefetch', 0)) -\
> +         int(arc_stats['demand_hit_predictive_prefetch']) -\
> +-        int(arc_stats['demand_iohit_predictive_prefetch'])
> ++        int(arc_stats.get('demand_iohit_predictive_prefetch', 0))
> +     prt_i2('Never demanded after predictive:',
> +-           f_perc(never, arc_stats['predictive_prefetch']),
> ++           f_perc(never, arc_stats.get('predictive_prefetch', 0)),
> +            f_hits(never))
> +     print()
> + 
> +     prt_2('ARC prescient prefetches:',
> +-           f_perc(arc_stats['prescient_prefetch'], all_prefetches),
> +-           f_hits(arc_stats['prescient_prefetch']))
> ++           f_perc(arc_stats.get('prescient_prefetch', 0), all_prefetches),
> ++           f_hits(arc_stats.get('prescient_prefetch', 0)))
> +     prt_i2('Demand hits after prescient:',
> +            f_perc(arc_stats['demand_hit_prescient_prefetch'],
> +-                  arc_stats['prescient_prefetch']),
> ++                  arc_stats.get('prescient_prefetch', 0)),
> +            f_hits(arc_stats['demand_hit_prescient_prefetch']))
> +     prt_i2('Demand I/O hits after prescient:',
> +-           f_perc(arc_stats['demand_iohit_prescient_prefetch'],
> +-                  arc_stats['prescient_prefetch']),
> +-           f_hits(arc_stats['demand_iohit_prescient_prefetch']))
> +-    never = int(arc_stats['prescient_prefetch'])-\
> ++           f_perc(arc_stats.get('demand_iohit_prescient_prefetch', 0),
> ++                  arc_stats.get('prescient_prefetch', 0)),
> ++           f_hits(arc_stats.get('demand_iohit_prescient_prefetch', 0)))
> ++    never = int(arc_stats.get('prescient_prefetch', 0))-\
> +         int(arc_stats['demand_hit_prescient_prefetch'])-\
> +-        int(arc_stats['demand_iohit_prescient_prefetch'])
> ++        int(arc_stats.get('demand_iohit_prescient_prefetch', 0))
> +     prt_i2('Never demanded after prescient:',
> +-           f_perc(never, arc_stats['prescient_prefetch']),
> ++           f_perc(never, arc_stats.get('prescient_prefetch', 0)),
> +            f_hits(never))
> +     print()
> + 
> +@@ -782,7 +782,7 @@ def section_archits(kstats_dict):
> +                 arc_stats['mfu_ghost_hits']),
> +                ('Most recently used (MRU) ghost:',
> +                 arc_stats['mru_ghost_hits']),
> +-               ('Uncached:', arc_stats['uncached_hits']))
> ++               ('Uncached:', arc_stats.get('uncached_hits', 0)))
> +     for title, value in cl_todo:
> +         prt_i2(title, f_perc(value, all_accesses), f_hits(value))
> +     print()
> +@@ -794,26 +794,26 @@ def section_dmu(kstats_dict):
> +     zfetch_stats = isolate_section('zfetchstats', kstats_dict)
> + 
> +     zfetch_access_total = int(zfetch_stats['hits']) +\
> +-        int(zfetch_stats['future']) + int(zfetch_stats['stride']) +\
> +-        int(zfetch_stats['past']) + int(zfetch_stats['misses'])
> ++        int(zfetch_stats.get('future', 0)) + int(zfetch_stats.get('stride', 0)) +\
> ++        int(zfetch_stats.get('past', 0)) + int(zfetch_stats['misses'])
> + 
> +     prt_1('DMU predictive prefetcher calls:', f_hits(zfetch_access_total))
> +     prt_i2('Stream hits:',
> +            f_perc(zfetch_stats['hits'], zfetch_access_total),
> +            f_hits(zfetch_stats['hits']))
> +-    future = int(zfetch_stats['future']) + int(zfetch_stats['stride'])
> ++    future = int(zfetch_stats.get('future', 0)) + int(zfetch_stats.get('stride', 0))
> +     prt_i2('Hits ahead of stream:', f_perc(future, zfetch_access_total),
> +            f_hits(future))
> +     prt_i2('Hits behind stream:',
> +-           f_perc(zfetch_stats['past'], zfetch_access_total),
> +-           f_hits(zfetch_stats['past']))
> ++           f_perc(zfetch_stats.get('past', 0), zfetch_access_total),
> ++           f_hits(zfetch_stats.get('past', 0)))
> +     prt_i2('Stream misses:',
> +            f_perc(zfetch_stats['misses'], zfetch_access_total),
> +            f_hits(zfetch_stats['misses']))
> +     prt_i2('Streams limit reached:',
> +            f_perc(zfetch_stats['max_streams'], zfetch_stats['misses']),
> +            f_hits(zfetch_stats['max_streams']))
> +-    prt_i1('Stream strides:', f_hits(zfetch_stats['stride']))
> ++    prt_i1('Stream strides:', f_hits(zfetch_stats.get('stride', 0)))
> +     prt_i1('Prefetches issued', f_hits(zfetch_stats['io_issued']))
> +     print()
> + 
> +@@ -860,20 +860,20 @@ def section_l2arc(kstats_dict):
> +            f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
> +            f_bytes(arc_stats['l2_hdr_size']))
> +     prt_i2('MFU allocated size:',
> +-           f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
> +-           f_bytes(arc_stats['l2_mfu_asize']))
> ++           f_perc(arc_stats.get('l2_mfu_asize', 0), arc_stats['l2_asize']),
> ++           f_bytes(arc_stats.get('l2_mfu_asize', 0))) # 2.0 module compat
> +     prt_i2('MRU allocated size:',
> +-           f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
> +-           f_bytes(arc_stats['l2_mru_asize']))
> ++           f_perc(arc_stats.get('l2_mru_asize', 0), arc_stats['l2_asize']),
> ++           f_bytes(arc_stats.get('l2_mru_asize', 0))) # 2.0 module compat
> +     prt_i2('Prefetch allocated size:',
> +-           f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
> +-           f_bytes(arc_stats['l2_prefetch_asize']))
> ++           f_perc(arc_stats.get('l2_prefetch_asize', 0), arc_stats['l2_asize']),
> ++           f_bytes(arc_stats.get('l2_prefetch_asize',0))) # 2.0 module compat
> +     prt_i2('Data (buffer content) allocated size:',
> +-           f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
> +-           f_bytes(arc_stats['l2_bufc_data_asize']))
> ++           f_perc(arc_stats.get('l2_bufc_data_asize', 0), arc_stats['l2_asize']),
> ++           f_bytes(arc_stats.get('l2_bufc_data_asize', 0))) # 2.0 module compat
> +     prt_i2('Metadata (buffer content) allocated size:',
> +-           f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
> +-           f_bytes(arc_stats['l2_bufc_metadata_asize']))
> ++           f_perc(arc_stats.get('l2_bufc_metadata_asize', 0), arc_stats['l2_asize']),
> ++           f_bytes(arc_stats.get('l2_bufc_metadata_asize', 0))) # 2.0 module compat
> + 
> +     print()
> +     prt_1('L2ARC breakdown:', f_hits(l2_access_total))
> +diff --git a/cmd/arcstat.in b/cmd/arcstat.in
> +index c4f10a1d6..bf47ec90e 100755
> +--- a/cmd/arcstat.in
> ++++ b/cmd/arcstat.in
> +@@ -510,7 +510,7 @@ def calculate():
> +     v = dict()
> +     v["time"] = time.strftime("%H:%M:%S", time.localtime())
> +     v["hits"] = d["hits"] // sint
> +-    v["iohs"] = d["iohits"] // sint
> ++    v["iohs"] = d.get("iohits", 0) // sint
> +     v["miss"] = d["misses"] // sint
> +     v["read"] = v["hits"] + v["iohs"] + v["miss"]
> +     v["hit%"] = 100 * v["hits"] // v["read"] if v["read"] > 0 else 0
> +@@ -518,7 +518,7 @@ def calculate():
> +     v["miss%"] = 100 - v["hit%"] - v["ioh%"] if v["read"] > 0 else 0
> + 
> +     v["dhit"] = (d["demand_data_hits"] + d["demand_metadata_hits"]) // sint
> +-    v["dioh"] = (d["demand_data_iohits"] + d["demand_metadata_iohits"]) // sint
> ++    v["dioh"] = (d.get("demand_data_iohits", 0) + d.get("demand_metadata_iohits", 0)) // sint
> +     v["dmis"] = (d["demand_data_misses"] + d["demand_metadata_misses"]) // sint
> + 
> +     v["dread"] = v["dhit"] + v["dioh"] + v["dmis"]
> +@@ -527,7 +527,7 @@ def calculate():
> +     v["dm%"] = 100 - v["dh%"] - v["di%"] if v["dread"] > 0 else 0
> + 
> +     v["ddhit"] = d["demand_data_hits"] // sint
> +-    v["ddioh"] = d["demand_data_iohits"] // sint
> ++    v["ddioh"] = d.get("demand_data_iohits", 0) // sint
> +     v["ddmis"] = d["demand_data_misses"] // sint
> + 
> +     v["ddread"] = v["ddhit"] + v["ddioh"] + v["ddmis"]
> +@@ -536,7 +536,7 @@ def calculate():
> +     v["ddm%"] = 100 - v["ddh%"] - v["ddi%"] if v["ddread"] > 0 else 0
> + 
> +     v["dmhit"] = d["demand_metadata_hits"] // sint
> +-    v["dmioh"] = d["demand_metadata_iohits"] // sint
> ++    v["dmioh"] = d.get("demand_metadata_iohits", 0) // sint
> +     v["dmmis"] = d["demand_metadata_misses"] // sint
> + 
> +     v["dmread"] = v["dmhit"] + v["dmioh"] + v["dmmis"]
> +@@ -545,8 +545,8 @@ def calculate():
> +     v["dmm%"] = 100 - v["dmh%"] - v["dmi%"] if v["dmread"] > 0 else 0
> + 
> +     v["phit"] = (d["prefetch_data_hits"] + d["prefetch_metadata_hits"]) // sint
> +-    v["pioh"] = (d["prefetch_data_iohits"] +
> +-                 d["prefetch_metadata_iohits"]) // sint
> ++    v["pioh"] = (d.get("prefetch_data_iohits", 0) +
> ++                 d.get("prefetch_metadata_iohits", 0)) // sint
> +     v["pmis"] = (d["prefetch_data_misses"] +
> +                  d["prefetch_metadata_misses"]) // sint
> + 
> +@@ -556,7 +556,7 @@ def calculate():
> +     v["pm%"] = 100 - v["ph%"] - v["pi%"] if v["pread"] > 0 else 0
> + 
> +     v["pdhit"] = d["prefetch_data_hits"] // sint
> +-    v["pdioh"] = d["prefetch_data_iohits"] // sint
> ++    v["pdioh"] = d.get("prefetch_data_iohits", 0) // sint
> +     v["pdmis"] = d["prefetch_data_misses"] // sint
> + 
> +     v["pdread"] = v["pdhit"] + v["pdioh"] + v["pdmis"]
> +@@ -565,7 +565,7 @@ def calculate():
> +     v["pdm%"] = 100 - v["pdh%"] - v["pdi%"] if v["pdread"] > 0 else 0
> + 
> +     v["pmhit"] = d["prefetch_metadata_hits"] // sint
> +-    v["pmioh"] = d["prefetch_metadata_iohits"] // sint
> ++    v["pmioh"] = d.get("prefetch_metadata_iohits", 0) // sint
> +     v["pmmis"] = d["prefetch_metadata_misses"] // sint
> + 
> +     v["pmread"] = v["pmhit"] + v["pmioh"] + v["pmmis"]
> +@@ -575,8 +575,8 @@ def calculate():
> + 
> +     v["mhit"] = (d["prefetch_metadata_hits"] +
> +                  d["demand_metadata_hits"]) // sint
> +-    v["mioh"] = (d["prefetch_metadata_iohits"] +
> +-                 d["demand_metadata_iohits"]) // sint
> ++    v["mioh"] = (d.get("prefetch_metadata_iohits", 0) +
> ++                 d.get("demand_metadata_iohits", 0)) // sint
> +     v["mmis"] = (d["prefetch_metadata_misses"] +
> +                  d["demand_metadata_misses"]) // sint
> + 
> +@@ -592,24 +592,24 @@ def calculate():
> +     v["mru"] = d["mru_hits"] // sint
> +     v["mrug"] = d["mru_ghost_hits"] // sint
> +     v["mfug"] = d["mfu_ghost_hits"] // sint
> +-    v["unc"] = d["uncached_hits"] // sint
> ++    v["unc"] = d.get("uncached_hits", 0) // sint
> +     v["eskip"] = d["evict_skip"] // sint
> +     v["el2skip"] = d["evict_l2_skip"] // sint
> +     v["el2cach"] = d["evict_l2_cached"] // sint
> +     v["el2el"] = d["evict_l2_eligible"] // sint
> +-    v["el2mfu"] = d["evict_l2_eligible_mfu"] // sint
> +-    v["el2mru"] = d["evict_l2_eligible_mru"] // sint
> ++    v["el2mfu"] = d.get("evict_l2_eligible_mfu", 0) // sint
> ++    v["el2mru"] = d.get("evict_l2_eligible_mru", 0) // sint
> +     v["el2inel"] = d["evict_l2_ineligible"] // sint
> +     v["mtxmis"] = d["mutex_miss"] // sint
> +-    v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
> +-                   d["zfetch_past"] + d["zfetch_misses"]) // sint
> ++    v["ztotal"] = (d["zfetch_hits"] + d.get("zfetch_future", 0) + d.get("zfetch_stride", 0) +
> ++                   d.get("zfetch_past", 0) + d["zfetch_misses"]) // sint
> +     v["zhits"] = d["zfetch_hits"] // sint
> +-    v["zahead"] = (d["zfetch_future"] + d["zfetch_stride"]) // sint
> +-    v["zpast"] = d["zfetch_past"] // sint
> ++    v["zahead"] = (d.get("zfetch_future", 0) + d.get("zfetch_stride", 0)) // sint
> ++    v["zpast"] = d.get("zfetch_past", 0) // sint
> +     v["zmisses"] = d["zfetch_misses"] // sint
> +     v["zmax"] = d["zfetch_max_streams"] // sint
> +-    v["zfuture"] = d["zfetch_future"] // sint
> +-    v["zstride"] = d["zfetch_stride"] // sint
> ++    v["zfuture"] = d.get("zfetch_future", 0) // sint
> ++    v["zstride"] = d.get("zfetch_stride", 0) // sint
> +     v["zissued"] = d["zfetch_io_issued"] // sint
> +     v["zactive"] = d["zfetch_io_active"] // sint
> + 
> +@@ -624,11 +624,11 @@ def calculate():
> +         v["l2size"] = cur["l2_size"]
> +         v["l2bytes"] = d["l2_read_bytes"] // sint
> + 
> +-        v["l2pref"] = cur["l2_prefetch_asize"]
> +-        v["l2mfu"] = cur["l2_mfu_asize"]
> +-        v["l2mru"] = cur["l2_mru_asize"]
> +-        v["l2data"] = cur["l2_bufc_data_asize"]
> +-        v["l2meta"] = cur["l2_bufc_metadata_asize"]
> ++        v["l2pref"] = cur.get("l2_prefetch_asize", 0)
> ++        v["l2mfu"] = cur.get("l2_mfu_asize", 0)
> ++        v["l2mru"] = cur.get("l2_mru_asize", 0)
> ++        v["l2data"] = cur.get("l2_bufc_data_asize", 0)
> ++        v["l2meta"] = cur.get("l2_bufc_metadata_asize", 0)
> +         v["l2pref%"] = 100 * v["l2pref"] // v["l2asize"]
> +         v["l2mfu%"] = 100 * v["l2mfu"] // v["l2asize"]
> +         v["l2mru%"] = 100 * v["l2mru"] // v["l2asize"]
> diff --git a/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch b/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
> deleted file mode 100644
> index 2e7c207d..00000000
> --- a/debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
> +++ /dev/null
> @@ -1,113 +0,0 @@
> -From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> -From: Thomas Lamprecht <t.lamprecht@proxmox.com>
> -Date: Wed, 10 Nov 2021 09:29:47 +0100
> -Subject: [PATCH] arc stat/summary: guard access to l2arc MFU/MRU stats
> -
> -commit 085321621e79a75bea41c2b6511da6ebfbf2ba0a added printing MFU
> -and MRU stats for 2.1 user space tools, but those keys are not
> -available in the 2.0 module. That means it may break the arcstat and
> -arc_summary tools after upgrade to 2.1 (user space), before a reboot
> -to the new 2.1 ZFS kernel-module happened, due to python raising a
> -KeyError on the dict access then.
> -
> -Move those two keys to a .get accessor with `0` as fallback, as it
> -should be better to show some possible wrong data for new stat-keys
> -than throwing an exception.
> -
> -Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> -
> -also move l2_mfu_asize  l2_mru_asize l2_prefetch_asize
> -l2_bufc_data_asize l2_bufc_metadata_asize to .get accessor
> -(these are only present with a cache device in the pool)
> -Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> -Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> ----
> - cmd/arc_summary | 28 ++++++++++++++--------------
> - cmd/arcstat.in  | 14 +++++++-------
> - 2 files changed, 21 insertions(+), 21 deletions(-)
> -
> -diff --git a/cmd/arc_summary b/cmd/arc_summary
> -index 100fb1987..86b2260a1 100755
> ---- a/cmd/arc_summary
> -+++ b/cmd/arc_summary
> -@@ -655,13 +655,13 @@ def section_arc(kstats_dict):
> -     prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
> -     prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
> -     prt_i2('L2 eligible MFU evictions:',
> --           f_perc(arc_stats['evict_l2_eligible_mfu'],
> -+           f_perc(arc_stats.get('evict_l2_eligible_mfu', 0), # 2.0 module compat
> -            arc_stats['evict_l2_eligible']),
> --           f_bytes(arc_stats['evict_l2_eligible_mfu']))
> -+           f_bytes(arc_stats.get('evict_l2_eligible_mfu', 0)))
> -     prt_i2('L2 eligible MRU evictions:',
> --           f_perc(arc_stats['evict_l2_eligible_mru'],
> -+           f_perc(arc_stats.get('evict_l2_eligible_mru', 0), # 2.0 module compat
> -            arc_stats['evict_l2_eligible']),
> --           f_bytes(arc_stats['evict_l2_eligible_mru']))
> -+           f_bytes(arc_stats.get('evict_l2_eligible_mru', 0)))
> -     prt_i1('L2 ineligible evictions:',
> -            f_bytes(arc_stats['evict_l2_ineligible']))
> -     print()
> -@@ -860,20 +860,20 @@ def section_l2arc(kstats_dict):
> -            f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
> -            f_bytes(arc_stats['l2_hdr_size']))
> -     prt_i2('MFU allocated size:',
> --           f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
> --           f_bytes(arc_stats['l2_mfu_asize']))
> -+           f_perc(arc_stats.get('l2_mfu_asize', 0), arc_stats['l2_asize']),
> -+           f_bytes(arc_stats.get('l2_mfu_asize', 0))) # 2.0 module compat
> -     prt_i2('MRU allocated size:',
> --           f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
> --           f_bytes(arc_stats['l2_mru_asize']))
> -+           f_perc(arc_stats.get('l2_mru_asize', 0), arc_stats['l2_asize']),
> -+           f_bytes(arc_stats.get('l2_mru_asize', 0))) # 2.0 module compat
> -     prt_i2('Prefetch allocated size:',
> --           f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
> --           f_bytes(arc_stats['l2_prefetch_asize']))
> -+           f_perc(arc_stats.get('l2_prefetch_asize', 0), arc_stats['l2_asize']),
> -+           f_bytes(arc_stats.get('l2_prefetch_asize',0))) # 2.0 module compat
> -     prt_i2('Data (buffer content) allocated size:',
> --           f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
> --           f_bytes(arc_stats['l2_bufc_data_asize']))
> -+           f_perc(arc_stats.get('l2_bufc_data_asize', 0), arc_stats['l2_asize']),
> -+           f_bytes(arc_stats.get('l2_bufc_data_asize', 0))) # 2.0 module compat
> -     prt_i2('Metadata (buffer content) allocated size:',
> --           f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
> --           f_bytes(arc_stats['l2_bufc_metadata_asize']))
> -+           f_perc(arc_stats.get('l2_bufc_metadata_asize', 0), arc_stats['l2_asize']),
> -+           f_bytes(arc_stats.get('l2_bufc_metadata_asize', 0))) # 2.0 module compat
> - 
> -     print()
> -     prt_1('L2ARC breakdown:', f_hits(l2_access_total))
> -diff --git a/cmd/arcstat.in b/cmd/arcstat.in
> -index c4f10a1d6..c570dca88 100755
> ---- a/cmd/arcstat.in
> -+++ b/cmd/arcstat.in
> -@@ -597,8 +597,8 @@ def calculate():
> -     v["el2skip"] = d["evict_l2_skip"] // sint
> -     v["el2cach"] = d["evict_l2_cached"] // sint
> -     v["el2el"] = d["evict_l2_eligible"] // sint
> --    v["el2mfu"] = d["evict_l2_eligible_mfu"] // sint
> --    v["el2mru"] = d["evict_l2_eligible_mru"] // sint
> -+    v["el2mfu"] = d.get("evict_l2_eligible_mfu", 0) // sint
> -+    v["el2mru"] = d.get("evict_l2_eligible_mru", 0) // sint
> -     v["el2inel"] = d["evict_l2_ineligible"] // sint
> -     v["mtxmis"] = d["mutex_miss"] // sint
> -     v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
> -@@ -624,11 +624,11 @@ def calculate():
> -         v["l2size"] = cur["l2_size"]
> -         v["l2bytes"] = d["l2_read_bytes"] // sint
> - 
> --        v["l2pref"] = cur["l2_prefetch_asize"]
> --        v["l2mfu"] = cur["l2_mfu_asize"]
> --        v["l2mru"] = cur["l2_mru_asize"]
> --        v["l2data"] = cur["l2_bufc_data_asize"]
> --        v["l2meta"] = cur["l2_bufc_metadata_asize"]
> -+        v["l2pref"] = cur.get("l2_prefetch_asize", 0)
> -+        v["l2mfu"] = cur.get("l2_mfu_asize", 0)
> -+        v["l2mru"] = cur.get("l2_mru_asize", 0)
> -+        v["l2data"] = cur.get("l2_bufc_data_asize", 0)
> -+        v["l2meta"] = cur.get("l2_bufc_metadata_asize", 0)
> -         v["l2pref%"] = 100 * v["l2pref"] // v["l2asize"]
> -         v["l2mfu%"] = 100 * v["l2mfu"] // v["l2asize"]
> -         v["l2mru%"] = 100 * v["l2mru"] // v["l2asize"]
> diff --git a/debian/patches/series b/debian/patches/series
> index 35f81d13..229027ff 100644
> --- a/debian/patches/series
> +++ b/debian/patches/series
> @@ -6,6 +6,6 @@
>  0006-dont-symlink-zed-scripts.patch
>  0007-Add-systemd-unit-for-importing-specific-pools.patch
>  0008-Patch-move-manpage-arcstat-1-to-arcstat-8.patch
> -0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
> +0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
>  0010-Fix-nfs_truncate_shares-without-etc-exports.d.patch
>  0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pve-devel] [PATCH zfsonlinux v2 1/2] update zfs submodule to 2.2.4 and refresh patches
  2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 1/2] update zfs submodule to 2.2.4 and refresh patches Stoiko Ivanov
@ 2024-05-21 13:56   ` Max Carrara
  0 siblings, 0 replies; 7+ messages in thread
From: Max Carrara @ 2024-05-21 13:56 UTC (permalink / raw)
  To: Proxmox VE development discussion

On Tue May 7, 2024 at 5:02 PM CEST, Stoiko Ivanov wrote:
> mostly - drop all patches we had queued up to get kernel 6.8
> supported.
>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---

See my reply to the cover letter.

Tested-by: Max Carrara <m.carrara@proxmox.com>

>  ...md-unit-for-importing-specific-pools.patch |   4 +-
>  ...-move-manpage-arcstat-1-to-arcstat-8.patch |   2 +-
>  ...-guard-access-to-l2arc-MFU-MRU-stats.patch |  12 +-
>  ...hten-bounds-for-noalloc-stat-availab.patch |   4 +-
>  ...rectly-handle-partition-16-and-later.patch |  52 --
>  ...-use-splice_copy_file_range-for-fall.patch | 135 ----
>  .../0014-linux-5.4-compat-page_size.patch     | 121 ----
>  .../patches/0015-abd-add-page-iterator.patch  | 334 ---------
>  ...-existing-functions-to-vdev_classic_.patch | 349 ---------
>  ...v_disk-reorganise-vdev_disk_io_start.patch | 111 ---
>  ...-read-write-IO-function-configurable.patch |  69 --
>  ...e-BIO-filling-machinery-to-avoid-spl.patch | 671 ------------------
>  ...dule-parameter-to-select-BIO-submiss.patch | 104 ---
>  ...se-bio_chain-to-submit-multiple-BIOs.patch | 363 ----------
>  ...on-t-use-compound-heads-on-Linux-4.5.patch |  96 ---
>  ...ault-to-classic-submission-for-2.2.x.patch |  90 ---
>  ...ion-caused-by-mmap-flushing-problems.patch | 104 ---
>  ...touch-vbio-after-its-handed-off-to-t.patch |  57 --
>  debian/patches/series                         |  14 -
>  upstream                                      |   2 +-
>  20 files changed, 12 insertions(+), 2682 deletions(-)
>  delete mode 100644 debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch
>  delete mode 100644 debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
>  delete mode 100644 debian/patches/0014-linux-5.4-compat-page_size.patch
>  delete mode 100644 debian/patches/0015-abd-add-page-iterator.patch
>  delete mode 100644 debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
>  delete mode 100644 debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch
>  delete mode 100644 debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch
>  delete mode 100644 debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
>  delete mode 100644 debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
>  delete mode 100644 debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
>  delete mode 100644 debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
>  delete mode 100644 debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
>  delete mode 100644 debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
>  delete mode 100644 debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch
>
> diff --git a/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch b/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch
> index 8232978c..0600296f 100644
> --- a/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch
> +++ b/debian/patches/0007-Add-systemd-unit-for-importing-specific-pools.patch
> @@ -18,7 +18,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
>  ---
>   etc/Makefile.am                           |  1 +
>   etc/systemd/system/50-zfs.preset          |  1 +
> - etc/systemd/system/zfs-import@.service.in | 18 ++++++++++++++++
> + etc/systemd/system/zfs-import@.service.in | 18 ++++++++++++++++++
>   3 files changed, 20 insertions(+)
>   create mode 100644 etc/systemd/system/zfs-import@.service.in


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [pve-devel] applied-series: [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4
  2024-05-07 15:02 [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Stoiko Ivanov
                   ` (2 preceding siblings ...)
  2024-05-21 13:31 ` [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Max Carrara
@ 2024-05-21 14:06 ` Thomas Lamprecht
  3 siblings, 0 replies; 7+ messages in thread
From: Thomas Lamprecht @ 2024-05-21 14:06 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stoiko Ivanov

Am 07/05/2024 um 17:02 schrieb Stoiko Ivanov:
> v1->v2:
> Patch 2/2 (adaptation of arc_summary/arcstat patch) modified:
> * right after sending the v1 I saw a report where pinning kernel 6.2 (thus
>   ZFS 2.1) leads to a similar traceback - which I seem to have overlooked
>   when packaging 2.2.0 ...
>   adapted the patch by booting a VM with kernel 6.2 and the current
>   userspace and running arc_summary /arcstat -a until no traceback was
>   displayed with a single-disk pool.
> 
> original cover-letter for v1:
> This patchset updates ZFS to the recently released 2.2.4
> 
> We had about half of the patches already in 2.2.3-2, due to the needed
> support for kernel 6.8.
> 
> Compared to the last 2.2 point releases this one compares quite a few
> potential performance improvments:
> * for ZVOL workloads (relevant for qemu guests) multiple taskq were
>   introduced [1] - this change is active by default (can be put back to
>   the old behavior with explicitly setting `zvol_num_taskqs=1`
> * the interface for ZFS submitting operations to the kernel's block layer
>   was augmented to better deal with split-pages [2] - which should also
>   improve performance, and prevent unaligned writes which are rejected by
>   e.g. the SCSI subsystem. - The default remains with the current code
>   (`zfs_vdev_disk_classic=0` turns on the 'new' behavior...)
> * Speculative prefetching was improved [3], which introduced not kstats,
>   which are reported by`arc_summary` and `arcstat`, as before with the
>   MRU/MFU additions there was not guard for running the new user-space
>   with an old kernel resulting in Python exceptions of both tools.
>   I adapted the patch where Thomas fixed that back in the 2.1 release
>   times. - sending as separate patch for easier review - and I hope it's
>   ok that I dropped the S-o-b tag (as it's changed code) - glad to resend
>   it, if this should be adapted.
> 
> Minimally tested on 2 VMs (the arcstat/arc_summary changes by running with
> an old kernel and new user-space)
> 
> 
> [0] https://github.com/openzfs/zfs/releases/tag/zfs-2.2.4
> [1] https://github.com/openzfs/zfs/pull/15992
> [2] https://github.com/openzfs/zfs/pull/15588
> [3] https://github.com/openzfs/zfs/pull/16022
> 
> Stoiko Ivanov (2):
>   update zfs submodule to 2.2.4 and refresh patches
>   update arc_summary arcstat patch with new introduced values
> 
>  ...md-unit-for-importing-specific-pools.patch |   4 +-
>  ...-move-manpage-arcstat-1-to-arcstat-8.patch |   2 +-
>  ...-guard-access-to-freshly-introduced-.patch | 438 ++++++++++++
>  ...-guard-access-to-l2arc-MFU-MRU-stats.patch | 113 ---
>  ...hten-bounds-for-noalloc-stat-availab.patch |   4 +-
>  ...rectly-handle-partition-16-and-later.patch |  52 --
>  ...-use-splice_copy_file_range-for-fall.patch | 135 ----
>  .../0014-linux-5.4-compat-page_size.patch     | 121 ----
>  .../patches/0015-abd-add-page-iterator.patch  | 334 ---------
>  ...-existing-functions-to-vdev_classic_.patch | 349 ---------
>  ...v_disk-reorganise-vdev_disk_io_start.patch | 111 ---
>  ...-read-write-IO-function-configurable.patch |  69 --
>  ...e-BIO-filling-machinery-to-avoid-spl.patch | 671 ------------------
>  ...dule-parameter-to-select-BIO-submiss.patch | 104 ---
>  ...se-bio_chain-to-submit-multiple-BIOs.patch | 363 ----------
>  ...on-t-use-compound-heads-on-Linux-4.5.patch |  96 ---
>  ...ault-to-classic-submission-for-2.2.x.patch |  90 ---
>  ...ion-caused-by-mmap-flushing-problems.patch | 104 ---
>  ...touch-vbio-after-its-handed-off-to-t.patch |  57 --
>  debian/patches/series                         |  16 +-
>  upstream                                      |   2 +-
>  21 files changed, 445 insertions(+), 2790 deletions(-)
>  create mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
>  delete mode 100644 debian/patches/0009-arc-stat-summary-guard-access-to-l2arc-MFU-MRU-stats.patch
>  delete mode 100644 debian/patches/0012-udev-correctly-handle-partition-16-and-later.patch
>  delete mode 100644 debian/patches/0013-Linux-6.8-compat-use-splice_copy_file_range-for-fall.patch
>  delete mode 100644 debian/patches/0014-linux-5.4-compat-page_size.patch
>  delete mode 100644 debian/patches/0015-abd-add-page-iterator.patch
>  delete mode 100644 debian/patches/0016-vdev_disk-rename-existing-functions-to-vdev_classic_.patch
>  delete mode 100644 debian/patches/0017-vdev_disk-reorganise-vdev_disk_io_start.patch
>  delete mode 100644 debian/patches/0018-vdev_disk-make-read-write-IO-function-configurable.patch
>  delete mode 100644 debian/patches/0019-vdev_disk-rewrite-BIO-filling-machinery-to-avoid-spl.patch
>  delete mode 100644 debian/patches/0020-vdev_disk-add-module-parameter-to-select-BIO-submiss.patch
>  delete mode 100644 debian/patches/0021-vdev_disk-use-bio_chain-to-submit-multiple-BIOs.patch
>  delete mode 100644 debian/patches/0022-abd_iter_page-don-t-use-compound-heads-on-Linux-4.5.patch
>  delete mode 100644 debian/patches/0023-vdev_disk-default-to-classic-submission-for-2.2.x.patch
>  delete mode 100644 debian/patches/0024-Fix-corruption-caused-by-mmap-flushing-problems.patch
>  delete mode 100644 debian/patches/0025-vdev_disk-don-t-touch-vbio-after-its-handed-off-to-t.patch
> 


applied series with Max's T-b and R-b, where applicable, thanks!

Did not yet make a version bump or updated the kernel repo though, I'd
wait for a new kernel build to do that.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-21 14:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-07 15:02 [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Stoiko Ivanov
2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 1/2] update zfs submodule to 2.2.4 and refresh patches Stoiko Ivanov
2024-05-21 13:56   ` Max Carrara
2024-05-07 15:02 ` [pve-devel] [PATCH zfsonlinux v2 2/2] update arc_summary arcstat patch with new introduced values Stoiko Ivanov
2024-05-21 13:32   ` Max Carrara
2024-05-21 13:31 ` [pve-devel] [PATCH zfsonlinux v2 0/2] Update to ZFS 2.2.4 Max Carrara
2024-05-21 14:06 ` [pve-devel] applied-series: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal