public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH zfsonlinux] patches: fix for zvol sync/flush regression
@ 2025-03-11  8:44 Fabian Grünbichler
  2025-03-11 13:44 ` Stoiko Ivanov
  2025-03-13 11:33 ` [pve-devel] superseded: " Fabian Grünbichler
  0 siblings, 2 replies; 3+ messages in thread
From: Fabian Grünbichler @ 2025-03-11  8:44 UTC (permalink / raw)
  To: pve-devel

this broke with 2.2.7, and can potentially cause data loss or
inconsistency. the patch basically reverts to pre-2.2.7 behaviour,
verified via a fio benchmark.

reported on our forum: https://forum.proxmox.com/threads/163066

upstream PR: https://github.com/openzfs/zfs/pull/17131

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
module only change, needs kernel bump to be active

 ...vols-correctly-detect-flush-requests.patch | 58 +++++++++++++++++++
 debian/patches/series                         |  1 +
 2 files changed, 59 insertions(+)
 create mode 100644 debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch

diff --git a/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch b/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
new file mode 100644
index 0000000..609934a
--- /dev/null
+++ b/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
@@ -0,0 +1,58 @@
+From c9ae14e064f961709557fdcbdb40e692a97225ec Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
+Date: Mon, 10 Mar 2025 15:51:09 +0100
+Subject: [PATCH] linux: zvols: correctly detect flush requests
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+since 4.10, bio->bi_opf needs to be checked to determine all kinds of
+flush requests. this was the case prior to the commit referenced below,
+but the order of ifdefs was not the usual one (newest up top), which
+might have caused this to slip through.
+
+this fixes a regression when using zvols as Qemu block devices, but
+might have broken other use cases as well. the symptoms are that all
+sync writes from within a VM configured to use such a virtual block
+devices are ignored and treated as async writes by the host ZFS layer.
+
+this can be verified using fio in sync mode inside the VM, for example
+with
+
+ fio \
+ --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \
+ --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \
+ --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \
+ --numjobs=1 --sync=1
+
+which shows an IOPS number way above what the physical device underneath
+supports, with "zpool iostat -r 1" on the hypervisor side showing no
+sync IO occuring during the benchmark.
+
+with the regression fixed, both fio inside the VM and the IO stats on
+the host show the expected numbers.
+
+Fixes: 846b5985192467acabf5484ae610b4b37b7f8162
+"config: remove HAVE_REQ_OP_* and HAVE_REQ_*"
+
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ include/os/linux/kernel/linux/blkdev_compat.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/include/os/linux/kernel/linux/blkdev_compat.h b/include/os/linux/kernel/linux/blkdev_compat.h
+index d96708c60..084869c34 100644
+--- a/include/os/linux/kernel/linux/blkdev_compat.h
++++ b/include/os/linux/kernel/linux/blkdev_compat.h
+@@ -383,7 +383,7 @@ bio_set_flush(struct bio *bio)
+ static inline boolean_t
+ bio_is_flush(struct bio *bio)
+ {
+-	return (bio_op(bio) == REQ_OP_FLUSH);
++	return (bio_op(bio) == REQ_OP_FLUSH) || (op_is_flush(bio->bi_opf));
+ }
+ 
+ /*
+-- 
+2.39.5
+
diff --git a/debian/patches/series b/debian/patches/series
index 229027f..938488c 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -9,3 +9,4 @@
 0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
 0010-Fix-nfs_truncate_shares-without-etc-exports.d.patch
 0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
+0012-linux-zvols-correctly-detect-flush-requests.patch
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] [PATCH zfsonlinux] patches: fix for zvol sync/flush regression
  2025-03-11  8:44 [pve-devel] [PATCH zfsonlinux] patches: fix for zvol sync/flush regression Fabian Grünbichler
@ 2025-03-11 13:44 ` Stoiko Ivanov
  2025-03-13 11:33 ` [pve-devel] superseded: " Fabian Grünbichler
  1 sibling, 0 replies; 3+ messages in thread
From: Stoiko Ivanov @ 2025-03-11 13:44 UTC (permalink / raw)
  To: Fabian Grünbichler; +Cc: Proxmox VE development discussion

Thanks for analysis and preparing the patch!

LGTM - gave it a spin on a VM (with nested guest for testing) on kernel
6.8 - works as expected.

Tested-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Reviewed-by: Stoiko Ivanov <s.ivanov@proxmox.com>

On Tue, 11 Mar 2025 09:44:03 +0100
Fabian Grünbichler <f.gruenbichler@proxmox.com> wrote:

> this broke with 2.2.7, and can potentially cause data loss or
> inconsistency. the patch basically reverts to pre-2.2.7 behaviour,
> verified via a fio benchmark.
> 
> reported on our forum: https://forum.proxmox.com/threads/163066
> 
> upstream PR: https://github.com/openzfs/zfs/pull/17131
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> module only change, needs kernel bump to be active
> 
>  ...vols-correctly-detect-flush-requests.patch | 58 +++++++++++++++++++
>  debian/patches/series                         |  1 +
>  2 files changed, 59 insertions(+)
>  create mode 100644 debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
> 
> diff --git a/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch b/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
> new file mode 100644
> index 0000000..609934a
> --- /dev/null
> +++ b/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
> @@ -0,0 +1,58 @@
> +From c9ae14e064f961709557fdcbdb40e692a97225ec Mon Sep 17 00:00:00 2001
> +From: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
> +Date: Mon, 10 Mar 2025 15:51:09 +0100
> +Subject: [PATCH] linux: zvols: correctly detect flush requests
> +MIME-Version: 1.0
> +Content-Type: text/plain; charset=UTF-8
> +Content-Transfer-Encoding: 8bit
> +
> +since 4.10, bio->bi_opf needs to be checked to determine all kinds of
> +flush requests. this was the case prior to the commit referenced below,
> +but the order of ifdefs was not the usual one (newest up top), which
> +might have caused this to slip through.
> +
> +this fixes a regression when using zvols as Qemu block devices, but
> +might have broken other use cases as well. the symptoms are that all
> +sync writes from within a VM configured to use such a virtual block
> +devices are ignored and treated as async writes by the host ZFS layer.
> +
> +this can be verified using fio in sync mode inside the VM, for example
> +with
> +
> + fio \
> + --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \
> + --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \
> + --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \
> + --numjobs=1 --sync=1
> +
> +which shows an IOPS number way above what the physical device underneath
> +supports, with "zpool iostat -r 1" on the hypervisor side showing no
> +sync IO occuring during the benchmark.
> +
> +with the regression fixed, both fio inside the VM and the IO stats on
> +the host show the expected numbers.
> +
> +Fixes: 846b5985192467acabf5484ae610b4b37b7f8162
> +"config: remove HAVE_REQ_OP_* and HAVE_REQ_*"
> +
> +Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> +---
> + include/os/linux/kernel/linux/blkdev_compat.h | 2 +-
> + 1 file changed, 1 insertion(+), 1 deletion(-)
> +
> +diff --git a/include/os/linux/kernel/linux/blkdev_compat.h b/include/os/linux/kernel/linux/blkdev_compat.h
> +index d96708c60..084869c34 100644
> +--- a/include/os/linux/kernel/linux/blkdev_compat.h
> ++++ b/include/os/linux/kernel/linux/blkdev_compat.h
> +@@ -383,7 +383,7 @@ bio_set_flush(struct bio *bio)
> + static inline boolean_t
> + bio_is_flush(struct bio *bio)
> + {
> +-	return (bio_op(bio) == REQ_OP_FLUSH);
> ++	return (bio_op(bio) == REQ_OP_FLUSH) || (op_is_flush(bio->bi_opf));
> + }
> + 
> + /*
> +-- 
> +2.39.5
> +
> diff --git a/debian/patches/series b/debian/patches/series
> index 229027f..938488c 100644
> --- a/debian/patches/series
> +++ b/debian/patches/series
> @@ -9,3 +9,4 @@
>  0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
>  0010-Fix-nfs_truncate_shares-without-etc-exports.d.patch
>  0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
> +0012-linux-zvols-correctly-detect-flush-requests.patch



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pve-devel] superseded: [PATCH zfsonlinux] patches: fix for zvol sync/flush regression
  2025-03-11  8:44 [pve-devel] [PATCH zfsonlinux] patches: fix for zvol sync/flush regression Fabian Grünbichler
  2025-03-11 13:44 ` Stoiko Ivanov
@ 2025-03-13 11:33 ` Fabian Grünbichler
  1 sibling, 0 replies; 3+ messages in thread
From: Fabian Grünbichler @ 2025-03-13 11:33 UTC (permalink / raw)
  To: pve-devel

v2 with slightly updated patch from upstream master:

https://lore.proxmox.com/pve-devel/20250313113214.23456-1-f.gruenbichler@proxmox.com/T/#u

On March 11, 2025 9:44 am, Fabian Grünbichler wrote:
> this broke with 2.2.7, and can potentially cause data loss or
> inconsistency. the patch basically reverts to pre-2.2.7 behaviour,
> verified via a fio benchmark.
> 
> reported on our forum: https://forum.proxmox.com/threads/163066
> 
> upstream PR: https://github.com/openzfs/zfs/pull/17131
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> module only change, needs kernel bump to be active
> 
>  ...vols-correctly-detect-flush-requests.patch | 58 +++++++++++++++++++
>  debian/patches/series                         |  1 +
>  2 files changed, 59 insertions(+)
>  create mode 100644 debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
> 
> diff --git a/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch b/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
> new file mode 100644
> index 0000000..609934a
> --- /dev/null
> +++ b/debian/patches/0012-linux-zvols-correctly-detect-flush-requests.patch
> @@ -0,0 +1,58 @@
> +From c9ae14e064f961709557fdcbdb40e692a97225ec Mon Sep 17 00:00:00 2001
> +From: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
> +Date: Mon, 10 Mar 2025 15:51:09 +0100
> +Subject: [PATCH] linux: zvols: correctly detect flush requests
> +MIME-Version: 1.0
> +Content-Type: text/plain; charset=UTF-8
> +Content-Transfer-Encoding: 8bit
> +
> +since 4.10, bio->bi_opf needs to be checked to determine all kinds of
> +flush requests. this was the case prior to the commit referenced below,
> +but the order of ifdefs was not the usual one (newest up top), which
> +might have caused this to slip through.
> +
> +this fixes a regression when using zvols as Qemu block devices, but
> +might have broken other use cases as well. the symptoms are that all
> +sync writes from within a VM configured to use such a virtual block
> +devices are ignored and treated as async writes by the host ZFS layer.
> +
> +this can be verified using fio in sync mode inside the VM, for example
> +with
> +
> + fio \
> + --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \
> + --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \
> + --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \
> + --numjobs=1 --sync=1
> +
> +which shows an IOPS number way above what the physical device underneath
> +supports, with "zpool iostat -r 1" on the hypervisor side showing no
> +sync IO occuring during the benchmark.
> +
> +with the regression fixed, both fio inside the VM and the IO stats on
> +the host show the expected numbers.
> +
> +Fixes: 846b5985192467acabf5484ae610b4b37b7f8162
> +"config: remove HAVE_REQ_OP_* and HAVE_REQ_*"
> +
> +Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> +---
> + include/os/linux/kernel/linux/blkdev_compat.h | 2 +-
> + 1 file changed, 1 insertion(+), 1 deletion(-)
> +
> +diff --git a/include/os/linux/kernel/linux/blkdev_compat.h b/include/os/linux/kernel/linux/blkdev_compat.h
> +index d96708c60..084869c34 100644
> +--- a/include/os/linux/kernel/linux/blkdev_compat.h
> ++++ b/include/os/linux/kernel/linux/blkdev_compat.h
> +@@ -383,7 +383,7 @@ bio_set_flush(struct bio *bio)
> + static inline boolean_t
> + bio_is_flush(struct bio *bio)
> + {
> +-	return (bio_op(bio) == REQ_OP_FLUSH);
> ++	return (bio_op(bio) == REQ_OP_FLUSH) || (op_is_flush(bio->bi_opf));
> + }
> + 
> + /*
> +-- 
> +2.39.5
> +
> diff --git a/debian/patches/series b/debian/patches/series
> index 229027f..938488c 100644
> --- a/debian/patches/series
> +++ b/debian/patches/series
> @@ -9,3 +9,4 @@
>  0009-arc-stat-summary-guard-access-to-freshly-introduced-.patch
>  0010-Fix-nfs_truncate_shares-without-etc-exports.d.patch
>  0011-zpool-status-tighten-bounds-for-noalloc-stat-availab.patch
> +0012-linux-zvols-correctly-detect-flush-requests.patch
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-03-13 11:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-11  8:44 [pve-devel] [PATCH zfsonlinux] patches: fix for zvol sync/flush regression Fabian Grünbichler
2025-03-11 13:44 ` Stoiko Ivanov
2025-03-13 11:33 ` [pve-devel] superseded: " Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal