* [pve-devel] [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout
@ 2024-04-29 15:20 Fiona Ebner
2024-04-29 15:20 ` [pve-devel] [PATCH v3 qemu 2/2] backup: improve error when copy-before-write fails for fleecing Fiona Ebner
2024-04-30 7:45 ` [pve-devel] applied-series: [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Thomas Lamprecht
0 siblings, 2 replies; 3+ messages in thread
From: Fiona Ebner @ 2024-04-29 15:20 UTC (permalink / raw)
To: pve-devel
The type for the copy-before-write timeout in nanoseconds was wrong.
By being just uint32_t, a maximum of slightly over 4 seconds was
possible. Larger values would overflow and thus the 45 seconds set by
Proxmox's backup with fleecing, resulted in effectively 2 seconds
timeout for copy-before-write operations.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* rebase on master
Upstream submission of this patch:
https://lore.kernel.org/qemu-devel/20240429141934.442154-1-f.ebner@proxmox.com/T/#u
...e-write-use-uint64_t-for-timeout-in-.patch | 35 +++++++++++++++++++
...ock-copy-before-write-fix-permission.patch | 2 +-
...e-write-support-unligned-snapshot-di.patch | 2 +-
...e-write-create-block_copy-bitmap-in-.patch | 2 +-
...-backup-add-discard-source-parameter.patch | 4 +--
...e-allow-specifying-minimum-cluster-s.patch | 2 +-
...um-cluster-size-to-performance-optio.patch | 2 +-
debian/patches/series | 1 +
8 files changed, 43 insertions(+), 7 deletions(-)
create mode 100644 debian/patches/extra/0005-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
diff --git a/debian/patches/extra/0005-block-copy-before-write-use-uint64_t-for-timeout-in-.patch b/debian/patches/extra/0005-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
new file mode 100644
index 0000000..a8bdd85
--- /dev/null
+++ b/debian/patches/extra/0005-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
@@ -0,0 +1,35 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Fiona Ebner <f.ebner@proxmox.com>
+Date: Mon, 29 Apr 2024 15:41:11 +0200
+Subject: [PATCH] block/copy-before-write: use uint64_t for timeout in
+ nanoseconds
+
+rather than the uint32_t for which the maximum is slightly more than 4
+seconds and larger values would overflow. The QAPI interface allows
+specifying the number of seconds, so only values 0 to 4 are safe right
+now, other values lead to a much lower timeout than a user expects.
+
+The block_copy() call where this is used already takes a uint64_t for
+the timeout, so no change required there.
+
+Fixes: 6db7fd1ca9 ("block/copy-before-write: implement cbw-timeout option")
+Reported-by: Friedrich Weber <f.weber@proxmox.com>
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+Tested-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ block/copy-before-write.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/block/copy-before-write.c b/block/copy-before-write.c
+index 8aba27a71d..026fa9840f 100644
+--- a/block/copy-before-write.c
++++ b/block/copy-before-write.c
+@@ -43,7 +43,7 @@ typedef struct BDRVCopyBeforeWriteState {
+ BlockCopyState *bcs;
+ BdrvChild *target;
+ OnCbwError on_cbw_error;
+- uint32_t cbw_timeout_ns;
++ uint64_t cbw_timeout_ns;
+
+ /*
+ * @lock: protects access to @access_bitmap, @done_bitmap and
diff --git a/debian/patches/pve/0044-block-copy-before-write-fix-permission.patch b/debian/patches/pve/0044-block-copy-before-write-fix-permission.patch
index 0d02af9..6a759a4 100644
--- a/debian/patches/pve/0044-block-copy-before-write-fix-permission.patch
+++ b/debian/patches/pve/0044-block-copy-before-write-fix-permission.patch
@@ -33,7 +33,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index 8aba27a71d..3e3af30c08 100644
+index 026fa9840f..5a9456d426 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -364,9 +364,13 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
diff --git a/debian/patches/pve/0045-block-copy-before-write-support-unligned-snapshot-di.patch b/debian/patches/pve/0045-block-copy-before-write-support-unligned-snapshot-di.patch
index c689750..f651c58 100644
--- a/debian/patches/pve/0045-block-copy-before-write-support-unligned-snapshot-di.patch
+++ b/debian/patches/pve/0045-block-copy-before-write-support-unligned-snapshot-di.patch
@@ -15,7 +15,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index 3e3af30c08..6d89af0b29 100644
+index 5a9456d426..c0e70669a2 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -325,14 +325,24 @@ static int coroutine_fn GRAPH_RDLOCK
diff --git a/debian/patches/pve/0046-block-copy-before-write-create-block_copy-bitmap-in-.patch b/debian/patches/pve/0046-block-copy-before-write-create-block_copy-bitmap-in-.patch
index 0fcda2e..7cd24d0 100644
--- a/debian/patches/pve/0046-block-copy-before-write-create-block_copy-bitmap-in-.patch
+++ b/debian/patches/pve/0046-block-copy-before-write-create-block_copy-bitmap-in-.patch
@@ -49,7 +49,7 @@ index 9ee3dd7ef5..8fca2c3698 100644
if (!copy_bitmap) {
return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index 6d89af0b29..ed2c228da7 100644
+index c0e70669a2..94db31512d 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -468,7 +468,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
diff --git a/debian/patches/pve/0047-qapi-blockdev-backup-add-discard-source-parameter.patch b/debian/patches/pve/0047-qapi-blockdev-backup-add-discard-source-parameter.patch
index eb34ad2..ef44f42 100644
--- a/debian/patches/pve/0047-qapi-blockdev-backup-add-discard-source-parameter.patch
+++ b/debian/patches/pve/0047-qapi-blockdev-backup-add-discard-source-parameter.patch
@@ -109,13 +109,13 @@ index 8fca2c3698..7e3b378528 100644
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index ed2c228da7..cd65524e26 100644
+index 94db31512d..853e01a1eb 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState {
BdrvChild *target;
OnCbwError on_cbw_error;
- uint32_t cbw_timeout_ns;
+ uint64_t cbw_timeout_ns;
+ bool discard_source;
/*
diff --git a/debian/patches/pve/0048-copy-before-write-allow-specifying-minimum-cluster-s.patch b/debian/patches/pve/0048-copy-before-write-allow-specifying-minimum-cluster-s.patch
index 5b80440..50a8cd2 100644
--- a/debian/patches/pve/0048-copy-before-write-allow-specifying-minimum-cluster-s.patch
+++ b/debian/patches/pve/0048-copy-before-write-allow-specifying-minimum-cluster-s.patch
@@ -82,7 +82,7 @@ index 7e3b378528..adb1cbb440 100644
return NULL;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index cd65524e26..ac05a4993f 100644
+index 853e01a1eb..47b3cdd09f 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -477,7 +477,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
diff --git a/debian/patches/pve/0049-backup-add-minimum-cluster-size-to-performance-optio.patch b/debian/patches/pve/0049-backup-add-minimum-cluster-size-to-performance-optio.patch
index fe37687..fe3ff95 100644
--- a/debian/patches/pve/0049-backup-add-minimum-cluster-size-to-performance-optio.patch
+++ b/debian/patches/pve/0049-backup-add-minimum-cluster-size-to-performance-optio.patch
@@ -36,7 +36,7 @@ index 1963e47ab9..fe69723ada 100644
goto error;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index ac05a4993f..d1e87f8cf4 100644
+index 47b3cdd09f..bba58326d7 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -546,6 +546,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
diff --git a/debian/patches/series b/debian/patches/series
index e955579..47e8307 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -2,6 +2,7 @@ extra/0001-monitor-qmp-fix-race-with-clients-disconnecting-earl.patch
extra/0002-scsi-megasas-Internal-cdbs-have-16-byte-length.patch
extra/0003-ide-avoid-potential-deadlock-when-draining-during-tr.patch
extra/0004-Revert-x86-acpi-workaround-Windows-not-handling-name.patch
+extra/0005-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pve-devel] [PATCH v3 qemu 2/2] backup: improve error when copy-before-write fails for fleecing
2024-04-29 15:20 [pve-devel] [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Fiona Ebner
@ 2024-04-29 15:20 ` Fiona Ebner
2024-04-30 7:45 ` [pve-devel] applied-series: [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Thomas Lamprecht
1 sibling, 0 replies; 3+ messages in thread
From: Fiona Ebner @ 2024-04-29 15:20 UTC (permalink / raw)
To: pve-devel
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* rebase on master
...ve-error-when-copy-before-write-fail.patch | 117 ++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 118 insertions(+)
create mode 100644 debian/patches/pve/0051-PVE-backup-improve-error-when-copy-before-write-fail.patch
diff --git a/debian/patches/pve/0051-PVE-backup-improve-error-when-copy-before-write-fail.patch b/debian/patches/pve/0051-PVE-backup-improve-error-when-copy-before-write-fail.patch
new file mode 100644
index 0000000..c7f2ccb
--- /dev/null
+++ b/debian/patches/pve/0051-PVE-backup-improve-error-when-copy-before-write-fail.patch
@@ -0,0 +1,117 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Fiona Ebner <f.ebner@proxmox.com>
+Date: Mon, 29 Apr 2024 14:43:58 +0200
+Subject: [PATCH] PVE backup: improve error when copy-before-write fails for
+ fleecing
+
+With fleecing, failure for copy-before-write does not fail the guest
+write, but only sets the snapshot error that is associated to the
+copy-before-write filter, making further requests to the snapshot
+access fail with EACCES, which then also fails the job. But that error
+code is not the root cause of why the backup failed, so bubble up the
+original snapshot error instead.
+
+Reported-by: Friedrich Weber <f.weber@proxmox.com>
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+Tested-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ block/copy-before-write.c | 18 ++++++++++++------
+ block/copy-before-write.h | 1 +
+ pve-backup.c | 9 +++++++++
+ 3 files changed, 22 insertions(+), 6 deletions(-)
+
+diff --git a/block/copy-before-write.c b/block/copy-before-write.c
+index bba58326d7..50cc4c7aae 100644
+--- a/block/copy-before-write.c
++++ b/block/copy-before-write.c
+@@ -27,6 +27,7 @@
+ #include "qapi/qmp/qjson.h"
+
+ #include "sysemu/block-backend.h"
++#include "qemu/atomic.h"
+ #include "qemu/cutils.h"
+ #include "qapi/error.h"
+ #include "block/block_int.h"
+@@ -74,7 +75,8 @@ typedef struct BDRVCopyBeforeWriteState {
+ * @snapshot_error is normally zero. But on first copy-before-write failure
+ * when @on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT, @snapshot_error takes
+ * value of this error (<0). After that all in-flight and further
+- * snapshot-API requests will fail with that error.
++ * snapshot-API requests will fail with that error. To be accessed with
++ * atomics.
+ */
+ int snapshot_error;
+ } BDRVCopyBeforeWriteState;
+@@ -114,7 +116,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
+ return 0;
+ }
+
+- if (s->snapshot_error) {
++ if (qatomic_read(&s->snapshot_error)) {
+ return 0;
+ }
+
+@@ -138,9 +140,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
+ WITH_QEMU_LOCK_GUARD(&s->lock) {
+ if (ret < 0) {
+ assert(s->on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT);
+- if (!s->snapshot_error) {
+- s->snapshot_error = ret;
+- }
++ qatomic_cmpxchg(&s->snapshot_error, 0, ret);
+ } else {
+ bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
+ }
+@@ -214,7 +214,7 @@ cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset, int64_t bytes,
+
+ QEMU_LOCK_GUARD(&s->lock);
+
+- if (s->snapshot_error) {
++ if (qatomic_read(&s->snapshot_error)) {
+ g_free(req);
+ return NULL;
+ }
+@@ -585,6 +585,12 @@ void bdrv_cbw_drop(BlockDriverState *bs)
+ bdrv_unref(bs);
+ }
+
++int bdrv_cbw_snapshot_error(BlockDriverState *bs)
++{
++ BDRVCopyBeforeWriteState *s = bs->opaque;
++ return qatomic_read(&s->snapshot_error);
++}
++
+ static void cbw_init(void)
+ {
+ bdrv_register(&bdrv_cbw_filter);
+diff --git a/block/copy-before-write.h b/block/copy-before-write.h
+index dc6cafe7fa..a27d2d7d9f 100644
+--- a/block/copy-before-write.h
++++ b/block/copy-before-write.h
+@@ -44,5 +44,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
+ BlockCopyState **bcs,
+ Error **errp);
+ void bdrv_cbw_drop(BlockDriverState *bs);
++int bdrv_cbw_snapshot_error(BlockDriverState *bs);
+
+ #endif /* COPY_BEFORE_WRITE_H */
+diff --git a/pve-backup.c b/pve-backup.c
+index 7cc1dd3724..07709aa350 100644
+--- a/pve-backup.c
++++ b/pve-backup.c
+@@ -379,6 +379,15 @@ static void pvebackup_complete_cb(void *opaque, int ret)
+ di->fleecing.snapshot_access = NULL;
+ }
+ if (di->fleecing.cbw) {
++ /*
++ * With fleecing, failure for cbw does not fail the guest write, but only sets the snapshot
++ * error, making further requests to the snapshot fail with EACCES, which then also fail the
++ * job. But that code is not the root cause and just confusing, so update it.
++ */
++ int snapshot_error = bdrv_cbw_snapshot_error(di->fleecing.cbw);
++ if (di->completed_ret == -EACCES && snapshot_error) {
++ di->completed_ret = snapshot_error;
++ }
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+ }
diff --git a/debian/patches/series b/debian/patches/series
index 47e8307..b97881e 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -59,3 +59,4 @@ pve/0047-qapi-blockdev-backup-add-discard-source-parameter.patch
pve/0048-copy-before-write-allow-specifying-minimum-cluster-s.patch
pve/0049-backup-add-minimum-cluster-size-to-performance-optio.patch
pve/0050-PVE-backup-add-fleecing-option.patch
+pve/0051-PVE-backup-improve-error-when-copy-before-write-fail.patch
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pve-devel] applied-series: [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout
2024-04-29 15:20 [pve-devel] [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Fiona Ebner
2024-04-29 15:20 ` [pve-devel] [PATCH v3 qemu 2/2] backup: improve error when copy-before-write fails for fleecing Fiona Ebner
@ 2024-04-30 7:45 ` Thomas Lamprecht
1 sibling, 0 replies; 3+ messages in thread
From: Thomas Lamprecht @ 2024-04-30 7:45 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
On 29/04/2024 17:20, Fiona Ebner wrote:
> The type for the copy-before-write timeout in nanoseconds was wrong.
> By being just uint32_t, a maximum of slightly over 4 seconds was
> possible. Larger values would overflow and thus the 45 seconds set by
> Proxmox's backup with fleecing, resulted in effectively 2 seconds
> timeout for copy-before-write operations.
>
> Reported-by: Friedrich Weber <f.weber@proxmox.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> Changes in v3:
> * rebase on master
>
> Upstream submission of this patch:
> https://lore.kernel.org/qemu-devel/20240429141934.442154-1-f.ebner@proxmox.com/T/#u
>
> ...e-write-use-uint64_t-for-timeout-in-.patch | 35 +++++++++++++++++++
> ...ock-copy-before-write-fix-permission.patch | 2 +-
> ...e-write-support-unligned-snapshot-di.patch | 2 +-
> ...e-write-create-block_copy-bitmap-in-.patch | 2 +-
> ...-backup-add-discard-source-parameter.patch | 4 +--
> ...e-allow-specifying-minimum-cluster-s.patch | 2 +-
> ...um-cluster-size-to-performance-optio.patch | 2 +-
> debian/patches/series | 1 +
> 8 files changed, 43 insertions(+), 7 deletions(-)
> create mode 100644 debian/patches/extra/0005-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
>
>
applied series, thanks!
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-04-30 7:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-29 15:20 [pve-devel] [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Fiona Ebner
2024-04-29 15:20 ` [pve-devel] [PATCH v3 qemu 2/2] backup: improve error when copy-before-write fails for fleecing Fiona Ebner
2024-04-30 7:45 ` [pve-devel] applied-series: [PATCH v3 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox