* [pve-devel] [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout
@ 2024-04-29 14:27 Fiona Ebner
2024-04-29 14:27 ` [pve-devel] [PATCH v2 qemu 2/2] backup: improve error when copy-before-write fails for fleecing Fiona Ebner
2024-04-29 15:05 ` [pve-devel] applied: [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Thomas Lamprecht
0 siblings, 2 replies; 3+ messages in thread
From: Fiona Ebner @ 2024-04-29 14:27 UTC (permalink / raw)
To: pve-devel
The type for the copy-before-write timeout in nanoseconds was wrong.
By being just uint32_t, a maximum of slightly over 4 seconds was
possible. Larger values would overflow and thus the 45 seconds set by
Proxmox's backup with fleecing, resulted in effectively 2 seconds
timeout for copy-before-write operations.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v2.
Upstream submission of this patch:
https://lore.kernel.org/qemu-devel/20240429141934.442154-1-f.ebner@proxmox.com/T/#u
...e-write-use-uint64_t-for-timeout-in-.patch | 35 +++++++++++++++++++
...ock-copy-before-write-fix-permission.patch | 2 +-
...e-write-support-unligned-snapshot-di.patch | 2 +-
...e-write-create-block_copy-bitmap-in-.patch | 2 +-
...-backup-add-discard-source-parameter.patch | 4 +--
...e-allow-specifying-minimum-cluster-s.patch | 2 +-
...um-cluster-size-to-performance-optio.patch | 2 +-
debian/patches/series | 1 +
8 files changed, 43 insertions(+), 7 deletions(-)
create mode 100644 debian/patches/extra/0014-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
diff --git a/debian/patches/extra/0014-block-copy-before-write-use-uint64_t-for-timeout-in-.patch b/debian/patches/extra/0014-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
new file mode 100644
index 0000000..b350067
--- /dev/null
+++ b/debian/patches/extra/0014-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
@@ -0,0 +1,35 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Fiona Ebner <f.ebner@proxmox.com>
+Date: Mon, 29 Apr 2024 15:41:11 +0200
+Subject: [PATCH] block/copy-before-write: use uint64_t for timeout in
+ nanoseconds
+
+rather than the uint32_t for which the maximum is slightly more than 4
+seconds and larger values would overflow. The QAPI interface allows
+specifying the number of seconds, so only values 0 to 4 are safe right
+now, other values lead to a much lower timeout than a user expects.
+
+The block_copy() call where this is used already takes a uint64_t for
+the timeout, so no change required there.
+
+Fixes: 6db7fd1ca9 ("block/copy-before-write: implement cbw-timeout option")
+Reported-by: Friedrich Weber <f.weber@proxmox.com>
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+Tested-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ block/copy-before-write.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/block/copy-before-write.c b/block/copy-before-write.c
+index b866e42271..3ee95c0e7a 100644
+--- a/block/copy-before-write.c
++++ b/block/copy-before-write.c
+@@ -43,7 +43,7 @@ typedef struct BDRVCopyBeforeWriteState {
+ BlockCopyState *bcs;
+ BdrvChild *target;
+ OnCbwError on_cbw_error;
+- uint32_t cbw_timeout_ns;
++ uint64_t cbw_timeout_ns;
+
+ /*
+ * @lock: protects access to @access_bitmap, @done_bitmap and
diff --git a/debian/patches/pve/0046-block-copy-before-write-fix-permission.patch b/debian/patches/pve/0046-block-copy-before-write-fix-permission.patch
index 1c2e5bd..5703311 100644
--- a/debian/patches/pve/0046-block-copy-before-write-fix-permission.patch
+++ b/debian/patches/pve/0046-block-copy-before-write-fix-permission.patch
@@ -33,7 +33,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index b866e42271..a2dddf6f57 100644
+index 3ee95c0e7a..641236dbf8 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -364,9 +364,13 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
diff --git a/debian/patches/pve/0047-block-copy-before-write-support-unligned-snapshot-di.patch b/debian/patches/pve/0047-block-copy-before-write-support-unligned-snapshot-di.patch
index 4656750..b02025a 100644
--- a/debian/patches/pve/0047-block-copy-before-write-support-unligned-snapshot-di.patch
+++ b/debian/patches/pve/0047-block-copy-before-write-support-unligned-snapshot-di.patch
@@ -15,7 +15,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index a2dddf6f57..0a219c2b75 100644
+index 641236dbf8..71cf50343f 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -325,14 +325,24 @@ static int coroutine_fn GRAPH_RDLOCK
diff --git a/debian/patches/pve/0048-block-copy-before-write-create-block_copy-bitmap-in-.patch b/debian/patches/pve/0048-block-copy-before-write-create-block_copy-bitmap-in-.patch
index ab7c0bf..5148257 100644
--- a/debian/patches/pve/0048-block-copy-before-write-create-block_copy-bitmap-in-.patch
+++ b/debian/patches/pve/0048-block-copy-before-write-create-block_copy-bitmap-in-.patch
@@ -49,7 +49,7 @@ index e13d7bc6b6..b61685f1a2 100644
if (!copy_bitmap) {
return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index 0a219c2b75..d3b95bd600 100644
+index 71cf50343f..ba827b0aba 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -470,7 +470,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
diff --git a/debian/patches/pve/0049-qapi-blockdev-backup-add-discard-source-parameter.patch b/debian/patches/pve/0049-qapi-blockdev-backup-add-discard-source-parameter.patch
index b6eee3e..2152c49 100644
--- a/debian/patches/pve/0049-qapi-blockdev-backup-add-discard-source-parameter.patch
+++ b/debian/patches/pve/0049-qapi-blockdev-backup-add-discard-source-parameter.patch
@@ -109,13 +109,13 @@ index b61685f1a2..3c61e52bae 100644
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index d3b95bd600..3503702d71 100644
+index ba827b0aba..2d92ae4043 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState {
BdrvChild *target;
OnCbwError on_cbw_error;
- uint32_t cbw_timeout_ns;
+ uint64_t cbw_timeout_ns;
+ bool discard_source;
/*
diff --git a/debian/patches/pve/0050-copy-before-write-allow-specifying-minimum-cluster-s.patch b/debian/patches/pve/0050-copy-before-write-allow-specifying-minimum-cluster-s.patch
index 17949f4..557cd3a 100644
--- a/debian/patches/pve/0050-copy-before-write-allow-specifying-minimum-cluster-s.patch
+++ b/debian/patches/pve/0050-copy-before-write-allow-specifying-minimum-cluster-s.patch
@@ -82,7 +82,7 @@ index 3c61e52bae..c9a722a5a6 100644
return NULL;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index 3503702d71..4a8c5bdb62 100644
+index 2d92ae4043..8a8f7685fa 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -479,7 +479,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
diff --git a/debian/patches/pve/0051-backup-add-minimum-cluster-size-to-performance-optio.patch b/debian/patches/pve/0051-backup-add-minimum-cluster-size-to-performance-optio.patch
index d760e45..3ac4b68 100644
--- a/debian/patches/pve/0051-backup-add-minimum-cluster-size-to-performance-optio.patch
+++ b/debian/patches/pve/0051-backup-add-minimum-cluster-size-to-performance-optio.patch
@@ -36,7 +36,7 @@ index 3dc955f625..ac5bd81338 100644
goto error;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
-index 4a8c5bdb62..9ca5ec5e5c 100644
+index 8a8f7685fa..7cc5e4f9b9 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -555,6 +555,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
diff --git a/debian/patches/series b/debian/patches/series
index 1680bca..44086f1 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -11,6 +11,7 @@ extra/0010-virtio-scsi-Attach-event-vq-notifier-with-no_poll.patch
extra/0011-virtio-Re-enable-notifications-after-drain.patch
extra/0012-qemu_init-increase-NOFILE-soft-limit-on-POSIX.patch
extra/0013-virtio-blk-avoid-using-ioeventfd-state-in-irqfd-cond.patch
+extra/0014-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pve-devel] [PATCH v2 qemu 2/2] backup: improve error when copy-before-write fails for fleecing
2024-04-29 14:27 [pve-devel] [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Fiona Ebner
@ 2024-04-29 14:27 ` Fiona Ebner
2024-04-29 15:05 ` [pve-devel] applied: [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Thomas Lamprecht
1 sibling, 0 replies; 3+ messages in thread
From: Fiona Ebner @ 2024-04-29 14:27 UTC (permalink / raw)
To: pve-devel
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes in v2, but sent as a patch for the parent git module.
...ve-error-when-copy-before-write-fail.patch | 117 ++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 118 insertions(+)
create mode 100644 debian/patches/pve/0053-PVE-backup-improve-error-when-copy-before-write-fail.patch
diff --git a/debian/patches/pve/0053-PVE-backup-improve-error-when-copy-before-write-fail.patch b/debian/patches/pve/0053-PVE-backup-improve-error-when-copy-before-write-fail.patch
new file mode 100644
index 0000000..d422157
--- /dev/null
+++ b/debian/patches/pve/0053-PVE-backup-improve-error-when-copy-before-write-fail.patch
@@ -0,0 +1,117 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Fiona Ebner <f.ebner@proxmox.com>
+Date: Mon, 29 Apr 2024 14:43:58 +0200
+Subject: [PATCH] PVE backup: improve error when copy-before-write fails for
+ fleecing
+
+With fleecing, failure for copy-before-write does not fail the guest
+write, but only sets the snapshot error that is associated to the
+copy-before-write filter, making further requests to the snapshot
+access fail with EACCES, which then also fails the job. But that error
+code is not the root cause of why the backup failed, so bubble up the
+original snapshot error instead.
+
+Reported-by: Friedrich Weber <f.weber@proxmox.com>
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+Tested-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ block/copy-before-write.c | 18 ++++++++++++------
+ block/copy-before-write.h | 1 +
+ pve-backup.c | 9 +++++++++
+ 3 files changed, 22 insertions(+), 6 deletions(-)
+
+diff --git a/block/copy-before-write.c b/block/copy-before-write.c
+index 7cc5e4f9b9..116b5ed5c9 100644
+--- a/block/copy-before-write.c
++++ b/block/copy-before-write.c
+@@ -27,6 +27,7 @@
+ #include "qapi/qmp/qjson.h"
+
+ #include "sysemu/block-backend.h"
++#include "qemu/atomic.h"
+ #include "qemu/cutils.h"
+ #include "qapi/error.h"
+ #include "block/block_int.h"
+@@ -74,7 +75,8 @@ typedef struct BDRVCopyBeforeWriteState {
+ * @snapshot_error is normally zero. But on first copy-before-write failure
+ * when @on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT, @snapshot_error takes
+ * value of this error (<0). After that all in-flight and further
+- * snapshot-API requests will fail with that error.
++ * snapshot-API requests will fail with that error. To be accessed with
++ * atomics.
+ */
+ int snapshot_error;
+ } BDRVCopyBeforeWriteState;
+@@ -114,7 +116,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
+ return 0;
+ }
+
+- if (s->snapshot_error) {
++ if (qatomic_read(&s->snapshot_error)) {
+ return 0;
+ }
+
+@@ -138,9 +140,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
+ WITH_QEMU_LOCK_GUARD(&s->lock) {
+ if (ret < 0) {
+ assert(s->on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT);
+- if (!s->snapshot_error) {
+- s->snapshot_error = ret;
+- }
++ qatomic_cmpxchg(&s->snapshot_error, 0, ret);
+ } else {
+ bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
+ }
+@@ -214,7 +214,7 @@ cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset, int64_t bytes,
+
+ QEMU_LOCK_GUARD(&s->lock);
+
+- if (s->snapshot_error) {
++ if (qatomic_read(&s->snapshot_error)) {
+ g_free(req);
+ return NULL;
+ }
+@@ -594,6 +594,12 @@ void bdrv_cbw_drop(BlockDriverState *bs)
+ bdrv_unref(bs);
+ }
+
++int bdrv_cbw_snapshot_error(BlockDriverState *bs)
++{
++ BDRVCopyBeforeWriteState *s = bs->opaque;
++ return qatomic_read(&s->snapshot_error);
++}
++
+ static void cbw_init(void)
+ {
+ bdrv_register(&bdrv_cbw_filter);
+diff --git a/block/copy-before-write.h b/block/copy-before-write.h
+index dc6cafe7fa..a27d2d7d9f 100644
+--- a/block/copy-before-write.h
++++ b/block/copy-before-write.h
+@@ -44,5 +44,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
+ BlockCopyState **bcs,
+ Error **errp);
+ void bdrv_cbw_drop(BlockDriverState *bs);
++int bdrv_cbw_snapshot_error(BlockDriverState *bs);
+
+ #endif /* COPY_BEFORE_WRITE_H */
+diff --git a/pve-backup.c b/pve-backup.c
+index 00aaff6509..f8fa8e068b 100644
+--- a/pve-backup.c
++++ b/pve-backup.c
+@@ -385,6 +385,15 @@ static void pvebackup_complete_cb(void *opaque, int ret)
+ di->fleecing.snapshot_access = NULL;
+ }
+ if (di->fleecing.cbw) {
++ /*
++ * With fleecing, failure for cbw does not fail the guest write, but only sets the snapshot
++ * error, making further requests to the snapshot fail with EACCES, which then also fail the
++ * job. But that code is not the root cause and just confusing, so update it.
++ */
++ int snapshot_error = bdrv_cbw_snapshot_error(di->fleecing.cbw);
++ if (di->completed_ret == -EACCES && snapshot_error) {
++ di->completed_ret = snapshot_error;
++ }
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+ }
diff --git a/debian/patches/series b/debian/patches/series
index 44086f1..f08feb2 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -70,3 +70,4 @@ pve/0049-qapi-blockdev-backup-add-discard-source-parameter.patch
pve/0050-copy-before-write-allow-specifying-minimum-cluster-s.patch
pve/0051-backup-add-minimum-cluster-size-to-performance-optio.patch
pve/0052-PVE-backup-add-fleecing-option.patch
+pve/0053-PVE-backup-improve-error-when-copy-before-write-fail.patch
--
2.39.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pve-devel] applied: [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout
2024-04-29 14:27 [pve-devel] [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Fiona Ebner
2024-04-29 14:27 ` [pve-devel] [PATCH v2 qemu 2/2] backup: improve error when copy-before-write fails for fleecing Fiona Ebner
@ 2024-04-29 15:05 ` Thomas Lamprecht
1 sibling, 0 replies; 3+ messages in thread
From: Thomas Lamprecht @ 2024-04-29 15:05 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
Am 29/04/2024 um 16:27 schrieb Fiona Ebner:
> The type for the copy-before-write timeout in nanoseconds was wrong.
> By being just uint32_t, a maximum of slightly over 4 seconds was
> possible. Larger values would overflow and thus the 45 seconds set by
> Proxmox's backup with fleecing, resulted in effectively 2 seconds
> timeout for copy-before-write operations.
>
> Reported-by: Friedrich Weber <f.weber@proxmox.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> New in v2.
>
> Upstream submission of this patch:
> https://lore.kernel.org/qemu-devel/20240429141934.442154-1-f.ebner@proxmox.com/T/#u
>
> ...e-write-use-uint64_t-for-timeout-in-.patch | 35 +++++++++++++++++++
> ...ock-copy-before-write-fix-permission.patch | 2 +-
> ...e-write-support-unligned-snapshot-di.patch | 2 +-
> ...e-write-create-block_copy-bitmap-in-.patch | 2 +-
> ...-backup-add-discard-source-parameter.patch | 4 +--
> ...e-allow-specifying-minimum-cluster-s.patch | 2 +-
> ...um-cluster-size-to-performance-optio.patch | 2 +-
> debian/patches/series | 1 +
> 8 files changed, 43 insertions(+), 7 deletions(-)
> create mode 100644 debian/patches/extra/0014-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
>
>
applied this one to a new stable-8 branch that was split of the last 8.1.5-5
release (8 in the branch name is meaning the PVE major release here though,
can then be merged with master before that switches over to target PVE 9).
Does not apply anymore on master, which is QEMU 9.0 now, so please send a
rebase for that.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-04-29 15:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-29 14:27 [pve-devel] [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Fiona Ebner
2024-04-29 14:27 ` [pve-devel] [PATCH v2 qemu 2/2] backup: improve error when copy-before-write fails for fleecing Fiona Ebner
2024-04-29 15:05 ` [pve-devel] applied: [PATCH v2 qemu 1/2] fix #5409: backup: fix copy-before-write timeout Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox