public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow
@ 2024-06-10 12:59 Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size Fiona Ebner
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

A long-standing issue with VM backups in Proxmox VE is that a slow or
unreachable target would lead to a copy-before-write (cbw) operation
to break the guest write rather than abort the backup. This is
unexpected to users and the will end up without a successful backup
and without a working guest in such cases. This series prevents the
latter by changing the behavior to fail the backup instead of the
guest write.

This is done by re-using the already existing 'on-cbw-error' and
'cbw-timeout' options that are already used for fleecing and having
regular backup also check for the cbw's snapshot_error (unfortunately
this becomes a bit of a misnomer). If a given copy-before-write
operation cannot complete within 45 seconds, it's extremely likely
that aborting the backup is the better choice than keeping the guest
IO blocked.

Just checking for the error already makes it work (i.e. without the
last two patches), but backup can only check the error at the end. To
abort backup immediately, an error callback for the copy-before-write
node is introduced. A potential alternative would be give the
block-copy operation a pointer to the snapshot_error and have it check
it during its operation, but my initial attempt failed. Likely I
missed adapting certain logic that checks for whether the block-copy
operation failed and it's questionable if this approach would be
cleaner. An error callback is nice and explicit.

Note for testers: if e.g. the PBS is compeletly unreachable, the
backup job still will need to wait until the in-flight request is
aborted after 15 minutes. But the guest writes should be fast again.

Should it really be required to make the option more flexible, i.e.
allow users to specify a custom timeout or go back to the old behavior
then the 'backup' QMP call can be extended with those parameters.

Unfortunately, this is a non-trivial amount of code to make it work,
but there is quite a bit of boilerplate and some comments, so
hopefully the logic is straight-forward enough.


The first patch can be applied regardless of whether we want to go
with this or not.


Fiona Ebner (7):
  PVE backup: fleecing: properly set minimum cluster size
  block/copy-before-write: allow passing additional options for
    bdrv_cbw_append()
  block/backup: allow passing additional options for copy-before-write
    upon job creation
  block/backup: make cbw error also fail backup that does not use
    fleecing
  fix #3231+#3631: PVE backup: add timeout for copy-before-write
    operations and fail backup instead of guest writes
  block/copy-before-write: allow specifying error callback
  block/backup: set callback for cbw errors

 block/backup.c                         | 57 +++++++++++++++++++++++++-
 block/copy-before-write.c              | 41 +++++++++++++++---
 block/copy-before-write.h              |  9 +++-
 block/replication.c                    |  2 +-
 blockdev.c                             |  2 +-
 include/block/block_int-global-state.h |  2 +
 pve-backup.c                           | 13 +++++-
 7 files changed, 115 insertions(+), 11 deletions(-)

-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
@ 2024-06-10 12:59 ` Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 2/7] block/copy-before-write: allow passing additional options for bdrv_cbw_append() Fiona Ebner
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

While the current code doesn't care, prepare for future changes
relying on the flag being set as well.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

To be squashed into "PVE backup: add fleecing option"

 pve-backup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pve-backup.c b/pve-backup.c
index 07709aa350..4e80a9f283 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -630,6 +630,7 @@ static void create_backup_jobs_bh(void *opaque) {
             if (bdrv_get_info(di->fleecing.bs, &bdi) >= 0) {
                 perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
             }
+            perf.has_min_cluster_size = true;
         }
 
         BlockJob *job = backup_job_create(
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [RFC qemu 2/7] block/copy-before-write: allow passing additional options for bdrv_cbw_append()
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size Fiona Ebner
@ 2024-06-10 12:59 ` Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation Fiona Ebner
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

In preparation to allow configuring 'on-cbw-error' and 'cbw-timeout'
for a backup job. The previously already passed-in 'min-cluster-size'
option is also passed-in via the QDict now.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 block/backup.c            | 9 ++++++++-
 block/copy-before-write.c | 8 ++++----
 block/copy-before-write.h | 2 +-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index fe69723ada..e0acfe6758 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -21,6 +21,7 @@
 #include "block/block_backup.h"
 #include "block/block-copy.h"
 #include "block/dirty-bitmap.h"
+#include "block/qdict.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "sysemu/block-backend.h"
@@ -346,6 +347,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int64_t cluster_size;
     BlockDriverState *cbw = NULL;
     BlockCopyState *bcs = NULL;
+    QDict *cbw_opts = NULL;
 
     assert(bs);
     assert(target);
@@ -433,8 +435,13 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         goto error;
     }
 
+    if (perf->has_min_cluster_size) {
+        cbw_opts = qdict_new();
+        qdict_put_int(cbw_opts, "min-cluster-size", perf->min_cluster_size);
+    }
+
     cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
-                          perf->min_cluster_size, &bcs, errp);
+                          cbw_opts, &bcs, errp);
     if (!cbw) {
         goto error;
     }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 50cc4c7aae..5e0c1252f6 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -546,26 +546,26 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
                                   bool discard_source,
-                                  int64_t min_cluster_size,
+                                  QDict *opts,
                                   BlockCopyState **bcs,
                                   Error **errp)
 {
     BDRVCopyBeforeWriteState *state;
     BlockDriverState *top;
-    QDict *opts;
     int flags = BDRV_O_RDWR | (discard_source ? BDRV_O_CBW_DISCARD_SOURCE : 0);
 
     assert(source->total_sectors == target->total_sectors);
     GLOBAL_STATE_CODE();
 
-    opts = qdict_new();
+    if (!opts) {
+        opts = qdict_new();
+    }
     qdict_put_str(opts, "driver", "copy-before-write");
     if (filter_node_name) {
         qdict_put_str(opts, "node-name", filter_node_name);
     }
     qdict_put_str(opts, "file", bdrv_get_node_name(source));
     qdict_put_str(opts, "target", bdrv_get_node_name(target));
-    qdict_put_int(opts, "min-cluster-size", min_cluster_size);
 
     top = bdrv_insert_node(source, opts, flags, errp);
     if (!top) {
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index a27d2d7d9f..4c22bd282e 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,7 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
                                   bool discard_source,
-                                  int64_t min_cluster_size,
+                                  QDict *opts,
                                   BlockCopyState **bcs,
                                   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 2/7] block/copy-before-write: allow passing additional options for bdrv_cbw_append() Fiona Ebner
@ 2024-06-10 12:59 ` Fiona Ebner
  2024-07-05  9:30   ` Fabian Grünbichler
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 4/7] block/backup: make cbw error also fail backup that does not use fleecing Fiona Ebner
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

In particular, useful for setting the 'on-cbw-error' and 'cbw-timeout'
options (see BlockdevOptionsCbw in QAPI).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 block/backup.c                         | 10 +++++++---
 block/replication.c                    |  2 +-
 blockdev.c                             |  2 +-
 include/block/block_int-global-state.h |  2 ++
 pve-backup.c                           |  2 +-
 5 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index e0acfe6758..82fedf1680 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -336,6 +336,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   bool compress, bool discard_source,
                   const char *filter_node_name,
                   BackupPerf *perf,
+                  QDict *cbw_opts,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   int creation_flags,
@@ -347,7 +348,6 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int64_t cluster_size;
     BlockDriverState *cbw = NULL;
     BlockCopyState *bcs = NULL;
-    QDict *cbw_opts = NULL;
 
     assert(bs);
     assert(target);
@@ -436,8 +436,12 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     }
 
     if (perf->has_min_cluster_size) {
-        cbw_opts = qdict_new();
-        qdict_put_int(cbw_opts, "min-cluster-size", perf->min_cluster_size);
+        if (!cbw_opts) {
+            cbw_opts = qdict_new();
+        }
+        if (!qdict_haskey(cbw_opts, "min-cluster-size")) {
+            qdict_put_int(cbw_opts, "min-cluster-size", perf->min_cluster_size);
+        }
     }
 
     cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
diff --git a/block/replication.c b/block/replication.c
index 0415a5e8b7..c5a27f593e 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -583,7 +583,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         s->backup_job = backup_job_create(
                                 NULL, s->secondary_disk->bs, s->hidden_disk->bs,
                                 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, false,
-                                NULL, &perf,
+                                NULL, &perf, NULL,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
diff --git a/blockdev.c b/blockdev.c
index cbe224387b..55ca967430 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2732,7 +2732,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
                             backup->sync, bmap, backup->bitmap_mode,
                             backup->compress, backup->discard_source,
                             backup->filter_node_name,
-                            &perf,
+                            &perf, NULL,
                             backup->on_source_error,
                             backup->on_target_error,
                             job_flags, NULL, NULL, txn, errp);
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index f0c642b194..7332680087 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -179,6 +179,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
  * @bitmap_mode: The bitmap synchronization policy to use.
  * @perf: Performance options. All actual fields assumed to be present,
  *        all ".has_*" fields are ignored.
+ * @cbw_opts: Additional options to configure cbw filter with.
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
  * @creation_flags: Flags that control the behavior of the Job lifetime.
@@ -198,6 +199,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                             bool compress, bool discard_source,
                             const char *filter_node_name,
                             BackupPerf *perf,
+                            QDict *cbw_opts,
                             BlockdevOnError on_source_error,
                             BlockdevOnError on_target_error,
                             int creation_flags,
diff --git a/pve-backup.c b/pve-backup.c
index 4e80a9f283..108e185a20 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -635,7 +635,7 @@ static void create_backup_jobs_bh(void *opaque) {
 
         BlockJob *job = backup_job_create(
             job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
-            bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
+            bitmap_mode, false, discard_source, NULL, &perf, NULL, BLOCKDEV_ON_ERROR_REPORT,
             BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
             &local_err);
 
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [RFC qemu 4/7] block/backup: make cbw error also fail backup that does not use fleecing
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
                   ` (2 preceding siblings ...)
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation Fiona Ebner
@ 2024-06-10 12:59 ` Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes Fiona Ebner
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

If the backup target can't be reached or is very slow, then the
default behavior for QEMU backup is to break the guest write. While an
option to change this was introduced with the 'on-cbw-error' option
for the copy-before-write filter, it only took an effect in
combination with a snapshot-access node like is used for backup
fleecing.

In preparation to set the 'on-cbw-error' option for PVE backups that
do not use fleecing.

Unfortunately, the name for the option is "break-snapshot", but in the
non-fleecing case, there is no snapshot-access node involved, so it's
a bit of a misnomer.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 block/backup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 82fedf1680..ba153110d3 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -202,6 +202,9 @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
 out:
     block_copy_call_free(s);
     job->bg_bcs_call = NULL;
+    if (!ret) {
+        ret = bdrv_cbw_snapshot_error(job->cbw);
+    }
     return ret;
 }
 
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
                   ` (3 preceding siblings ...)
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 4/7] block/backup: make cbw error also fail backup that does not use fleecing Fiona Ebner
@ 2024-06-10 12:59 ` Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 6/7] block/copy-before-write: allow specifying error callback Fiona Ebner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

If the backup target can't be reached or is very slow, then the
default behavior for QEMU backup is to break the guest write. This is
undesirable and it is more expected and less intrusive to make the
backup error out instead.

A timeout of 45 seconds for copy-before-write operations is set, like
for fleecing. Guest drivers like virtio-win have issues when a write
takes more than 60 seconds and still completes afterwards, so a value
below that was chosen.

Unfortunately, with this alone, the backup would still try to run to
completion and fail only at the very end. This can be improved by
adding a callback function that will abort the backup once a
copy-before-write operation fails.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 pve-backup.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 108e185a20..9843d8d122 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -560,6 +560,7 @@ static void create_backup_jobs_bh(void *opaque) {
         bdrv_drained_begin(di->bs);
 
         BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+        QDict *backup_cbw_opts = qdict_new();
 
         BlockDriverState *source_bs = di->bs;
         bool discard_source = false;
@@ -631,11 +632,18 @@ static void create_backup_jobs_bh(void *opaque) {
                 perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
             }
             perf.has_min_cluster_size = true;
+        } else {
+            /*
+             * When fleecing is not used, need to set the options on the copy-before-write node
+             * installed by the backup job itself.
+             */
+            qdict_put_str(backup_cbw_opts, "on-cbw-error", "break-snapshot");
+            qdict_put_int(backup_cbw_opts, "cbw-timeout", 45);
         }
 
         BlockJob *job = backup_job_create(
-            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
-            bitmap_mode, false, discard_source, NULL, &perf, NULL, BLOCKDEV_ON_ERROR_REPORT,
+            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, bitmap_mode,
+            false, discard_source, NULL, &perf, backup_cbw_opts, BLOCKDEV_ON_ERROR_REPORT,
             BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
             &local_err);
 
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [RFC qemu 6/7] block/copy-before-write: allow specifying error callback
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
                   ` (4 preceding siblings ...)
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes Fiona Ebner
@ 2024-06-10 12:59 ` Fiona Ebner
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors Fiona Ebner
  2024-07-05  9:41 ` [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fabian Grünbichler
  7 siblings, 0 replies; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

The callback is invoked when cbw is configured to not break the guest
write. Will be useful to be able to abort a backup job immediately.
Currently, it has to wait for the rest of the block copy operation to
finish before checking the cbw error state. Since the job will be
required for the callback and the cbw node is created before the
backup job, the error callback can only be set after the cbw node is
created. That is why a dedicated function is used for that.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 block/copy-before-write.c | 33 ++++++++++++++++++++++++++++++++-
 block/copy-before-write.h |  7 +++++++
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 5e0c1252f6..00ce1e957e 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -79,6 +79,9 @@ typedef struct BDRVCopyBeforeWriteState {
      * atomics.
      */
     int snapshot_error;
+
+    CbwErrorCallbackFunc error_cb;
+    void *error_cb_opaque;
 } BDRVCopyBeforeWriteState;
 
 static int coroutine_fn GRAPH_RDLOCK
@@ -140,7 +143,15 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
     WITH_QEMU_LOCK_GUARD(&s->lock) {
         if (ret < 0) {
             assert(s->on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT);
-            qatomic_cmpxchg(&s->snapshot_error, 0, ret);
+            int old = qatomic_cmpxchg(&s->snapshot_error, 0, ret);
+            CbwErrorCallbackFunc error_cb = qatomic_load_acquire(&s->error_cb);
+            /*
+             * Only invoke error callback the first time. There can be massive amounts of errros
+             * when the backup target cannot be reached.
+             */
+            if (!old && error_cb) {
+                error_cb(s->error_cb_opaque, ret);
+            }
         } else {
             bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
         }
@@ -591,6 +602,26 @@ int bdrv_cbw_snapshot_error(BlockDriverState *bs)
     return qatomic_read(&s->snapshot_error);
 }
 
+void bdrv_cbw_set_error_cb(BlockDriverState *bs, CbwErrorCallbackFunc error_cb,
+                           void *error_cb_opaque)
+{
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+
+    /*
+     * Supporting multiple calls requires using locking or indirection of callback+arguments into
+     * another struct whose pointer can then be set with atomics.
+     */
+    assert(!s->error_cb);
+
+    s->error_cb_opaque = error_cb_opaque;
+    /*
+     * The only user of this function is during initalization of the backup job while drained. And
+     * while there should be enough stuff happening in between to ensure the callback and argument
+     * is seen by cbw_do_copy_before_write(), play it safe.
+     */
+    qatomic_store_release(&s->error_cb, error_cb);
+}
+
 static void cbw_init(void)
 {
     bdrv_register(&bdrv_cbw_filter);
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 4c22bd282e..cddbd5d9a2 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -29,6 +29,8 @@
 #include "block/block_int.h"
 #include "block/block-copy.h"
 
+typedef void (*CbwErrorCallbackFunc)(void *opaque, int error);
+
 /*
  * Global state (GS) API. These functions run under the BQL.
  *
@@ -45,5 +47,10 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
 int bdrv_cbw_snapshot_error(BlockDriverState *bs);
+/*
+ * Can only be called once for a given cbw node.
+ */
+void bdrv_cbw_set_error_cb(BlockDriverState *bs, CbwErrorCallbackFunc error_cb,
+                           void *error_cb_opaque);
 
 #endif /* COPY_BEFORE_WRITE_H */
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
                   ` (5 preceding siblings ...)
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 6/7] block/copy-before-write: allow specifying error callback Fiona Ebner
@ 2024-06-10 12:59 ` Fiona Ebner
  2024-07-05  9:37   ` Fabian Grünbichler
  2024-07-05  9:41 ` [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fabian Grünbichler
  7 siblings, 1 reply; 11+ messages in thread
From: Fiona Ebner @ 2024-06-10 12:59 UTC (permalink / raw)
  To: pve-devel

The callback is invoked when cbw is configured to not break the guest
write and will abort a backup job immediately. Currently the backup
has to wait for the rest of the block copy operation to finish before
checking the cbw error state.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Note for testers: if e.g. the PBS is compeletly unreachable, the
backup job still will need to wait until the in-flight request is
aborted after 15 minutes. But the guest writes should be fast again.

 block/backup.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index ba153110d3..43d34ce4c2 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -32,6 +32,45 @@
 
 static const BlockJobDriver backup_job_driver;
 
+typedef struct {
+    Job *job;
+    int ret;
+} BackupOnCbwError;
+
+static void backup_on_cbw_error_cb_bh(void *opaque)
+{
+    BackupOnCbwError *data = opaque;
+    if (data->job) {
+        WITH_JOB_LOCK_GUARD() {
+            if (!job_is_completed_locked(data->job)) {
+                error_report("backup was cancelled because of copy-before-write error: %s",
+                             strerror(-data->ret));
+                job_cancel_locked(data->job, true);
+            }
+        }
+    } else {
+        error_report("backup_on_cbw_error_cb_bh: no job! Error: %s", strerror(-data->ret));
+    }
+
+    g_free(data);
+}
+
+static void backup_on_cbw_error_cb(void *opaque, int ret)
+{
+    BackupOnCbwError *data = g_new0(BackupOnCbwError, 1);
+    data->job = opaque;
+    data->ret = ret;
+
+    /*
+     * backup_cancel() cannot run in coroutine context.
+     */
+    if (qemu_in_coroutine()) {
+        aio_bh_schedule_oneshot(qemu_get_aio_context(), backup_on_cbw_error_cb_bh, data);
+    } else {
+        backup_on_cbw_error_cb_bh(data);
+    }
+}
+
 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
 {
     BdrvDirtyBitmap *bm;
@@ -477,6 +516,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         goto error;
     }
 
+    bdrv_cbw_set_error_cb(cbw, backup_on_cbw_error_cb, job);
+
     job->cbw = cbw;
     job->source_bs = bs;
     job->target_bs = target;
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation Fiona Ebner
@ 2024-07-05  9:30   ` Fabian Grünbichler
  0 siblings, 0 replies; 11+ messages in thread
From: Fabian Grünbichler @ 2024-07-05  9:30 UTC (permalink / raw)
  To: Proxmox VE development discussion

Quoting Fiona Ebner (2024-06-10 14:59:38)
> In particular, useful for setting the 'on-cbw-error' and 'cbw-timeout'
> options (see BlockdevOptionsCbw in QAPI).
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  block/backup.c                         | 10 +++++++---
>  block/replication.c                    |  2 +-
>  blockdev.c                             |  2 +-
>  include/block/block_int-global-state.h |  2 ++
>  pve-backup.c                           |  2 +-
>  5 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index e0acfe6758..82fedf1680 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -336,6 +336,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                    bool compress, bool discard_source,
>                    const char *filter_node_name,
>                    BackupPerf *perf,
> +                  QDict *cbw_opts,
>                    BlockdevOnError on_source_error,
>                    BlockdevOnError on_target_error,
>                    int creation_flags,
> @@ -347,7 +348,6 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>      int64_t cluster_size;
>      BlockDriverState *cbw = NULL;
>      BlockCopyState *bcs = NULL;
> -    QDict *cbw_opts = NULL;
>  
>      assert(bs);
>      assert(target);
> @@ -436,8 +436,12 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>      }
>  
>      if (perf->has_min_cluster_size) {
> -        cbw_opts = qdict_new();
> -        qdict_put_int(cbw_opts, "min-cluster-size", perf->min_cluster_size);
> +        if (!cbw_opts) {
> +            cbw_opts = qdict_new();
> +        }
> +        if (!qdict_haskey(cbw_opts, "min-cluster-size")) {
> +            qdict_put_int(cbw_opts, "min-cluster-size", perf->min_cluster_size);

so here cbw_opts "wins" without any indication that this happens

> +        }
>      }
>  
>      cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
> diff --git a/block/replication.c b/block/replication.c
> index 0415a5e8b7..c5a27f593e 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -583,7 +583,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>          s->backup_job = backup_job_create(
>                                  NULL, s->secondary_disk->bs, s->hidden_disk->bs,
>                                  0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, false,
> -                                NULL, &perf,
> +                                NULL, &perf, NULL,
>                                  BLOCKDEV_ON_ERROR_REPORT,
>                                  BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
>                                  backup_job_completed, bs, NULL, &local_err);
> diff --git a/blockdev.c b/blockdev.c
> index cbe224387b..55ca967430 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2732,7 +2732,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
>                              backup->sync, bmap, backup->bitmap_mode,
>                              backup->compress, backup->discard_source,
>                              backup->filter_node_name,
> -                            &perf,
> +                            &perf, NULL,
>                              backup->on_source_error,
>                              backup->on_target_error,
>                              job_flags, NULL, NULL, txn, errp);
> diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
> index f0c642b194..7332680087 100644
> --- a/include/block/block_int-global-state.h
> +++ b/include/block/block_int-global-state.h
> @@ -179,6 +179,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>   * @bitmap_mode: The bitmap synchronization policy to use.
>   * @perf: Performance options. All actual fields assumed to be present,
>   *        all ".has_*" fields are ignored.
> + * @cbw_opts: Additional options to configure cbw filter with.

from this, I'd have guessed the other way round ;) should the precedence be
made explicit somewhere? or, for this particular option, should the higher
value win? but such logic might quickly get complicated once more parameters
need to be handled..

>   * @on_source_error: The action to take upon error reading from the source.
>   * @on_target_error: The action to take upon error writing to the target.
>   * @creation_flags: Flags that control the behavior of the Job lifetime.
> @@ -198,6 +199,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                              bool compress, bool discard_source,
>                              const char *filter_node_name,
>                              BackupPerf *perf,
> +                            QDict *cbw_opts,
>                              BlockdevOnError on_source_error,
>                              BlockdevOnError on_target_error,
>                              int creation_flags,
> diff --git a/pve-backup.c b/pve-backup.c
> index 4e80a9f283..108e185a20 100644
> --- a/pve-backup.c
> +++ b/pve-backup.c
> @@ -635,7 +635,7 @@ static void create_backup_jobs_bh(void *opaque) {
>  
>          BlockJob *job = backup_job_create(
>              job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
> -            bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
> +            bitmap_mode, false, discard_source, NULL, &perf, NULL, BLOCKDEV_ON_ERROR_REPORT,
>              BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
>              &local_err);
>  
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors Fiona Ebner
@ 2024-07-05  9:37   ` Fabian Grünbichler
  0 siblings, 0 replies; 11+ messages in thread
From: Fabian Grünbichler @ 2024-07-05  9:37 UTC (permalink / raw)
  To: Proxmox VE development discussion

Quoting Fiona Ebner (2024-06-10 14:59:42)
> The callback is invoked when cbw is configured to not break the guest
> write and will abort a backup job immediately. Currently the backup
> has to wait for the rest of the block copy operation to finish before
> checking the cbw error state.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> Note for testers: if e.g. the PBS is compeletly unreachable, the
> backup job still will need to wait until the in-flight request is
> aborted after 15 minutes. But the guest writes should be fast again.

could we improve that by checking the status in the pbs-qemu lib periodically,
and aborting there as well?

how is the bitmap handled in case of a cbw-timeout/error?

> 
>  block/backup.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/block/backup.c b/block/backup.c
> index ba153110d3..43d34ce4c2 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -32,6 +32,45 @@
>  
>  static const BlockJobDriver backup_job_driver;
>  
> +typedef struct {
> +    Job *job;
> +    int ret;
> +} BackupOnCbwError;
> +
> +static void backup_on_cbw_error_cb_bh(void *opaque)
> +{
> +    BackupOnCbwError *data = opaque;
> +    if (data->job) {
> +        WITH_JOB_LOCK_GUARD() {
> +            if (!job_is_completed_locked(data->job)) {
> +                error_report("backup was cancelled because of copy-before-write error: %s",
> +                             strerror(-data->ret));
> +                job_cancel_locked(data->job, true);
> +            }
> +        }
> +    } else {
> +        error_report("backup_on_cbw_error_cb_bh: no job! Error: %s", strerror(-data->ret));
> +    }
> +
> +    g_free(data);
> +}
> +
> +static void backup_on_cbw_error_cb(void *opaque, int ret)
> +{
> +    BackupOnCbwError *data = g_new0(BackupOnCbwError, 1);
> +    data->job = opaque;
> +    data->ret = ret;
> +
> +    /*
> +     * backup_cancel() cannot run in coroutine context.
> +     */
> +    if (qemu_in_coroutine()) {
> +        aio_bh_schedule_oneshot(qemu_get_aio_context(), backup_on_cbw_error_cb_bh, data);
> +    } else {
> +        backup_on_cbw_error_cb_bh(data);
> +    }
> +}
> +
>  static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>  {
>      BdrvDirtyBitmap *bm;
> @@ -477,6 +516,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>          goto error;
>      }
>  
> +    bdrv_cbw_set_error_cb(cbw, backup_on_cbw_error_cb, job);
> +
>      job->cbw = cbw;
>      job->source_bs = bs;
>      job->target_bs = target;
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow
  2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
                   ` (6 preceding siblings ...)
  2024-06-10 12:59 ` [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors Fiona Ebner
@ 2024-07-05  9:41 ` Fabian Grünbichler
  7 siblings, 0 replies; 11+ messages in thread
From: Fabian Grünbichler @ 2024-07-05  9:41 UTC (permalink / raw)
  To: Proxmox VE development discussion

Quoting Fiona Ebner (2024-06-10 14:59:35)
> A long-standing issue with VM backups in Proxmox VE is that a slow or
> unreachable target would lead to a copy-before-write (cbw) operation
> to break the guest write rather than abort the backup. This is
> unexpected to users and the will end up without a successful backup
> and without a working guest in such cases. This series prevents the
> latter by changing the behavior to fail the backup instead of the
> guest write.
> 
> This is done by re-using the already existing 'on-cbw-error' and
> 'cbw-timeout' options that are already used for fleecing and having
> regular backup also check for the cbw's snapshot_error (unfortunately
> this becomes a bit of a misnomer). If a given copy-before-write
> operation cannot complete within 45 seconds, it's extremely likely
> that aborting the backup is the better choice than keeping the guest
> IO blocked.
> 
> Just checking for the error already makes it work (i.e. without the
> last two patches), but backup can only check the error at the end. To
> abort backup immediately, an error callback for the copy-before-write
> node is introduced. A potential alternative would be give the
> block-copy operation a pointer to the snapshot_error and have it check
> it during its operation, but my initial attempt failed. Likely I
> missed adapting certain logic that checks for whether the block-copy
> operation failed and it's questionable if this approach would be
> cleaner. An error callback is nice and explicit.
> 
> Note for testers: if e.g. the PBS is compeletly unreachable, the
> backup job still will need to wait until the in-flight request is
> aborted after 15 minutes. But the guest writes should be fast again.
> 
> Should it really be required to make the option more flexible, i.e.
> allow users to specify a custom timeout or go back to the old behavior
> then the 'backup' QMP call can be extended with those parameters.
> 
> Unfortunately, this is a non-trivial amount of code to make it work,
> but there is quite a bit of boilerplate and some comments, so
> hopefully the logic is straight-forward enough.

this sentence here made me except a lot worse ;) code seems very
straight-forward and clean, two small comments inline. not sure whether we want
to entangle this with 9.0, but I think this could be applied soonish after some
more in-depth testing, since it should solve a pretty big pain point user
consistenly run into..

I am sure we will have users clamoring for a configurable timeout soon after
though ;)

> 
> The first patch can be applied regardless of whether we want to go
> with this or not.
> 
> 
> Fiona Ebner (7):
>   PVE backup: fleecing: properly set minimum cluster size
>   block/copy-before-write: allow passing additional options for
>     bdrv_cbw_append()
>   block/backup: allow passing additional options for copy-before-write
>     upon job creation
>   block/backup: make cbw error also fail backup that does not use
>     fleecing
>   fix #3231+#3631: PVE backup: add timeout for copy-before-write
>     operations and fail backup instead of guest writes
>   block/copy-before-write: allow specifying error callback
>   block/backup: set callback for cbw errors
> 
>  block/backup.c                         | 57 +++++++++++++++++++++++++-
>  block/copy-before-write.c              | 41 +++++++++++++++---
>  block/copy-before-write.h              |  9 +++-
>  block/replication.c                    |  2 +-
>  blockdev.c                             |  2 +-
>  include/block/block_int-global-state.h |  2 +
>  pve-backup.c                           | 13 +++++-
>  7 files changed, 115 insertions(+), 11 deletions(-)
> 
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-07-05  9:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 2/7] block/copy-before-write: allow passing additional options for bdrv_cbw_append() Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation Fiona Ebner
2024-07-05  9:30   ` Fabian Grünbichler
2024-06-10 12:59 ` [pve-devel] [RFC qemu 4/7] block/backup: make cbw error also fail backup that does not use fleecing Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 6/7] block/copy-before-write: allow specifying error callback Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors Fiona Ebner
2024-07-05  9:37   ` Fabian Grünbichler
2024-07-05  9:41 ` [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal