public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes
Date: Mon, 10 Jun 2024 14:59:40 +0200	[thread overview]
Message-ID: <20240610125942.116985-6-f.ebner@proxmox.com> (raw)
In-Reply-To: <20240610125942.116985-1-f.ebner@proxmox.com>

If the backup target can't be reached or is very slow, then the
default behavior for QEMU backup is to break the guest write. This is
undesirable and it is more expected and less intrusive to make the
backup error out instead.

A timeout of 45 seconds for copy-before-write operations is set, like
for fleecing. Guest drivers like virtio-win have issues when a write
takes more than 60 seconds and still completes afterwards, so a value
below that was chosen.

Unfortunately, with this alone, the backup would still try to run to
completion and fail only at the very end. This can be improved by
adding a callback function that will abort the backup once a
copy-before-write operation fails.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 pve-backup.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 108e185a20..9843d8d122 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -560,6 +560,7 @@ static void create_backup_jobs_bh(void *opaque) {
         bdrv_drained_begin(di->bs);
 
         BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+        QDict *backup_cbw_opts = qdict_new();
 
         BlockDriverState *source_bs = di->bs;
         bool discard_source = false;
@@ -631,11 +632,18 @@ static void create_backup_jobs_bh(void *opaque) {
                 perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
             }
             perf.has_min_cluster_size = true;
+        } else {
+            /*
+             * When fleecing is not used, need to set the options on the copy-before-write node
+             * installed by the backup job itself.
+             */
+            qdict_put_str(backup_cbw_opts, "on-cbw-error", "break-snapshot");
+            qdict_put_int(backup_cbw_opts, "cbw-timeout", 45);
         }
 
         BlockJob *job = backup_job_create(
-            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
-            bitmap_mode, false, discard_source, NULL, &perf, NULL, BLOCKDEV_ON_ERROR_REPORT,
+            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, bitmap_mode,
+            false, discard_source, NULL, &perf, backup_cbw_opts, BLOCKDEV_ON_ERROR_REPORT,
             BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
             &local_err);
 
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2024-06-10 12:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 12:59 [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [PATCH qemu 1/7] PVE backup: fleecing: properly set minimum cluster size Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 2/7] block/copy-before-write: allow passing additional options for bdrv_cbw_append() Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 3/7] block/backup: allow passing additional options for copy-before-write upon job creation Fiona Ebner
2024-07-05  9:30   ` Fabian Grünbichler
2024-06-10 12:59 ` [pve-devel] [RFC qemu 4/7] block/backup: make cbw error also fail backup that does not use fleecing Fiona Ebner
2024-06-10 12:59 ` Fiona Ebner [this message]
2024-06-10 12:59 ` [pve-devel] [RFC qemu 6/7] block/copy-before-write: allow specifying error callback Fiona Ebner
2024-06-10 12:59 ` [pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors Fiona Ebner
2024-07-05  9:37   ` Fabian Grünbichler
2024-07-05  9:41 ` [pve-devel] [RFC qemu] fix #3231+#3631: PVE backup: fail backup rather than guest write when backup target cannot be reached or is too slow Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240610125942.116985-6-f.ebner@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal