From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 22FB71FF39E
	for <inbox@lore.proxmox.com>; Mon, 10 Jun 2024 14:59:56 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id C21D8175BB;
	Mon, 10 Jun 2024 15:00:20 +0200 (CEST)
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Mon, 10 Jun 2024 14:59:40 +0200
Message-Id: <20240610125942.116985-6-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20240610125942.116985-1-f.ebner@proxmox.com>
References: <20240610125942.116985-1-f.ebner@proxmox.com>
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.059 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout
 for copy-before-write operations and fail backup instead of guest writes
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

If the backup target can't be reached or is very slow, then the
default behavior for QEMU backup is to break the guest write. This is
undesirable and it is more expected and less intrusive to make the
backup error out instead.

A timeout of 45 seconds for copy-before-write operations is set, like
for fleecing. Guest drivers like virtio-win have issues when a write
takes more than 60 seconds and still completes afterwards, so a value
below that was chosen.

Unfortunately, with this alone, the backup would still try to run to
completion and fail only at the very end. This can be improved by
adding a callback function that will abort the backup once a
copy-before-write operation fails.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 pve-backup.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 108e185a20..9843d8d122 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -560,6 +560,7 @@ static void create_backup_jobs_bh(void *opaque) {
         bdrv_drained_begin(di->bs);
 
         BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+        QDict *backup_cbw_opts = qdict_new();
 
         BlockDriverState *source_bs = di->bs;
         bool discard_source = false;
@@ -631,11 +632,18 @@ static void create_backup_jobs_bh(void *opaque) {
                 perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
             }
             perf.has_min_cluster_size = true;
+        } else {
+            /*
+             * When fleecing is not used, need to set the options on the copy-before-write node
+             * installed by the backup job itself.
+             */
+            qdict_put_str(backup_cbw_opts, "on-cbw-error", "break-snapshot");
+            qdict_put_int(backup_cbw_opts, "cbw-timeout", 45);
         }
 
         BlockJob *job = backup_job_create(
-            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
-            bitmap_mode, false, discard_source, NULL, &perf, NULL, BLOCKDEV_ON_ERROR_REPORT,
+            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, bitmap_mode,
+            false, discard_source, NULL, &perf, backup_cbw_opts, BLOCKDEV_ON_ERROR_REPORT,
             BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
             &local_err);
 
-- 
2.39.2


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel