From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 22FB71FF39E for ; Mon, 10 Jun 2024 14:59:56 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C21D8175BB; Mon, 10 Jun 2024 15:00:20 +0200 (CEST) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Mon, 10 Jun 2024 14:59:40 +0200 Message-Id: <20240610125942.116985-6-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240610125942.116985-1-f.ebner@proxmox.com> References: <20240610125942.116985-1-f.ebner@proxmox.com> MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.059 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" If the backup target can't be reached or is very slow, then the default behavior for QEMU backup is to break the guest write. This is undesirable and it is more expected and less intrusive to make the backup error out instead. A timeout of 45 seconds for copy-before-write operations is set, like for fleecing. Guest drivers like virtio-win have issues when a write takes more than 60 seconds and still completes afterwards, so a value below that was chosen. Unfortunately, with this alone, the backup would still try to run to completion and fail only at the very end. This can be improved by adding a callback function that will abort the backup once a copy-before-write operation fails. Signed-off-by: Fiona Ebner --- pve-backup.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/pve-backup.c b/pve-backup.c index 108e185a20..9843d8d122 100644 --- a/pve-backup.c +++ b/pve-backup.c @@ -560,6 +560,7 @@ static void create_backup_jobs_bh(void *opaque) { bdrv_drained_begin(di->bs); BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers }; + QDict *backup_cbw_opts = qdict_new(); BlockDriverState *source_bs = di->bs; bool discard_source = false; @@ -631,11 +632,18 @@ static void create_backup_jobs_bh(void *opaque) { perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size); } perf.has_min_cluster_size = true; + } else { + /* + * When fleecing is not used, need to set the options on the copy-before-write node + * installed by the backup job itself. + */ + qdict_put_str(backup_cbw_opts, "on-cbw-error", "break-snapshot"); + qdict_put_int(backup_cbw_opts, "cbw-timeout", 45); } BlockJob *job = backup_job_create( - job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, - bitmap_mode, false, discard_source, NULL, &perf, NULL, BLOCKDEV_ON_ERROR_REPORT, + job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, bitmap_mode, + false, discard_source, NULL, &perf, backup_cbw_opts, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn, &local_err); -- 2.39.2 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel