From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 6C3141FF163
	for <inbox@lore.proxmox.com>; Thu,  7 Nov 2024 17:52:02 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 5ADC5349C4;
	Thu,  7 Nov 2024 17:51:54 +0100 (CET)
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Thu,  7 Nov 2024 17:51:14 +0100
Message-Id: <20241107165146.125935-3-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.39.5
In-Reply-To: <20241107165146.125935-1-f.ebner@proxmox.com>
References: <20241107165146.125935-1-f.ebner@proxmox.com>
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.058 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pve-devel] [PATCH qemu v3 02/34] PVE backup: fixup error handling
 for fleecing
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

The drained section needs to be terminated before breaking out of the
loop in the error scenarios. Otherwise, guest IO on the drive would
become stuck.

If the job is created successfully, then the job completion callback
will clean up the snapshot access block nodes. In case failure
happened before the job is created, there was no cleanup for the
snapshot access block nodes yet. Add it.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes in v3.

 pve-backup.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 4e730aa3da..c4178758b3 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -357,22 +357,23 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
     qemu_co_mutex_unlock(&backup_state.backup_mutex);
 }
 
+static void cleanup_snapshot_access(PVEBackupDevInfo *di)
+{
+    if (di->fleecing.snapshot_access) {
+        bdrv_unref(di->fleecing.snapshot_access);
+        di->fleecing.snapshot_access = NULL;
+    }
+    if (di->fleecing.cbw) {
+        bdrv_cbw_drop(di->fleecing.cbw);
+        di->fleecing.cbw = NULL;
+    }
+}
+
 static void pvebackup_complete_cb(void *opaque, int ret)
 {
     PVEBackupDevInfo *di = opaque;
     di->completed_ret = ret;
 
-    /*
-     * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
-     * won't be done as a coroutine anyways:
-     * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
-     *   just spawn a BH calling bdrv_unref().
-     * - For cbw, draining would need to spawn a BH.
-     */
-    if (di->fleecing.snapshot_access) {
-        bdrv_unref(di->fleecing.snapshot_access);
-        di->fleecing.snapshot_access = NULL;
-    }
     if (di->fleecing.cbw) {
         /*
          * With fleecing, failure for cbw does not fail the guest write, but only sets the snapshot
@@ -383,10 +384,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
         if (di->completed_ret == -EACCES && snapshot_error) {
             di->completed_ret = snapshot_error;
         }
-        bdrv_cbw_drop(di->fleecing.cbw);
-        di->fleecing.cbw = NULL;
     }
 
+    /*
+     * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
+     * won't be done as a coroutine anyways:
+     * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
+     *   just spawn a BH calling bdrv_unref().
+     * - For cbw, draining would need to spawn a BH.
+     */
+    cleanup_snapshot_access(di);
+
     /*
      * Needs to happen outside of coroutine, because it takes the graph write lock.
      */
@@ -587,6 +595,7 @@ static void create_backup_jobs_bh(void *opaque) {
             if (!di->fleecing.cbw) {
                 error_setg(errp, "appending cbw node for fleecing failed: %s",
                            local_err ? error_get_pretty(local_err) : "unknown error");
+                bdrv_drained_end(di->bs);
                 break;
             }
 
@@ -599,6 +608,8 @@ static void create_backup_jobs_bh(void *opaque) {
             if (!di->fleecing.snapshot_access) {
                 error_setg(errp, "setting up snapshot access for fleecing failed: %s",
                            local_err ? error_get_pretty(local_err) : "unknown error");
+                cleanup_snapshot_access(di);
+                bdrv_drained_end(di->bs);
                 break;
             }
             source_bs = di->fleecing.snapshot_access;
@@ -637,6 +648,7 @@ static void create_backup_jobs_bh(void *opaque) {
         }
 
         if (!job || local_err) {
+            cleanup_snapshot_access(di);
             error_setg(errp, "backup_job_create failed: %s",
                        local_err ? error_get_pretty(local_err) : "null");
             break;
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel