From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <s.reiter@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id B3A1D63F8D
 for <pve-devel@lists.proxmox.com>; Thu, 29 Oct 2020 14:11:01 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 79BAB958A
 for <pve-devel@lists.proxmox.com>; Thu, 29 Oct 2020 14:11:01 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 5C5119551
 for <pve-devel@lists.proxmox.com>; Thu, 29 Oct 2020 14:10:44 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1FF9A45F91
 for <pve-devel@lists.proxmox.com>; Thu, 29 Oct 2020 14:10:44 +0100 (CET)
From: Stefan Reiter <s.reiter@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Thu, 29 Oct 2020 14:10:34 +0100
Message-Id: <20201029131036.11786-5-s.reiter@proxmox.com>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20201029131036.11786-1-s.reiter@proxmox.com>
References: <20201029131036.11786-1-s.reiter@proxmox.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.038 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pve-devel] [PATCH v2 qemu 4/6] PVE: Don't call job_cancel in
 coroutines
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 29 Oct 2020 13:11:01 -0000

...because it hangs on cancelling other jobs in the txn if you do.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* use new CoCtxData
* use aio_co_enter vs aio_co_schedule for BH return
* cache job_ctx since job_cancel_sync might switch the job to a different
  context (when iothreads are in use) thus making us drop the wrong AioContext
  if we access job->aio_context again. This is incidentally the same bug I once
  fixed for upstream, almost made it in again...

 pve-backup.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/pve-backup.c b/pve-backup.c
index 92eaada0bc..0466145bec 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -332,6 +332,20 @@ static void pvebackup_complete_cb(void *opaque, int ret)
     aio_co_enter(qemu_get_aio_context(), co);
 }
 
+/*
+ * job_cancel(_sync) does not like to be called from coroutines, so defer to
+ * main loop processing via a bottom half.
+ */
+static void job_cancel_bh(void *opaque) {
+    CoCtxData *data = (CoCtxData*)opaque;
+    Job *job = (Job*)data->data;
+    AioContext *job_ctx = job->aio_context;
+    aio_context_acquire(job_ctx);
+    job_cancel_sync(job);
+    aio_context_release(job_ctx);
+    aio_co_enter(data->ctx, data->co);
+}
+
 static void coroutine_fn pvebackup_co_cancel(void *opaque)
 {
     Error *cancel_err = NULL;
@@ -357,7 +371,13 @@ static void coroutine_fn pvebackup_co_cancel(void *opaque)
         NULL;
 
     if (cancel_job) {
-        job_cancel(&cancel_job->job, false);
+        CoCtxData data = {
+            .ctx = qemu_get_current_aio_context(),
+            .co = qemu_coroutine_self(),
+            .data = &cancel_job->job,
+        };
+        aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+        qemu_coroutine_yield();
     }
 
     qemu_co_mutex_unlock(&backup_state.backup_mutex);
-- 
2.20.1