public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes
@ 2020-10-29 13:10 Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 1/6] PVE: fixup: drop CoMutex on error return Stefan Reiter
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Stefan Reiter @ 2020-10-29 13:10 UTC (permalink / raw)
  To: pve-devel

Some bugfixes for qmp_backup_cancel and create_backup_jobs, that would lead to
VM hangs, wrongly aborted backups or missing/bad error messages.

Now tested with all permutations of
* one vs two disks
* iothread enabled/disabled per disk
and running
    until sudo timeout -s SIGTERM 5s vzdump 100 --storage pbs; do; done
during lunch :)

Sent as seperate patches for review, last patch is for pve-qemu and contains the
changes squashed in with only one new file.

v2:
* fix some more issues (patches 1 and 5)
* fix releasing wrong AioContext after job_cancel_sync
* add patch-of-patches for easier apply


qemu: Stefan Reiter (5):
  PVE: fixup: drop CoMutex on error return
  PVE: Introduce generic CoCtxData struct
  PVE: Don't expect complete_cb to be called outside coroutine
  PVE: Don't call job_cancel in coroutines
  PVE: fix and clean up error handling for create_backup_jobs

 proxmox-backup-client.c |  20 +++-----
 proxmox-backup-client.h |   6 +++
 pve-backup.c            | 111 +++++++++++++++++++++++++++++-----------
 3 files changed, 93 insertions(+), 44 deletions(-)

qemu: Stefan Reiter (1):
  Several fixes for backup abort and error reporting

 ...ckup-proxmox-backup-patches-for-qemu.patch |  41 ++--
 ...irty-bitmap-tracking-for-incremental.patch |  10 +-
 ...ct-stderr-to-journal-when-daemonized.patch |   2 +
 ...d-sequential-job-transaction-support.patch |   1 +
 ...-transaction-to-synchronize-job-stat.patch |   2 +
 ...ore-coroutines-and-don-t-block-on-fi.patch |  94 +++++----
 ...n-up-error-handling-for-create_backu.patch | 187 ++++++++++++++++++
 debian/patches/series                         |   1 +
 8 files changed, 275 insertions(+), 63 deletions(-)
 create mode 100644 debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch

-- 
2.20.1




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH v2 qemu 1/6] PVE: fixup: drop CoMutex on error return
  2020-10-29 13:10 [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes Stefan Reiter
@ 2020-10-29 13:10 ` Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 2/6] PVE: Introduce generic CoCtxData struct Stefan Reiter
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Stefan Reiter @ 2020-10-29 13:10 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2: new

 pve-backup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/pve-backup.c b/pve-backup.c
index 53cf23ed5a..13a69f9c5a 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -561,8 +561,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
     const char *firewall_name = "qemu-server.fw";
 
     if (backup_state.di_list) {
-         error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+        error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
                   "previous backup not finished");
+        qemu_co_mutex_unlock(&backup_state.backup_mutex);
         return;
     }
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH v2 qemu 2/6] PVE: Introduce generic CoCtxData struct
  2020-10-29 13:10 [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 1/6] PVE: fixup: drop CoMutex on error return Stefan Reiter
@ 2020-10-29 13:10 ` Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 3/6] PVE: Don't expect complete_cb to be called outside coroutine Stefan Reiter
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Stefan Reiter @ 2020-10-29 13:10 UTC (permalink / raw)
  To: pve-devel

Adapted from ProxmoxBackupWaker for more general use cases as well.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2: new, makes patch 4 and 5 cleaner

 proxmox-backup-client.c | 20 +++++++-------------
 proxmox-backup-client.h |  6 ++++++
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
index 0e9c584701..4ce7bc0b5e 100644
--- a/proxmox-backup-client.c
+++ b/proxmox-backup-client.c
@@ -12,12 +12,6 @@ typedef struct BlockOnCoroutineWrapper {
     bool finished;
 } BlockOnCoroutineWrapper;
 
-// Waker implementaion to syncronice with proxmox backup rust code
-typedef struct ProxmoxBackupWaker {
-    Coroutine *co;
-    AioContext *ctx;
-} ProxmoxBackupWaker;
-
 static void coroutine_fn block_on_coroutine_wrapper(void *opaque)
 {
     BlockOnCoroutineWrapper *wrapper = opaque;
@@ -44,7 +38,7 @@ void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg)
 
 // This is called from another thread, so we use aio_co_schedule()
 static void proxmox_backup_schedule_wake(void *data) {
-    ProxmoxBackupWaker *waker = (ProxmoxBackupWaker *)data;
+    CoCtxData *waker = (CoCtxData *)data;
     aio_co_schedule(waker->ctx, waker->co);
 }
 
@@ -53,7 +47,7 @@ proxmox_backup_co_connect(ProxmoxBackupHandle *pbs, Error **errp)
 {
     Coroutine *co = qemu_coroutine_self();
     AioContext *ctx = qemu_get_current_aio_context();
-    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
+    CoCtxData waker = { .co = co, .ctx = ctx };
     char *pbs_err = NULL;
     int pbs_res = -1;
 
@@ -76,7 +70,7 @@ proxmox_backup_co_add_config(
 {
     Coroutine *co = qemu_coroutine_self();
     AioContext *ctx = qemu_get_current_aio_context();
-    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
+    CoCtxData waker = { .co = co, .ctx = ctx };
     char *pbs_err = NULL;
     int pbs_res = -1;
 
@@ -100,7 +94,7 @@ proxmox_backup_co_register_image(
 {
     Coroutine *co = qemu_coroutine_self();
     AioContext *ctx = qemu_get_current_aio_context();
-    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
+    CoCtxData waker = { .co = co, .ctx = ctx };
     char *pbs_err = NULL;
     int pbs_res = -1;
 
@@ -121,7 +115,7 @@ proxmox_backup_co_finish(
 {
     Coroutine *co = qemu_coroutine_self();
     AioContext *ctx = qemu_get_current_aio_context();
-    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
+    CoCtxData waker = { .co = co, .ctx = ctx };
     char *pbs_err = NULL;
     int pbs_res = -1;
 
@@ -143,7 +137,7 @@ proxmox_backup_co_close_image(
 {
     Coroutine *co = qemu_coroutine_self();
     AioContext *ctx = qemu_get_current_aio_context();
-    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
+    CoCtxData waker = { .co = co, .ctx = ctx };
     char *pbs_err = NULL;
     int pbs_res = -1;
 
@@ -168,7 +162,7 @@ proxmox_backup_co_write_data(
 {
     Coroutine *co = qemu_coroutine_self();
     AioContext *ctx = qemu_get_current_aio_context();
-    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
+    CoCtxData waker = { .co = co, .ctx = ctx };
     char *pbs_err = NULL;
     int pbs_res = -1;
 
diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
index a4781c5851..8cbf645b2c 100644
--- a/proxmox-backup-client.h
+++ b/proxmox-backup-client.h
@@ -5,6 +5,12 @@
 #include "qemu/coroutine.h"
 #include "proxmox-backup-qemu.h"
 
+typedef struct CoCtxData {
+    Coroutine *co;
+    AioContext *ctx;
+    void *data;
+} CoCtxData;
+
 // FIXME: Remove once coroutines are supported for QMP
 void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg);
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH v2 qemu 3/6] PVE: Don't expect complete_cb to be called outside coroutine
  2020-10-29 13:10 [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 1/6] PVE: fixup: drop CoMutex on error return Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 2/6] PVE: Introduce generic CoCtxData struct Stefan Reiter
@ 2020-10-29 13:10 ` Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 4/6] PVE: Don't call job_cancel in coroutines Stefan Reiter
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Stefan Reiter @ 2020-10-29 13:10 UTC (permalink / raw)
  To: pve-devel

We're at the mercy of the rest of QEMU here, and it sometimes decides to
call pvebackup_complete_cb from a coroutine. This really doesn't matter
to us, so don't assert and crash on it.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2: unchanged

 pve-backup.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 13a69f9c5a..92eaada0bc 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -318,19 +318,18 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
 
 static void pvebackup_complete_cb(void *opaque, int ret)
 {
-    assert(!qemu_in_coroutine());
-
     PVEBackupDevInfo *di = opaque;
     di->completed_ret = ret;
 
     /*
      * Schedule stream cleanup in async coroutine. close_image and finish might
-     * take a while, so we can't block on them here.
+     * take a while, so we can't block on them here. This way it also doesn't
+     * matter if we're already running in a coroutine or not.
      * Note: di is a pointer to an entry in the global backup_state struct, so
      * it stays valid.
      */
     Coroutine *co = qemu_coroutine_create(pvebackup_co_complete_stream, di);
-    aio_co_schedule(qemu_get_aio_context(), co);
+    aio_co_enter(qemu_get_aio_context(), co);
 }
 
 static void coroutine_fn pvebackup_co_cancel(void *opaque)
-- 
2.20.1





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH v2 qemu 4/6] PVE: Don't call job_cancel in coroutines
  2020-10-29 13:10 [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes Stefan Reiter
                   ` (2 preceding siblings ...)
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 3/6] PVE: Don't expect complete_cb to be called outside coroutine Stefan Reiter
@ 2020-10-29 13:10 ` Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 5/6] PVE: fix and clean up error handling for create_backup_jobs Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 6/6] Several fixes for backup abort and error reporting Stefan Reiter
  5 siblings, 0 replies; 8+ messages in thread
From: Stefan Reiter @ 2020-10-29 13:10 UTC (permalink / raw)
  To: pve-devel

...because it hangs on cancelling other jobs in the txn if you do.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* use new CoCtxData
* use aio_co_enter vs aio_co_schedule for BH return
* cache job_ctx since job_cancel_sync might switch the job to a different
  context (when iothreads are in use) thus making us drop the wrong AioContext
  if we access job->aio_context again. This is incidentally the same bug I once
  fixed for upstream, almost made it in again...

 pve-backup.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/pve-backup.c b/pve-backup.c
index 92eaada0bc..0466145bec 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -332,6 +332,20 @@ static void pvebackup_complete_cb(void *opaque, int ret)
     aio_co_enter(qemu_get_aio_context(), co);
 }
 
+/*
+ * job_cancel(_sync) does not like to be called from coroutines, so defer to
+ * main loop processing via a bottom half.
+ */
+static void job_cancel_bh(void *opaque) {
+    CoCtxData *data = (CoCtxData*)opaque;
+    Job *job = (Job*)data->data;
+    AioContext *job_ctx = job->aio_context;
+    aio_context_acquire(job_ctx);
+    job_cancel_sync(job);
+    aio_context_release(job_ctx);
+    aio_co_enter(data->ctx, data->co);
+}
+
 static void coroutine_fn pvebackup_co_cancel(void *opaque)
 {
     Error *cancel_err = NULL;
@@ -357,7 +371,13 @@ static void coroutine_fn pvebackup_co_cancel(void *opaque)
         NULL;
 
     if (cancel_job) {
-        job_cancel(&cancel_job->job, false);
+        CoCtxData data = {
+            .ctx = qemu_get_current_aio_context(),
+            .co = qemu_coroutine_self(),
+            .data = &cancel_job->job,
+        };
+        aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+        qemu_coroutine_yield();
     }
 
     qemu_co_mutex_unlock(&backup_state.backup_mutex);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH v2 qemu 5/6] PVE: fix and clean up error handling for create_backup_jobs
  2020-10-29 13:10 [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes Stefan Reiter
                   ` (3 preceding siblings ...)
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 4/6] PVE: Don't call job_cancel in coroutines Stefan Reiter
@ 2020-10-29 13:10 ` Stefan Reiter
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 6/6] Several fixes for backup abort and error reporting Stefan Reiter
  5 siblings, 0 replies; 8+ messages in thread
From: Stefan Reiter @ 2020-10-29 13:10 UTC (permalink / raw)
  To: pve-devel

No more weird bool returns, just the standard "errp" format used
everywhere else too. With this, if backup_job_create fails, the error
message is actually returned over QMP and can be shown to the user.

To facilitate correct cleanup on such an error, we call
create_backup_jobs as a bottom half directly from pvebackup_co_prepare.
This additionally allows us to actually hold the backup_mutex during
operation.

Also add a job_cancel_sync before job_unref, since a job must be in
STATUS_NULL to be deleted by unref, which could trigger an assert
before.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2: new

Error handling in create_backup_jobs was pretty much nonexistant, lots of weird
things would happen. Clean it up and fix those issues at once.

 pve-backup.c | 79 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 0466145bec..1a2647e7a5 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -50,6 +50,7 @@ static struct PVEBackupState {
         size_t zero_bytes;
         GList *bitmap_list;
         bool finishing;
+        bool starting;
     } stat;
     int64_t speed;
     VmaWriter *vmaw;
@@ -277,6 +278,16 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
     PVEBackupDevInfo *di = opaque;
     int ret = di->completed_ret;
 
+    qemu_mutex_lock(&backup_state.stat.lock);
+    bool starting = backup_state.stat.starting;
+    qemu_mutex_unlock(&backup_state.stat.lock);
+    if (starting) {
+        /* in 'starting' state, no tasks have been run yet, meaning we can (and
+         * must) skip all cleanup, as we don't know what has and hasn't been
+         * initialized yet. */
+        return;
+    }
+
     qemu_co_mutex_lock(&backup_state.backup_mutex);
 
     if (ret < 0) {
@@ -441,15 +452,15 @@ static int coroutine_fn pvebackup_co_add_config(
 /*
  * backup_job_create can *not* be run from a coroutine (and requires an
  * acquired AioContext), so this can't either.
- * This does imply that this function cannot run with backup_mutex acquired.
- * That is ok because it is only ever called between setting up the backup_state
- * struct and starting the jobs, and from within a QMP call. This means that no
- * other QMP call can interrupt, and no background job is running yet.
+ * The caller is responsible that backup_mutex is held nonetheless.
  */
-static bool create_backup_jobs(void) {
+static void create_backup_jobs_bh(void *opaque) {
 
     assert(!qemu_in_coroutine());
 
+    CoCtxData *data = (CoCtxData*)opaque;
+    Error **errp = (Error**)data->data;
+
     Error *local_err = NULL;
 
     /* create job transaction to synchronize bitmap commit and cancel all
@@ -483,24 +494,19 @@ static bool create_backup_jobs(void) {
 
         aio_context_release(aio_context);
 
-        if (!job || local_err != NULL) {
-            Error *create_job_err = NULL;
-            error_setg(&create_job_err, "backup_job_create failed: %s",
-                       local_err ? error_get_pretty(local_err) : "null");
+        di->job = job;
 
-            pvebackup_propagate_error(create_job_err);
+        if (!job || local_err) {
+            error_setg(errp, "backup_job_create failed: %s",
+                       local_err ? error_get_pretty(local_err) : "null");
             break;
         }
 
-        di->job = job;
-
         bdrv_unref(di->target);
         di->target = NULL;
     }
 
-    bool errors = pvebackup_error_or_canceled();
-
-    if (errors) {
+    if (*errp) {
         l = backup_state.di_list;
         while (l) {
             PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
@@ -512,12 +518,17 @@ static bool create_backup_jobs(void) {
             }
 
             if (di->job) {
+                AioContext *ctx = di->job->job.aio_context;
+                aio_context_acquire(ctx);
+                job_cancel_sync(&di->job->job);
                 job_unref(&di->job->job);
+                aio_context_release(ctx);
             }
         }
     }
 
-    return errors;
+    /* return */
+    aio_co_enter(data->ctx, data->co);
 }
 
 typedef struct QmpBackupTask {
@@ -883,6 +894,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
     backup_state.stat.transferred = 0;
     backup_state.stat.zero_bytes = 0;
     backup_state.stat.finishing = false;
+    backup_state.stat.starting = true;
 
     qemu_mutex_unlock(&backup_state.stat.lock);
 
@@ -898,7 +910,32 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
 
     task->result = uuid_info;
 
+    /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
+    * backup_mutex locked. This is fine, a CoMutex can be held across yield
+    * points, and we'll release it as soon as the BH reschedules us.
+    */
+    CoCtxData waker = {
+        .co = qemu_coroutine_self(),
+        .ctx = qemu_get_current_aio_context(),
+        .data = &local_err,
+    };
+    aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
+    qemu_coroutine_yield();
+
+    if (local_err) {
+        error_propagate(task->errp, local_err);
+        goto err;
+    }
+
     qemu_co_mutex_unlock(&backup_state.backup_mutex);
+
+    qemu_mutex_lock(&backup_state.stat.lock);
+    backup_state.stat.starting = false;
+    qemu_mutex_unlock(&backup_state.stat.lock);
+
+    /* start the first job in the transaction */
+    job_txn_start_seq(backup_state.txn);
+
     return;
 
 err_mutex:
@@ -921,6 +958,7 @@ err:
         g_free(di);
     }
     g_list_free(di_list);
+    backup_state.di_list = NULL;
 
     if (devs) {
         g_strfreev(devs);
@@ -998,15 +1036,6 @@ UuidInfo *qmp_backup(
 
     block_on_coroutine_fn(pvebackup_co_prepare, &task);
 
-    if (*errp == NULL) {
-        bool errors = create_backup_jobs();
-
-        if (!errors) {
-            // start the first job in the transaction
-            job_txn_start_seq(backup_state.txn);
-        }
-    }
-
     return task.result;
 }
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH v2 qemu 6/6] Several fixes for backup abort and error reporting
  2020-10-29 13:10 [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes Stefan Reiter
                   ` (4 preceding siblings ...)
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 5/6] PVE: fix and clean up error handling for create_backup_jobs Stefan Reiter
@ 2020-10-29 13:10 ` Stefan Reiter
  2020-10-29 17:27   ` [pve-devel] applied: " Thomas Lamprecht
  5 siblings, 1 reply; 8+ messages in thread
From: Stefan Reiter @ 2020-10-29 13:10 UTC (permalink / raw)
  To: pve-devel

Also add my Signed-off-by to some patches where it was missing.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2: new

Patches 1-5 squashed in or added to the pve-qemu repository.

Confirmed with "objdump" that the binary is the same with squashed changes as
with standalone patches.

 ...ckup-proxmox-backup-patches-for-qemu.patch |  41 ++--
 ...irty-bitmap-tracking-for-incremental.patch |  10 +-
 ...ct-stderr-to-journal-when-daemonized.patch |   2 +
 ...d-sequential-job-transaction-support.patch |   1 +
 ...-transaction-to-synchronize-job-stat.patch |   2 +
 ...ore-coroutines-and-don-t-block-on-fi.patch |  94 +++++----
 ...n-up-error-handling-for-create_backu.patch | 187 ++++++++++++++++++
 debian/patches/series                         |   1 +
 8 files changed, 275 insertions(+), 63 deletions(-)
 create mode 100644 debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch

diff --git a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
index d41b675..7896692 100644
--- a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
+++ b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
@@ -14,13 +14,13 @@ Subject: [PATCH] PVE-Backup: proxmox backup patches for qemu
  include/block/block_int.h      |   2 +-
  include/monitor/hmp.h          |   3 +
  monitor/hmp-cmds.c             |  44 ++
- proxmox-backup-client.c        | 182 +++++++
- proxmox-backup-client.h        |  52 ++
+ proxmox-backup-client.c        | 176 ++++++
+ proxmox-backup-client.h        |  59 ++
  pve-backup.c                   | 955 +++++++++++++++++++++++++++++++++
  qapi/block-core.json           | 109 ++++
  qapi/common.json               |  13 +
  qapi/misc.json                 |  13 -
- 16 files changed, 1439 insertions(+), 15 deletions(-)
+ 16 files changed, 1440 insertions(+), 15 deletions(-)
  create mode 100644 proxmox-backup-client.c
  create mode 100644 proxmox-backup-client.h
  create mode 100644 pve-backup.c
@@ -278,10 +278,10 @@ index 280bb447a6..0e2d166552 100644
      switch (addr->type) {
 diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
 new file mode 100644
-index 0000000000..b7bc7f2574
+index 0000000000..a8f6653a81
 --- /dev/null
 +++ b/proxmox-backup-client.c
-@@ -0,0 +1,182 @@
+@@ -0,0 +1,176 @@
 +#include "proxmox-backup-client.h"
 +#include "qemu/main-loop.h"
 +#include "block/aio-wait.h"
@@ -296,12 +296,6 @@ index 0000000000..b7bc7f2574
 +    bool finished;
 +} BlockOnCoroutineWrapper;
 +
-+// Waker implementaion to syncronice with proxmox backup rust code
-+typedef struct ProxmoxBackupWaker {
-+    Coroutine *co;
-+    AioContext *ctx;
-+} ProxmoxBackupWaker;
-+
 +static void coroutine_fn block_on_coroutine_wrapper(void *opaque)
 +{
 +    BlockOnCoroutineWrapper *wrapper = opaque;
@@ -328,7 +322,7 @@ index 0000000000..b7bc7f2574
 +
 +// This is called from another thread, so we use aio_co_schedule()
 +static void proxmox_backup_schedule_wake(void *data) {
-+    ProxmoxBackupWaker *waker = (ProxmoxBackupWaker *)data;
++    CoCtxData *waker = (CoCtxData *)data;
 +    aio_co_schedule(waker->ctx, waker->co);
 +}
 +
@@ -337,7 +331,7 @@ index 0000000000..b7bc7f2574
 +{
 +    Coroutine *co = qemu_coroutine_self();
 +    AioContext *ctx = qemu_get_current_aio_context();
-+    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
++    CoCtxData waker = { .co = co, .ctx = ctx };
 +    char *pbs_err = NULL;
 +    int pbs_res = -1;
 +
@@ -360,7 +354,7 @@ index 0000000000..b7bc7f2574
 +{
 +    Coroutine *co = qemu_coroutine_self();
 +    AioContext *ctx = qemu_get_current_aio_context();
-+    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
++    CoCtxData waker = { .co = co, .ctx = ctx };
 +    char *pbs_err = NULL;
 +    int pbs_res = -1;
 +
@@ -383,7 +377,7 @@ index 0000000000..b7bc7f2574
 +{
 +    Coroutine *co = qemu_coroutine_self();
 +    AioContext *ctx = qemu_get_current_aio_context();
-+    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
++    CoCtxData waker = { .co = co, .ctx = ctx };
 +    char *pbs_err = NULL;
 +    int pbs_res = -1;
 +
@@ -404,7 +398,7 @@ index 0000000000..b7bc7f2574
 +{
 +    Coroutine *co = qemu_coroutine_self();
 +    AioContext *ctx = qemu_get_current_aio_context();
-+    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
++    CoCtxData waker = { .co = co, .ctx = ctx };
 +    char *pbs_err = NULL;
 +    int pbs_res = -1;
 +
@@ -426,7 +420,7 @@ index 0000000000..b7bc7f2574
 +{
 +    Coroutine *co = qemu_coroutine_self();
 +    AioContext *ctx = qemu_get_current_aio_context();
-+    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
++    CoCtxData waker = { .co = co, .ctx = ctx };
 +    char *pbs_err = NULL;
 +    int pbs_res = -1;
 +
@@ -451,7 +445,7 @@ index 0000000000..b7bc7f2574
 +{
 +    Coroutine *co = qemu_coroutine_self();
 +    AioContext *ctx = qemu_get_current_aio_context();
-+    ProxmoxBackupWaker waker = { .co = co, .ctx = ctx };
++    CoCtxData waker = { .co = co, .ctx = ctx };
 +    char *pbs_err = NULL;
 +    int pbs_res = -1;
 +
@@ -466,10 +460,10 @@ index 0000000000..b7bc7f2574
 +}
 diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
 new file mode 100644
-index 0000000000..b311bf8de8
+index 0000000000..1dda8b7d8f
 --- /dev/null
 +++ b/proxmox-backup-client.h
-@@ -0,0 +1,52 @@
+@@ -0,0 +1,59 @@
 +#ifndef PROXMOX_BACKUP_CLIENT_H
 +#define PROXMOX_BACKUP_CLIENT_H
 +
@@ -477,6 +471,13 @@ index 0000000000..b311bf8de8
 +#include "qemu/coroutine.h"
 +#include "proxmox-backup-qemu.h"
 +
++typedef struct CoCtxData {
++    Coroutine *co;
++    AioContext *ctx;
++    void *data;
++} CoCtxData;
++
++// FIXME: Remove once coroutines are supported for QMP
 +void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg);
 +
 +int coroutine_fn
diff --git a/debian/patches/pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch b/debian/patches/pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
index 1e550f4..c974060 100644
--- a/debian/patches/pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
+++ b/debian/patches/pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
@@ -99,10 +99,10 @@ index 0e2d166552..3ff014d32a 100644
  
      qapi_free_BackupStatus(info);
 diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
-index b7bc7f2574..0e9c584701 100644
+index a8f6653a81..4ce7bc0b5e 100644
 --- a/proxmox-backup-client.c
 +++ b/proxmox-backup-client.c
-@@ -95,6 +95,7 @@ proxmox_backup_co_register_image(
+@@ -89,6 +89,7 @@ proxmox_backup_co_register_image(
      ProxmoxBackupHandle *pbs,
      const char *device_name,
      uint64_t size,
@@ -110,7 +110,7 @@ index b7bc7f2574..0e9c584701 100644
      Error **errp)
  {
      Coroutine *co = qemu_coroutine_self();
-@@ -104,7 +105,7 @@ proxmox_backup_co_register_image(
+@@ -98,7 +99,7 @@ proxmox_backup_co_register_image(
      int pbs_res = -1;
  
      proxmox_backup_register_image_async(
@@ -120,10 +120,10 @@ index b7bc7f2574..0e9c584701 100644
      if (pbs_res < 0) {
          if (errp) error_setg(errp, "backup register image failed: %s", pbs_err ? pbs_err : "unknown error");
 diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
-index b311bf8de8..20fd6b1719 100644
+index 1dda8b7d8f..8cbf645b2c 100644
 --- a/proxmox-backup-client.h
 +++ b/proxmox-backup-client.h
-@@ -25,6 +25,7 @@ proxmox_backup_co_register_image(
+@@ -32,6 +32,7 @@ proxmox_backup_co_register_image(
      ProxmoxBackupHandle *pbs,
      const char *device_name,
      uint64_t size,
diff --git a/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch b/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
index a4ca3c4..1d54282 100644
--- a/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
+++ b/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
@@ -5,6 +5,8 @@ Subject: [PATCH] PVE: redirect stderr to journal when daemonized
 
 QEMU uses the logging for error messages usually, so LOG_ERR is most
 fitting.
+
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 ---
  Makefile.objs | 1 +
  os-posix.c    | 7 +++++--
diff --git a/debian/patches/pve/0050-PVE-Add-sequential-job-transaction-support.patch b/debian/patches/pve/0050-PVE-Add-sequential-job-transaction-support.patch
index 67b053d..20531d3 100644
--- a/debian/patches/pve/0050-PVE-Add-sequential-job-transaction-support.patch
+++ b/debian/patches/pve/0050-PVE-Add-sequential-job-transaction-support.patch
@@ -3,6 +3,7 @@ From: Stefan Reiter <s.reiter@proxmox.com>
 Date: Thu, 20 Aug 2020 14:31:59 +0200
 Subject: [PATCH] PVE: Add sequential job transaction support
 
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 ---
  include/qemu/job.h | 12 ++++++++++++
  job.c              | 24 ++++++++++++++++++++++++
diff --git a/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch b/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
index 0e33331..a9489a8 100644
--- a/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
+++ b/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
@@ -9,6 +9,8 @@ since QEMU's BITMAP_SYNC_MODE_ON_SUCCESS will now handle that for us.
 
 To keep the rate-limiting and IO impact from before, we use a sequential
 transaction, so drives will still be backed up one after the other.
+
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 ---
  pve-backup.c | 167 +++++++++++++++------------------------------------
  1 file changed, 49 insertions(+), 118 deletions(-)
diff --git a/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch b/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
index 8ae58fe..695c0fd 100644
--- a/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
+++ b/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
@@ -20,38 +20,25 @@ Cases of aio_context_acquire/release from within what is now a coroutine
 are changed to aio_co_reschedule_self, which works since a running
 coroutine always holds the aio lock for the context it is running in.
 
-job_cancel_sync is changed to regular job_cancel, since job_cancel_sync
-uses AIO_WAIT_WHILE internally, which is forbidden from Coroutines.
+job_cancel_sync is called from a BH since it can't be run from a
+coroutine (uses AIO_WAIT_WHILE internally).
 
-create_backup_jobs cannot be run from a coroutine, so it is run
-directly. This does however mean that it runs unlocked, as it cannot
-acquire a CoMutex (see it's comment for rational on why that's ok).
+Same thing for create_backup_jobs, which is converted to a BH too.
 
 To communicate the finishing state, a new property is introduced to
 query-backup: 'finishing'. A new state is explicitly not used, since
 that would break compatibility with older qemu-server versions.
 
 [0] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg03515.html
----
- proxmox-backup-client.h |   1 +
- pve-backup.c            | 124 ++++++++++++++++++++++------------------
- qapi/block-core.json    |   5 +-
- 3 files changed, 74 insertions(+), 56 deletions(-)
 
-diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
-index 20fd6b1719..a4781c5851 100644
---- a/proxmox-backup-client.h
-+++ b/proxmox-backup-client.h
-@@ -5,6 +5,7 @@
- #include "qemu/coroutine.h"
- #include "proxmox-backup-qemu.h"
- 
-+// FIXME: Remove once coroutines are supported for QMP
- void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg);
- 
- int coroutine_fn
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ pve-backup.c         | 148 ++++++++++++++++++++++++++-----------------
+ qapi/block-core.json |   5 +-
+ 2 files changed, 95 insertions(+), 58 deletions(-)
+
 diff --git a/pve-backup.c b/pve-backup.c
-index 9562e9c98d..53cf23ed5a 100644
+index 9562e9c98d..0466145bec 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -33,7 +33,9 @@ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
@@ -172,7 +159,7 @@ index 9562e9c98d..53cf23ed5a 100644
  
      // remove self from job list
      backup_state.di_list = g_list_remove(backup_state.di_list, di);
-@@ -310,21 +310,36 @@ static void pvebackup_complete_cb(void *opaque, int ret)
+@@ -310,21 +310,49 @@ static void pvebackup_complete_cb(void *opaque, int ret)
  
      /* call cleanup if we're the last job */
      if (!g_list_first(backup_state.di_list)) {
@@ -187,19 +174,33 @@ index 9562e9c98d..53cf23ed5a 100644
 -static void pvebackup_cancel(void)
 +static void pvebackup_complete_cb(void *opaque, int ret)
  {
-     assert(!qemu_in_coroutine());
- 
+-    assert(!qemu_in_coroutine());
 +    PVEBackupDevInfo *di = opaque;
 +    di->completed_ret = ret;
 +
 +    /*
 +     * Schedule stream cleanup in async coroutine. close_image and finish might
-+     * take a while, so we can't block on them here.
++     * take a while, so we can't block on them here. This way it also doesn't
++     * matter if we're already running in a coroutine or not.
 +     * Note: di is a pointer to an entry in the global backup_state struct, so
 +     * it stays valid.
 +     */
 +    Coroutine *co = qemu_coroutine_create(pvebackup_co_complete_stream, di);
-+    aio_co_schedule(qemu_get_aio_context(), co);
++    aio_co_enter(qemu_get_aio_context(), co);
++}
+ 
++/*
++ * job_cancel(_sync) does not like to be called from coroutines, so defer to
++ * main loop processing via a bottom half.
++ */
++static void job_cancel_bh(void *opaque) {
++    CoCtxData *data = (CoCtxData*)opaque;
++    Job *job = (Job*)data->data;
++    AioContext *job_ctx = job->aio_context;
++    aio_context_acquire(job_ctx);
++    job_cancel_sync(job);
++    aio_context_release(job_ctx);
++    aio_co_enter(data->ctx, data->co);
 +}
 +
 +static void coroutine_fn pvebackup_co_cancel(void *opaque)
@@ -213,14 +214,20 @@ index 9562e9c98d..53cf23ed5a 100644
  
      if (backup_state.vmaw) {
          /* make sure vma writer does not block anymore */
-@@ -342,27 +357,16 @@ static void pvebackup_cancel(void)
+@@ -342,27 +370,22 @@ static void pvebackup_cancel(void)
          ((PVEBackupDevInfo *)bdi->data)->job :
          NULL;
  
 -    /* ref the job before releasing the mutex, just to be safe */
      if (cancel_job) {
 -        job_ref(&cancel_job->job);
-+        job_cancel(&cancel_job->job, false);
++        CoCtxData data = {
++            .ctx = qemu_get_current_aio_context(),
++            .co = qemu_coroutine_self(),
++            .data = &cancel_job->job,
++        };
++        aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
++        qemu_coroutine_yield();
      }
  
 -    /* job_cancel_sync may enter the job, so we need to release the
@@ -244,7 +251,7 @@ index 9562e9c98d..53cf23ed5a 100644
  }
  
  // assumes the caller holds backup_mutex
-@@ -415,6 +419,14 @@ static int coroutine_fn pvebackup_co_add_config(
+@@ -415,6 +438,14 @@ static int coroutine_fn pvebackup_co_add_config(
      goto out;
  }
  
@@ -259,7 +266,7 @@ index 9562e9c98d..53cf23ed5a 100644
  static bool create_backup_jobs(void) {
  
      assert(!qemu_in_coroutine());
-@@ -523,11 +535,12 @@ typedef struct QmpBackupTask {
+@@ -523,11 +554,12 @@ typedef struct QmpBackupTask {
      UuidInfo *result;
  } QmpBackupTask;
  
@@ -273,7 +280,18 @@ index 9562e9c98d..53cf23ed5a 100644
      QmpBackupTask *task = opaque;
  
      task->result = NULL; // just to be sure
-@@ -616,6 +629,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -548,8 +580,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+     const char *firewall_name = "qemu-server.fw";
+ 
+     if (backup_state.di_list) {
+-         error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
++        error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+                   "previous backup not finished");
++        qemu_co_mutex_unlock(&backup_state.backup_mutex);
+         return;
+     }
+ 
+@@ -616,6 +649,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
          }
          di->size = size;
          total += size;
@@ -282,7 +300,7 @@ index 9562e9c98d..53cf23ed5a 100644
      }
  
      uuid_generate(uuid);
-@@ -847,6 +862,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -847,6 +882,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
      backup_state.stat.dirty = total - backup_state.stat.reused;
      backup_state.stat.transferred = 0;
      backup_state.stat.zero_bytes = 0;
@@ -290,7 +308,7 @@ index 9562e9c98d..53cf23ed5a 100644
  
      qemu_mutex_unlock(&backup_state.stat.lock);
  
-@@ -861,6 +877,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -861,6 +897,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
      uuid_info->UUID = uuid_str;
  
      task->result = uuid_info;
@@ -299,7 +317,7 @@ index 9562e9c98d..53cf23ed5a 100644
      return;
  
  err_mutex:
-@@ -903,6 +921,8 @@ err:
+@@ -903,6 +941,8 @@ err:
      }
  
      task->result = NULL;
@@ -308,7 +326,7 @@ index 9562e9c98d..53cf23ed5a 100644
      return;
  }
  
-@@ -956,22 +976,15 @@ UuidInfo *qmp_backup(
+@@ -956,22 +996,15 @@ UuidInfo *qmp_backup(
          .errp = errp,
      };
  
@@ -332,7 +350,7 @@ index 9562e9c98d..53cf23ed5a 100644
      }
  
      return task.result;
-@@ -1025,6 +1038,7 @@ BackupStatus *qmp_query_backup(Error **errp)
+@@ -1025,6 +1058,7 @@ BackupStatus *qmp_query_backup(Error **errp)
      info->transferred = backup_state.stat.transferred;
      info->has_reused = true;
      info->reused = backup_state.stat.reused;
diff --git a/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch b/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
new file mode 100644
index 0000000..45dabc4
--- /dev/null
+++ b/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
@@ -0,0 +1,187 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Thu, 22 Oct 2020 17:01:07 +0200
+Subject: [PATCH] PVE: fix and clean up error handling for create_backup_jobs
+
+No more weird bool returns, just the standard "errp" format used
+everywhere else too. With this, if backup_job_create fails, the error
+message is actually returned over QMP and can be shown to the user.
+
+To facilitate correct cleanup on such an error, we call
+create_backup_jobs as a bottom half directly from pvebackup_co_prepare.
+This additionally allows us to actually hold the backup_mutex during
+operation.
+
+Also add a job_cancel_sync before job_unref, since a job must be in
+STATUS_NULL to be deleted by unref, which could trigger an assert
+before.
+
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ pve-backup.c | 79 +++++++++++++++++++++++++++++++++++-----------------
+ 1 file changed, 54 insertions(+), 25 deletions(-)
+
+diff --git a/pve-backup.c b/pve-backup.c
+index 0466145bec..1a2647e7a5 100644
+--- a/pve-backup.c
++++ b/pve-backup.c
+@@ -50,6 +50,7 @@ static struct PVEBackupState {
+         size_t zero_bytes;
+         GList *bitmap_list;
+         bool finishing;
++        bool starting;
+     } stat;
+     int64_t speed;
+     VmaWriter *vmaw;
+@@ -277,6 +278,16 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
+     PVEBackupDevInfo *di = opaque;
+     int ret = di->completed_ret;
+ 
++    qemu_mutex_lock(&backup_state.stat.lock);
++    bool starting = backup_state.stat.starting;
++    qemu_mutex_unlock(&backup_state.stat.lock);
++    if (starting) {
++        /* in 'starting' state, no tasks have been run yet, meaning we can (and
++         * must) skip all cleanup, as we don't know what has and hasn't been
++         * initialized yet. */
++        return;
++    }
++
+     qemu_co_mutex_lock(&backup_state.backup_mutex);
+ 
+     if (ret < 0) {
+@@ -441,15 +452,15 @@ static int coroutine_fn pvebackup_co_add_config(
+ /*
+  * backup_job_create can *not* be run from a coroutine (and requires an
+  * acquired AioContext), so this can't either.
+- * This does imply that this function cannot run with backup_mutex acquired.
+- * That is ok because it is only ever called between setting up the backup_state
+- * struct and starting the jobs, and from within a QMP call. This means that no
+- * other QMP call can interrupt, and no background job is running yet.
++ * The caller is responsible that backup_mutex is held nonetheless.
+  */
+-static bool create_backup_jobs(void) {
++static void create_backup_jobs_bh(void *opaque) {
+ 
+     assert(!qemu_in_coroutine());
+ 
++    CoCtxData *data = (CoCtxData*)opaque;
++    Error **errp = (Error**)data->data;
++
+     Error *local_err = NULL;
+ 
+     /* create job transaction to synchronize bitmap commit and cancel all
+@@ -483,24 +494,19 @@ static bool create_backup_jobs(void) {
+ 
+         aio_context_release(aio_context);
+ 
+-        if (!job || local_err != NULL) {
+-            Error *create_job_err = NULL;
+-            error_setg(&create_job_err, "backup_job_create failed: %s",
+-                       local_err ? error_get_pretty(local_err) : "null");
++        di->job = job;
+ 
+-            pvebackup_propagate_error(create_job_err);
++        if (!job || local_err) {
++            error_setg(errp, "backup_job_create failed: %s",
++                       local_err ? error_get_pretty(local_err) : "null");
+             break;
+         }
+ 
+-        di->job = job;
+-
+         bdrv_unref(di->target);
+         di->target = NULL;
+     }
+ 
+-    bool errors = pvebackup_error_or_canceled();
+-
+-    if (errors) {
++    if (*errp) {
+         l = backup_state.di_list;
+         while (l) {
+             PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+@@ -512,12 +518,17 @@ static bool create_backup_jobs(void) {
+             }
+ 
+             if (di->job) {
++                AioContext *ctx = di->job->job.aio_context;
++                aio_context_acquire(ctx);
++                job_cancel_sync(&di->job->job);
+                 job_unref(&di->job->job);
++                aio_context_release(ctx);
+             }
+         }
+     }
+ 
+-    return errors;
++    /* return */
++    aio_co_enter(data->ctx, data->co);
+ }
+ 
+ typedef struct QmpBackupTask {
+@@ -883,6 +894,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+     backup_state.stat.transferred = 0;
+     backup_state.stat.zero_bytes = 0;
+     backup_state.stat.finishing = false;
++    backup_state.stat.starting = true;
+ 
+     qemu_mutex_unlock(&backup_state.stat.lock);
+ 
+@@ -898,7 +910,32 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+ 
+     task->result = uuid_info;
+ 
++    /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
++    * backup_mutex locked. This is fine, a CoMutex can be held across yield
++    * points, and we'll release it as soon as the BH reschedules us.
++    */
++    CoCtxData waker = {
++        .co = qemu_coroutine_self(),
++        .ctx = qemu_get_current_aio_context(),
++        .data = &local_err,
++    };
++    aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
++    qemu_coroutine_yield();
++
++    if (local_err) {
++        error_propagate(task->errp, local_err);
++        goto err;
++    }
++
+     qemu_co_mutex_unlock(&backup_state.backup_mutex);
++
++    qemu_mutex_lock(&backup_state.stat.lock);
++    backup_state.stat.starting = false;
++    qemu_mutex_unlock(&backup_state.stat.lock);
++
++    /* start the first job in the transaction */
++    job_txn_start_seq(backup_state.txn);
++
+     return;
+ 
+ err_mutex:
+@@ -921,6 +958,7 @@ err:
+         g_free(di);
+     }
+     g_list_free(di_list);
++    backup_state.di_list = NULL;
+ 
+     if (devs) {
+         g_strfreev(devs);
+@@ -998,15 +1036,6 @@ UuidInfo *qmp_backup(
+ 
+     block_on_coroutine_fn(pvebackup_co_prepare, &task);
+ 
+-    if (*errp == NULL) {
+-        bool errors = create_backup_jobs();
+-
+-        if (!errors) {
+-            // start the first job in the transaction
+-            job_txn_start_seq(backup_state.txn);
+-        }
+-    }
+-
+     return task.result;
+ }
+ 
diff --git a/debian/patches/series b/debian/patches/series
index 021c0d7..9ebe181 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -53,3 +53,4 @@ pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
 pve/0050-PVE-Add-sequential-job-transaction-support.patch
 pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
 pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
+pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
-- 
2.20.1





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] applied: [PATCH v2 qemu 6/6] Several fixes for backup abort and error reporting
  2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 6/6] Several fixes for backup abort and error reporting Stefan Reiter
@ 2020-10-29 17:27   ` Thomas Lamprecht
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Lamprecht @ 2020-10-29 17:27 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter

On 29.10.20 14:10, Stefan Reiter wrote:
> Also add my Signed-off-by to some patches where it was missing.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> v2: new
> 
> Patches 1-5 squashed in or added to the pve-qemu repository.
> 
> Confirmed with "objdump" that the binary is the same with squashed changes as
> with standalone patches.
> 
>  ...ckup-proxmox-backup-patches-for-qemu.patch |  41 ++--
>  ...irty-bitmap-tracking-for-incremental.patch |  10 +-
>  ...ct-stderr-to-journal-when-daemonized.patch |   2 +
>  ...d-sequential-job-transaction-support.patch |   1 +
>  ...-transaction-to-synchronize-job-stat.patch |   2 +
>  ...ore-coroutines-and-don-t-block-on-fi.patch |  94 +++++----
>  ...n-up-error-handling-for-create_backu.patch | 187 ++++++++++++++++++
>  debian/patches/series                         |   1 +
>  8 files changed, 275 insertions(+), 63 deletions(-)
>  create mode 100644 debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-29 17:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29 13:10 [pve-devel] [PATCH v2 0/6] QEMU backup cancellation fixes Stefan Reiter
2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 1/6] PVE: fixup: drop CoMutex on error return Stefan Reiter
2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 2/6] PVE: Introduce generic CoCtxData struct Stefan Reiter
2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 3/6] PVE: Don't expect complete_cb to be called outside coroutine Stefan Reiter
2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 4/6] PVE: Don't call job_cancel in coroutines Stefan Reiter
2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 5/6] PVE: fix and clean up error handling for create_backup_jobs Stefan Reiter
2020-10-29 13:10 ` [pve-devel] [PATCH v2 qemu 6/6] Several fixes for backup abort and error reporting Stefan Reiter
2020-10-29 17:27   ` [pve-devel] applied: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal