[pve-devel] [RFC v2 0/3] Add job transaction support for backup

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [RFC v2 0/3] Add job transaction support for backup
@ 2020-08-25 13:15 Stefan Reiter
  2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 1/3] Revert "PVE-Backup: modify job api" Stefan Reiter
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Stefan Reiter @ 2020-08-25 13:15 UTC (permalink / raw)
  To: pve-devel

Still RFC and experimental, though with more (successful) testing since v1.

Our backup starts one QEMU block job per drive that is included in the final
archive. Currently, we start them all in 'paused' state, manually calling
job_start for the next one we whenever one calls its completion callback.

By using a transaction, we can automate that part, meaning we need a lot less
code in pve-backup.c. Additionally, we gain the benefit of having transactional
backups, which is important for dirty-bitmaps, since it allows us to keep all
bitmaps even when a backup fails or is aborted. QEMU does the heavy lifting
here, by already having support for transactions in backup jobs and handling the
bitmaps accordingly.

A small change to the job transaction API of QEMU is necessary (patch 2) to
allow the jobs to run in sequence, instead of all at once. This property is good
to have for several reasons, like not overloading a server, writing VMA files in
sequential order and supporting bandwidth limiting (which only works per job in
QEMU).

v2:
* revert/remove existing 'PVE-Backup: modify job api'

qemu: Stefan Reiter (3):
  Revert "PVE-Backup: modify job api"
  PVE: Add sequential job transaction support
  PVE-Backup: Use a transaction to synchronize job states

 block/backup.c            |   3 -
 block/replication.c       |   2 +-
 blockdev.c                |   3 +-
 include/block/block_int.h |   1 -
 include/qemu/job.h        |  12 +++
 job.c                     |  26 +++++-
 pve-backup.c              | 169 +++++++++++---------------------------
 7 files changed, 89 insertions(+), 127 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [RFC v2 qemu 1/3] Revert "PVE-Backup: modify job api"
  2020-08-25 13:15 [pve-devel] [RFC v2 0/3] Add job transaction support for backup Stefan Reiter
@ 2020-08-25 13:15 ` Stefan Reiter
  2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 2/3] PVE: Add sequential job transaction support Stefan Reiter
  2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 3/3] PVE-Backup: Use a transaction to synchronize job states Stefan Reiter
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Reiter @ 2020-08-25 13:15 UTC (permalink / raw)
  To: pve-devel

This reverts commit 640d758d467962e7e67b4aea867037f1304424af.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Added in v2. Not necessary anymore with the transaction patches, the jobs are
automatically created in "JOB_STATUS_CREATED" and not running, so the sequential
transactions work without this as well. Removes even more custom patch code :)

On applying to pve-qemu you can of course just remove the patch instead of
applying the revert.

 block/backup.c            | 3 ---
 block/replication.c       | 2 +-
 blockdev.c                | 3 +--
 include/block/block_int.h | 1 -
 job.c                     | 2 +-
 5 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 1bcc7faa32..cd42236b79 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -320,7 +320,6 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BlockdevOnError on_target_error,
                   int creation_flags,
                   BlockCompletionFunc *cb, void *opaque,
-                  int pause_count,
                   JobTxn *txn, Error **errp)
 {
     int64_t len, target_len;
@@ -459,8 +458,6 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
                        &error_abort);
 
-    job->common.job.pause_count += pause_count;
-
     return &job->common;
 
  error:
diff --git a/block/replication.c b/block/replication.c
index 59270a0468..0c70215784 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -560,7 +560,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
                                 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
-                                backup_job_completed, bs, 0, NULL, &local_err);
+                                backup_job_completed, bs, NULL, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             backup_job_cleanup(bs);
diff --git a/blockdev.c b/blockdev.c
index 1db0cbcad5..be87d65c02 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2833,8 +2833,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
                             backup->filter_node_name,
                             backup->on_source_error,
                             backup->on_target_error,
-                            job_flags, NULL, NULL, 0, txn, errp);
-
+                            job_flags, NULL, NULL, txn, errp);
     return job;
 }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index befdae125b..279bd4ab61 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1286,7 +1286,6 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                             BlockdevOnError on_target_error,
                             int creation_flags,
                             BlockCompletionFunc *cb, void *opaque,
-                            int pause_count,
                             JobTxn *txn, Error **errp);
 
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
diff --git a/job.c b/job.c
index bcbbb0be02..b8139c80a4 100644
--- a/job.c
+++ b/job.c
@@ -919,7 +919,7 @@ void job_start(Job *job)
     job->co = qemu_coroutine_create(job_co_entry, job);
     job->pause_count--;
     job->busy = true;
-    job->paused = job->pause_count > 0;
+    job->paused = false;
     job_state_transition(job, JOB_STATUS_RUNNING);
     aio_co_enter(job->aio_context, job->co);
 }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [RFC v2 qemu 2/3] PVE: Add sequential job transaction support
  2020-08-25 13:15 [pve-devel] [RFC v2 0/3] Add job transaction support for backup Stefan Reiter
  2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 1/3] Revert "PVE-Backup: modify job api" Stefan Reiter
@ 2020-08-25 13:15 ` Stefan Reiter
  2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 3/3] PVE-Backup: Use a transaction to synchronize job states Stefan Reiter
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Reiter @ 2020-08-25 13:15 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 include/qemu/job.h | 12 ++++++++++++
 job.c              | 24 ++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 32aabb1c60..f7a6a0926a 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -280,6 +280,18 @@ typedef enum JobCreateFlags {
  */
 JobTxn *job_txn_new(void);
 
+/**
+ * Create a new transaction and set it to sequential mode, i.e. run all jobs
+ * one after the other instead of at the same time.
+ */
+JobTxn *job_txn_new_seq(void);
+
+/**
+ * Helper method to start the first job in a sequential transaction to kick it
+ * off. Other jobs will be run after this one completes.
+ */
+void job_txn_start_seq(JobTxn *txn);
+
 /**
  * Release a reference that was previously acquired with job_txn_add_job or
  * job_txn_new. If it's the last reference to the object, it will be freed.
diff --git a/job.c b/job.c
index b8139c80a4..97ee97a192 100644
--- a/job.c
+++ b/job.c
@@ -72,6 +72,8 @@ struct JobTxn {
 
     /* Reference count */
     int refcnt;
+
+    bool sequential;
 };
 
 /* Right now, this mutex is only needed to synchronize accesses to job->busy
@@ -102,6 +104,25 @@ JobTxn *job_txn_new(void)
     return txn;
 }
 
+JobTxn *job_txn_new_seq(void)
+{
+    JobTxn *txn = job_txn_new();
+    txn->sequential = true;
+    return txn;
+}
+
+void job_txn_start_seq(JobTxn *txn)
+{
+    assert(txn->sequential);
+    assert(!txn->aborting);
+
+    Job *first = QLIST_FIRST(&txn->jobs);
+    assert(first);
+    assert(first->status == JOB_STATUS_CREATED);
+
+    job_start(first);
+}
+
 static void job_txn_ref(JobTxn *txn)
 {
     txn->refcnt++;
@@ -841,6 +862,9 @@ static void job_completed_txn_success(Job *job)
      */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         if (!job_is_completed(other_job)) {
+            if (txn->sequential) {
+                job_start(other_job);
+            }
             return;
         }
         assert(other_job->ret == 0);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [RFC v2 qemu 3/3] PVE-Backup: Use a transaction to synchronize job states
  2020-08-25 13:15 [pve-devel] [RFC v2 0/3] Add job transaction support for backup Stefan Reiter
  2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 1/3] Revert "PVE-Backup: modify job api" Stefan Reiter
  2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 2/3] PVE: Add sequential job transaction support Stefan Reiter
@ 2020-08-25 13:15 ` Stefan Reiter
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Reiter @ 2020-08-25 13:15 UTC (permalink / raw)
  To: pve-devel

By using a JobTxn, we can sync dirty bitmaps only when *all* jobs were
successful - meaning we don't need to remove them when the backup fails,
since QEMU's BITMAP_SYNC_MODE_ON_SUCCESS will now handle that for us.

To keep the rate-limiting and IO impact from before, we use a sequential
transaction, so drives will still be backed up one after the other.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* updated to no longer pass "pause_count" since that patch is now gone

 pve-backup.c | 169 +++++++++++++++------------------------------------
 1 file changed, 50 insertions(+), 119 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 04c21c80aa..9562e9c98d 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -52,6 +52,7 @@ static struct PVEBackupState {
     VmaWriter *vmaw;
     ProxmoxBackupHandle *pbs;
     GList *di_list;
+    JobTxn *txn;
     QemuMutex backup_mutex;
     CoMutex dump_callback_mutex;
 } backup_state;
@@ -71,32 +72,12 @@ typedef struct PVEBackupDevInfo {
     size_t size;
     uint64_t block_size;
     uint8_t dev_id;
-    bool completed;
     char targetfile[PATH_MAX];
     BdrvDirtyBitmap *bitmap;
     BlockDriverState *target;
+    BlockJob *job;
 } PVEBackupDevInfo;
 
-static void pvebackup_run_next_job(void);
-
-static BlockJob *
-lookup_active_block_job(PVEBackupDevInfo *di)
-{
-    if (!di->completed && di->bs) {
-        for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
-            if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
-                continue;
-            }
-
-            BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
-            if (bjob && bjob->source_bs == di->bs) {
-                return job;
-            }
-        }
-    }
-    return NULL;
-}
-
 static void pvebackup_propagate_error(Error *err)
 {
     qemu_mutex_lock(&backup_state.stat.lock);
@@ -272,18 +253,6 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
             if (local_err != NULL) {
                 pvebackup_propagate_error(local_err);
             }
-        } else {
-            // on error or cancel we cannot ensure synchronization of dirty
-            // bitmaps with backup server, so remove all and do full backup next
-            GList *l = backup_state.di_list;
-            while (l) {
-                PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-                l = g_list_next(l);
-
-                if (di->bitmap) {
-                    bdrv_release_dirty_bitmap(di->bitmap);
-                }
-            }
         }
 
         proxmox_backup_disconnect(backup_state.pbs);
@@ -322,8 +291,6 @@ static void pvebackup_complete_cb(void *opaque, int ret)
 
     qemu_mutex_lock(&backup_state.backup_mutex);
 
-    di->completed = true;
-
     if (ret < 0) {
         Error *local_err = NULL;
         error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
@@ -336,20 +303,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
 
     block_on_coroutine_fn(pvebackup_complete_stream, di);
 
-    // remove self from job queue
+    // remove self from job list
     backup_state.di_list = g_list_remove(backup_state.di_list, di);
 
-    if (di->bitmap && ret < 0) {
-        // on error or cancel we cannot ensure synchronization of dirty
-        // bitmaps with backup server, so remove all and do full backup next
-        bdrv_release_dirty_bitmap(di->bitmap);
-    }
-
     g_free(di);
 
-    qemu_mutex_unlock(&backup_state.backup_mutex);
+    /* call cleanup if we're the last job */
+    if (!g_list_first(backup_state.di_list)) {
+        block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+    }
 
-    pvebackup_run_next_job();
+    qemu_mutex_unlock(&backup_state.backup_mutex);
 }
 
 static void pvebackup_cancel(void)
@@ -371,36 +335,28 @@ static void pvebackup_cancel(void)
         proxmox_backup_abort(backup_state.pbs, "backup canceled");
     }
 
+    /* it's enough to cancel one job in the transaction, the rest will follow
+     * automatically */
+    GList *bdi = g_list_first(backup_state.di_list);
+    BlockJob *cancel_job = bdi && bdi->data ?
+        ((PVEBackupDevInfo *)bdi->data)->job :
+        NULL;
+
+    /* ref the job before releasing the mutex, just to be safe */
+    if (cancel_job) {
+        job_ref(&cancel_job->job);
+    }
+
+    /* job_cancel_sync may enter the job, so we need to release the
+     * backup_mutex to avoid deadlock */
     qemu_mutex_unlock(&backup_state.backup_mutex);
 
-    for(;;) {
-
-        BlockJob *next_job = NULL;
-
-        qemu_mutex_lock(&backup_state.backup_mutex);
-
-        GList *l = backup_state.di_list;
-        while (l) {
-            PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-            l = g_list_next(l);
-
-            BlockJob *job = lookup_active_block_job(di);
-            if (job != NULL) {
-                next_job = job;
-                break;
-            }
-        }
-
-        qemu_mutex_unlock(&backup_state.backup_mutex);
-
-        if (next_job) {
-            AioContext *aio_context = next_job->job.aio_context;
-            aio_context_acquire(aio_context);
-            job_cancel_sync(&next_job->job);
-            aio_context_release(aio_context);
-        } else {
-            break;
-        }
+    if (cancel_job) {
+        AioContext *aio_context = cancel_job->job.aio_context;
+        aio_context_acquire(aio_context);
+        job_cancel_sync(&cancel_job->job);
+        job_unref(&cancel_job->job);
+        aio_context_release(aio_context);
     }
 }
 
@@ -459,51 +415,19 @@ static int coroutine_fn pvebackup_co_add_config(
     goto out;
 }
 
-bool job_should_pause(Job *job);
-
-static void pvebackup_run_next_job(void)
-{
-    assert(!qemu_in_coroutine());
-
-    qemu_mutex_lock(&backup_state.backup_mutex);
-
-    GList *l = backup_state.di_list;
-    while (l) {
-        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-        l = g_list_next(l);
-
-        BlockJob *job = lookup_active_block_job(di);
-
-        if (job) {
-            qemu_mutex_unlock(&backup_state.backup_mutex);
-
-            AioContext *aio_context = job->job.aio_context;
-            aio_context_acquire(aio_context);
-
-            if (job_should_pause(&job->job)) {
-                bool error_or_canceled = pvebackup_error_or_canceled();
-                if (error_or_canceled) {
-                    job_cancel_sync(&job->job);
-                } else {
-                    job_resume(&job->job);
-                }
-            }
-            aio_context_release(aio_context);
-            return;
-        }
-    }
-
-    block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
-
-    qemu_mutex_unlock(&backup_state.backup_mutex);
-}
-
 static bool create_backup_jobs(void) {
 
     assert(!qemu_in_coroutine());
 
     Error *local_err = NULL;
 
+    /* create job transaction to synchronize bitmap commit and cancel all
+     * jobs in case one errors */
+    if (backup_state.txn) {
+        job_txn_unref(backup_state.txn);
+    }
+    backup_state.txn = job_txn_new_seq();
+
     /* create and start all jobs (paused state) */
     GList *l =  backup_state.di_list;
     while (l) {
@@ -524,7 +448,7 @@ static bool create_backup_jobs(void) {
         BlockJob *job = backup_job_create(
             NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
             bitmap_mode, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-            JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
+            JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn, &local_err);
 
         aio_context_release(aio_context);
 
@@ -536,7 +460,8 @@ static bool create_backup_jobs(void) {
             pvebackup_propagate_error(create_job_err);
             break;
         }
-        job_start(&job->job);
+
+        di->job = job;
 
         bdrv_unref(di->target);
         di->target = NULL;
@@ -554,6 +479,10 @@ static bool create_backup_jobs(void) {
                 bdrv_unref(di->target);
                 di->target = NULL;
             }
+
+            if (di->job) {
+                job_unref(&di->job->job);
+            }
         }
     }
 
@@ -944,10 +873,6 @@ err:
         PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
         l = g_list_next(l);
 
-        if (di->bitmap) {
-            bdrv_release_dirty_bitmap(di->bitmap);
-        }
-
         if (di->target) {
             bdrv_unref(di->target);
         }
@@ -1036,9 +961,15 @@ UuidInfo *qmp_backup(
     block_on_coroutine_fn(pvebackup_co_prepare, &task);
 
     if (*errp == NULL) {
-        create_backup_jobs();
+        bool errors = create_backup_jobs();
         qemu_mutex_unlock(&backup_state.backup_mutex);
-        pvebackup_run_next_job();
+
+        if (!errors) {
+            /* start the first job in the transaction
+             * note: this might directly enter the job, so we need to do this
+             * after unlocking the backup_mutex */
+            job_txn_start_seq(backup_state.txn);
+        }
     } else {
         qemu_mutex_unlock(&backup_state.backup_mutex);
     }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-25 13:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25 13:15 [pve-devel] [RFC v2 0/3] Add job transaction support for backup Stefan Reiter
2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 1/3] Revert "PVE-Backup: modify job api" Stefan Reiter
2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 2/3] PVE: Add sequential job transaction support Stefan Reiter
2020-08-25 13:15 ` [pve-devel] [RFC v2 qemu 3/3] PVE-Backup: Use a transaction to synchronize job states Stefan Reiter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal