[pve-devel] [RFC 0/2] Add job transaction support for backup

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [RFC 0/2] Add job transaction support for backup
@ 2020-08-20 13:48 Stefan Reiter
  2020-08-20 13:48 ` [pve-devel] [RFC qemu 1/2] PVE: Add sequential job transaction support Stefan Reiter
  2020-08-20 13:48 ` [pve-devel] [RFC qemu 2/2] PVE-Backup: Use a transaction to synchronize job states Stefan Reiter
  0 siblings, 2 replies; 3+ messages in thread
From: Stefan Reiter @ 2020-08-20 13:48 UTC (permalink / raw)
  To: pve-devel

RFC since very experimental and only lightly tested.

Our backup starts one QEMU block job per drive that is included in the final
archive. Currently, we start them all in 'paused' state, manually calling
job_start for the next one we whenever one calls its completion callback.

By using a transaction, we can automate that part, meaning we need a lot less
code in pve-backup.c. Additionally, we gain the benefit of having transactional
backups, which is important for dirty-bitmaps, since it allows us to keep all
bitmaps even when a backup fails or is aborted. QEMU does the heavy lifting
here, by already having support for transactions in backup jobs and handling the
bitmaps accordingly.

A small change to the job transaction API of QEMU is necessary (patch 1) to
allow the jobs to run in sequence, instead of all at once. This property is good
to have for several reasons, like not overloading a server, writing VMA files in
sequential order and supporting bandwidth limiting (which only works per job in
QEMU).

qemu: Stefan Reiter (2):
  PVE: Add sequential job transaction support
  PVE-Backup: Use a transaction to synchronize job states

 include/qemu/job.h |  12 ++++
 job.c              |  24 +++++++
 pve-backup.c       | 169 ++++++++++++++-------------------------------
 3 files changed, 86 insertions(+), 119 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pve-devel] [RFC qemu 1/2] PVE: Add sequential job transaction support
  2020-08-20 13:48 [pve-devel] [RFC 0/2] Add job transaction support for backup Stefan Reiter
@ 2020-08-20 13:48 ` Stefan Reiter
  2020-08-20 13:48 ` [pve-devel] [RFC qemu 2/2] PVE-Backup: Use a transaction to synchronize job states Stefan Reiter
  1 sibling, 0 replies; 3+ messages in thread
From: Stefan Reiter @ 2020-08-20 13:48 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 include/qemu/job.h | 12 ++++++++++++
 job.c              | 24 ++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 32aabb1c60..f7a6a0926a 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -280,6 +280,18 @@ typedef enum JobCreateFlags {
  */
 JobTxn *job_txn_new(void);
 
+/**
+ * Create a new transaction and set it to sequential mode, i.e. run all jobs
+ * one after the other instead of at the same time.
+ */
+JobTxn *job_txn_new_seq(void);
+
+/**
+ * Helper method to start the first job in a sequential transaction to kick it
+ * off. Other jobs will be run after this one completes.
+ */
+void job_txn_start_seq(JobTxn *txn);
+
 /**
  * Release a reference that was previously acquired with job_txn_add_job or
  * job_txn_new. If it's the last reference to the object, it will be freed.
diff --git a/job.c b/job.c
index bcbbb0be02..901f84dceb 100644
--- a/job.c
+++ b/job.c
@@ -72,6 +72,8 @@ struct JobTxn {
 
     /* Reference count */
     int refcnt;
+
+    bool sequential;
 };
 
 /* Right now, this mutex is only needed to synchronize accesses to job->busy
@@ -102,6 +104,25 @@ JobTxn *job_txn_new(void)
     return txn;
 }
 
+JobTxn *job_txn_new_seq(void)
+{
+    JobTxn *txn = job_txn_new();
+    txn->sequential = true;
+    return txn;
+}
+
+void job_txn_start_seq(JobTxn *txn)
+{
+    assert(txn->sequential);
+    assert(!txn->aborting);
+
+    Job *first = QLIST_FIRST(&txn->jobs);
+    assert(first);
+    assert(first->status == JOB_STATUS_CREATED);
+
+    job_start(first);
+}
+
 static void job_txn_ref(JobTxn *txn)
 {
     txn->refcnt++;
@@ -841,6 +862,9 @@ static void job_completed_txn_success(Job *job)
      */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         if (!job_is_completed(other_job)) {
+            if (txn->sequential) {
+                job_start(other_job);
+            }
             return;
         }
         assert(other_job->ret == 0);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pve-devel] [RFC qemu 2/2] PVE-Backup: Use a transaction to synchronize job states
  2020-08-20 13:48 [pve-devel] [RFC 0/2] Add job transaction support for backup Stefan Reiter
  2020-08-20 13:48 ` [pve-devel] [RFC qemu 1/2] PVE: Add sequential job transaction support Stefan Reiter
@ 2020-08-20 13:48 ` Stefan Reiter
  1 sibling, 0 replies; 3+ messages in thread
From: Stefan Reiter @ 2020-08-20 13:48 UTC (permalink / raw)
  To: pve-devel

By using a JobTxn, we can sync dirty bitmaps only when *all* jobs were
successful - meaning we don't need to remove them when the backup fails,
since QEMU's BITMAP_SYNC_MODE_ON_SUCCESS will now handle that for us.

To keep the rate-limiting and IO impact from before, we use a sequential
transaction, so drives will still be backed up one after the other.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 pve-backup.c | 169 +++++++++++++++------------------------------------
 1 file changed, 50 insertions(+), 119 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 04c21c80aa..ffe3cceef9 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -52,6 +52,7 @@ static struct PVEBackupState {
     VmaWriter *vmaw;
     ProxmoxBackupHandle *pbs;
     GList *di_list;
+    JobTxn *txn;
     QemuMutex backup_mutex;
     CoMutex dump_callback_mutex;
 } backup_state;
@@ -71,32 +72,12 @@ typedef struct PVEBackupDevInfo {
     size_t size;
     uint64_t block_size;
     uint8_t dev_id;
-    bool completed;
     char targetfile[PATH_MAX];
     BdrvDirtyBitmap *bitmap;
     BlockDriverState *target;
+    BlockJob *job;
 } PVEBackupDevInfo;
 
-static void pvebackup_run_next_job(void);
-
-static BlockJob *
-lookup_active_block_job(PVEBackupDevInfo *di)
-{
-    if (!di->completed && di->bs) {
-        for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
-            if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
-                continue;
-            }
-
-            BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
-            if (bjob && bjob->source_bs == di->bs) {
-                return job;
-            }
-        }
-    }
-    return NULL;
-}
-
 static void pvebackup_propagate_error(Error *err)
 {
     qemu_mutex_lock(&backup_state.stat.lock);
@@ -272,18 +253,6 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
             if (local_err != NULL) {
                 pvebackup_propagate_error(local_err);
             }
-        } else {
-            // on error or cancel we cannot ensure synchronization of dirty
-            // bitmaps with backup server, so remove all and do full backup next
-            GList *l = backup_state.di_list;
-            while (l) {
-                PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-                l = g_list_next(l);
-
-                if (di->bitmap) {
-                    bdrv_release_dirty_bitmap(di->bitmap);
-                }
-            }
         }
 
         proxmox_backup_disconnect(backup_state.pbs);
@@ -322,8 +291,6 @@ static void pvebackup_complete_cb(void *opaque, int ret)
 
     qemu_mutex_lock(&backup_state.backup_mutex);
 
-    di->completed = true;
-
     if (ret < 0) {
         Error *local_err = NULL;
         error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
@@ -336,20 +303,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
 
     block_on_coroutine_fn(pvebackup_complete_stream, di);
 
-    // remove self from job queue
+    // remove self from job list
     backup_state.di_list = g_list_remove(backup_state.di_list, di);
 
-    if (di->bitmap && ret < 0) {
-        // on error or cancel we cannot ensure synchronization of dirty
-        // bitmaps with backup server, so remove all and do full backup next
-        bdrv_release_dirty_bitmap(di->bitmap);
-    }
-
     g_free(di);
 
-    qemu_mutex_unlock(&backup_state.backup_mutex);
+    /* call cleanup if we're the last job */
+    if (!g_list_first(backup_state.di_list)) {
+        block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+    }
 
-    pvebackup_run_next_job();
+    qemu_mutex_unlock(&backup_state.backup_mutex);
 }
 
 static void pvebackup_cancel(void)
@@ -371,36 +335,28 @@ static void pvebackup_cancel(void)
         proxmox_backup_abort(backup_state.pbs, "backup canceled");
     }
 
+    /* it's enough to cancel one job in the transaction, the rest will follow
+     * automatically */
+    GList *bdi = g_list_first(backup_state.di_list);
+    BlockJob *cancel_job = bdi && bdi->data ?
+        ((PVEBackupDevInfo *)bdi->data)->job :
+        NULL;
+
+    /* ref the job before releasing the mutex, just to be safe */
+    if (cancel_job) {
+        job_ref(&cancel_job->job);
+    }
+
+    /* job_cancel_sync may enter the job, so we need to release the
+     * backup_mutex to avoid deadlock */
     qemu_mutex_unlock(&backup_state.backup_mutex);
 
-    for(;;) {
-
-        BlockJob *next_job = NULL;
-
-        qemu_mutex_lock(&backup_state.backup_mutex);
-
-        GList *l = backup_state.di_list;
-        while (l) {
-            PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-            l = g_list_next(l);
-
-            BlockJob *job = lookup_active_block_job(di);
-            if (job != NULL) {
-                next_job = job;
-                break;
-            }
-        }
-
-        qemu_mutex_unlock(&backup_state.backup_mutex);
-
-        if (next_job) {
-            AioContext *aio_context = next_job->job.aio_context;
-            aio_context_acquire(aio_context);
-            job_cancel_sync(&next_job->job);
-            aio_context_release(aio_context);
-        } else {
-            break;
-        }
+    if (cancel_job) {
+        AioContext *aio_context = cancel_job->job.aio_context;
+        aio_context_acquire(aio_context);
+        job_cancel_sync(&cancel_job->job);
+        job_unref(&cancel_job->job);
+        aio_context_release(aio_context);
     }
 }
 
@@ -459,51 +415,19 @@ static int coroutine_fn pvebackup_co_add_config(
     goto out;
 }
 
-bool job_should_pause(Job *job);
-
-static void pvebackup_run_next_job(void)
-{
-    assert(!qemu_in_coroutine());
-
-    qemu_mutex_lock(&backup_state.backup_mutex);
-
-    GList *l = backup_state.di_list;
-    while (l) {
-        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-        l = g_list_next(l);
-
-        BlockJob *job = lookup_active_block_job(di);
-
-        if (job) {
-            qemu_mutex_unlock(&backup_state.backup_mutex);
-
-            AioContext *aio_context = job->job.aio_context;
-            aio_context_acquire(aio_context);
-
-            if (job_should_pause(&job->job)) {
-                bool error_or_canceled = pvebackup_error_or_canceled();
-                if (error_or_canceled) {
-                    job_cancel_sync(&job->job);
-                } else {
-                    job_resume(&job->job);
-                }
-            }
-            aio_context_release(aio_context);
-            return;
-        }
-    }
-
-    block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
-
-    qemu_mutex_unlock(&backup_state.backup_mutex);
-}
-
 static bool create_backup_jobs(void) {
 
     assert(!qemu_in_coroutine());
 
     Error *local_err = NULL;
 
+    /* create job transaction to synchronize bitmap commit and cancel all
+     * jobs in case one errors */
+    if (backup_state.txn) {
+        job_txn_unref(backup_state.txn);
+    }
+    backup_state.txn = job_txn_new_seq();
+
     /* create and start all jobs (paused state) */
     GList *l =  backup_state.di_list;
     while (l) {
@@ -524,7 +448,7 @@ static bool create_backup_jobs(void) {
         BlockJob *job = backup_job_create(
             NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
             bitmap_mode, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-            JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
+            JOB_DEFAULT, pvebackup_complete_cb, di, 0, backup_state.txn, &local_err);
 
         aio_context_release(aio_context);
 
@@ -536,7 +460,8 @@ static bool create_backup_jobs(void) {
             pvebackup_propagate_error(create_job_err);
             break;
         }
-        job_start(&job->job);
+
+        di->job = job;
 
         bdrv_unref(di->target);
         di->target = NULL;
@@ -554,6 +479,10 @@ static bool create_backup_jobs(void) {
                 bdrv_unref(di->target);
                 di->target = NULL;
             }
+
+            if (di->job) {
+                job_unref(&di->job->job);
+            }
         }
     }
 
@@ -944,10 +873,6 @@ err:
         PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
         l = g_list_next(l);
 
-        if (di->bitmap) {
-            bdrv_release_dirty_bitmap(di->bitmap);
-        }
-
         if (di->target) {
             bdrv_unref(di->target);
         }
@@ -1036,9 +961,15 @@ UuidInfo *qmp_backup(
     block_on_coroutine_fn(pvebackup_co_prepare, &task);
 
     if (*errp == NULL) {
-        create_backup_jobs();
+        bool errors = create_backup_jobs();
         qemu_mutex_unlock(&backup_state.backup_mutex);
-        pvebackup_run_next_job();
+
+        if (!errors) {
+            /* start the first job in the transaction
+             * note: this might directly enter the job, so we need to do this
+             * after unlocking the backup_mutex */
+            job_txn_start_seq(backup_state.txn);
+        }
     } else {
         qemu_mutex_unlock(&backup_state.backup_mutex);
     }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-20 13:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 13:48 [pve-devel] [RFC 0/2] Add job transaction support for backup Stefan Reiter
2020-08-20 13:48 ` [pve-devel] [RFC qemu 1/2] PVE: Add sequential job transaction support Stefan Reiter
2020-08-20 13:48 ` [pve-devel] [RFC qemu 2/2] PVE-Backup: Use a transaction to synchronize job states Stefan Reiter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal