public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots
@ 2021-03-03  9:56 Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 01/11] clean up pve/ patches by merging Stefan Reiter
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

First two patches are unrelated cleanups to QEMU patches-in-patches, patch 3
depends on them though.

v2:
* incorporated review feedback on qemu-server patches
* updated and rebased QEMU patches for 5.2
* fixed live-restore to ceph RBD - note that restoring to user-space Ceph is
  still rather slow, I suspect it's the same bug that effects snapshot writing
  as well, bigger buffers make it only slightly better

Original cover letter:

"live-restore" allows starting a VM immediately from a backup snapshot, no
waiting for a long restore process. This is made possible with QEMU backing
images, i.e. data is read from the backup which is attached to the VM as a
drive, but new data is written to the destination, while a background process
('block-stream') copies over data in a linear fashion as well.

QEMU backing images are normally only supported for qcow2 images, but since the
destination always starts out empty, we can use a dirty bitmap to achieve the
same effect - this is implemented as the 'alloc-track' driver in the 'qemu' part
of the series.

The Rust part of the equation is adjusted to provide (quiet a lot) more caching,
as mixing random read/write from the guest with the linear reads from the
background process (both of which may use read sizes smaller or bigger than a
single chunk) would thrash performance without large buffers.

I've marked the feature as 'experimental' in the GUI for now, as I'm sure there
are a lot of edge cases I've missed to test, and there's also the possibility of
data loss, since anything the VM writes during the restore is removed if the
restore fails.


pve-qemu: Stefan Reiter (3):
  clean up pve/ patches by merging
  move bitmap-mirror patches to seperate folder
  add alloc-track block driver patch

 ...-support-for-sync-bitmap-mode-never.patch} |  30 +-
 ...support-for-conditional-and-always-.patch} |   0
 ...heck-for-bitmap-mode-without-bitmap.patch} |   4 +-
 ...to-bdrv_dirty_bitmap_merge_internal.patch} |   0
 ...-iotests-add-test-for-bitmap-mirror.patch} |   0
 ...0006-mirror-move-some-checks-to-qmp.patch} |   4 +-
 ...le-posix-make-locking-optiono-on-cre.patch |   4 +-
 ...-Backup-add-backup-dump-block-driver.patch |   2 +-
 ...ckup-proxmox-backup-patches-for-qemu.patch | 671 ++++++-------
 ...estore-new-command-to-restore-from-p.patch |  18 +-
 ...rty-bitmap-tracking-for-incremental.patch} |  80 +-
 ...-coroutines-to-fix-AIO-freeze-cleanu.patch | 914 ------------------
 .../pve/0031-PVE-various-PBS-fixes.patch      | 218 +++++
 ...-driver-to-map-backup-archives-into.patch} |   0
 ...d-query_proxmox_support-QMP-command.patch} |   4 +-
 ...-add-query-pbs-bitmap-info-QMP-call.patch} |   0
 ...t-stderr-to-journal-when-daemonized.patch} |   0
 ...-sequential-job-transaction-support.patch} |  20 +-
 ...transaction-to-synchronize-job-stat.patch} |   0
 ...block-on-finishing-and-cleanup-crea.patch} | 245 +++--
 ...name-incremental-to-use-dirty-bitmap.patch | 126 ---
 ...grate-dirty-bitmap-state-via-savevm.patch} |   0
 .../pve/0039-PVE-fixup-pbs-restore-API.patch  |  44 -
 ...irty-counter-for-non-incremental-bac.patch |  30 -
 ...irty-bitmap-migrate-other-bitmaps-e.patch} |   0
 ...ll-back-to-open-iscsi-initiatorname.patch} |   0
 ...use-proxmox_backup_check_incremental.patch |  36 -
 ...outine-QMP-for-backup-cancel_backup.patch} |   0
 ...ckup-add-compress-and-encrypt-option.patch | 103 --
 ... => 0043-PBS-add-master-key-support.patch} |   0
 ...st-path-reads-without-allocation-if-.patch |  52 +
 ...PVE-block-stream-increase-chunk-size.patch |  23 +
 ...issing-crypt-and-compress-parameters.patch |  43 -
 ...rite-callback-with-big-blocks-correc.patch |  76 --
 ...accept-NULL-qiov-in-bdrv_pad_request.patch |  42 +
 ...-block-handling-to-PBS-dump-callback.patch |  85 --
 .../0047-block-add-alloc-track-driver.patch   | 380 ++++++++
 ...n-up-error-handling-for-create_backu.patch | 187 ----
 ...-multiple-CREATED-jobs-in-sequential.patch |  39 -
 debian/patches/series                         |  54 +-
 40 files changed, 1356 insertions(+), 2178 deletions(-)
 rename debian/patches/{pve/0031-drive-mirror-add-support-for-sync-bitmap-mode-never.patch => bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch} (96%)
 rename debian/patches/{pve/0032-drive-mirror-add-support-for-conditional-and-always-.patch => bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch} (100%)
 rename debian/patches/{pve/0033-mirror-add-check-for-bitmap-mode-without-bitmap.patch => bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch} (90%)
 rename debian/patches/{pve/0034-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch => bitmap-mirror/0004-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch} (100%)
 rename debian/patches/{pve/0035-iotests-add-test-for-bitmap-mirror.patch => bitmap-mirror/0005-iotests-add-test-for-bitmap-mirror.patch} (100%)
 rename debian/patches/{pve/0036-mirror-move-some-checks-to-qmp.patch => bitmap-mirror/0006-mirror-move-some-checks-to-qmp.patch} (99%)
 rename debian/patches/pve/{0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch => 0030-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch} (88%)
 delete mode 100644 debian/patches/pve/0030-PVE-Backup-avoid-coroutines-to-fix-AIO-freeze-cleanu.patch
 create mode 100644 debian/patches/pve/0031-PVE-various-PBS-fixes.patch
 rename debian/patches/pve/{0043-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch => 0032-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch} (100%)
 rename debian/patches/pve/{0044-PVE-add-query_proxmox_support-QMP-command.patch => 0033-PVE-add-query_proxmox_support-QMP-command.patch} (94%)
 rename debian/patches/pve/{0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch => 0034-PVE-add-query-pbs-bitmap-info-QMP-call.patch} (100%)
 rename debian/patches/pve/{0049-PVE-redirect-stderr-to-journal-when-daemonized.patch => 0035-PVE-redirect-stderr-to-journal-when-daemonized.patch} (100%)
 rename debian/patches/pve/{0050-PVE-Add-sequential-job-transaction-support.patch => 0036-PVE-Add-sequential-job-transaction-support.patch} (75%)
 rename debian/patches/pve/{0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch => 0037-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch} (100%)
 rename debian/patches/pve/{0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch => 0038-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch} (63%)
 delete mode 100644 debian/patches/pve/0038-PVE-backup-rename-incremental-to-use-dirty-bitmap.patch
 rename debian/patches/pve/{0054-PVE-Migrate-dirty-bitmap-state-via-savevm.patch => 0039-PVE-Migrate-dirty-bitmap-state-via-savevm.patch} (100%)
 delete mode 100644 debian/patches/pve/0039-PVE-fixup-pbs-restore-API.patch
 delete mode 100644 debian/patches/pve/0040-PVE-always-set-dirty-counter-for-non-incremental-bac.patch
 rename debian/patches/pve/{0055-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch => 0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch} (100%)
 rename debian/patches/pve/{0057-PVE-fall-back-to-open-iscsi-initiatorname.patch => 0041-PVE-fall-back-to-open-iscsi-initiatorname.patch} (100%)
 delete mode 100644 debian/patches/pve/0041-PVE-use-proxmox_backup_check_incremental.patch
 rename debian/patches/pve/{0058-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch => 0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch} (100%)
 delete mode 100644 debian/patches/pve/0042-PVE-fixup-pbs-backup-add-compress-and-encrypt-option.patch
 rename debian/patches/pve/{0059-PBS-add-master-key-support.patch => 0043-PBS-add-master-key-support.patch} (100%)
 create mode 100644 debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
 create mode 100644 debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
 delete mode 100644 debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
 delete mode 100644 debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
 create mode 100644 debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
 delete mode 100644 debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
 create mode 100644 debian/patches/pve/0047-block-add-alloc-track-driver.patch
 delete mode 100644 debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
 delete mode 100644 debian/patches/pve/0056-PVE-fix-aborting-multiple-CREATED-jobs-in-sequential.patch

proxmox-backup: Stefan Reiter (1):
  RemoteChunkReader: add LRU cached variant

 src/bin/proxmox_backup_client/mount.rs |  4 +-
 src/client/remote_chunk_reader.rs      | 77 ++++++++++++++++++++------
 2 files changed, 62 insertions(+), 19 deletions(-)

proxmox-backup-qemu: Stefan Reiter (1):
  access: use bigger cache and LRU chunk reader

 src/restore.rs | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

qemu-server: Stefan Reiter (5):
  make qemu_drive_mirror_monitor more generic
  cfg2cmd: allow PBS snapshots as backing files for drives
  enable live-restore for PBS
  extract register_qmeventd_handle to QemuServer.pm
  live-restore: register qmeventd handle

 PVE/API2/Qemu.pm         |  14 +-
 PVE/QemuServer.pm        | 297 ++++++++++++++++++++++++++++++++-------
 PVE/VZDump/QemuServer.pm |  32 +----
 3 files changed, 259 insertions(+), 84 deletions(-)

manager: Stefan Reiter (1):
  ui: restore: add live-restore checkbox

 www/manager6/grid/BackupView.js    |  6 ++++-
 www/manager6/storage/BackupView.js |  5 +++-
 www/manager6/window/Restore.js     | 38 +++++++++++++++++++++++++++++-
 3 files changed, 46 insertions(+), 3 deletions(-)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 pve-qemu 01/11] clean up pve/ patches by merging
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03 16:32   ` [pve-devel] applied: " Thomas Lamprecht
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 02/11] move bitmap-mirror patches to seperate folder Stefan Reiter
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

No functional change intended.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Unrelated to rest of series.

 ...ckup-proxmox-backup-patches-for-qemu.patch | 665 ++++++-------
 ...estore-new-command-to-restore-from-p.patch |  18 +-
 ...-coroutines-to-fix-AIO-freeze-cleanu.patch | 914 ------------------
 ...-support-for-sync-bitmap-mode-never.patch} |   0
 ...support-for-conditional-and-always-.patch} |   0
 ...heck-for-bitmap-mode-without-bitmap.patch} |   0
 ...to-bdrv_dirty_bitmap_merge_internal.patch} |   0
 ...-iotests-add-test-for-bitmap-mirror.patch} |   0
 ...0035-mirror-move-some-checks-to-qmp.patch} |   0
 ...rty-bitmap-tracking-for-incremental.patch} |  80 +-
 .../pve/0037-PVE-various-PBS-fixes.patch      | 218 +++++
 ...-driver-to-map-backup-archives-into.patch} |   0
 ...name-incremental-to-use-dirty-bitmap.patch | 126 ---
 ...d-query_proxmox_support-QMP-command.patch} |   4 +-
 .../pve/0039-PVE-fixup-pbs-restore-API.patch  |  44 -
 ...-add-query-pbs-bitmap-info-QMP-call.patch} |   0
 ...irty-counter-for-non-incremental-bac.patch |  30 -
 ...t-stderr-to-journal-when-daemonized.patch} |   0
 ...use-proxmox_backup_check_incremental.patch |  36 -
 ...-sequential-job-transaction-support.patch} |  20 +-
 ...ckup-add-compress-and-encrypt-option.patch | 103 --
 ...transaction-to-synchronize-job-stat.patch} |   0
 ...block-on-finishing-and-cleanup-crea.patch} | 245 +++--
 ...grate-dirty-bitmap-state-via-savevm.patch} |   0
 ...issing-crypt-and-compress-parameters.patch |  43 -
 ...rite-callback-with-big-blocks-correc.patch |  76 --
 ...irty-bitmap-migrate-other-bitmaps-e.patch} |   0
 ...-block-handling-to-PBS-dump-callback.patch |  85 --
 ...ll-back-to-open-iscsi-initiatorname.patch} |   0
 ...outine-QMP-for-backup-cancel_backup.patch} |   0
 ... => 0049-PBS-add-master-key-support.patch} |   0
 ...n-up-error-handling-for-create_backu.patch | 187 ----
 ...-multiple-CREATED-jobs-in-sequential.patch |  39 -
 debian/patches/series                         |  50 +-
 34 files changed, 830 insertions(+), 2153 deletions(-)
 delete mode 100644 debian/patches/pve/0030-PVE-Backup-avoid-coroutines-to-fix-AIO-freeze-cleanu.patch
 rename debian/patches/pve/{0031-drive-mirror-add-support-for-sync-bitmap-mode-never.patch => 0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch} (100%)
 rename debian/patches/pve/{0032-drive-mirror-add-support-for-conditional-and-always-.patch => 0031-drive-mirror-add-support-for-conditional-and-always-.patch} (100%)
 rename debian/patches/pve/{0033-mirror-add-check-for-bitmap-mode-without-bitmap.patch => 0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch} (100%)
 rename debian/patches/pve/{0034-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch => 0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch} (100%)
 rename debian/patches/pve/{0035-iotests-add-test-for-bitmap-mirror.patch => 0034-iotests-add-test-for-bitmap-mirror.patch} (100%)
 rename debian/patches/pve/{0036-mirror-move-some-checks-to-qmp.patch => 0035-mirror-move-some-checks-to-qmp.patch} (100%)
 rename debian/patches/pve/{0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch => 0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch} (88%)
 create mode 100644 debian/patches/pve/0037-PVE-various-PBS-fixes.patch
 rename debian/patches/pve/{0043-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch => 0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch} (100%)
 delete mode 100644 debian/patches/pve/0038-PVE-backup-rename-incremental-to-use-dirty-bitmap.patch
 rename debian/patches/pve/{0044-PVE-add-query_proxmox_support-QMP-command.patch => 0039-PVE-add-query_proxmox_support-QMP-command.patch} (94%)
 delete mode 100644 debian/patches/pve/0039-PVE-fixup-pbs-restore-API.patch
 rename debian/patches/pve/{0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch => 0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch} (100%)
 delete mode 100644 debian/patches/pve/0040-PVE-always-set-dirty-counter-for-non-incremental-bac.patch
 rename debian/patches/pve/{0049-PVE-redirect-stderr-to-journal-when-daemonized.patch => 0041-PVE-redirect-stderr-to-journal-when-daemonized.patch} (100%)
 delete mode 100644 debian/patches/pve/0041-PVE-use-proxmox_backup_check_incremental.patch
 rename debian/patches/pve/{0050-PVE-Add-sequential-job-transaction-support.patch => 0042-PVE-Add-sequential-job-transaction-support.patch} (75%)
 delete mode 100644 debian/patches/pve/0042-PVE-fixup-pbs-backup-add-compress-and-encrypt-option.patch
 rename debian/patches/pve/{0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch => 0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch} (100%)
 rename debian/patches/pve/{0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch => 0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch} (63%)
 rename debian/patches/pve/{0054-PVE-Migrate-dirty-bitmap-state-via-savevm.patch => 0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch} (100%)
 delete mode 100644 debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
 delete mode 100644 debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
 rename debian/patches/pve/{0055-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch => 0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch} (100%)
 delete mode 100644 debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
 rename debian/patches/pve/{0057-PVE-fall-back-to-open-iscsi-initiatorname.patch => 0047-PVE-fall-back-to-open-iscsi-initiatorname.patch} (100%)
 rename debian/patches/pve/{0058-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch => 0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch} (100%)
 rename debian/patches/pve/{0059-PBS-add-master-key-support.patch => 0049-PBS-add-master-key-support.patch} (100%)
 delete mode 100644 debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
 delete mode 100644 debian/patches/pve/0056-PVE-fix-aborting-multiple-CREATED-jobs-in-sequential.patch

diff --git a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
index cb8334e..37bb98a 100644
--- a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
+++ b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
@@ -3,6 +3,9 @@ From: Dietmar Maurer <dietmar@proxmox.com>
 Date: Mon, 6 Apr 2020 12:16:59 +0200
 Subject: [PATCH] PVE-Backup: proxmox backup patches for qemu
 
+Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
+[PVE-Backup: avoid coroutines to fix AIO freeze, cleanups]
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 ---
  block/meson.build              |   5 +
  block/monitor/block-hmp-cmds.c |  33 ++
@@ -15,11 +18,11 @@ Subject: [PATCH] PVE-Backup: proxmox backup patches for qemu
  monitor/hmp-cmds.c             |  44 ++
  proxmox-backup-client.c        | 176 ++++++
  proxmox-backup-client.h        |  59 ++
- pve-backup.c                   | 955 +++++++++++++++++++++++++++++++++
+ pve-backup.c                   | 957 +++++++++++++++++++++++++++++++++
  qapi/block-core.json           | 109 ++++
  qapi/common.json               |  13 +
  qapi/machine.json              |  15 +-
- 15 files changed, 1444 insertions(+), 14 deletions(-)
+ 15 files changed, 1446 insertions(+), 14 deletions(-)
  create mode 100644 proxmox-backup-client.c
  create mode 100644 proxmox-backup-client.h
  create mode 100644 pve-backup.c
@@ -507,10 +510,10 @@ index 0000000000..1dda8b7d8f
 +#endif /* PROXMOX_BACKUP_CLIENT_H */
 diff --git a/pve-backup.c b/pve-backup.c
 new file mode 100644
-index 0000000000..55441eb9d1
+index 0000000000..d40f3f2fd6
 --- /dev/null
 +++ b/pve-backup.c
-@@ -0,0 +1,955 @@
+@@ -0,0 +1,957 @@
 +#include "proxmox-backup-client.h"
 +#include "vma.h"
 +
@@ -524,11 +527,27 @@ index 0000000000..55441eb9d1
 +
 +/* PVE backup state and related function */
 +
++/*
++ * Note: A resume from a qemu_coroutine_yield can happen in a different thread,
++ * so you may not use normal mutexes within coroutines:
++ *
++ * ---bad-example---
++ * qemu_rec_mutex_lock(lock)
++ * ...
++ * qemu_coroutine_yield() // wait for something
++ * // we are now inside a different thread
++ * qemu_rec_mutex_unlock(lock) // Crash - wrong thread!!
++ * ---end-bad-example--
++ *
++ * ==> Always use CoMutext inside coroutines.
++ * ==> Never acquire/release AioContext withing coroutines (because that use QemuRecMutex)
++ *
++ */
 +
 +static struct PVEBackupState {
 +    struct {
-+        // Everithing accessed from qmp command, protected using rwlock
-+        CoRwlock rwlock;
++        // Everithing accessed from qmp_backup_query command is protected using lock
++        QemuMutex lock;
 +        Error *error;
 +        time_t start_time;
 +        time_t end_time;
@@ -538,19 +557,20 @@ index 0000000000..55441eb9d1
 +        size_t total;
 +        size_t transferred;
 +        size_t zero_bytes;
-+        bool cancel;
 +    } stat;
 +    int64_t speed;
 +    VmaWriter *vmaw;
 +    ProxmoxBackupHandle *pbs;
 +    GList *di_list;
-+    CoMutex backup_mutex;
++    QemuMutex backup_mutex;
++    CoMutex dump_callback_mutex;
 +} backup_state;
 +
 +static void pvebackup_init(void)
 +{
-+    qemu_co_rwlock_init(&backup_state.stat.rwlock);
-+    qemu_co_mutex_init(&backup_state.backup_mutex);
++    qemu_mutex_init(&backup_state.stat.lock);
++    qemu_mutex_init(&backup_state.backup_mutex);
++    qemu_co_mutex_init(&backup_state.dump_callback_mutex);
 +}
 +
 +// initialize PVEBackupState at startup
@@ -565,10 +585,54 @@ index 0000000000..55441eb9d1
 +    BlockDriverState *target;
 +} PVEBackupDevInfo;
 +
-+static void pvebackup_co_run_next_job(void);
++static void pvebackup_run_next_job(void);
 +
++static BlockJob *
++lookup_active_block_job(PVEBackupDevInfo *di)
++{
++    if (!di->completed && di->bs) {
++        for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
++            if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
++                continue;
++            }
++
++            BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
++            if (bjob && bjob->source_bs == di->bs) {
++                return job;
++            }
++        }
++    }
++    return NULL;
++}
++
++static void pvebackup_propagate_error(Error *err)
++{
++    qemu_mutex_lock(&backup_state.stat.lock);
++    error_propagate(&backup_state.stat.error, err);
++    qemu_mutex_unlock(&backup_state.stat.lock);
++}
++
++static bool pvebackup_error_or_canceled(void)
++{
++    qemu_mutex_lock(&backup_state.stat.lock);
++    bool error_or_canceled = !!backup_state.stat.error;
++    qemu_mutex_unlock(&backup_state.stat.lock);
++
++    return error_or_canceled;
++}
++
++static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes)
++{
++    qemu_mutex_lock(&backup_state.stat.lock);
++    backup_state.stat.zero_bytes += zero_bytes;
++    backup_state.stat.transferred += transferred;
++    qemu_mutex_unlock(&backup_state.stat.lock);
++}
++
++// This may get called from multiple coroutines in multiple io-threads
++// Note1: this may get called after job_cancel()
 +static int coroutine_fn
-+pvebackup_co_dump_cb(
++pvebackup_co_dump_pbs_cb(
 +    void *opaque,
 +    uint64_t start,
 +    uint64_t bytes,
@@ -580,137 +644,127 @@ index 0000000000..55441eb9d1
 +    const unsigned char *buf = pbuf;
 +    PVEBackupDevInfo *di = opaque;
 +
-+    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+    bool cancel = backup_state.stat.cancel;
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++    assert(backup_state.pbs);
 +
-+    if (cancel) {
-+        return size; // return success
++    Error *local_err = NULL;
++    int pbs_res = -1;
++
++    qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
++
++    // avoid deadlock if job is cancelled
++    if (pvebackup_error_or_canceled()) {
++        qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++        return -1;
 +    }
 +
-+    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+
-+    int ret = -1;
-+
-+    if (backup_state.vmaw) {
-+        size_t zero_bytes = 0;
-+        uint64_t remaining = size;
-+
-+        uint64_t cluster_num = start / VMA_CLUSTER_SIZE;
-+        if ((cluster_num * VMA_CLUSTER_SIZE) != start) {
-+            qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+            if (!backup_state.stat.error) {
-+                qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
-+                error_setg(&backup_state.stat.error,
-+                           "got unaligned write inside backup dump "
-+                           "callback (sector %ld)", start);
-+            }
-+            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+            qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+            return -1; // not aligned to cluster size
-+        }
-+
-+        while (remaining > 0) {
-+            ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num,
-+                                   buf, &zero_bytes);
-+            ++cluster_num;
-+            if (buf) {
-+                buf += VMA_CLUSTER_SIZE;
-+            }
-+            if (ret < 0) {
-+                qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+                if (!backup_state.stat.error) {
-+                    qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
-+                    vma_writer_error_propagate(backup_state.vmaw, &backup_state.stat.error);
-+                }
-+                qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+
-+                qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+                return ret;
-+            } else {
-+                qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+                backup_state.stat.zero_bytes += zero_bytes;
-+                if (remaining >= VMA_CLUSTER_SIZE) {
-+                    backup_state.stat.transferred += VMA_CLUSTER_SIZE;
-+                    remaining -= VMA_CLUSTER_SIZE;
-+                } else {
-+                    backup_state.stat.transferred += remaining;
-+                    remaining = 0;
-+                }
-+                qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+            }
-+        }
-+    } else if (backup_state.pbs) {
-+        Error *local_err = NULL;
-+        int pbs_res = -1;
-+
-+        pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
-+
-+        qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+
-+        if (pbs_res < 0) {
-+            error_propagate(&backup_state.stat.error, local_err);
-+            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+            qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+            return pbs_res;
-+        } else {
-+            if (!buf) {
-+                backup_state.stat.zero_bytes += size;
-+            }
-+            backup_state.stat.transferred += size;
-+        }
-+
-+        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++    pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
++    qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
 +
++    if (pbs_res < 0) {
++        pvebackup_propagate_error(local_err);
++        return pbs_res;
 +    } else {
-+        qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+        if (!buf) {
-+            backup_state.stat.zero_bytes += size;
-+        }
-+        backup_state.stat.transferred += size;
-+        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++        pvebackup_add_transfered_bytes(size, !buf ? size : 0);
 +    }
 +
-+    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+
 +    return size;
 +}
 +
-+static void coroutine_fn pvebackup_co_cleanup(void)
++// This may get called from multiple coroutines in multiple io-threads
++static int coroutine_fn
++pvebackup_co_dump_vma_cb(
++    void *opaque,
++    uint64_t start,
++    uint64_t bytes,
++    const void *pbuf)
 +{
 +    assert(qemu_in_coroutine());
 +
-+    qemu_co_mutex_lock(&backup_state.backup_mutex);
++    const uint64_t size = bytes;
++    const unsigned char *buf = pbuf;
++    PVEBackupDevInfo *di = opaque;
 +
-+    qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
++    int ret = -1;
++
++    assert(backup_state.vmaw);
++
++    uint64_t remaining = size;
++
++    uint64_t cluster_num = start / VMA_CLUSTER_SIZE;
++    if ((cluster_num * VMA_CLUSTER_SIZE) != start) {
++        Error *local_err = NULL;
++        error_setg(&local_err,
++                   "got unaligned write inside backup dump "
++                   "callback (sector %ld)", start);
++        pvebackup_propagate_error(local_err);
++        return -1; // not aligned to cluster size
++    }
++
++    while (remaining > 0) {
++        qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
++        // avoid deadlock if job is cancelled
++        if (pvebackup_error_or_canceled()) {
++            qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++            return -1;
++        }
++
++        size_t zero_bytes = 0;
++        ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num, buf, &zero_bytes);
++        qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++
++        ++cluster_num;
++        if (buf) {
++            buf += VMA_CLUSTER_SIZE;
++        }
++        if (ret < 0) {
++            Error *local_err = NULL;
++            vma_writer_error_propagate(backup_state.vmaw, &local_err);
++            pvebackup_propagate_error(local_err);
++            return ret;
++        } else {
++            if (remaining >= VMA_CLUSTER_SIZE) {
++                assert(ret == VMA_CLUSTER_SIZE);
++                pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes);
++                remaining -= VMA_CLUSTER_SIZE;
++            } else {
++                assert(ret == remaining);
++                pvebackup_add_transfered_bytes(remaining, zero_bytes);
++                remaining = 0;
++            }
++        }
++    }
++
++    return size;
++}
++
++// assumes the caller holds backup_mutex
++static void coroutine_fn pvebackup_co_cleanup(void *unused)
++{
++    assert(qemu_in_coroutine());
++
++    qemu_mutex_lock(&backup_state.stat.lock);
 +    backup_state.stat.end_time = time(NULL);
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++    qemu_mutex_unlock(&backup_state.stat.lock);
 +
 +    if (backup_state.vmaw) {
 +        Error *local_err = NULL;
 +        vma_writer_close(backup_state.vmaw, &local_err);
 +
 +        if (local_err != NULL) {
-+            qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+            error_propagate(&backup_state.stat.error, local_err);
-+            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+        }
++            pvebackup_propagate_error(local_err);
++         }
 +
 +        backup_state.vmaw = NULL;
 +    }
 +
 +    if (backup_state.pbs) {
-+        qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+        bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
-+        if (!error_or_canceled) {
++        if (!pvebackup_error_or_canceled()) {
 +            Error *local_err = NULL;
 +            proxmox_backup_co_finish(backup_state.pbs, &local_err);
 +            if (local_err != NULL) {
-+                qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
-+                error_propagate(&backup_state.stat.error, local_err);
-+             }
++                pvebackup_propagate_error(local_err);
++            }
 +        }
-+        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
 +
 +        proxmox_backup_disconnect(backup_state.pbs);
 +        backup_state.pbs = NULL;
@@ -718,43 +772,14 @@ index 0000000000..55441eb9d1
 +
 +    g_list_free(backup_state.di_list);
 +    backup_state.di_list = NULL;
-+    qemu_co_mutex_unlock(&backup_state.backup_mutex);
 +}
 +
-+typedef struct PVEBackupCompeteCallbackData {
-+    PVEBackupDevInfo *di;
-+    int result;
-+} PVEBackupCompeteCallbackData;
-+
-+static void coroutine_fn pvebackup_co_complete_cb(void *opaque)
++// assumes the caller holds backup_mutex
++static void coroutine_fn pvebackup_complete_stream(void *opaque)
 +{
-+    assert(qemu_in_coroutine());
++    PVEBackupDevInfo *di = opaque;
 +
-+    PVEBackupCompeteCallbackData *cb_data = opaque;
-+
-+    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+
-+    PVEBackupDevInfo *di = cb_data->di;
-+    int ret = cb_data->result;
-+
-+    di->completed = true;
-+
-+    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+    bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
-+
-+    if (ret < 0 && !backup_state.stat.error) {
-+        qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
-+        error_setg(&backup_state.stat.error, "job failed with err %d - %s",
-+                   ret, strerror(-ret));
-+    }
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+
-+    di->bs = NULL;
-+
-+    if (di->target) {
-+        bdrv_unref(di->target);
-+        di->target = NULL;
-+    }
++    bool error_or_canceled = pvebackup_error_or_canceled();
 +
 +    if (backup_state.vmaw) {
 +        vma_writer_close_stream(backup_state.vmaw, di->dev_id);
@@ -764,110 +789,101 @@ index 0000000000..55441eb9d1
 +        Error *local_err = NULL;
 +        proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
 +        if (local_err != NULL) {
-+            qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+            error_propagate(&backup_state.stat.error, local_err);
-+            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++            pvebackup_propagate_error(local_err);
 +        }
 +    }
-+
-+    // remove self from job queue
-+    backup_state.di_list = g_list_remove(backup_state.di_list, di);
-+    g_free(di);
-+
-+    int pending_jobs = g_list_length(backup_state.di_list);
-+
-+    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+
-+    if (pending_jobs > 0) {
-+        pvebackup_co_run_next_job();
-+    } else {
-+        pvebackup_co_cleanup();
-+    }
 +}
 +
 +static void pvebackup_complete_cb(void *opaque, int ret)
 +{
-+    // This can be called from the main loop, or from a coroutine
-+    PVEBackupCompeteCallbackData cb_data = {
-+        .di = opaque,
-+        .result = ret,
-+    };
++    assert(!qemu_in_coroutine());
 +
-+    if (qemu_in_coroutine()) {
-+        pvebackup_co_complete_cb(&cb_data);
-+    } else {
-+        block_on_coroutine_fn(pvebackup_co_complete_cb, &cb_data);
++    PVEBackupDevInfo *di = opaque;
++
++    qemu_mutex_lock(&backup_state.backup_mutex);
++
++    di->completed = true;
++
++    if (ret < 0) {
++        Error *local_err = NULL;
++        error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
++        pvebackup_propagate_error(local_err);
 +    }
++
++    di->bs = NULL;
++
++    assert(di->target == NULL);
++
++    block_on_coroutine_fn(pvebackup_complete_stream, di);
++
++    // remove self from job queue
++    backup_state.di_list = g_list_remove(backup_state.di_list, di);
++
++    g_free(di);
++
++    qemu_mutex_unlock(&backup_state.backup_mutex);
++
++    pvebackup_run_next_job();
 +}
 +
-+static void coroutine_fn pvebackup_co_cancel(void *opaque)
++static void pvebackup_cancel(void)
 +{
-+    assert(qemu_in_coroutine());
++    assert(!qemu_in_coroutine());
 +
-+    qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+    backup_state.stat.cancel = true;
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++    Error *cancel_err = NULL;
++    error_setg(&cancel_err, "backup canceled");
++    pvebackup_propagate_error(cancel_err);
 +
-+    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+
-+    // Avoid race between block jobs and backup-cancel command:
-+    if (!(backup_state.vmaw || backup_state.pbs)) {
-+        qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+        return;
-+    }
-+
-+    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+    if (!backup_state.stat.error) {
-+        qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
-+        error_setg(&backup_state.stat.error, "backup cancelled");
-+    }
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++    qemu_mutex_lock(&backup_state.backup_mutex);
 +
 +    if (backup_state.vmaw) {
 +        /* make sure vma writer does not block anymore */
-+        vma_writer_set_error(backup_state.vmaw, "backup cancelled");
++        vma_writer_set_error(backup_state.vmaw, "backup canceled");
 +    }
 +
 +    if (backup_state.pbs) {
-+        proxmox_backup_abort(backup_state.pbs, "backup cancelled");
++        proxmox_backup_abort(backup_state.pbs, "backup canceled");
 +    }
 +
-+    bool running_jobs = 0;
-+    GList *l = backup_state.di_list;
-+    while (l) {
-+        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-+        l = g_list_next(l);
-+        if (!di->completed && di->bs) {
-+            for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
-+                if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
-+                    continue;
-+                }
++    qemu_mutex_unlock(&backup_state.backup_mutex);
 +
-+                BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
-+                if (bjob && bjob->source_bs == di->bs) {
-+                    AioContext *aio_context = job->job.aio_context;
-+                    aio_context_acquire(aio_context);
++    for(;;) {
 +
-+                    if (!di->completed) {
-+                        running_jobs += 1;
-+                        job_cancel(&job->job, false);
-+                    }
-+                    aio_context_release(aio_context);
-+                }
++        BlockJob *next_job = NULL;
++
++        qemu_mutex_lock(&backup_state.backup_mutex);
++
++        GList *l = backup_state.di_list;
++        while (l) {
++            PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
++            l = g_list_next(l);
++
++            BlockJob *job = lookup_active_block_job(di);
++            if (job != NULL) {
++                next_job = job;
++                break;
 +            }
 +        }
++
++        qemu_mutex_unlock(&backup_state.backup_mutex);
++
++        if (next_job) {
++            AioContext *aio_context = next_job->job.aio_context;
++            aio_context_acquire(aio_context);
++            job_cancel_sync(&next_job->job);
++            aio_context_release(aio_context);
++        } else {
++            break;
++        }
 +    }
-+
-+    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+
-+    if (running_jobs == 0) pvebackup_co_cleanup(); // else job will call completion handler
 +}
 +
 +void qmp_backup_cancel(Error **errp)
 +{
-+    block_on_coroutine_fn(pvebackup_co_cancel, NULL);
++    pvebackup_cancel();
 +}
 +
++// assumes the caller holds backup_mutex
 +static int coroutine_fn pvebackup_co_add_config(
 +    const char *file,
 +    const char *name,
@@ -919,46 +935,97 @@ index 0000000000..55441eb9d1
 +
 +bool job_should_pause(Job *job);
 +
-+static void coroutine_fn pvebackup_co_run_next_job(void)
++static void pvebackup_run_next_job(void)
 +{
-+    assert(qemu_in_coroutine());
++    assert(!qemu_in_coroutine());
 +
-+    qemu_co_mutex_lock(&backup_state.backup_mutex);
++    qemu_mutex_lock(&backup_state.backup_mutex);
 +
 +    GList *l = backup_state.di_list;
 +    while (l) {
 +        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
 +        l = g_list_next(l);
-+        if (!di->completed && di->bs) {
-+            for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
-+                if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
-+                    continue;
++
++        BlockJob *job = lookup_active_block_job(di);
++
++        if (job) {
++            qemu_mutex_unlock(&backup_state.backup_mutex);
++
++            AioContext *aio_context = job->job.aio_context;
++            aio_context_acquire(aio_context);
++
++            if (job_should_pause(&job->job)) {
++                bool error_or_canceled = pvebackup_error_or_canceled();
++                if (error_or_canceled) {
++                    job_cancel_sync(&job->job);
++                } else {
++                    job_resume(&job->job);
 +                }
++            }
++            aio_context_release(aio_context);
++            return;
++        }
++    }
 +
-+                BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
-+                if (bjob && bjob->source_bs == di->bs) {
-+                    AioContext *aio_context = job->job.aio_context;
-+                    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+                    aio_context_acquire(aio_context);
++    block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
 +
-+                    if (job_should_pause(&job->job)) {
-+                        qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+                        bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
-+                        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++    qemu_mutex_unlock(&backup_state.backup_mutex);
++}
 +
-+                        if (error_or_canceled) {
-+                            job_cancel(&job->job, false);
-+                        } else {
-+                            job_resume(&job->job);
-+                        }
-+                    }
-+                    aio_context_release(aio_context);
-+                    return;
-+                }
++static bool create_backup_jobs(void) {
++
++    assert(!qemu_in_coroutine());
++
++    Error *local_err = NULL;
++
++    /* create and start all jobs (paused state) */
++    GList *l =  backup_state.di_list;
++    while (l) {
++        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
++        l = g_list_next(l);
++
++        assert(di->target != NULL);
++
++        AioContext *aio_context = bdrv_get_aio_context(di->bs);
++        aio_context_acquire(aio_context);
++
++        BlockJob *job = backup_job_create(
++            NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
++            BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
++            JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
++
++        aio_context_release(aio_context);
++
++        if (!job || local_err != NULL) {
++            Error *create_job_err = NULL;
++            error_setg(&create_job_err, "backup_job_create failed: %s",
++                       local_err ? error_get_pretty(local_err) : "null");
++
++            pvebackup_propagate_error(create_job_err);
++            break;
++        }
++        job_start(&job->job);
++
++        bdrv_unref(di->target);
++        di->target = NULL;
++    }
++
++    bool errors = pvebackup_error_or_canceled();
++
++    if (errors) {
++        l = backup_state.di_list;
++        while (l) {
++            PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
++            l = g_list_next(l);
++
++            if (di->target) {
++                bdrv_unref(di->target);
++                di->target = NULL;
 +            }
 +        }
 +    }
-+    qemu_co_mutex_unlock(&backup_state.backup_mutex);
++
++    return errors;
 +}
 +
 +typedef struct QmpBackupTask {
@@ -989,7 +1056,8 @@ index 0000000000..55441eb9d1
 +    UuidInfo *result;
 +} QmpBackupTask;
 +
-+static void coroutine_fn pvebackup_co_start(void *opaque)
++// assumes the caller holds backup_mutex
++static void coroutine_fn pvebackup_co_prepare(void *opaque)
 +{
 +    assert(qemu_in_coroutine());
 +
@@ -1008,16 +1076,12 @@ index 0000000000..55441eb9d1
 +    GList *di_list = NULL;
 +    GList *l;
 +    UuidInfo *uuid_info;
-+    BlockJob *job;
 +
 +    const char *config_name = "qemu-server.conf";
 +    const char *firewall_name = "qemu-server.fw";
 +
-+    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+
 +    if (backup_state.di_list) {
-+        qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+        error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
++         error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
 +                  "previous backup not finished");
 +        return;
 +    }
@@ -1140,7 +1204,7 @@ index 0000000000..55441eb9d1
 +            if (dev_id < 0)
 +                goto err;
 +
-+            if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_cb, di, task->errp))) {
++            if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
 +                goto err;
 +            }
 +
@@ -1161,7 +1225,7 @@ index 0000000000..55441eb9d1
 +            PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
 +            l = g_list_next(l);
 +
-+            if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_cb, di, task->errp))) {
++            if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
 +                goto err;
 +            }
 +
@@ -1226,9 +1290,7 @@ index 0000000000..55441eb9d1
 +    }
 +    /* initialize global backup_state now */
 +
-+    qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+
-+    backup_state.stat.cancel = false;
++    qemu_mutex_lock(&backup_state.stat.lock);
 +
 +    if (backup_state.stat.error) {
 +        error_free(backup_state.stat.error);
@@ -1251,7 +1313,7 @@ index 0000000000..55441eb9d1
 +    backup_state.stat.transferred = 0;
 +    backup_state.stat.zero_bytes = 0;
 +
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
++    qemu_mutex_unlock(&backup_state.stat.lock);
 +
 +    backup_state.speed = (task->has_speed && task->speed > 0) ? task->speed : 0;
 +
@@ -1260,48 +1322,6 @@ index 0000000000..55441eb9d1
 +
 +    backup_state.di_list = di_list;
 +
-+    /* start all jobs (paused state) */
-+    l = di_list;
-+    while (l) {
-+        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-+        l = g_list_next(l);
-+
-+        // make sure target runs in same aoi_context as source
-+        AioContext *aio_context = bdrv_get_aio_context(di->bs);
-+        aio_context_acquire(aio_context);
-+        GSList *ignore = NULL;
-+        bdrv_set_aio_context_ignore(di->target, aio_context, &ignore);
-+        g_slist_free(ignore);
-+        aio_context_release(aio_context);
-+
-+        job = backup_job_create(NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
-+                                BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-+                                JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
-+        if (!job || local_err != NULL) {
-+            qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+            error_setg(&backup_state.stat.error, "backup_job_create failed");
-+            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+            break;
-+        }
-+        job_start(&job->job);
-+        if (di->target) {
-+            bdrv_unref(di->target);
-+            di->target = NULL;
-+        }
-+    }
-+
-+    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+
-+    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+    bool no_errors = !backup_state.stat.error;
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+
-+    if (no_errors) {
-+        pvebackup_co_run_next_job(); // run one job
-+    } else {
-+        pvebackup_co_cancel(NULL);
-+    }
-+
 +    uuid_info = g_malloc0(sizeof(*uuid_info));
 +    uuid_info->UUID = uuid_str;
 +
@@ -1344,8 +1364,6 @@ index 0000000000..55441eb9d1
 +        rmdir(backup_dir);
 +    }
 +
-+    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+
 +    task->result = NULL;
 +    return;
 +}
@@ -1389,32 +1407,31 @@ index 0000000000..55441eb9d1
 +        .errp = errp,
 +    };
 +
-+    block_on_coroutine_fn(pvebackup_co_start, &task);
++    qemu_mutex_lock(&backup_state.backup_mutex);
++
++    block_on_coroutine_fn(pvebackup_co_prepare, &task);
++
++    if (*errp == NULL) {
++        create_backup_jobs();
++        qemu_mutex_unlock(&backup_state.backup_mutex);
++        pvebackup_run_next_job();
++    } else {
++        qemu_mutex_unlock(&backup_state.backup_mutex);
++    }
 +
 +    return task.result;
 +}
 +
-+
-+typedef struct QmpQueryBackupTask {
-+    Error **errp;
-+    BackupStatus *result;
-+} QmpQueryBackupTask;
-+
-+static void coroutine_fn pvebackup_co_query(void *opaque)
++BackupStatus *qmp_query_backup(Error **errp)
 +{
-+    assert(qemu_in_coroutine());
-+
-+    QmpQueryBackupTask *task = opaque;
-+
 +    BackupStatus *info = g_malloc0(sizeof(*info));
 +
-+    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
++    qemu_mutex_lock(&backup_state.stat.lock);
 +
 +    if (!backup_state.stat.start_time) {
 +        /* not started, return {} */
-+        task->result = info;
-+        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+        return;
++        qemu_mutex_unlock(&backup_state.stat.lock);
++        return info;
 +    }
 +
 +    info->has_status = true;
@@ -1450,21 +1467,9 @@ index 0000000000..55441eb9d1
 +    info->has_transferred = true;
 +    info->transferred = backup_state.stat.transferred;
 +
-+    task->result = info;
++    qemu_mutex_unlock(&backup_state.stat.lock);
 +
-+    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+}
-+
-+BackupStatus *qmp_query_backup(Error **errp)
-+{
-+    QmpQueryBackupTask task = {
-+        .errp = errp,
-+        .result = NULL,
-+    };
-+
-+    block_on_coroutine_fn(pvebackup_co_query, &task);
-+
-+    return task.result;
++    return info;
 +}
 diff --git a/qapi/block-core.json b/qapi/block-core.json
 index 7957b9867d..be67dc3376 100644
diff --git a/debian/patches/pve/0029-PVE-Backup-pbs-restore-new-command-to-restore-from-p.patch b/debian/patches/pve/0029-PVE-Backup-pbs-restore-new-command-to-restore-from-p.patch
index 2b04e1f..fcdbcea 100644
--- a/debian/patches/pve/0029-PVE-Backup-pbs-restore-new-command-to-restore-from-p.patch
+++ b/debian/patches/pve/0029-PVE-Backup-pbs-restore-new-command-to-restore-from-p.patch
@@ -7,8 +7,8 @@ Subject: [PATCH] PVE-Backup: pbs-restore - new command to restore from proxmox
 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
 ---
  meson.build   |   4 +
- pbs-restore.c | 218 ++++++++++++++++++++++++++++++++++++++++++++++++++
- 2 files changed, 222 insertions(+)
+ pbs-restore.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++++++
+ 2 files changed, 228 insertions(+)
  create mode 100644 pbs-restore.c
 
 diff --git a/meson.build b/meson.build
@@ -28,10 +28,10 @@ index 3094f98c47..6f1fafee14 100644
    subdir('contrib/elf2dmp')
 diff --git a/pbs-restore.c b/pbs-restore.c
 new file mode 100644
-index 0000000000..d4daee7e91
+index 0000000000..4d3f925a1b
 --- /dev/null
 +++ b/pbs-restore.c
-@@ -0,0 +1,218 @@
+@@ -0,0 +1,224 @@
 +/*
 + * Qemu image restore helper for Proxmox Backup
 + *
@@ -195,13 +195,19 @@ index 0000000000..d4daee7e91
 +        fprintf(stderr, "connecting to repository '%s'\n", repository);
 +    }
 +    char *pbs_error = NULL;
-+    ProxmoxRestoreHandle *conn = proxmox_restore_connect(
++    ProxmoxRestoreHandle *conn = proxmox_restore_new(
 +        repository, snapshot, password, keyfile, key_password, fingerprint, &pbs_error);
 +    if (conn == NULL) {
 +        fprintf(stderr, "restore failed: %s\n", pbs_error);
 +        return -1;
 +    }
 +
++    int res = proxmox_restore_connect(conn, &pbs_error);
++    if (res < 0 || pbs_error) {
++        fprintf(stderr, "restore failed (connection error): %s\n", pbs_error);
++        return -1;
++    }
++
 +    QDict *options = qdict_new();
 +
 +    if (format) {
@@ -232,7 +238,7 @@ index 0000000000..d4daee7e91
 +        fprintf(stderr, "starting to restore snapshot '%s'\n", snapshot);
 +        fflush(stderr); // ensure we do not get printed after the progress log
 +    }
-+    int res = proxmox_restore_image(
++    res = proxmox_restore_image(
 +        conn,
 +        archive_name,
 +        write_callback,
diff --git a/debian/patches/pve/0030-PVE-Backup-avoid-coroutines-to-fix-AIO-freeze-cleanu.patch b/debian/patches/pve/0030-PVE-Backup-avoid-coroutines-to-fix-AIO-freeze-cleanu.patch
deleted file mode 100644
index 0d874ce..0000000
--- a/debian/patches/pve/0030-PVE-Backup-avoid-coroutines-to-fix-AIO-freeze-cleanu.patch
+++ /dev/null
@@ -1,914 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Dietmar Maurer <dietmar@proxmox.com>
-Date: Mon, 6 Apr 2020 12:17:00 +0200
-Subject: [PATCH] PVE-Backup: avoid coroutines to fix AIO freeze, cleanups
-
-We observed various AIO pool loop freezes, so we decided to avoid
-coroutines and restrict ourselfes using similar code as upstream
-(see blockdev.c: do_backup_common).
-
-* avoid coroutine for job related code (causes hangs with iothreads)
-    - We then acquire/release all mutexes outside coroutines now, so we can now
-      correctly use a normal mutex.
-
-* split pvebackup_co_dump_cb into:
-    - pvebackup_co_dump_pbs_cb and
-    - pvebackup_co_dump_pbs_cb
-
-* new helper functions
-    - pvebackup_propagate_error
-    - pvebackup_error_or_canceled
-    - pvebackup_add_transfered_bytes
-
-* avoid cancel flag (not needed)
-
-* simplify backup_cancel logic
-
-There is progress on upstream to support running qmp commands inside
-coroutines, see:
-https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg04852.html
-
-We should consider using that when it is available in upstream qemu.
-
-Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
----
- pve-backup.c | 638 ++++++++++++++++++++++++++-------------------------
- 1 file changed, 320 insertions(+), 318 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index 55441eb9d1..d40f3f2fd6 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -11,11 +11,27 @@
- 
- /* PVE backup state and related function */
- 
-+/*
-+ * Note: A resume from a qemu_coroutine_yield can happen in a different thread,
-+ * so you may not use normal mutexes within coroutines:
-+ *
-+ * ---bad-example---
-+ * qemu_rec_mutex_lock(lock)
-+ * ...
-+ * qemu_coroutine_yield() // wait for something
-+ * // we are now inside a different thread
-+ * qemu_rec_mutex_unlock(lock) // Crash - wrong thread!!
-+ * ---end-bad-example--
-+ *
-+ * ==> Always use CoMutext inside coroutines.
-+ * ==> Never acquire/release AioContext withing coroutines (because that use QemuRecMutex)
-+ *
-+ */
- 
- static struct PVEBackupState {
-     struct {
--        // Everithing accessed from qmp command, protected using rwlock
--        CoRwlock rwlock;
-+        // Everithing accessed from qmp_backup_query command is protected using lock
-+        QemuMutex lock;
-         Error *error;
-         time_t start_time;
-         time_t end_time;
-@@ -25,19 +41,20 @@ static struct PVEBackupState {
-         size_t total;
-         size_t transferred;
-         size_t zero_bytes;
--        bool cancel;
-     } stat;
-     int64_t speed;
-     VmaWriter *vmaw;
-     ProxmoxBackupHandle *pbs;
-     GList *di_list;
--    CoMutex backup_mutex;
-+    QemuMutex backup_mutex;
-+    CoMutex dump_callback_mutex;
- } backup_state;
- 
- static void pvebackup_init(void)
- {
--    qemu_co_rwlock_init(&backup_state.stat.rwlock);
--    qemu_co_mutex_init(&backup_state.backup_mutex);
-+    qemu_mutex_init(&backup_state.stat.lock);
-+    qemu_mutex_init(&backup_state.backup_mutex);
-+    qemu_co_mutex_init(&backup_state.dump_callback_mutex);
- }
- 
- // initialize PVEBackupState at startup
-@@ -52,10 +69,54 @@ typedef struct PVEBackupDevInfo {
-     BlockDriverState *target;
- } PVEBackupDevInfo;
- 
--static void pvebackup_co_run_next_job(void);
-+static void pvebackup_run_next_job(void);
- 
-+static BlockJob *
-+lookup_active_block_job(PVEBackupDevInfo *di)
-+{
-+    if (!di->completed && di->bs) {
-+        for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
-+            if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
-+                continue;
-+            }
-+
-+            BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
-+            if (bjob && bjob->source_bs == di->bs) {
-+                return job;
-+            }
-+        }
-+    }
-+    return NULL;
-+}
-+
-+static void pvebackup_propagate_error(Error *err)
-+{
-+    qemu_mutex_lock(&backup_state.stat.lock);
-+    error_propagate(&backup_state.stat.error, err);
-+    qemu_mutex_unlock(&backup_state.stat.lock);
-+}
-+
-+static bool pvebackup_error_or_canceled(void)
-+{
-+    qemu_mutex_lock(&backup_state.stat.lock);
-+    bool error_or_canceled = !!backup_state.stat.error;
-+    qemu_mutex_unlock(&backup_state.stat.lock);
-+
-+    return error_or_canceled;
-+}
-+
-+static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes)
-+{
-+    qemu_mutex_lock(&backup_state.stat.lock);
-+    backup_state.stat.zero_bytes += zero_bytes;
-+    backup_state.stat.transferred += transferred;
-+    qemu_mutex_unlock(&backup_state.stat.lock);
-+}
-+
-+// This may get called from multiple coroutines in multiple io-threads
-+// Note1: this may get called after job_cancel()
- static int coroutine_fn
--pvebackup_co_dump_cb(
-+pvebackup_co_dump_pbs_cb(
-     void *opaque,
-     uint64_t start,
-     uint64_t bytes,
-@@ -67,137 +128,127 @@ pvebackup_co_dump_cb(
-     const unsigned char *buf = pbuf;
-     PVEBackupDevInfo *di = opaque;
- 
--    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--    bool cancel = backup_state.stat.cancel;
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+    assert(backup_state.pbs);
-+
-+    Error *local_err = NULL;
-+    int pbs_res = -1;
-+
-+    qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
- 
--    if (cancel) {
--        return size; // return success
-+    // avoid deadlock if job is cancelled
-+    if (pvebackup_error_or_canceled()) {
-+        qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
-+        return -1;
-     }
- 
--    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+    pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
-+    qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
- 
--    int ret = -1;
-+    if (pbs_res < 0) {
-+        pvebackup_propagate_error(local_err);
-+        return pbs_res;
-+    } else {
-+        pvebackup_add_transfered_bytes(size, !buf ? size : 0);
-+    }
- 
--    if (backup_state.vmaw) {
--        size_t zero_bytes = 0;
--        uint64_t remaining = size;
--
--        uint64_t cluster_num = start / VMA_CLUSTER_SIZE;
--        if ((cluster_num * VMA_CLUSTER_SIZE) != start) {
--            qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--            if (!backup_state.stat.error) {
--                qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
--                error_setg(&backup_state.stat.error,
--                           "got unaligned write inside backup dump "
--                           "callback (sector %ld)", start);
--            }
--            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--            qemu_co_mutex_unlock(&backup_state.backup_mutex);
--            return -1; // not aligned to cluster size
--        }
-+    return size;
-+}
- 
--        while (remaining > 0) {
--            ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num,
--                                   buf, &zero_bytes);
--            ++cluster_num;
--            if (buf) {
--                buf += VMA_CLUSTER_SIZE;
--            }
--            if (ret < 0) {
--                qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--                if (!backup_state.stat.error) {
--                    qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
--                    vma_writer_error_propagate(backup_state.vmaw, &backup_state.stat.error);
--                }
--                qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+// This may get called from multiple coroutines in multiple io-threads
-+static int coroutine_fn
-+pvebackup_co_dump_vma_cb(
-+    void *opaque,
-+    uint64_t start,
-+    uint64_t bytes,
-+    const void *pbuf)
-+{
-+    assert(qemu_in_coroutine());
- 
--                qemu_co_mutex_unlock(&backup_state.backup_mutex);
--                return ret;
--            } else {
--                qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
--                backup_state.stat.zero_bytes += zero_bytes;
--                if (remaining >= VMA_CLUSTER_SIZE) {
--                    backup_state.stat.transferred += VMA_CLUSTER_SIZE;
--                    remaining -= VMA_CLUSTER_SIZE;
--                } else {
--                    backup_state.stat.transferred += remaining;
--                    remaining = 0;
--                }
--                qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--            }
--        }
--    } else if (backup_state.pbs) {
--        Error *local_err = NULL;
--        int pbs_res = -1;
-+    const uint64_t size = bytes;
-+    const unsigned char *buf = pbuf;
-+    PVEBackupDevInfo *di = opaque;
- 
--        pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
-+    int ret = -1;
- 
--        qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+    assert(backup_state.vmaw);
- 
--        if (pbs_res < 0) {
--            error_propagate(&backup_state.stat.error, local_err);
--            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--            qemu_co_mutex_unlock(&backup_state.backup_mutex);
--            return pbs_res;
--        } else {
--            if (!buf) {
--                backup_state.stat.zero_bytes += size;
--            }
--            backup_state.stat.transferred += size;
-+    uint64_t remaining = size;
-+
-+    uint64_t cluster_num = start / VMA_CLUSTER_SIZE;
-+    if ((cluster_num * VMA_CLUSTER_SIZE) != start) {
-+        Error *local_err = NULL;
-+        error_setg(&local_err,
-+                   "got unaligned write inside backup dump "
-+                   "callback (sector %ld)", start);
-+        pvebackup_propagate_error(local_err);
-+        return -1; // not aligned to cluster size
-+    }
-+
-+    while (remaining > 0) {
-+        qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
-+        // avoid deadlock if job is cancelled
-+        if (pvebackup_error_or_canceled()) {
-+            qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
-+            return -1;
-         }
- 
--        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+        size_t zero_bytes = 0;
-+        ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num, buf, &zero_bytes);
-+        qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
- 
--    } else {
--        qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
--        if (!buf) {
--            backup_state.stat.zero_bytes += size;
-+        ++cluster_num;
-+        if (buf) {
-+            buf += VMA_CLUSTER_SIZE;
-+        }
-+        if (ret < 0) {
-+            Error *local_err = NULL;
-+            vma_writer_error_propagate(backup_state.vmaw, &local_err);
-+            pvebackup_propagate_error(local_err);
-+            return ret;
-+        } else {
-+            if (remaining >= VMA_CLUSTER_SIZE) {
-+                assert(ret == VMA_CLUSTER_SIZE);
-+                pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes);
-+                remaining -= VMA_CLUSTER_SIZE;
-+            } else {
-+                assert(ret == remaining);
-+                pvebackup_add_transfered_bytes(remaining, zero_bytes);
-+                remaining = 0;
-+            }
-         }
--        backup_state.stat.transferred += size;
--        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-     }
- 
--    qemu_co_mutex_unlock(&backup_state.backup_mutex);
--
-     return size;
- }
- 
--static void coroutine_fn pvebackup_co_cleanup(void)
-+// assumes the caller holds backup_mutex
-+static void coroutine_fn pvebackup_co_cleanup(void *unused)
- {
-     assert(qemu_in_coroutine());
- 
--    qemu_co_mutex_lock(&backup_state.backup_mutex);
--
--    qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-+    qemu_mutex_lock(&backup_state.stat.lock);
-     backup_state.stat.end_time = time(NULL);
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+    qemu_mutex_unlock(&backup_state.stat.lock);
- 
-     if (backup_state.vmaw) {
-         Error *local_err = NULL;
-         vma_writer_close(backup_state.vmaw, &local_err);
- 
-         if (local_err != NULL) {
--            qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
--            error_propagate(&backup_state.stat.error, local_err);
--            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--        }
-+            pvebackup_propagate_error(local_err);
-+         }
- 
-         backup_state.vmaw = NULL;
-     }
- 
-     if (backup_state.pbs) {
--        qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--        bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
--        if (!error_or_canceled) {
-+        if (!pvebackup_error_or_canceled()) {
-             Error *local_err = NULL;
-             proxmox_backup_co_finish(backup_state.pbs, &local_err);
-             if (local_err != NULL) {
--                qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
--                error_propagate(&backup_state.stat.error, local_err);
--             }
-+                pvebackup_propagate_error(local_err);
-+            }
-         }
--        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
- 
-         proxmox_backup_disconnect(backup_state.pbs);
-         backup_state.pbs = NULL;
-@@ -205,43 +256,14 @@ static void coroutine_fn pvebackup_co_cleanup(void)
- 
-     g_list_free(backup_state.di_list);
-     backup_state.di_list = NULL;
--    qemu_co_mutex_unlock(&backup_state.backup_mutex);
- }
- 
--typedef struct PVEBackupCompeteCallbackData {
--    PVEBackupDevInfo *di;
--    int result;
--} PVEBackupCompeteCallbackData;
--
--static void coroutine_fn pvebackup_co_complete_cb(void *opaque)
-+// assumes the caller holds backup_mutex
-+static void coroutine_fn pvebackup_complete_stream(void *opaque)
- {
--    assert(qemu_in_coroutine());
--
--    PVEBackupCompeteCallbackData *cb_data = opaque;
--
--    qemu_co_mutex_lock(&backup_state.backup_mutex);
--
--    PVEBackupDevInfo *di = cb_data->di;
--    int ret = cb_data->result;
--
--    di->completed = true;
--
--    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--    bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
--
--    if (ret < 0 && !backup_state.stat.error) {
--        qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
--        error_setg(&backup_state.stat.error, "job failed with err %d - %s",
--                   ret, strerror(-ret));
--    }
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--
--    di->bs = NULL;
-+    PVEBackupDevInfo *di = opaque;
- 
--    if (di->target) {
--        bdrv_unref(di->target);
--        di->target = NULL;
--    }
-+    bool error_or_canceled = pvebackup_error_or_canceled();
- 
-     if (backup_state.vmaw) {
-         vma_writer_close_stream(backup_state.vmaw, di->dev_id);
-@@ -251,110 +273,101 @@ static void coroutine_fn pvebackup_co_complete_cb(void *opaque)
-         Error *local_err = NULL;
-         proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
-         if (local_err != NULL) {
--            qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
--            error_propagate(&backup_state.stat.error, local_err);
--            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+            pvebackup_propagate_error(local_err);
-         }
-     }
-+}
- 
--    // remove self from job queue
--    backup_state.di_list = g_list_remove(backup_state.di_list, di);
--    g_free(di);
-+static void pvebackup_complete_cb(void *opaque, int ret)
-+{
-+    assert(!qemu_in_coroutine());
- 
--    int pending_jobs = g_list_length(backup_state.di_list);
-+    PVEBackupDevInfo *di = opaque;
- 
--    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+    qemu_mutex_lock(&backup_state.backup_mutex);
- 
--    if (pending_jobs > 0) {
--        pvebackup_co_run_next_job();
--    } else {
--        pvebackup_co_cleanup();
-+    di->completed = true;
-+
-+    if (ret < 0) {
-+        Error *local_err = NULL;
-+        error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
-+        pvebackup_propagate_error(local_err);
-     }
--}
- 
--static void pvebackup_complete_cb(void *opaque, int ret)
--{
--    // This can be called from the main loop, or from a coroutine
--    PVEBackupCompeteCallbackData cb_data = {
--        .di = opaque,
--        .result = ret,
--    };
-+    di->bs = NULL;
- 
--    if (qemu_in_coroutine()) {
--        pvebackup_co_complete_cb(&cb_data);
--    } else {
--        block_on_coroutine_fn(pvebackup_co_complete_cb, &cb_data);
--    }
--}
-+    assert(di->target == NULL);
- 
--static void coroutine_fn pvebackup_co_cancel(void *opaque)
--{
--    assert(qemu_in_coroutine());
-+    block_on_coroutine_fn(pvebackup_complete_stream, di);
- 
--    qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
--    backup_state.stat.cancel = true;
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+    // remove self from job queue
-+    backup_state.di_list = g_list_remove(backup_state.di_list, di);
- 
--    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+    g_free(di);
- 
--    // Avoid race between block jobs and backup-cancel command:
--    if (!(backup_state.vmaw || backup_state.pbs)) {
--        qemu_co_mutex_unlock(&backup_state.backup_mutex);
--        return;
--    }
-+    qemu_mutex_unlock(&backup_state.backup_mutex);
- 
--    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--    if (!backup_state.stat.error) {
--        qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
--        error_setg(&backup_state.stat.error, "backup cancelled");
--    }
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+    pvebackup_run_next_job();
-+}
-+
-+static void pvebackup_cancel(void)
-+{
-+    assert(!qemu_in_coroutine());
-+
-+    Error *cancel_err = NULL;
-+    error_setg(&cancel_err, "backup canceled");
-+    pvebackup_propagate_error(cancel_err);
-+
-+    qemu_mutex_lock(&backup_state.backup_mutex);
- 
-     if (backup_state.vmaw) {
-         /* make sure vma writer does not block anymore */
--        vma_writer_set_error(backup_state.vmaw, "backup cancelled");
-+        vma_writer_set_error(backup_state.vmaw, "backup canceled");
-     }
- 
-     if (backup_state.pbs) {
--        proxmox_backup_abort(backup_state.pbs, "backup cancelled");
-+        proxmox_backup_abort(backup_state.pbs, "backup canceled");
-     }
- 
--    bool running_jobs = 0;
--    GList *l = backup_state.di_list;
--    while (l) {
--        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
--        l = g_list_next(l);
--        if (!di->completed && di->bs) {
--            for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
--                if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
--                    continue;
--                }
-+    qemu_mutex_unlock(&backup_state.backup_mutex);
- 
--                BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
--                if (bjob && bjob->source_bs == di->bs) {
--                    AioContext *aio_context = job->job.aio_context;
--                    aio_context_acquire(aio_context);
-+    for(;;) {
- 
--                    if (!di->completed) {
--                        running_jobs += 1;
--                        job_cancel(&job->job, false);
--                    }
--                    aio_context_release(aio_context);
--                }
-+        BlockJob *next_job = NULL;
-+
-+        qemu_mutex_lock(&backup_state.backup_mutex);
-+
-+        GList *l = backup_state.di_list;
-+        while (l) {
-+            PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-+            l = g_list_next(l);
-+
-+            BlockJob *job = lookup_active_block_job(di);
-+            if (job != NULL) {
-+                next_job = job;
-+                break;
-             }
-         }
--    }
- 
--    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+        qemu_mutex_unlock(&backup_state.backup_mutex);
- 
--    if (running_jobs == 0) pvebackup_co_cleanup(); // else job will call completion handler
-+        if (next_job) {
-+            AioContext *aio_context = next_job->job.aio_context;
-+            aio_context_acquire(aio_context);
-+            job_cancel_sync(&next_job->job);
-+            aio_context_release(aio_context);
-+        } else {
-+            break;
-+        }
-+    }
- }
- 
- void qmp_backup_cancel(Error **errp)
- {
--    block_on_coroutine_fn(pvebackup_co_cancel, NULL);
-+    pvebackup_cancel();
- }
- 
-+// assumes the caller holds backup_mutex
- static int coroutine_fn pvebackup_co_add_config(
-     const char *file,
-     const char *name,
-@@ -406,46 +419,97 @@ static int coroutine_fn pvebackup_co_add_config(
- 
- bool job_should_pause(Job *job);
- 
--static void coroutine_fn pvebackup_co_run_next_job(void)
-+static void pvebackup_run_next_job(void)
- {
--    assert(qemu_in_coroutine());
-+    assert(!qemu_in_coroutine());
- 
--    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+    qemu_mutex_lock(&backup_state.backup_mutex);
- 
-     GList *l = backup_state.di_list;
-     while (l) {
-         PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-         l = g_list_next(l);
--        if (!di->completed && di->bs) {
--            for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
--                if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
--                    continue;
--                }
- 
--                BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
--                if (bjob && bjob->source_bs == di->bs) {
--                    AioContext *aio_context = job->job.aio_context;
--                    qemu_co_mutex_unlock(&backup_state.backup_mutex);
--                    aio_context_acquire(aio_context);
--
--                    if (job_should_pause(&job->job)) {
--                        qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--                        bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
--                        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--
--                        if (error_or_canceled) {
--                            job_cancel(&job->job, false);
--                        } else {
--                            job_resume(&job->job);
--                        }
--                    }
--                    aio_context_release(aio_context);
--                    return;
-+        BlockJob *job = lookup_active_block_job(di);
-+
-+        if (job) {
-+            qemu_mutex_unlock(&backup_state.backup_mutex);
-+
-+            AioContext *aio_context = job->job.aio_context;
-+            aio_context_acquire(aio_context);
-+
-+            if (job_should_pause(&job->job)) {
-+                bool error_or_canceled = pvebackup_error_or_canceled();
-+                if (error_or_canceled) {
-+                    job_cancel_sync(&job->job);
-+                } else {
-+                    job_resume(&job->job);
-                 }
-             }
-+            aio_context_release(aio_context);
-+            return;
-+        }
-+    }
-+
-+    block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
-+
-+    qemu_mutex_unlock(&backup_state.backup_mutex);
-+}
-+
-+static bool create_backup_jobs(void) {
-+
-+    assert(!qemu_in_coroutine());
-+
-+    Error *local_err = NULL;
-+
-+    /* create and start all jobs (paused state) */
-+    GList *l =  backup_state.di_list;
-+    while (l) {
-+        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-+        l = g_list_next(l);
-+
-+        assert(di->target != NULL);
-+
-+        AioContext *aio_context = bdrv_get_aio_context(di->bs);
-+        aio_context_acquire(aio_context);
-+
-+        BlockJob *job = backup_job_create(
-+            NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
-+            BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-+            JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
-+
-+        aio_context_release(aio_context);
-+
-+        if (!job || local_err != NULL) {
-+            Error *create_job_err = NULL;
-+            error_setg(&create_job_err, "backup_job_create failed: %s",
-+                       local_err ? error_get_pretty(local_err) : "null");
-+
-+            pvebackup_propagate_error(create_job_err);
-+            break;
-+        }
-+        job_start(&job->job);
-+
-+        bdrv_unref(di->target);
-+        di->target = NULL;
-+    }
-+
-+    bool errors = pvebackup_error_or_canceled();
-+
-+    if (errors) {
-+        l = backup_state.di_list;
-+        while (l) {
-+            PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-+            l = g_list_next(l);
-+
-+            if (di->target) {
-+                bdrv_unref(di->target);
-+                di->target = NULL;
-+            }
-         }
-     }
--    qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+
-+    return errors;
- }
- 
- typedef struct QmpBackupTask {
-@@ -476,7 +540,8 @@ typedef struct QmpBackupTask {
-     UuidInfo *result;
- } QmpBackupTask;
- 
--static void coroutine_fn pvebackup_co_start(void *opaque)
-+// assumes the caller holds backup_mutex
-+static void coroutine_fn pvebackup_co_prepare(void *opaque)
- {
-     assert(qemu_in_coroutine());
- 
-@@ -495,16 +560,12 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
-     GList *di_list = NULL;
-     GList *l;
-     UuidInfo *uuid_info;
--    BlockJob *job;
- 
-     const char *config_name = "qemu-server.conf";
-     const char *firewall_name = "qemu-server.fw";
- 
--    qemu_co_mutex_lock(&backup_state.backup_mutex);
--
-     if (backup_state.di_list) {
--        qemu_co_mutex_unlock(&backup_state.backup_mutex);
--        error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
-+         error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
-                   "previous backup not finished");
-         return;
-     }
-@@ -627,7 +688,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
-             if (dev_id < 0)
-                 goto err;
- 
--            if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_cb, di, task->errp))) {
-+            if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
-                 goto err;
-             }
- 
-@@ -648,7 +709,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
-             PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-             l = g_list_next(l);
- 
--            if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_cb, di, task->errp))) {
-+            if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
-                 goto err;
-             }
- 
-@@ -713,9 +774,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
-     }
-     /* initialize global backup_state now */
- 
--    qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
--
--    backup_state.stat.cancel = false;
-+    qemu_mutex_lock(&backup_state.stat.lock);
- 
-     if (backup_state.stat.error) {
-         error_free(backup_state.stat.error);
-@@ -738,7 +797,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
-     backup_state.stat.transferred = 0;
-     backup_state.stat.zero_bytes = 0;
- 
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-+    qemu_mutex_unlock(&backup_state.stat.lock);
- 
-     backup_state.speed = (task->has_speed && task->speed > 0) ? task->speed : 0;
- 
-@@ -747,48 +806,6 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
- 
-     backup_state.di_list = di_list;
- 
--    /* start all jobs (paused state) */
--    l = di_list;
--    while (l) {
--        PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
--        l = g_list_next(l);
--
--        // make sure target runs in same aoi_context as source
--        AioContext *aio_context = bdrv_get_aio_context(di->bs);
--        aio_context_acquire(aio_context);
--        GSList *ignore = NULL;
--        bdrv_set_aio_context_ignore(di->target, aio_context, &ignore);
--        g_slist_free(ignore);
--        aio_context_release(aio_context);
--
--        job = backup_job_create(NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
--                                BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
--                                JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
--        if (!job || local_err != NULL) {
--            qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
--            error_setg(&backup_state.stat.error, "backup_job_create failed");
--            qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--            break;
--        }
--        job_start(&job->job);
--        if (di->target) {
--            bdrv_unref(di->target);
--            di->target = NULL;
--        }
--    }
--
--    qemu_co_mutex_unlock(&backup_state.backup_mutex);
--
--    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
--    bool no_errors = !backup_state.stat.error;
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--
--    if (no_errors) {
--        pvebackup_co_run_next_job(); // run one job
--    } else {
--        pvebackup_co_cancel(NULL);
--    }
--
-     uuid_info = g_malloc0(sizeof(*uuid_info));
-     uuid_info->UUID = uuid_str;
- 
-@@ -831,8 +848,6 @@ err:
-         rmdir(backup_dir);
-     }
- 
--    qemu_co_mutex_unlock(&backup_state.backup_mutex);
--
-     task->result = NULL;
-     return;
- }
-@@ -876,32 +891,31 @@ UuidInfo *qmp_backup(
-         .errp = errp,
-     };
- 
--    block_on_coroutine_fn(pvebackup_co_start, &task);
-+    qemu_mutex_lock(&backup_state.backup_mutex);
- 
--    return task.result;
--}
-+    block_on_coroutine_fn(pvebackup_co_prepare, &task);
- 
-+    if (*errp == NULL) {
-+        create_backup_jobs();
-+        qemu_mutex_unlock(&backup_state.backup_mutex);
-+        pvebackup_run_next_job();
-+    } else {
-+        qemu_mutex_unlock(&backup_state.backup_mutex);
-+    }
- 
--typedef struct QmpQueryBackupTask {
--    Error **errp;
--    BackupStatus *result;
--} QmpQueryBackupTask;
-+    return task.result;
-+}
- 
--static void coroutine_fn pvebackup_co_query(void *opaque)
-+BackupStatus *qmp_query_backup(Error **errp)
- {
--    assert(qemu_in_coroutine());
--
--    QmpQueryBackupTask *task = opaque;
--
-     BackupStatus *info = g_malloc0(sizeof(*info));
- 
--    qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
-+    qemu_mutex_lock(&backup_state.stat.lock);
- 
-     if (!backup_state.stat.start_time) {
-         /* not started, return {} */
--        task->result = info;
--        qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--        return;
-+        qemu_mutex_unlock(&backup_state.stat.lock);
-+        return info;
-     }
- 
-     info->has_status = true;
-@@ -937,19 +951,7 @@ static void coroutine_fn pvebackup_co_query(void *opaque)
-     info->has_transferred = true;
-     info->transferred = backup_state.stat.transferred;
- 
--    task->result = info;
-+    qemu_mutex_unlock(&backup_state.stat.lock);
- 
--    qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
--}
--
--BackupStatus *qmp_query_backup(Error **errp)
--{
--    QmpQueryBackupTask task = {
--        .errp = errp,
--        .result = NULL,
--    };
--
--    block_on_coroutine_fn(pvebackup_co_query, &task);
--
--    return task.result;
-+    return info;
- }
diff --git a/debian/patches/pve/0031-drive-mirror-add-support-for-sync-bitmap-mode-never.patch b/debian/patches/pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
similarity index 100%
rename from debian/patches/pve/0031-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
rename to debian/patches/pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
diff --git a/debian/patches/pve/0032-drive-mirror-add-support-for-conditional-and-always-.patch b/debian/patches/pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch
similarity index 100%
rename from debian/patches/pve/0032-drive-mirror-add-support-for-conditional-and-always-.patch
rename to debian/patches/pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch
diff --git a/debian/patches/pve/0033-mirror-add-check-for-bitmap-mode-without-bitmap.patch b/debian/patches/pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch
similarity index 100%
rename from debian/patches/pve/0033-mirror-add-check-for-bitmap-mode-without-bitmap.patch
rename to debian/patches/pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch
diff --git a/debian/patches/pve/0034-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch b/debian/patches/pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
similarity index 100%
rename from debian/patches/pve/0034-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
rename to debian/patches/pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
diff --git a/debian/patches/pve/0035-iotests-add-test-for-bitmap-mirror.patch b/debian/patches/pve/0034-iotests-add-test-for-bitmap-mirror.patch
similarity index 100%
rename from debian/patches/pve/0035-iotests-add-test-for-bitmap-mirror.patch
rename to debian/patches/pve/0034-iotests-add-test-for-bitmap-mirror.patch
diff --git a/debian/patches/pve/0036-mirror-move-some-checks-to-qmp.patch b/debian/patches/pve/0035-mirror-move-some-checks-to-qmp.patch
similarity index 100%
rename from debian/patches/pve/0036-mirror-move-some-checks-to-qmp.patch
rename to debian/patches/pve/0035-mirror-move-some-checks-to-qmp.patch
diff --git a/debian/patches/pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch b/debian/patches/pve/0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
similarity index 88%
rename from debian/patches/pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
rename to debian/patches/pve/0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
index b5c2dab..99322ce 100644
--- a/debian/patches/pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
+++ b/debian/patches/pve/0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
@@ -20,13 +20,13 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
 ---
- block/monitor/block-hmp-cmds.c |  1 +
- monitor/hmp-cmds.c             | 45 ++++++++++++----
- proxmox-backup-client.c        |  3 +-
- proxmox-backup-client.h        |  1 +
- pve-backup.c                   | 95 ++++++++++++++++++++++++++++++----
- qapi/block-core.json           | 12 ++++-
- 6 files changed, 134 insertions(+), 23 deletions(-)
+ block/monitor/block-hmp-cmds.c |   1 +
+ monitor/hmp-cmds.c             |  45 ++++++++++----
+ proxmox-backup-client.c        |   3 +-
+ proxmox-backup-client.h        |   1 +
+ pve-backup.c                   | 103 ++++++++++++++++++++++++++++++---
+ qapi/block-core.json           |  12 +++-
+ 6 files changed, 142 insertions(+), 23 deletions(-)
 
 diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
 index 9ba7c774a2..056d14deee 100644
@@ -132,7 +132,7 @@ index 1dda8b7d8f..8cbf645b2c 100644
  
  
 diff --git a/pve-backup.c b/pve-backup.c
-index d40f3f2fd6..d50f03a050 100644
+index d40f3f2fd6..1cd9d31d7c 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -28,6 +28,8 @@
@@ -257,8 +257,8 @@ index d40f3f2fd6..d50f03a050 100644
      const char *fingerprint;
      bool has_fingerprint;
      int64_t backup_time;
-+    bool has_incremental;
-+    bool incremental;
++    bool has_use_dirty_bitmap;
++    bool use_dirty_bitmap;
      bool has_format;
      BackupFormat format;
      bool has_config_file;
@@ -274,7 +274,7 @@ index d40f3f2fd6..d50f03a050 100644
          int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
          firewall_name = "fw.conf";
  
-+        bool incremental = task->has_incremental && task->incremental;
++        bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
 +
          char *pbs_err = NULL;
          pbs = proxmox_backup_new(
@@ -289,42 +289,50 @@ index d40f3f2fd6..d50f03a050 100644
              goto err;
  
          /* register all devices */
-@@ -684,9 +721,32 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -684,9 +721,40 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
  
              const char *devname = bdrv_get_device_name(di->bs);
  
 -            int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, task->errp);
 -            if (dev_id < 0)
 +            BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
++            bool expect_only_dirty = false;
 +
-+            bool use_incremental = false;
-+            if (incremental) {
++            if (use_dirty_bitmap) {
 +                if (bitmap == NULL) {
 +                    bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
 +                    if (!bitmap) {
 +                        goto err;
 +                    }
-+                    /* mark entire bitmap as dirty to make full backup first */
++                } else {
++                    expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
++                }
++
++                if (expect_only_dirty) {
++                    dirty += bdrv_get_dirty_count(bitmap);
++                } else {
++                    /* mark entire bitmap as dirty to make full backup */
 +                    bdrv_set_dirty_bitmap(bitmap, 0, di->size);
 +                    dirty += di->size;
-+                } else {
-+                    use_incremental = true;
-+                    dirty += bdrv_get_dirty_count(bitmap);
 +                }
 +                di->bitmap = bitmap;
-+            } else if (bitmap != NULL) {
++            } else {
 +                dirty += di->size;
-+                bdrv_release_dirty_bitmap(bitmap);
++
++                /* after a full backup the old dirty bitmap is invalid anyway */
++                if (bitmap != NULL) {
++                    bdrv_release_dirty_bitmap(bitmap);
++                }
 +            }
 +
-+            int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, use_incremental, task->errp);
++            int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
 +            if (dev_id < 0) {
                  goto err;
 +            }
  
              if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
                  goto err;
-@@ -695,6 +755,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -695,6 +763,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
              di->dev_id = dev_id;
          }
      } else if (format == BACKUP_FORMAT_VMA) {
@@ -333,7 +341,7 @@ index d40f3f2fd6..d50f03a050 100644
          vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
          if (!vmaw) {
              if (local_err) {
-@@ -722,6 +784,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -722,6 +792,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
              }
          }
      } else if (format == BACKUP_FORMAT_DIR) {
@@ -342,18 +350,18 @@ index d40f3f2fd6..d50f03a050 100644
          if (mkdir(task->backup_file, 0640) != 0) {
              error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
                               task->backup_file);
-@@ -794,8 +858,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -794,8 +866,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
      char *uuid_str = g_strdup(backup_state.stat.uuid_str);
  
      backup_state.stat.total = total;
 +    backup_state.stat.dirty = dirty;
      backup_state.stat.transferred = 0;
      backup_state.stat.zero_bytes = 0;
-+    backup_state.stat.reused = dirty >= total ? 0 : total - dirty;
++    backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
  
      qemu_mutex_unlock(&backup_state.stat.lock);
  
-@@ -819,6 +885,10 @@ err:
+@@ -819,6 +893,10 @@ err:
          PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
          l = g_list_next(l);
  
@@ -364,24 +372,24 @@ index d40f3f2fd6..d50f03a050 100644
          if (di->target) {
              bdrv_unref(di->target);
          }
-@@ -860,6 +930,7 @@ UuidInfo *qmp_backup(
+@@ -860,6 +938,7 @@ UuidInfo *qmp_backup(
      bool has_fingerprint, const char *fingerprint,
      bool has_backup_id, const char *backup_id,
      bool has_backup_time, int64_t backup_time,
-+    bool has_incremental, bool incremental,
++    bool has_use_dirty_bitmap, bool use_dirty_bitmap,
      bool has_format, BackupFormat format,
      bool has_config_file, const char *config_file,
      bool has_firewall_file, const char *firewall_file,
-@@ -878,6 +949,8 @@ UuidInfo *qmp_backup(
+@@ -878,6 +957,8 @@ UuidInfo *qmp_backup(
          .backup_id = backup_id,
          .has_backup_time = has_backup_time,
          .backup_time = backup_time,
-+        .has_incremental = has_incremental,
-+        .incremental = incremental,
++        .has_use_dirty_bitmap = has_use_dirty_bitmap,
++        .use_dirty_bitmap = use_dirty_bitmap,
          .has_format = has_format,
          .format = format,
          .has_config_file = has_config_file,
-@@ -946,10 +1019,14 @@ BackupStatus *qmp_query_backup(Error **errp)
+@@ -946,10 +1027,14 @@ BackupStatus *qmp_query_backup(Error **errp)
  
      info->has_total = true;
      info->total = backup_state.stat.total;
@@ -397,14 +405,14 @@ index d40f3f2fd6..d50f03a050 100644
      qemu_mutex_unlock(&backup_state.stat.lock);
  
 diff --git a/qapi/block-core.json b/qapi/block-core.json
-index 9054db608c..aadd5329b3 100644
+index 9054db608c..d4e1c98c50 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
 @@ -758,8 +758,13 @@
  #
  # @total: total amount of bytes involved in the backup process
  #
-+# @dirty: with incremental mode, this is the amount of bytes involved
++# @dirty: with incremental mode (PBS) this is the amount of bytes involved
 +#         in the backup process which are marked dirty.
 +#
  # @transferred: amount of bytes already backed up.
@@ -429,7 +437,7 @@ index 9054db608c..aadd5329b3 100644
  #
  # @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
  #
-+# @incremental: sync incremental changes since last job (optional for format 'pbs')
++# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
 +#
  # Returns: the uuid of the backup job
  #
@@ -438,7 +446,7 @@ index 9054db608c..aadd5329b3 100644
                                      '*fingerprint': 'str',
                                      '*backup-id': 'str',
                                      '*backup-time': 'int',
-+                                    '*incremental': 'bool',
++                                    '*use-dirty-bitmap': 'bool',
                                      '*format': 'BackupFormat',
                                      '*config-file': 'str',
                                      '*firewall-file': 'str',
diff --git a/debian/patches/pve/0037-PVE-various-PBS-fixes.patch b/debian/patches/pve/0037-PVE-various-PBS-fixes.patch
new file mode 100644
index 0000000..7ab4bfe
--- /dev/null
+++ b/debian/patches/pve/0037-PVE-various-PBS-fixes.patch
@@ -0,0 +1,218 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Dietmar Maurer <dietmar@proxmox.com>
+Date: Thu, 9 Jul 2020 12:53:08 +0200
+Subject: [PATCH] PVE: various PBS fixes
+
+pbs: fix crypt and compress parameters
+Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
+
+PVE: handle PBS write callback with big blocks correctly
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+
+PVE: add zero block handling to PBS dump callback
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ block/monitor/block-hmp-cmds.c |  4 ++-
+ pve-backup.c                   | 57 +++++++++++++++++++++++++++-------
+ qapi/block-core.json           |  6 ++++
+ 3 files changed, 54 insertions(+), 13 deletions(-)
+
+diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
+index 056d14deee..46c63b1cf9 100644
+--- a/block/monitor/block-hmp-cmds.c
++++ b/block/monitor/block-hmp-cmds.c
+@@ -1039,7 +1039,9 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
+         false, NULL, // PBS fingerprint
+         false, NULL, // PBS backup-id
+         false, 0, // PBS backup-time
+-        false, false, // PBS incremental
++        false, false, // PBS use-dirty-bitmap
++        false, false, // PBS compress
++        false, false, // PBS encrypt
+         true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
+         false, NULL, false, NULL, !!devlist,
+         devlist, qdict_haskey(qdict, "speed"), speed, &error);
+diff --git a/pve-backup.c b/pve-backup.c
+index 1cd9d31d7c..b8182aaf89 100644
+--- a/pve-backup.c
++++ b/pve-backup.c
+@@ -8,6 +8,7 @@
+ #include "block/blockjob.h"
+ #include "qapi/qapi-commands-block.h"
+ #include "qapi/qmp/qerror.h"
++#include "qemu/cutils.h"
+ 
+ /* PVE backup state and related function */
+ 
+@@ -67,6 +68,7 @@ opts_init(pvebackup_init);
+ typedef struct PVEBackupDevInfo {
+     BlockDriverState *bs;
+     size_t size;
++    uint64_t block_size;
+     uint8_t dev_id;
+     bool completed;
+     char targetfile[PATH_MAX];
+@@ -135,10 +137,13 @@ pvebackup_co_dump_pbs_cb(
+     PVEBackupDevInfo *di = opaque;
+ 
+     assert(backup_state.pbs);
++    assert(buf);
+ 
+     Error *local_err = NULL;
+     int pbs_res = -1;
+ 
++    bool is_zero_block = size == di->block_size && buffer_is_zero(buf, size);
++
+     qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
+ 
+     // avoid deadlock if job is cancelled
+@@ -147,17 +152,29 @@ pvebackup_co_dump_pbs_cb(
+         return -1;
+     }
+ 
+-    pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
+-    qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++    uint64_t transferred = 0;
++    uint64_t reused = 0;
++    while (transferred < size) {
++        uint64_t left = size - transferred;
++        uint64_t to_transfer = left < di->block_size ? left : di->block_size;
+ 
+-    if (pbs_res < 0) {
+-        pvebackup_propagate_error(local_err);
+-        return pbs_res;
+-    } else {
+-        size_t reused = (pbs_res == 0) ? size : 0;
+-        pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
++        pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id,
++            is_zero_block ? NULL : buf + transferred, start + transferred,
++            to_transfer, &local_err);
++        transferred += to_transfer;
++
++        if (pbs_res < 0) {
++            pvebackup_propagate_error(local_err);
++            qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++            return pbs_res;
++        }
++
++        reused += pbs_res == 0 ? to_transfer : 0;
+     }
+ 
++    qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++    pvebackup_add_transfered_bytes(size, is_zero_block ? size : 0, reused);
++
+     return size;
+ }
+ 
+@@ -178,6 +195,7 @@ pvebackup_co_dump_vma_cb(
+     int ret = -1;
+ 
+     assert(backup_state.vmaw);
++    assert(buf);
+ 
+     uint64_t remaining = size;
+ 
+@@ -204,9 +222,7 @@ pvebackup_co_dump_vma_cb(
+         qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ 
+         ++cluster_num;
+-        if (buf) {
+-            buf += VMA_CLUSTER_SIZE;
+-        }
++        buf += VMA_CLUSTER_SIZE;
+         if (ret < 0) {
+             Error *local_err = NULL;
+             vma_writer_error_propagate(backup_state.vmaw, &local_err);
+@@ -567,6 +583,10 @@ typedef struct QmpBackupTask {
+     const char *firewall_file;
+     bool has_devlist;
+     const char *devlist;
++    bool has_compress;
++    bool compress;
++    bool has_encrypt;
++    bool encrypt;
+     bool has_speed;
+     int64_t speed;
+     Error **errp;
+@@ -690,6 +710,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+ 
+         bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
+ 
++
+         char *pbs_err = NULL;
+         pbs = proxmox_backup_new(
+             task->backup_file,
+@@ -699,8 +720,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+             task->has_password ? task->password : NULL,
+             task->has_keyfile ? task->keyfile : NULL,
+             task->has_key_password ? task->key_password : NULL,
++            task->has_compress ? task->compress : true,
++            task->has_encrypt ? task->encrypt : task->has_keyfile,
+             task->has_fingerprint ? task->fingerprint : NULL,
+-            &pbs_err);
++             &pbs_err);
+ 
+         if (!pbs) {
+             error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+@@ -719,6 +742,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+             PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+             l = g_list_next(l);
+ 
++            di->block_size = dump_cb_block_size;
++
+             const char *devname = bdrv_get_device_name(di->bs);
+ 
+             BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
+@@ -939,6 +964,8 @@ UuidInfo *qmp_backup(
+     bool has_backup_id, const char *backup_id,
+     bool has_backup_time, int64_t backup_time,
+     bool has_use_dirty_bitmap, bool use_dirty_bitmap,
++    bool has_compress, bool compress,
++    bool has_encrypt, bool encrypt,
+     bool has_format, BackupFormat format,
+     bool has_config_file, const char *config_file,
+     bool has_firewall_file, const char *firewall_file,
+@@ -949,6 +976,8 @@ UuidInfo *qmp_backup(
+         .backup_file = backup_file,
+         .has_password = has_password,
+         .password = password,
++        .has_keyfile = has_keyfile,
++        .keyfile = keyfile,
+         .has_key_password = has_key_password,
+         .key_password = key_password,
+         .has_fingerprint = has_fingerprint,
+@@ -959,6 +988,10 @@ UuidInfo *qmp_backup(
+         .backup_time = backup_time,
+         .has_use_dirty_bitmap = has_use_dirty_bitmap,
+         .use_dirty_bitmap = use_dirty_bitmap,
++        .has_compress = has_compress,
++        .compress = compress,
++        .has_encrypt = has_encrypt,
++        .encrypt = encrypt,
+         .has_format = has_format,
+         .format = format,
+         .has_config_file = has_config_file,
+diff --git a/qapi/block-core.json b/qapi/block-core.json
+index d4e1c98c50..0fda1e3fd3 100644
+--- a/qapi/block-core.json
++++ b/qapi/block-core.json
+@@ -823,6 +823,10 @@
+ #
+ # @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
+ #
++# @compress: use compression (optional for format 'pbs', defaults to true)
++#
++# @encrypt: use encryption ((optional for format 'pbs', defaults to true if there is a keyfile)
++#
+ # Returns: the uuid of the backup job
+ #
+ ##
+@@ -834,6 +838,8 @@
+                                     '*backup-id': 'str',
+                                     '*backup-time': 'int',
+                                     '*use-dirty-bitmap': 'bool',
++                                    '*compress': 'bool',
++                                    '*encrypt': 'bool',
+                                     '*format': 'BackupFormat',
+                                     '*config-file': 'str',
+                                     '*firewall-file': 'str',
diff --git a/debian/patches/pve/0043-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch b/debian/patches/pve/0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
similarity index 100%
rename from debian/patches/pve/0043-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
rename to debian/patches/pve/0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
diff --git a/debian/patches/pve/0038-PVE-backup-rename-incremental-to-use-dirty-bitmap.patch b/debian/patches/pve/0038-PVE-backup-rename-incremental-to-use-dirty-bitmap.patch
deleted file mode 100644
index 56b9c32..0000000
--- a/debian/patches/pve/0038-PVE-backup-rename-incremental-to-use-dirty-bitmap.patch
+++ /dev/null
@@ -1,126 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Thomas Lamprecht <t.lamprecht@proxmox.com>
-Date: Mon, 6 Jul 2020 20:05:16 +0200
-Subject: [PATCH] PVE backup: rename incremental to use-dirty-bitmap
-
-Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
----
- pve-backup.c         | 22 +++++++++++-----------
- qapi/block-core.json |  6 +++---
- 2 files changed, 14 insertions(+), 14 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index d50f03a050..7bf54b4c5d 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -557,8 +557,8 @@ typedef struct QmpBackupTask {
-     const char *fingerprint;
-     bool has_fingerprint;
-     int64_t backup_time;
--    bool has_incremental;
--    bool incremental;
-+    bool has_use_dirty_bitmap;
-+    bool use_dirty_bitmap;
-     bool has_format;
-     BackupFormat format;
-     bool has_config_file;
-@@ -688,7 +688,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-         int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
-         firewall_name = "fw.conf";
- 
--        bool incremental = task->has_incremental && task->incremental;
-+        bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
- 
-         char *pbs_err = NULL;
-         pbs = proxmox_backup_new(
-@@ -722,9 +722,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-             const char *devname = bdrv_get_device_name(di->bs);
- 
-             BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
-+            bool expect_only_dirty = false;
- 
--            bool use_incremental = false;
--            if (incremental) {
-+            if (use_dirty_bitmap) {
-                 if (bitmap == NULL) {
-                     bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
-                     if (!bitmap) {
-@@ -734,7 +734,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-                     bdrv_set_dirty_bitmap(bitmap, 0, di->size);
-                     dirty += di->size;
-                 } else {
--                    use_incremental = true;
-+                    expect_only_dirty = true;
-                     dirty += bdrv_get_dirty_count(bitmap);
-                 }
-                 di->bitmap = bitmap;
-@@ -743,7 +743,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-                 bdrv_release_dirty_bitmap(bitmap);
-             }
- 
--            int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, use_incremental, task->errp);
-+            int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
-             if (dev_id < 0) {
-                 goto err;
-             }
-@@ -861,7 +861,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-     backup_state.stat.dirty = dirty;
-     backup_state.stat.transferred = 0;
-     backup_state.stat.zero_bytes = 0;
--    backup_state.stat.reused = dirty >= total ? 0 : total - dirty;
-+    backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
- 
-     qemu_mutex_unlock(&backup_state.stat.lock);
- 
-@@ -930,7 +930,7 @@ UuidInfo *qmp_backup(
-     bool has_fingerprint, const char *fingerprint,
-     bool has_backup_id, const char *backup_id,
-     bool has_backup_time, int64_t backup_time,
--    bool has_incremental, bool incremental,
-+    bool has_use_dirty_bitmap, bool use_dirty_bitmap,
-     bool has_format, BackupFormat format,
-     bool has_config_file, const char *config_file,
-     bool has_firewall_file, const char *firewall_file,
-@@ -949,8 +949,8 @@ UuidInfo *qmp_backup(
-         .backup_id = backup_id,
-         .has_backup_time = has_backup_time,
-         .backup_time = backup_time,
--        .has_incremental = has_incremental,
--        .incremental = incremental,
-+        .has_use_dirty_bitmap = has_use_dirty_bitmap,
-+        .use_dirty_bitmap = use_dirty_bitmap,
-         .has_format = has_format,
-         .format = format,
-         .has_config_file = has_config_file,
-diff --git a/qapi/block-core.json b/qapi/block-core.json
-index aadd5329b3..d4e1c98c50 100644
---- a/qapi/block-core.json
-+++ b/qapi/block-core.json
-@@ -758,7 +758,7 @@
- #
- # @total: total amount of bytes involved in the backup process
- #
--# @dirty: with incremental mode, this is the amount of bytes involved
-+# @dirty: with incremental mode (PBS) this is the amount of bytes involved
- #         in the backup process which are marked dirty.
- #
- # @transferred: amount of bytes already backed up.
-@@ -821,7 +821,7 @@
- #
- # @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
- #
--# @incremental: sync incremental changes since last job (optional for format 'pbs')
-+# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
- #
- # Returns: the uuid of the backup job
- #
-@@ -833,7 +833,7 @@
-                                     '*fingerprint': 'str',
-                                     '*backup-id': 'str',
-                                     '*backup-time': 'int',
--                                    '*incremental': 'bool',
-+                                    '*use-dirty-bitmap': 'bool',
-                                     '*format': 'BackupFormat',
-                                     '*config-file': 'str',
-                                     '*firewall-file': 'str',
diff --git a/debian/patches/pve/0044-PVE-add-query_proxmox_support-QMP-command.patch b/debian/patches/pve/0039-PVE-add-query_proxmox_support-QMP-command.patch
similarity index 94%
rename from debian/patches/pve/0044-PVE-add-query_proxmox_support-QMP-command.patch
rename to debian/patches/pve/0039-PVE-add-query_proxmox_support-QMP-command.patch
index a9cbe84..5697adf 100644
--- a/debian/patches/pve/0044-PVE-add-query_proxmox_support-QMP-command.patch
+++ b/debian/patches/pve/0039-PVE-add-query_proxmox_support-QMP-command.patch
@@ -16,10 +16,10 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  2 files changed, 33 insertions(+)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index bfb648d6b5..ba9d0d8a86 100644
+index b8182aaf89..40c2697b37 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
-@@ -1051,3 +1051,11 @@ BackupStatus *qmp_query_backup(Error **errp)
+@@ -1073,3 +1073,11 @@ BackupStatus *qmp_query_backup(Error **errp)
  
      return info;
  }
diff --git a/debian/patches/pve/0039-PVE-fixup-pbs-restore-API.patch b/debian/patches/pve/0039-PVE-fixup-pbs-restore-API.patch
deleted file mode 100644
index cc8f30a..0000000
--- a/debian/patches/pve/0039-PVE-fixup-pbs-restore-API.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Mon, 6 Jul 2020 14:40:12 +0200
-Subject: [PATCH] PVE: fixup pbs-restore API
-
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
----
- pbs-restore.c | 10 ++++++++--
- 1 file changed, 8 insertions(+), 2 deletions(-)
-
-diff --git a/pbs-restore.c b/pbs-restore.c
-index d4daee7e91..4d3f925a1b 100644
---- a/pbs-restore.c
-+++ b/pbs-restore.c
-@@ -161,13 +161,19 @@ int main(int argc, char **argv)
-         fprintf(stderr, "connecting to repository '%s'\n", repository);
-     }
-     char *pbs_error = NULL;
--    ProxmoxRestoreHandle *conn = proxmox_restore_connect(
-+    ProxmoxRestoreHandle *conn = proxmox_restore_new(
-         repository, snapshot, password, keyfile, key_password, fingerprint, &pbs_error);
-     if (conn == NULL) {
-         fprintf(stderr, "restore failed: %s\n", pbs_error);
-         return -1;
-     }
- 
-+    int res = proxmox_restore_connect(conn, &pbs_error);
-+    if (res < 0 || pbs_error) {
-+        fprintf(stderr, "restore failed (connection error): %s\n", pbs_error);
-+        return -1;
-+    }
-+
-     QDict *options = qdict_new();
- 
-     if (format) {
-@@ -198,7 +204,7 @@ int main(int argc, char **argv)
-         fprintf(stderr, "starting to restore snapshot '%s'\n", snapshot);
-         fflush(stderr); // ensure we do not get printed after the progress log
-     }
--    int res = proxmox_restore_image(
-+    res = proxmox_restore_image(
-         conn,
-         archive_name,
-         write_callback,
diff --git a/debian/patches/pve/0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch b/debian/patches/pve/0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch
similarity index 100%
rename from debian/patches/pve/0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch
rename to debian/patches/pve/0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch
diff --git a/debian/patches/pve/0040-PVE-always-set-dirty-counter-for-non-incremental-bac.patch b/debian/patches/pve/0040-PVE-always-set-dirty-counter-for-non-incremental-bac.patch
deleted file mode 100644
index c7b267e..0000000
--- a/debian/patches/pve/0040-PVE-always-set-dirty-counter-for-non-incremental-bac.patch
+++ /dev/null
@@ -1,30 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Mon, 6 Jul 2020 14:40:13 +0200
-Subject: [PATCH] PVE: always set dirty counter for non-incremental backups
-
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
----
- pve-backup.c | 8 ++++++--
- 1 file changed, 6 insertions(+), 2 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index 7bf54b4c5d..1f2a0bbe8c 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -738,9 +738,13 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-                     dirty += bdrv_get_dirty_count(bitmap);
-                 }
-                 di->bitmap = bitmap;
--            } else if (bitmap != NULL) {
-+            } else {
-                 dirty += di->size;
--                bdrv_release_dirty_bitmap(bitmap);
-+
-+                /* after a full backup the old dirty bitmap is invalid anyway */
-+                if (bitmap != NULL) {
-+                    bdrv_release_dirty_bitmap(bitmap);
-+                }
-             }
- 
-             int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
diff --git a/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch b/debian/patches/pve/0041-PVE-redirect-stderr-to-journal-when-daemonized.patch
similarity index 100%
rename from debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
rename to debian/patches/pve/0041-PVE-redirect-stderr-to-journal-when-daemonized.patch
diff --git a/debian/patches/pve/0041-PVE-use-proxmox_backup_check_incremental.patch b/debian/patches/pve/0041-PVE-use-proxmox_backup_check_incremental.patch
deleted file mode 100644
index c55357f..0000000
--- a/debian/patches/pve/0041-PVE-use-proxmox_backup_check_incremental.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Mon, 6 Jul 2020 14:40:14 +0200
-Subject: [PATCH] PVE: use proxmox_backup_check_incremental
-
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
-Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
----
- pve-backup.c | 12 ++++++++----
- 1 file changed, 8 insertions(+), 4 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index 1f2a0bbe8c..1cd9d31d7c 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -730,12 +730,16 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-                     if (!bitmap) {
-                         goto err;
-                     }
--                    /* mark entire bitmap as dirty to make full backup first */
--                    bdrv_set_dirty_bitmap(bitmap, 0, di->size);
--                    dirty += di->size;
-                 } else {
--                    expect_only_dirty = true;
-+                    expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
-+                }
-+
-+                if (expect_only_dirty) {
-                     dirty += bdrv_get_dirty_count(bitmap);
-+                } else {
-+                    /* mark entire bitmap as dirty to make full backup */
-+                    bdrv_set_dirty_bitmap(bitmap, 0, di->size);
-+                    dirty += di->size;
-                 }
-                 di->bitmap = bitmap;
-             } else {
diff --git a/debian/patches/pve/0050-PVE-Add-sequential-job-transaction-support.patch b/debian/patches/pve/0042-PVE-Add-sequential-job-transaction-support.patch
similarity index 75%
rename from debian/patches/pve/0050-PVE-Add-sequential-job-transaction-support.patch
rename to debian/patches/pve/0042-PVE-Add-sequential-job-transaction-support.patch
index 24f8bfb..6be76d8 100644
--- a/debian/patches/pve/0050-PVE-Add-sequential-job-transaction-support.patch
+++ b/debian/patches/pve/0042-PVE-Add-sequential-job-transaction-support.patch
@@ -6,8 +6,8 @@ Subject: [PATCH] PVE: Add sequential job transaction support
 Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 ---
  include/qemu/job.h | 12 ++++++++++++
- job.c              | 24 ++++++++++++++++++++++++
- 2 files changed, 36 insertions(+)
+ job.c              | 31 +++++++++++++++++++++++++++++++
+ 2 files changed, 43 insertions(+)
 
 diff --git a/include/qemu/job.h b/include/qemu/job.h
 index 32aabb1c60..f7a6a0926a 100644
@@ -33,7 +33,7 @@ index 32aabb1c60..f7a6a0926a 100644
   * Release a reference that was previously acquired with job_txn_add_job or
   * job_txn_new. If it's the last reference to the object, it will be freed.
 diff --git a/job.c b/job.c
-index f9884e7d9d..8f06e05fbf 100644
+index f9884e7d9d..05b7797e82 100644
 --- a/job.c
 +++ b/job.c
 @@ -72,6 +72,8 @@ struct JobTxn {
@@ -81,3 +81,17 @@ index f9884e7d9d..8f06e05fbf 100644
              return;
          }
          assert(other_job->ret == 0);
+@@ -1011,6 +1035,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
+         return -EBUSY;
+     }
+ 
++    /* in a sequential transaction jobs with status CREATED can appear at time
++     * of cancelling, these have not begun work so job_enter won't do anything,
++     * let's ensure they are marked as ABORTING if required */
++    if (job->status == JOB_STATUS_CREATED && job->txn->sequential) {
++        job_update_rc(job);
++    }
++
+     AIO_WAIT_WHILE(job->aio_context,
+                    (job_enter(job), !job_is_completed(job)));
+ 
diff --git a/debian/patches/pve/0042-PVE-fixup-pbs-backup-add-compress-and-encrypt-option.patch b/debian/patches/pve/0042-PVE-fixup-pbs-backup-add-compress-and-encrypt-option.patch
deleted file mode 100644
index 601df5c..0000000
--- a/debian/patches/pve/0042-PVE-fixup-pbs-backup-add-compress-and-encrypt-option.patch
+++ /dev/null
@@ -1,103 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Dietmar Maurer <dietmar@proxmox.com>
-Date: Thu, 9 Jul 2020 12:53:08 +0200
-Subject: [PATCH] PVE: fixup pbs backup, add compress and encrypt options
-
----
- block/monitor/block-hmp-cmds.c |  4 +++-
- pve-backup.c                   | 13 ++++++++++++-
- qapi/block-core.json           |  6 ++++++
- 3 files changed, 21 insertions(+), 2 deletions(-)
-
-diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
-index 056d14deee..46c63b1cf9 100644
---- a/block/monitor/block-hmp-cmds.c
-+++ b/block/monitor/block-hmp-cmds.c
-@@ -1039,7 +1039,9 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
-         false, NULL, // PBS fingerprint
-         false, NULL, // PBS backup-id
-         false, 0, // PBS backup-time
--        false, false, // PBS incremental
-+        false, false, // PBS use-dirty-bitmap
-+        false, false, // PBS compress
-+        false, false, // PBS encrypt
-         true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
-         false, NULL, false, NULL, !!devlist,
-         devlist, qdict_haskey(qdict, "speed"), speed, &error);
-diff --git a/pve-backup.c b/pve-backup.c
-index 1cd9d31d7c..bfb648d6b5 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -567,6 +567,10 @@ typedef struct QmpBackupTask {
-     const char *firewall_file;
-     bool has_devlist;
-     const char *devlist;
-+    bool has_compress;
-+    bool compress;
-+    bool has_encrypt;
-+    bool encrypt;
-     bool has_speed;
-     int64_t speed;
-     Error **errp;
-@@ -690,6 +694,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
- 
-         bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
- 
-+
-         char *pbs_err = NULL;
-         pbs = proxmox_backup_new(
-             task->backup_file,
-@@ -699,8 +704,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-             task->has_password ? task->password : NULL,
-             task->has_keyfile ? task->keyfile : NULL,
-             task->has_key_password ? task->key_password : NULL,
-+            task->has_compress ? task->compress : true,
-+            task->has_encrypt ? task->encrypt : task->has_keyfile,
-             task->has_fingerprint ? task->fingerprint : NULL,
--            &pbs_err);
-+             &pbs_err);
- 
-         if (!pbs) {
-             error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
-@@ -939,6 +946,8 @@ UuidInfo *qmp_backup(
-     bool has_backup_id, const char *backup_id,
-     bool has_backup_time, int64_t backup_time,
-     bool has_use_dirty_bitmap, bool use_dirty_bitmap,
-+    bool has_compress, bool compress,
-+    bool has_encrypt, bool encrypt,
-     bool has_format, BackupFormat format,
-     bool has_config_file, const char *config_file,
-     bool has_firewall_file, const char *firewall_file,
-@@ -967,6 +976,8 @@ UuidInfo *qmp_backup(
-         .firewall_file = firewall_file,
-         .has_devlist = has_devlist,
-         .devlist = devlist,
-+        .has_compress = has_compress,
-+        .has_encrypt = has_encrypt,
-         .has_speed = has_speed,
-         .speed = speed,
-         .errp = errp,
-diff --git a/qapi/block-core.json b/qapi/block-core.json
-index d4e1c98c50..0fda1e3fd3 100644
---- a/qapi/block-core.json
-+++ b/qapi/block-core.json
-@@ -823,6 +823,10 @@
- #
- # @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
- #
-+# @compress: use compression (optional for format 'pbs', defaults to true)
-+#
-+# @encrypt: use encryption ((optional for format 'pbs', defaults to true if there is a keyfile)
-+#
- # Returns: the uuid of the backup job
- #
- ##
-@@ -834,6 +838,8 @@
-                                     '*backup-id': 'str',
-                                     '*backup-time': 'int',
-                                     '*use-dirty-bitmap': 'bool',
-+                                    '*compress': 'bool',
-+                                    '*encrypt': 'bool',
-                                     '*format': 'BackupFormat',
-                                     '*config-file': 'str',
-                                     '*firewall-file': 'str',
diff --git a/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch b/debian/patches/pve/0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
similarity index 100%
rename from debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
rename to debian/patches/pve/0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
diff --git a/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch b/debian/patches/pve/0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
similarity index 63%
rename from debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
rename to debian/patches/pve/0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
index b215d3d..d15dda1 100644
--- a/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
+++ b/debian/patches/pve/0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
@@ -1,7 +1,8 @@
 From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
 From: Stefan Reiter <s.reiter@proxmox.com>
 Date: Mon, 28 Sep 2020 13:40:51 +0200
-Subject: [PATCH] PVE-Backup: Use more coroutines and don't block on finishing
+Subject: [PATCH] PVE-Backup: Don't block on finishing and cleanup
+ create_backup_jobs
 
 proxmox_backup_co_finish is already async, but previously we would wait
 for the coroutine using block_on_coroutine_fn(). Avoid this by
@@ -29,16 +30,31 @@ To communicate the finishing state, a new property is introduced to
 query-backup: 'finishing'. A new state is explicitly not used, since
 that would break compatibility with older qemu-server versions.
 
+Also fix create_backup_jobs:
+
+No more weird bool returns, just the standard "errp" format used
+everywhere else too. With this, if backup_job_create fails, the error
+message is actually returned over QMP and can be shown to the user.
+
+To facilitate correct cleanup on such an error, we call
+create_backup_jobs as a bottom half directly from pvebackup_co_prepare.
+This additionally allows us to actually hold the backup_mutex during
+operation.
+
+Also add a job_cancel_sync before job_unref, since a job must be in
+STATUS_NULL to be deleted by unref, which could trigger an assert
+before.
+
 [0] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg03515.html
 
 Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 ---
- pve-backup.c         | 148 ++++++++++++++++++++++++++-----------------
+ pve-backup.c         | 217 ++++++++++++++++++++++++++++---------------
  qapi/block-core.json |   5 +-
- 2 files changed, 95 insertions(+), 58 deletions(-)
+ 2 files changed, 144 insertions(+), 78 deletions(-)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index f3df43eb8c..f3b8ce1f3a 100644
+index f3df43eb8c..ded90cb822 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -33,7 +33,9 @@ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
@@ -52,11 +68,12 @@ index f3df43eb8c..f3b8ce1f3a 100644
          QemuMutex lock;
          Error *error;
          time_t start_time;
-@@ -47,20 +49,21 @@ static struct PVEBackupState {
+@@ -47,20 +49,22 @@ static struct PVEBackupState {
          size_t reused;
          size_t zero_bytes;
          GList *bitmap_list;
 +        bool finishing;
++        bool starting;
      } stat;
      int64_t speed;
      VmaWriter *vmaw;
@@ -76,7 +93,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
      qemu_co_mutex_init(&backup_state.dump_callback_mutex);
  }
  
-@@ -72,6 +75,7 @@ typedef struct PVEBackupDevInfo {
+@@ -72,6 +76,7 @@ typedef struct PVEBackupDevInfo {
      size_t size;
      uint64_t block_size;
      uint8_t dev_id;
@@ -84,7 +101,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
      char targetfile[PATH_MAX];
      BdrvDirtyBitmap *bitmap;
      BlockDriverState *target;
-@@ -227,12 +231,12 @@ pvebackup_co_dump_vma_cb(
+@@ -227,12 +232,12 @@ pvebackup_co_dump_vma_cb(
  }
  
  // assumes the caller holds backup_mutex
@@ -99,7 +116,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
      qemu_mutex_unlock(&backup_state.stat.lock);
  
      if (backup_state.vmaw) {
-@@ -261,12 +265,29 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
+@@ -261,35 +266,29 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
  
      g_list_free(backup_state.di_list);
      backup_state.di_list = NULL;
@@ -116,25 +133,28 @@ index f3df43eb8c..f3b8ce1f3a 100644
  {
      PVEBackupDevInfo *di = opaque;
 +    int ret = di->completed_ret;
-+
-+    qemu_co_mutex_lock(&backup_state.backup_mutex);
-+
-+    if (ret < 0) {
-+        Error *local_err = NULL;
-+        error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
-+        pvebackup_propagate_error(local_err);
-+    }
-+
-+    di->bs = NULL;
-+
-+    assert(di->target == NULL);
  
-     bool error_or_canceled = pvebackup_error_or_canceled();
- 
-@@ -281,27 +302,6 @@ static void coroutine_fn pvebackup_complete_stream(void *opaque)
-             pvebackup_propagate_error(local_err);
-         }
+-    bool error_or_canceled = pvebackup_error_or_canceled();
+-
+-    if (backup_state.vmaw) {
+-        vma_writer_close_stream(backup_state.vmaw, di->dev_id);
++    qemu_mutex_lock(&backup_state.stat.lock);
++    bool starting = backup_state.stat.starting;
++    qemu_mutex_unlock(&backup_state.stat.lock);
++    if (starting) {
++        /* in 'starting' state, no tasks have been run yet, meaning we can (and
++         * must) skip all cleanup, as we don't know what has and hasn't been
++         * initialized yet. */
++        return;
      }
+ 
+-    if (backup_state.pbs && !error_or_canceled) {
+-        Error *local_err = NULL;
+-        proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
+-        if (local_err != NULL) {
+-            pvebackup_propagate_error(local_err);
+-        }
+-    }
 -}
 -
 -static void pvebackup_complete_cb(void *opaque, int ret)
@@ -144,22 +164,32 @@ index f3df43eb8c..f3b8ce1f3a 100644
 -    PVEBackupDevInfo *di = opaque;
 -
 -    qemu_mutex_lock(&backup_state.backup_mutex);
--
--    if (ret < 0) {
--        Error *local_err = NULL;
--        error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
--        pvebackup_propagate_error(local_err);
--    }
--
--    di->bs = NULL;
--
--    assert(di->target == NULL);
--
++    qemu_co_mutex_lock(&backup_state.backup_mutex);
+ 
+     if (ret < 0) {
+         Error *local_err = NULL;
+@@ -301,7 +300,19 @@ static void pvebackup_complete_cb(void *opaque, int ret)
+ 
+     assert(di->target == NULL);
+ 
 -    block_on_coroutine_fn(pvebackup_complete_stream, di);
++    bool error_or_canceled = pvebackup_error_or_canceled();
++
++    if (backup_state.vmaw) {
++        vma_writer_close_stream(backup_state.vmaw, di->dev_id);
++    }
++
++    if (backup_state.pbs && !error_or_canceled) {
++        Error *local_err = NULL;
++        proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
++        if (local_err != NULL) {
++            pvebackup_propagate_error(local_err);
++        }
++    }
  
      // remove self from job list
      backup_state.di_list = g_list_remove(backup_state.di_list, di);
-@@ -310,21 +310,49 @@ static void pvebackup_complete_cb(void *opaque, int ret)
+@@ -310,21 +321,49 @@ static void pvebackup_complete_cb(void *opaque, int ret)
  
      /* call cleanup if we're the last job */
      if (!g_list_first(backup_state.di_list)) {
@@ -188,7 +218,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
 +    Coroutine *co = qemu_coroutine_create(pvebackup_co_complete_stream, di);
 +    aio_co_enter(qemu_get_aio_context(), co);
 +}
- 
++
 +/*
 + * job_cancel(_sync) does not like to be called from coroutines, so defer to
 + * main loop processing via a bottom half.
@@ -202,7 +232,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
 +    aio_context_release(job_ctx);
 +    aio_co_enter(data->ctx, data->co);
 +}
-+
+ 
 +static void coroutine_fn pvebackup_co_cancel(void *opaque)
 +{
      Error *cancel_err = NULL;
@@ -214,7 +244,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
  
      if (backup_state.vmaw) {
          /* make sure vma writer does not block anymore */
-@@ -342,27 +370,22 @@ static void pvebackup_cancel(void)
+@@ -342,27 +381,22 @@ static void pvebackup_cancel(void)
          ((PVEBackupDevInfo *)bdi->data)->job :
          NULL;
  
@@ -251,22 +281,76 @@ index f3df43eb8c..f3b8ce1f3a 100644
  }
  
  // assumes the caller holds backup_mutex
-@@ -415,6 +438,14 @@ static int coroutine_fn pvebackup_co_add_config(
+@@ -415,10 +449,18 @@ static int coroutine_fn pvebackup_co_add_config(
      goto out;
  }
  
+-static bool create_backup_jobs(void) {
 +/*
 + * backup_job_create can *not* be run from a coroutine (and requires an
 + * acquired AioContext), so this can't either.
-+ * This does imply that this function cannot run with backup_mutex acquired.
-+ * That is ok because it is only ever called between setting up the backup_state
-+ * struct and starting the jobs, and from within a QMP call. This means that no
-+ * other QMP call can interrupt, and no background job is running yet.
++ * The caller is responsible that backup_mutex is held nonetheless.
 + */
- static bool create_backup_jobs(void) {
++static void create_backup_jobs_bh(void *opaque) {
  
      assert(!qemu_in_coroutine());
-@@ -523,11 +554,12 @@ typedef struct QmpBackupTask {
+ 
++    CoCtxData *data = (CoCtxData*)opaque;
++    Error **errp = (Error**)data->data;
++
+     Error *local_err = NULL;
+ 
+     /* create job transaction to synchronize bitmap commit and cancel all
+@@ -452,24 +494,19 @@ static bool create_backup_jobs(void) {
+ 
+         aio_context_release(aio_context);
+ 
+-        if (!job || local_err != NULL) {
+-            Error *create_job_err = NULL;
+-            error_setg(&create_job_err, "backup_job_create failed: %s",
+-                       local_err ? error_get_pretty(local_err) : "null");
++        di->job = job;
+ 
+-            pvebackup_propagate_error(create_job_err);
++        if (!job || local_err) {
++            error_setg(errp, "backup_job_create failed: %s",
++                       local_err ? error_get_pretty(local_err) : "null");
+             break;
+         }
+ 
+-        di->job = job;
+-
+         bdrv_unref(di->target);
+         di->target = NULL;
+     }
+ 
+-    bool errors = pvebackup_error_or_canceled();
+-
+-    if (errors) {
++    if (*errp) {
+         l = backup_state.di_list;
+         while (l) {
+             PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+@@ -481,12 +518,17 @@ static bool create_backup_jobs(void) {
+             }
+ 
+             if (di->job) {
++                AioContext *ctx = di->job->job.aio_context;
++                aio_context_acquire(ctx);
++                job_cancel_sync(&di->job->job);
+                 job_unref(&di->job->job);
++                aio_context_release(ctx);
+             }
+         }
+     }
+ 
+-    return errors;
++    /* return */
++    aio_co_enter(data->ctx, data->co);
+ }
+ 
+ typedef struct QmpBackupTask {
+@@ -523,11 +565,12 @@ typedef struct QmpBackupTask {
      UuidInfo *result;
  } QmpBackupTask;
  
@@ -280,7 +364,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
      QmpBackupTask *task = opaque;
  
      task->result = NULL; // just to be sure
-@@ -548,8 +580,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -548,8 +591,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
      const char *firewall_name = "qemu-server.fw";
  
      if (backup_state.di_list) {
@@ -291,7 +375,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
          return;
      }
  
-@@ -616,6 +649,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -616,6 +660,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
          }
          di->size = size;
          total += size;
@@ -300,24 +384,58 @@ index f3df43eb8c..f3b8ce1f3a 100644
      }
  
      uuid_generate(uuid);
-@@ -847,6 +882,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -847,6 +893,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
      backup_state.stat.dirty = total - backup_state.stat.reused;
      backup_state.stat.transferred = 0;
      backup_state.stat.zero_bytes = 0;
 +    backup_state.stat.finishing = false;
++    backup_state.stat.starting = true;
  
      qemu_mutex_unlock(&backup_state.stat.lock);
  
-@@ -861,6 +897,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
+@@ -861,6 +909,33 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
      uuid_info->UUID = uuid_str;
  
      task->result = uuid_info;
 +
++    /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
++    * backup_mutex locked. This is fine, a CoMutex can be held across yield
++    * points, and we'll release it as soon as the BH reschedules us.
++    */
++    CoCtxData waker = {
++        .co = qemu_coroutine_self(),
++        .ctx = qemu_get_current_aio_context(),
++        .data = &local_err,
++    };
++    aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
++    qemu_coroutine_yield();
++
++    if (local_err) {
++        error_propagate(task->errp, local_err);
++        goto err;
++    }
++
 +    qemu_co_mutex_unlock(&backup_state.backup_mutex);
++
++    qemu_mutex_lock(&backup_state.stat.lock);
++    backup_state.stat.starting = false;
++    qemu_mutex_unlock(&backup_state.stat.lock);
++
++    /* start the first job in the transaction */
++    job_txn_start_seq(backup_state.txn);
++
      return;
  
  err_mutex:
-@@ -903,6 +941,8 @@ err:
+@@ -883,6 +958,7 @@ err:
+         g_free(di);
+     }
+     g_list_free(di_list);
++    backup_state.di_list = NULL;
+ 
+     if (devs) {
+         g_strfreev(devs);
+@@ -903,6 +979,8 @@ err:
      }
  
      task->result = NULL;
@@ -326,7 +444,7 @@ index f3df43eb8c..f3b8ce1f3a 100644
      return;
  }
  
-@@ -956,22 +996,15 @@ UuidInfo *qmp_backup(
+@@ -956,24 +1034,8 @@ UuidInfo *qmp_backup(
          .errp = errp,
      };
  
@@ -334,23 +452,24 @@ index f3df43eb8c..f3b8ce1f3a 100644
 -
      block_on_coroutine_fn(pvebackup_co_prepare, &task);
  
-     if (*errp == NULL) {
-         bool errors = create_backup_jobs();
+-    if (*errp == NULL) {
+-        bool errors = create_backup_jobs();
 -        qemu_mutex_unlock(&backup_state.backup_mutex);
- 
-         if (!errors) {
+-
+-        if (!errors) {
 -            /* start the first job in the transaction
 -             * note: this might directly enter the job, so we need to do this
 -             * after unlocking the backup_mutex */
-+            // start the first job in the transaction
-             job_txn_start_seq(backup_state.txn);
-         }
+-            job_txn_start_seq(backup_state.txn);
+-        }
 -    } else {
 -        qemu_mutex_unlock(&backup_state.backup_mutex);
-     }
- 
+-    }
+-
      return task.result;
-@@ -1025,6 +1058,7 @@ BackupStatus *qmp_query_backup(Error **errp)
+ }
+ 
+@@ -1025,6 +1087,7 @@ BackupStatus *qmp_query_backup(Error **errp)
      info->transferred = backup_state.stat.transferred;
      info->has_reused = true;
      info->reused = backup_state.stat.reused;
diff --git a/debian/patches/pve/0054-PVE-Migrate-dirty-bitmap-state-via-savevm.patch b/debian/patches/pve/0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
similarity index 100%
rename from debian/patches/pve/0054-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
rename to debian/patches/pve/0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
diff --git a/debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch b/debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
deleted file mode 100644
index d4a03be..0000000
--- a/debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
+++ /dev/null
@@ -1,43 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Wolfgang Bumiller <w.bumiller@proxmox.com>
-Date: Fri, 10 Jul 2020 13:22:35 +0200
-Subject: [PATCH] pbs: fix missing crypt and compress parameters
-
-Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
----
- pve-backup.c | 8 ++++++--
- 1 file changed, 6 insertions(+), 2 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index ba9d0d8a86..e1dcf10a45 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -958,6 +958,8 @@ UuidInfo *qmp_backup(
-         .backup_file = backup_file,
-         .has_password = has_password,
-         .password = password,
-+        .has_keyfile = has_keyfile,
-+        .keyfile = keyfile,
-         .has_key_password = has_key_password,
-         .key_password = key_password,
-         .has_fingerprint = has_fingerprint,
-@@ -968,6 +970,10 @@ UuidInfo *qmp_backup(
-         .backup_time = backup_time,
-         .has_use_dirty_bitmap = has_use_dirty_bitmap,
-         .use_dirty_bitmap = use_dirty_bitmap,
-+        .has_compress = has_compress,
-+        .compress = compress,
-+        .has_encrypt = has_encrypt,
-+        .encrypt = encrypt,
-         .has_format = has_format,
-         .format = format,
-         .has_config_file = has_config_file,
-@@ -976,8 +982,6 @@ UuidInfo *qmp_backup(
-         .firewall_file = firewall_file,
-         .has_devlist = has_devlist,
-         .devlist = devlist,
--        .has_compress = has_compress,
--        .has_encrypt = has_encrypt,
-         .has_speed = has_speed,
-         .speed = speed,
-         .errp = errp,
diff --git a/debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch b/debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
deleted file mode 100644
index 7457eb6..0000000
--- a/debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
+++ /dev/null
@@ -1,76 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Wed, 19 Aug 2020 12:33:17 +0200
-Subject: [PATCH] PVE: handle PBS write callback with big blocks correctly
-
-Under certain conditions QEMU will push more than the given blocksize
-into the callback at once. Handle it like VMA does, by iterating the
-data until all is written.
-
-The block size is stored per backup device to be used in the callback.
-This avoids relying on PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE, in case it is
-made configurable in the future.
-
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
----
- pve-backup.c | 30 ++++++++++++++++++++++--------
- 1 file changed, 22 insertions(+), 8 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index e1dcf10a45..3eba85506a 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -67,6 +67,7 @@ opts_init(pvebackup_init);
- typedef struct PVEBackupDevInfo {
-     BlockDriverState *bs;
-     size_t size;
-+    uint64_t block_size;
-     uint8_t dev_id;
-     bool completed;
-     char targetfile[PATH_MAX];
-@@ -147,17 +148,28 @@ pvebackup_co_dump_pbs_cb(
-         return -1;
-     }
- 
--    pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
--    qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
-+    uint64_t transferred = 0;
-+    uint64_t reused = 0;
-+    while (transferred < size) {
-+        uint64_t left = size - transferred;
-+        uint64_t to_transfer = left < di->block_size ? left : di->block_size;
- 
--    if (pbs_res < 0) {
--        pvebackup_propagate_error(local_err);
--        return pbs_res;
--    } else {
--        size_t reused = (pbs_res == 0) ? size : 0;
--        pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
-+        pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id,
-+            buf ? buf + transferred : NULL, start + transferred, to_transfer, &local_err);
-+        transferred += to_transfer;
-+
-+        if (pbs_res < 0) {
-+            pvebackup_propagate_error(local_err);
-+            qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
-+            return pbs_res;
-+        }
-+
-+        reused += pbs_res == 0 ? to_transfer : 0;
-     }
- 
-+    qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
-+    pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
-+
-     return size;
- }
- 
-@@ -726,6 +738,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-             PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-             l = g_list_next(l);
- 
-+            di->block_size = dump_cb_block_size;
-+
-             const char *devname = bdrv_get_device_name(di->bs);
- 
-             BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
diff --git a/debian/patches/pve/0055-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch b/debian/patches/pve/0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
similarity index 100%
rename from debian/patches/pve/0055-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
rename to debian/patches/pve/0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
diff --git a/debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch b/debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
deleted file mode 100644
index 3bb6b35..0000000
--- a/debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
+++ /dev/null
@@ -1,85 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Thu, 13 Aug 2020 13:50:27 +0200
-Subject: [PATCH] PVE: add zero block handling to PBS dump callback
-
-Both the PBS and VMA dump callbacks assume that a NULL pointer can be
-passed as *pbuf, but that never happens, as backup-dump.c calls this
-function with contents of an iovec.
-
-So first, remove that assumption and add an 'assert' to verify.
-
-Secondly, while the vma-writer already does the buffer_is_zero check
-internally, for PBS we relied on that non-existant behaviour for zero
-chunks, so do the buffer_is_zero check manually and pass NULL to the
-rust lib in case it is true.
-
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
----
- pve-backup.c | 14 +++++++++-----
- 1 file changed, 9 insertions(+), 5 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index 3eba85506a..40c2697b37 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -8,6 +8,7 @@
- #include "block/blockjob.h"
- #include "qapi/qapi-commands-block.h"
- #include "qapi/qmp/qerror.h"
-+#include "qemu/cutils.h"
- 
- /* PVE backup state and related function */
- 
-@@ -136,10 +137,13 @@ pvebackup_co_dump_pbs_cb(
-     PVEBackupDevInfo *di = opaque;
- 
-     assert(backup_state.pbs);
-+    assert(buf);
- 
-     Error *local_err = NULL;
-     int pbs_res = -1;
- 
-+    bool is_zero_block = size == di->block_size && buffer_is_zero(buf, size);
-+
-     qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
- 
-     // avoid deadlock if job is cancelled
-@@ -155,7 +159,8 @@ pvebackup_co_dump_pbs_cb(
-         uint64_t to_transfer = left < di->block_size ? left : di->block_size;
- 
-         pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id,
--            buf ? buf + transferred : NULL, start + transferred, to_transfer, &local_err);
-+            is_zero_block ? NULL : buf + transferred, start + transferred,
-+            to_transfer, &local_err);
-         transferred += to_transfer;
- 
-         if (pbs_res < 0) {
-@@ -168,7 +173,7 @@ pvebackup_co_dump_pbs_cb(
-     }
- 
-     qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
--    pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
-+    pvebackup_add_transfered_bytes(size, is_zero_block ? size : 0, reused);
- 
-     return size;
- }
-@@ -190,6 +195,7 @@ pvebackup_co_dump_vma_cb(
-     int ret = -1;
- 
-     assert(backup_state.vmaw);
-+    assert(buf);
- 
-     uint64_t remaining = size;
- 
-@@ -216,9 +222,7 @@ pvebackup_co_dump_vma_cb(
-         qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
- 
-         ++cluster_num;
--        if (buf) {
--            buf += VMA_CLUSTER_SIZE;
--        }
-+        buf += VMA_CLUSTER_SIZE;
-         if (ret < 0) {
-             Error *local_err = NULL;
-             vma_writer_error_propagate(backup_state.vmaw, &local_err);
diff --git a/debian/patches/pve/0057-PVE-fall-back-to-open-iscsi-initiatorname.patch b/debian/patches/pve/0047-PVE-fall-back-to-open-iscsi-initiatorname.patch
similarity index 100%
rename from debian/patches/pve/0057-PVE-fall-back-to-open-iscsi-initiatorname.patch
rename to debian/patches/pve/0047-PVE-fall-back-to-open-iscsi-initiatorname.patch
diff --git a/debian/patches/pve/0058-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch b/debian/patches/pve/0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
similarity index 100%
rename from debian/patches/pve/0058-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
rename to debian/patches/pve/0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
diff --git a/debian/patches/pve/0059-PBS-add-master-key-support.patch b/debian/patches/pve/0049-PBS-add-master-key-support.patch
similarity index 100%
rename from debian/patches/pve/0059-PBS-add-master-key-support.patch
rename to debian/patches/pve/0049-PBS-add-master-key-support.patch
diff --git a/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch b/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
deleted file mode 100644
index 92dc3e0..0000000
--- a/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
+++ /dev/null
@@ -1,187 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Thu, 22 Oct 2020 17:01:07 +0200
-Subject: [PATCH] PVE: fix and clean up error handling for create_backup_jobs
-
-No more weird bool returns, just the standard "errp" format used
-everywhere else too. With this, if backup_job_create fails, the error
-message is actually returned over QMP and can be shown to the user.
-
-To facilitate correct cleanup on such an error, we call
-create_backup_jobs as a bottom half directly from pvebackup_co_prepare.
-This additionally allows us to actually hold the backup_mutex during
-operation.
-
-Also add a job_cancel_sync before job_unref, since a job must be in
-STATUS_NULL to be deleted by unref, which could trigger an assert
-before.
-
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
----
- pve-backup.c | 79 +++++++++++++++++++++++++++++++++++-----------------
- 1 file changed, 54 insertions(+), 25 deletions(-)
-
-diff --git a/pve-backup.c b/pve-backup.c
-index f3b8ce1f3a..ded90cb822 100644
---- a/pve-backup.c
-+++ b/pve-backup.c
-@@ -50,6 +50,7 @@ static struct PVEBackupState {
-         size_t zero_bytes;
-         GList *bitmap_list;
-         bool finishing;
-+        bool starting;
-     } stat;
-     int64_t speed;
-     VmaWriter *vmaw;
-@@ -277,6 +278,16 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
-     PVEBackupDevInfo *di = opaque;
-     int ret = di->completed_ret;
- 
-+    qemu_mutex_lock(&backup_state.stat.lock);
-+    bool starting = backup_state.stat.starting;
-+    qemu_mutex_unlock(&backup_state.stat.lock);
-+    if (starting) {
-+        /* in 'starting' state, no tasks have been run yet, meaning we can (and
-+         * must) skip all cleanup, as we don't know what has and hasn't been
-+         * initialized yet. */
-+        return;
-+    }
-+
-     qemu_co_mutex_lock(&backup_state.backup_mutex);
- 
-     if (ret < 0) {
-@@ -441,15 +452,15 @@ static int coroutine_fn pvebackup_co_add_config(
- /*
-  * backup_job_create can *not* be run from a coroutine (and requires an
-  * acquired AioContext), so this can't either.
-- * This does imply that this function cannot run with backup_mutex acquired.
-- * That is ok because it is only ever called between setting up the backup_state
-- * struct and starting the jobs, and from within a QMP call. This means that no
-- * other QMP call can interrupt, and no background job is running yet.
-+ * The caller is responsible that backup_mutex is held nonetheless.
-  */
--static bool create_backup_jobs(void) {
-+static void create_backup_jobs_bh(void *opaque) {
- 
-     assert(!qemu_in_coroutine());
- 
-+    CoCtxData *data = (CoCtxData*)opaque;
-+    Error **errp = (Error**)data->data;
-+
-     Error *local_err = NULL;
- 
-     /* create job transaction to synchronize bitmap commit and cancel all
-@@ -483,24 +494,19 @@ static bool create_backup_jobs(void) {
- 
-         aio_context_release(aio_context);
- 
--        if (!job || local_err != NULL) {
--            Error *create_job_err = NULL;
--            error_setg(&create_job_err, "backup_job_create failed: %s",
--                       local_err ? error_get_pretty(local_err) : "null");
-+        di->job = job;
- 
--            pvebackup_propagate_error(create_job_err);
-+        if (!job || local_err) {
-+            error_setg(errp, "backup_job_create failed: %s",
-+                       local_err ? error_get_pretty(local_err) : "null");
-             break;
-         }
- 
--        di->job = job;
--
-         bdrv_unref(di->target);
-         di->target = NULL;
-     }
- 
--    bool errors = pvebackup_error_or_canceled();
--
--    if (errors) {
-+    if (*errp) {
-         l = backup_state.di_list;
-         while (l) {
-             PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
-@@ -512,12 +518,17 @@ static bool create_backup_jobs(void) {
-             }
- 
-             if (di->job) {
-+                AioContext *ctx = di->job->job.aio_context;
-+                aio_context_acquire(ctx);
-+                job_cancel_sync(&di->job->job);
-                 job_unref(&di->job->job);
-+                aio_context_release(ctx);
-             }
-         }
-     }
- 
--    return errors;
-+    /* return */
-+    aio_co_enter(data->ctx, data->co);
- }
- 
- typedef struct QmpBackupTask {
-@@ -883,6 +894,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
-     backup_state.stat.transferred = 0;
-     backup_state.stat.zero_bytes = 0;
-     backup_state.stat.finishing = false;
-+    backup_state.stat.starting = true;
- 
-     qemu_mutex_unlock(&backup_state.stat.lock);
- 
-@@ -898,7 +910,32 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
- 
-     task->result = uuid_info;
- 
-+    /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
-+    * backup_mutex locked. This is fine, a CoMutex can be held across yield
-+    * points, and we'll release it as soon as the BH reschedules us.
-+    */
-+    CoCtxData waker = {
-+        .co = qemu_coroutine_self(),
-+        .ctx = qemu_get_current_aio_context(),
-+        .data = &local_err,
-+    };
-+    aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
-+    qemu_coroutine_yield();
-+
-+    if (local_err) {
-+        error_propagate(task->errp, local_err);
-+        goto err;
-+    }
-+
-     qemu_co_mutex_unlock(&backup_state.backup_mutex);
-+
-+    qemu_mutex_lock(&backup_state.stat.lock);
-+    backup_state.stat.starting = false;
-+    qemu_mutex_unlock(&backup_state.stat.lock);
-+
-+    /* start the first job in the transaction */
-+    job_txn_start_seq(backup_state.txn);
-+
-     return;
- 
- err_mutex:
-@@ -921,6 +958,7 @@ err:
-         g_free(di);
-     }
-     g_list_free(di_list);
-+    backup_state.di_list = NULL;
- 
-     if (devs) {
-         g_strfreev(devs);
-@@ -998,15 +1036,6 @@ UuidInfo *qmp_backup(
- 
-     block_on_coroutine_fn(pvebackup_co_prepare, &task);
- 
--    if (*errp == NULL) {
--        bool errors = create_backup_jobs();
--
--        if (!errors) {
--            // start the first job in the transaction
--            job_txn_start_seq(backup_state.txn);
--        }
--    }
--
-     return task.result;
- }
- 
diff --git a/debian/patches/pve/0056-PVE-fix-aborting-multiple-CREATED-jobs-in-sequential.patch b/debian/patches/pve/0056-PVE-fix-aborting-multiple-CREATED-jobs-in-sequential.patch
deleted file mode 100644
index 0e30326..0000000
--- a/debian/patches/pve/0056-PVE-fix-aborting-multiple-CREATED-jobs-in-sequential.patch
+++ /dev/null
@@ -1,39 +0,0 @@
-From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
-From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Mon, 4 Jan 2021 14:49:14 +0100
-Subject: [PATCH] PVE: fix aborting multiple 'CREATED' jobs in sequential
- transaction
-
-Deadlocks could occur in the AIO_WAIT_WHILE loop in job_finish_sync,
-which would wait for CREATED but not running jobs to complete, even
-though job_enter is a no-op in that scenario. Mark offending jobs as
-ABORTING immediately via job_update_rc if required.
-
-Manifested itself in cancelling or failing backups with more than 2
-drives.
-
-Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
-Tested-by: Mira Limbeck <m.limbeck@proxmox.com>
-Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
----
- job.c | 7 +++++++
- 1 file changed, 7 insertions(+)
-
-diff --git a/job.c b/job.c
-index 8f06e05fbf..05b7797e82 100644
---- a/job.c
-+++ b/job.c
-@@ -1035,6 +1035,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
-         return -EBUSY;
-     }
- 
-+    /* in a sequential transaction jobs with status CREATED can appear at time
-+     * of cancelling, these have not begun work so job_enter won't do anything,
-+     * let's ensure they are marked as ABORTING if required */
-+    if (job->status == JOB_STATUS_CREATED && job->txn->sequential) {
-+        job_update_rc(job);
-+    }
-+
-     AIO_WAIT_WHILE(job->aio_context,
-                    (job_enter(job), !job_is_completed(job)));
- 
diff --git a/debian/patches/series b/debian/patches/series
index 6539603..61ecf5d 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -35,33 +35,23 @@ pve/0026-PVE-Backup-add-vma-backup-format-code.patch
 pve/0027-PVE-Backup-add-backup-dump-block-driver.patch
 pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
 pve/0029-PVE-Backup-pbs-restore-new-command-to-restore-from-p.patch
-pve/0030-PVE-Backup-avoid-coroutines-to-fix-AIO-freeze-cleanu.patch
-pve/0031-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
-pve/0032-drive-mirror-add-support-for-conditional-and-always-.patch
-pve/0033-mirror-add-check-for-bitmap-mode-without-bitmap.patch
-pve/0034-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
-pve/0035-iotests-add-test-for-bitmap-mirror.patch
-pve/0036-mirror-move-some-checks-to-qmp.patch
-pve/0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
-pve/0038-PVE-backup-rename-incremental-to-use-dirty-bitmap.patch
-pve/0039-PVE-fixup-pbs-restore-API.patch
-pve/0040-PVE-always-set-dirty-counter-for-non-incremental-bac.patch
-pve/0041-PVE-use-proxmox_backup_check_incremental.patch
-pve/0042-PVE-fixup-pbs-backup-add-compress-and-encrypt-option.patch
-pve/0043-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
-pve/0044-PVE-add-query_proxmox_support-QMP-command.patch
-pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
-pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
-pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
-pve/0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch
-pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
-pve/0050-PVE-Add-sequential-job-transaction-support.patch
-pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
-pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
-pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
-pve/0054-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
-pve/0055-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
-pve/0056-PVE-fix-aborting-multiple-CREATED-jobs-in-sequential.patch
-pve/0057-PVE-fall-back-to-open-iscsi-initiatorname.patch
-pve/0058-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
-pve/0059-PBS-add-master-key-support.patch
+pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
+pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch
+pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch
+pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
+pve/0034-iotests-add-test-for-bitmap-mirror.patch
+pve/0035-mirror-move-some-checks-to-qmp.patch
+pve/0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
+pve/0037-PVE-various-PBS-fixes.patch
+pve/0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
+pve/0039-PVE-add-query_proxmox_support-QMP-command.patch
+pve/0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch
+pve/0041-PVE-redirect-stderr-to-journal-when-daemonized.patch
+pve/0042-PVE-Add-sequential-job-transaction-support.patch
+pve/0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
+pve/0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
+pve/0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
+pve/0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
+pve/0047-PVE-fall-back-to-open-iscsi-initiatorname.patch
+pve/0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
+pve/0049-PBS-add-master-key-support.patch
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 pve-qemu 02/11] move bitmap-mirror patches to seperate folder
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 01/11] clean up pve/ patches by merging Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03 16:32   ` [pve-devel] applied: " Thomas Lamprecht
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 03/11] add alloc-track block driver patch Stefan Reiter
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

...instead of having them in the middle of the backup related patches.
These might (hopefully) become upstream at some point as well.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Unrelated to rest of series.

 ...-support-for-sync-bitmap-mode-never.patch} | 30 +++++++-------
 ...support-for-conditional-and-always-.patch} |  0
 ...heck-for-bitmap-mode-without-bitmap.patch} |  4 +-
 ...to-bdrv_dirty_bitmap_merge_internal.patch} |  0
 ...-iotests-add-test-for-bitmap-mirror.patch} |  0
 ...0006-mirror-move-some-checks-to-qmp.patch} |  4 +-
 ...le-posix-make-locking-optiono-on-cre.patch |  4 +-
 ...-Backup-add-backup-dump-block-driver.patch |  2 +-
 ...ckup-proxmox-backup-patches-for-qemu.patch |  6 +--
 ...rty-bitmap-tracking-for-incremental.patch} |  0
 ...patch => 0031-PVE-various-PBS-fixes.patch} |  0
 ...-driver-to-map-backup-archives-into.patch} |  0
 ...d-query_proxmox_support-QMP-command.patch} |  0
 ...-add-query-pbs-bitmap-info-QMP-call.patch} |  0
 ...t-stderr-to-journal-when-daemonized.patch} |  0
 ...-sequential-job-transaction-support.patch} |  0
 ...transaction-to-synchronize-job-stat.patch} |  0
 ...block-on-finishing-and-cleanup-crea.patch} |  0
 ...grate-dirty-bitmap-state-via-savevm.patch} |  0
 ...irty-bitmap-migrate-other-bitmaps-e.patch} |  0
 ...ll-back-to-open-iscsi-initiatorname.patch} |  0
 ...outine-QMP-for-backup-cancel_backup.patch} |  0
 ... => 0043-PBS-add-master-key-support.patch} |  0
 debian/patches/series                         | 40 +++++++++----------
 24 files changed, 45 insertions(+), 45 deletions(-)
 rename debian/patches/{pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch => bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch} (96%)
 rename debian/patches/{pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch => bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch} (100%)
 rename debian/patches/{pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch => bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch} (90%)
 rename debian/patches/{pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch => bitmap-mirror/0004-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch} (100%)
 rename debian/patches/{pve/0034-iotests-add-test-for-bitmap-mirror.patch => bitmap-mirror/0005-iotests-add-test-for-bitmap-mirror.patch} (100%)
 rename debian/patches/{pve/0035-mirror-move-some-checks-to-qmp.patch => bitmap-mirror/0006-mirror-move-some-checks-to-qmp.patch} (99%)
 rename debian/patches/pve/{0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch => 0030-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch} (100%)
 rename debian/patches/pve/{0037-PVE-various-PBS-fixes.patch => 0031-PVE-various-PBS-fixes.patch} (100%)
 rename debian/patches/pve/{0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch => 0032-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch} (100%)
 rename debian/patches/pve/{0039-PVE-add-query_proxmox_support-QMP-command.patch => 0033-PVE-add-query_proxmox_support-QMP-command.patch} (100%)
 rename debian/patches/pve/{0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch => 0034-PVE-add-query-pbs-bitmap-info-QMP-call.patch} (100%)
 rename debian/patches/pve/{0041-PVE-redirect-stderr-to-journal-when-daemonized.patch => 0035-PVE-redirect-stderr-to-journal-when-daemonized.patch} (100%)
 rename debian/patches/pve/{0042-PVE-Add-sequential-job-transaction-support.patch => 0036-PVE-Add-sequential-job-transaction-support.patch} (100%)
 rename debian/patches/pve/{0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch => 0037-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch} (100%)
 rename debian/patches/pve/{0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch => 0038-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch} (100%)
 rename debian/patches/pve/{0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch => 0039-PVE-Migrate-dirty-bitmap-state-via-savevm.patch} (100%)
 rename debian/patches/pve/{0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch => 0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch} (100%)
 rename debian/patches/pve/{0047-PVE-fall-back-to-open-iscsi-initiatorname.patch => 0041-PVE-fall-back-to-open-iscsi-initiatorname.patch} (100%)
 rename debian/patches/pve/{0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch => 0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch} (100%)
 rename debian/patches/pve/{0049-PBS-add-master-key-support.patch => 0043-PBS-add-master-key-support.patch} (100%)

diff --git a/debian/patches/pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch b/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
similarity index 96%
rename from debian/patches/pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
rename to debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
index 3e324a3..60d0105 100644
--- a/debian/patches/pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
+++ b/debian/patches/bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
@@ -249,10 +249,10 @@ index 8e1ad6eceb..97843992c2 100644
                       &local_err);
      if (local_err) {
 diff --git a/blockdev.c b/blockdev.c
-index bae80b9177..c79e081f57 100644
+index fe6fb5dc1d..394920613d 100644
 --- a/blockdev.c
 +++ b/blockdev.c
-@@ -2931,6 +2931,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+@@ -2930,6 +2930,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
                                     BlockDriverState *target,
                                     bool has_replaces, const char *replaces,
                                     enum MirrorSyncMode sync,
@@ -263,7 +263,7 @@ index bae80b9177..c79e081f57 100644
                                     BlockMirrorBackingMode backing_mode,
                                     bool zero_target,
                                     bool has_speed, int64_t speed,
-@@ -2950,6 +2954,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+@@ -2949,6 +2953,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
  {
      BlockDriverState *unfiltered_bs;
      int job_flags = JOB_DEFAULT;
@@ -271,7 +271,7 @@ index bae80b9177..c79e081f57 100644
  
      if (!has_speed) {
          speed = 0;
-@@ -3004,6 +3009,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+@@ -3003,6 +3008,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
          sync = MIRROR_SYNC_MODE_FULL;
      }
  
@@ -301,7 +301,7 @@ index bae80b9177..c79e081f57 100644
      if (!has_replaces) {
          /* We want to mirror from @bs, but keep implicit filters on top */
          unfiltered_bs = bdrv_skip_implicit_filters(bs);
-@@ -3050,8 +3078,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+@@ -3049,8 +3077,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
       * and will allow to check whether the node still exist at mirror completion
       */
      mirror_start(job_id, bs, target,
@@ -312,7 +312,7 @@ index bae80b9177..c79e081f57 100644
                   on_source_error, on_target_error, unmap, filter_node_name,
                   copy_mode, errp);
  }
-@@ -3196,6 +3224,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
+@@ -3195,6 +3223,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
  
      blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
                             arg->has_replaces, arg->replaces, arg->sync,
@@ -321,7 +321,7 @@ index bae80b9177..c79e081f57 100644
                             backing_mode, zero_target,
                             arg->has_speed, arg->speed,
                             arg->has_granularity, arg->granularity,
-@@ -3217,6 +3247,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
+@@ -3216,6 +3246,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
                           const char *device, const char *target,
                           bool has_replaces, const char *replaces,
                           MirrorSyncMode sync,
@@ -330,7 +330,7 @@ index bae80b9177..c79e081f57 100644
                           bool has_speed, int64_t speed,
                           bool has_granularity, uint32_t granularity,
                           bool has_buf_size, int64_t buf_size,
-@@ -3266,7 +3298,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
+@@ -3265,7 +3297,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
      }
  
      blockdev_mirror_common(has_job_id ? job_id : NULL, bs, target_bs,
@@ -341,10 +341,10 @@ index bae80b9177..c79e081f57 100644
                             has_granularity, granularity,
                             has_buf_size, buf_size,
 diff --git a/include/block/block_int.h b/include/block/block_int.h
-index 9fa282ff54..1bd4b64522 100644
+index 95d9333be1..6f8eda629a 100644
 --- a/include/block/block_int.h
 +++ b/include/block/block_int.h
-@@ -1260,7 +1260,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
+@@ -1230,7 +1230,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                    BlockDriverState *target, const char *replaces,
                    int creation_flags, int64_t speed,
                    uint32_t granularity, int64_t buf_size,
@@ -356,10 +356,10 @@ index 9fa282ff54..1bd4b64522 100644
                    BlockdevOnError on_source_error,
                    BlockdevOnError on_target_error,
 diff --git a/qapi/block-core.json b/qapi/block-core.json
-index be67dc3376..9054db608c 100644
+index 04ad80bc1e..9db3120716 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
-@@ -2080,10 +2080,19 @@
+@@ -1971,10 +1971,19 @@
  #        (all the disk, only the sectors allocated in the topmost image, or
  #        only new I/O).
  #
@@ -380,7 +380,7 @@ index be67dc3376..9054db608c 100644
  #
  # @buf-size: maximum amount of data in flight from source to
  #            target (since 1.4).
-@@ -2121,7 +2130,9 @@
+@@ -2012,7 +2021,9 @@
  { 'struct': 'DriveMirror',
    'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
              '*format': 'str', '*node-name': 'str', '*replaces': 'str',
@@ -391,7 +391,7 @@ index be67dc3376..9054db608c 100644
              '*speed': 'int', '*granularity': 'uint32',
              '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
              '*on-target-error': 'BlockdevOnError',
-@@ -2389,10 +2400,19 @@
+@@ -2280,10 +2291,19 @@
  #        (all the disk, only the sectors allocated in the topmost image, or
  #        only new I/O).
  #
@@ -412,7 +412,7 @@ index be67dc3376..9054db608c 100644
  #
  # @buf-size: maximum amount of data in flight from source to
  #            target
-@@ -2441,7 +2461,8 @@
+@@ -2332,7 +2352,8 @@
  { 'command': 'blockdev-mirror',
    'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
              '*replaces': 'str',
diff --git a/debian/patches/pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch b/debian/patches/bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
similarity index 100%
rename from debian/patches/pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch
rename to debian/patches/bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
diff --git a/debian/patches/pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch b/debian/patches/bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
similarity index 90%
rename from debian/patches/pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch
rename to debian/patches/bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
index 97077f3..84fd5f9 100644
--- a/debian/patches/pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch
+++ b/debian/patches/bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
@@ -15,10 +15,10 @@ Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
  1 file changed, 3 insertions(+)
 
 diff --git a/blockdev.c b/blockdev.c
-index c79e081f57..827f004069 100644
+index 394920613d..4f8bd38b58 100644
 --- a/blockdev.c
 +++ b/blockdev.c
-@@ -3030,6 +3030,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+@@ -3029,6 +3029,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
          if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_ALLOW_RO, errp)) {
              return;
          }
diff --git a/debian/patches/pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch b/debian/patches/bitmap-mirror/0004-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
similarity index 100%
rename from debian/patches/pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
rename to debian/patches/bitmap-mirror/0004-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
diff --git a/debian/patches/pve/0034-iotests-add-test-for-bitmap-mirror.patch b/debian/patches/bitmap-mirror/0005-iotests-add-test-for-bitmap-mirror.patch
similarity index 100%
rename from debian/patches/pve/0034-iotests-add-test-for-bitmap-mirror.patch
rename to debian/patches/bitmap-mirror/0005-iotests-add-test-for-bitmap-mirror.patch
diff --git a/debian/patches/pve/0035-mirror-move-some-checks-to-qmp.patch b/debian/patches/bitmap-mirror/0006-mirror-move-some-checks-to-qmp.patch
similarity index 99%
rename from debian/patches/pve/0035-mirror-move-some-checks-to-qmp.patch
rename to debian/patches/bitmap-mirror/0006-mirror-move-some-checks-to-qmp.patch
index bfbb49f..16551ef 100644
--- a/debian/patches/pve/0035-mirror-move-some-checks-to-qmp.patch
+++ b/debian/patches/bitmap-mirror/0006-mirror-move-some-checks-to-qmp.patch
@@ -59,10 +59,10 @@ index e6140cf018..3a08239a78 100644
  
          if (bitmap_mode != BITMAP_SYNC_MODE_NEVER) {
 diff --git a/blockdev.c b/blockdev.c
-index 827f004069..e2f826ca62 100644
+index 4f8bd38b58..a40c6fd0f6 100644
 --- a/blockdev.c
 +++ b/blockdev.c
-@@ -3009,7 +3009,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+@@ -3008,7 +3008,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
          sync = MIRROR_SYNC_MODE_FULL;
      }
  
diff --git a/debian/patches/pve/0022-PVE-Up-Config-file-posix-make-locking-optiono-on-cre.patch b/debian/patches/pve/0022-PVE-Up-Config-file-posix-make-locking-optiono-on-cre.patch
index 52c0046..8b3df50 100644
--- a/debian/patches/pve/0022-PVE-Up-Config-file-posix-make-locking-optiono-on-cre.patch
+++ b/debian/patches/pve/0022-PVE-Up-Config-file-posix-make-locking-optiono-on-cre.patch
@@ -128,10 +128,10 @@ index bda3e606dc..037839622e 100644
                                     false, errp);
          if (ret < 0) {
 diff --git a/qapi/block-core.json b/qapi/block-core.json
-index 04ad80bc1e..7957b9867d 100644
+index 9db3120716..d285622589 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
-@@ -4203,7 +4203,8 @@
+@@ -4224,7 +4224,8 @@
              'size':                 'size',
              '*preallocation':       'PreallocMode',
              '*nocow':               'bool',
diff --git a/debian/patches/pve/0027-PVE-Backup-add-backup-dump-block-driver.patch b/debian/patches/pve/0027-PVE-Backup-add-backup-dump-block-driver.patch
index 8692f8e..a81341f 100644
--- a/debian/patches/pve/0027-PVE-Backup-add-backup-dump-block-driver.patch
+++ b/debian/patches/pve/0027-PVE-Backup-add-backup-dump-block-driver.patch
@@ -244,7 +244,7 @@ index feffbc8623..2507af1168 100644
    'blkdebug.c',
    'blklogwrites.c',
 diff --git a/include/block/block_int.h b/include/block/block_int.h
-index 95d9333be1..2645e53282 100644
+index 6f8eda629a..5455102da8 100644
 --- a/include/block/block_int.h
 +++ b/include/block/block_int.h
 @@ -63,6 +63,36 @@
diff --git a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
index 37bb98a..e9ffa84 100644
--- a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
+++ b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
@@ -85,7 +85,7 @@ index d15a2be827..9ba7c774a2 100644
 +    hmp_handle_error(mon, error);
 +}
 diff --git a/blockdev.c b/blockdev.c
-index fe6fb5dc1d..bae80b9177 100644
+index a40c6fd0f6..e2f826ca62 100644
 --- a/blockdev.c
 +++ b/blockdev.c
 @@ -36,6 +36,7 @@
@@ -161,7 +161,7 @@ index d294c234a5..0c6b944850 100644
  
      {
 diff --git a/include/block/block_int.h b/include/block/block_int.h
-index 2645e53282..9fa282ff54 100644
+index 5455102da8..1bd4b64522 100644
 --- a/include/block/block_int.h
 +++ b/include/block/block_int.h
 @@ -65,7 +65,7 @@
@@ -1472,7 +1472,7 @@ index 0000000000..d40f3f2fd6
 +    return info;
 +}
 diff --git a/qapi/block-core.json b/qapi/block-core.json
-index 7957b9867d..be67dc3376 100644
+index d285622589..9054db608c 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
 @@ -745,6 +745,115 @@
diff --git a/debian/patches/pve/0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch b/debian/patches/pve/0030-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
similarity index 100%
rename from debian/patches/pve/0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
rename to debian/patches/pve/0030-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
diff --git a/debian/patches/pve/0037-PVE-various-PBS-fixes.patch b/debian/patches/pve/0031-PVE-various-PBS-fixes.patch
similarity index 100%
rename from debian/patches/pve/0037-PVE-various-PBS-fixes.patch
rename to debian/patches/pve/0031-PVE-various-PBS-fixes.patch
diff --git a/debian/patches/pve/0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch b/debian/patches/pve/0032-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
similarity index 100%
rename from debian/patches/pve/0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
rename to debian/patches/pve/0032-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
diff --git a/debian/patches/pve/0039-PVE-add-query_proxmox_support-QMP-command.patch b/debian/patches/pve/0033-PVE-add-query_proxmox_support-QMP-command.patch
similarity index 100%
rename from debian/patches/pve/0039-PVE-add-query_proxmox_support-QMP-command.patch
rename to debian/patches/pve/0033-PVE-add-query_proxmox_support-QMP-command.patch
diff --git a/debian/patches/pve/0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch b/debian/patches/pve/0034-PVE-add-query-pbs-bitmap-info-QMP-call.patch
similarity index 100%
rename from debian/patches/pve/0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch
rename to debian/patches/pve/0034-PVE-add-query-pbs-bitmap-info-QMP-call.patch
diff --git a/debian/patches/pve/0041-PVE-redirect-stderr-to-journal-when-daemonized.patch b/debian/patches/pve/0035-PVE-redirect-stderr-to-journal-when-daemonized.patch
similarity index 100%
rename from debian/patches/pve/0041-PVE-redirect-stderr-to-journal-when-daemonized.patch
rename to debian/patches/pve/0035-PVE-redirect-stderr-to-journal-when-daemonized.patch
diff --git a/debian/patches/pve/0042-PVE-Add-sequential-job-transaction-support.patch b/debian/patches/pve/0036-PVE-Add-sequential-job-transaction-support.patch
similarity index 100%
rename from debian/patches/pve/0042-PVE-Add-sequential-job-transaction-support.patch
rename to debian/patches/pve/0036-PVE-Add-sequential-job-transaction-support.patch
diff --git a/debian/patches/pve/0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch b/debian/patches/pve/0037-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
similarity index 100%
rename from debian/patches/pve/0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
rename to debian/patches/pve/0037-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
diff --git a/debian/patches/pve/0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch b/debian/patches/pve/0038-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
similarity index 100%
rename from debian/patches/pve/0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
rename to debian/patches/pve/0038-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
diff --git a/debian/patches/pve/0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch b/debian/patches/pve/0039-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
similarity index 100%
rename from debian/patches/pve/0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
rename to debian/patches/pve/0039-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
diff --git a/debian/patches/pve/0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch b/debian/patches/pve/0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
similarity index 100%
rename from debian/patches/pve/0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
rename to debian/patches/pve/0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
diff --git a/debian/patches/pve/0047-PVE-fall-back-to-open-iscsi-initiatorname.patch b/debian/patches/pve/0041-PVE-fall-back-to-open-iscsi-initiatorname.patch
similarity index 100%
rename from debian/patches/pve/0047-PVE-fall-back-to-open-iscsi-initiatorname.patch
rename to debian/patches/pve/0041-PVE-fall-back-to-open-iscsi-initiatorname.patch
diff --git a/debian/patches/pve/0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch b/debian/patches/pve/0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
similarity index 100%
rename from debian/patches/pve/0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
rename to debian/patches/pve/0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
diff --git a/debian/patches/pve/0049-PBS-add-master-key-support.patch b/debian/patches/pve/0043-PBS-add-master-key-support.patch
similarity index 100%
rename from debian/patches/pve/0049-PBS-add-master-key-support.patch
rename to debian/patches/pve/0043-PBS-add-master-key-support.patch
diff --git a/debian/patches/series b/debian/patches/series
index 61ecf5d..1b30d97 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -6,6 +6,12 @@ extra/0005-virtiofsd-optionally-return-inode-pointer-from-lo_do.patch
 extra/0006-virtiofsd-prevent-opening-of-special-files-CVE-2020-.patch
 extra/0007-virtiofsd-Add-_llseek-to-the-seccomp-whitelist.patch
 extra/0008-virtiofsd-Add-restart_syscall-to-the-seccomp-whiteli.patch
+bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
+bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
+bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
+bitmap-mirror/0004-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
+bitmap-mirror/0005-iotests-add-test-for-bitmap-mirror.patch
+bitmap-mirror/0006-mirror-move-some-checks-to-qmp.patch
 pve/0001-PVE-Config-block-file-change-locking-default-to-off.patch
 pve/0002-PVE-Config-Adjust-network-script-path-to-etc-kvm.patch
 pve/0003-PVE-Config-set-the-CPU-model-to-kvm64-32-instead-of-.patch
@@ -35,23 +41,17 @@ pve/0026-PVE-Backup-add-vma-backup-format-code.patch
 pve/0027-PVE-Backup-add-backup-dump-block-driver.patch
 pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
 pve/0029-PVE-Backup-pbs-restore-new-command-to-restore-from-p.patch
-pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
-pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch
-pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch
-pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch
-pve/0034-iotests-add-test-for-bitmap-mirror.patch
-pve/0035-mirror-move-some-checks-to-qmp.patch
-pve/0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
-pve/0037-PVE-various-PBS-fixes.patch
-pve/0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
-pve/0039-PVE-add-query_proxmox_support-QMP-command.patch
-pve/0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch
-pve/0041-PVE-redirect-stderr-to-journal-when-daemonized.patch
-pve/0042-PVE-Add-sequential-job-transaction-support.patch
-pve/0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
-pve/0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
-pve/0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
-pve/0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
-pve/0047-PVE-fall-back-to-open-iscsi-initiatorname.patch
-pve/0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
-pve/0049-PBS-add-master-key-support.patch
+pve/0030-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch
+pve/0031-PVE-various-PBS-fixes.patch
+pve/0032-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch
+pve/0033-PVE-add-query_proxmox_support-QMP-command.patch
+pve/0034-PVE-add-query-pbs-bitmap-info-QMP-call.patch
+pve/0035-PVE-redirect-stderr-to-journal-when-daemonized.patch
+pve/0036-PVE-Add-sequential-job-transaction-support.patch
+pve/0037-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
+pve/0038-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch
+pve/0039-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
+pve/0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
+pve/0041-PVE-fall-back-to-open-iscsi-initiatorname.patch
+pve/0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
+pve/0043-PBS-add-master-key-support.patch
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 pve-qemu 03/11] add alloc-track block driver patch
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 01/11] clean up pve/ patches by merging Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 02/11] move bitmap-mirror patches to seperate folder Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-15 14:14   ` Wolfgang Bumiller
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant Stefan Reiter
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

See added patches for more info, overview:
0044: slightly increase PBS performance by reducing allocations
0045: slightly increase block-stream performance for Ceph
0046: don't crash with block-stream on RBD
0047: add alloc-track driver for live restore

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

works best with updated proxmox-backup-qemu, but no hard dependency

v2:
* now sent as pve-qemu combined patch
* 0060 is unchanged
* 0061/0062 are new, they fix restore via user-space Ceph
* 0063 (alloc-track) is updated with Wolfgangs feedback (track_co_block_status)
  and updated for 5.2 compatibility (mainly track_drop)

 ...st-path-reads-without-allocation-if-.patch |  52 +++
 ...PVE-block-stream-increase-chunk-size.patch |  23 ++
 ...accept-NULL-qiov-in-bdrv_pad_request.patch |  42 ++
 .../0047-block-add-alloc-track-driver.patch   | 380 ++++++++++++++++++
 debian/patches/series                         |   4 +
 5 files changed, 501 insertions(+)
 create mode 100644 debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
 create mode 100644 debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
 create mode 100644 debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
 create mode 100644 debian/patches/pve/0047-block-add-alloc-track-driver.patch

diff --git a/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch b/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
new file mode 100644
index 0000000..a85ebc2
--- /dev/null
+++ b/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
@@ -0,0 +1,52 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Wed, 9 Dec 2020 11:46:57 +0100
+Subject: [PATCH] PVE: block/pbs: fast-path reads without allocation if
+ possible
+
+...and switch over to g_malloc/g_free while at it to align with other
+QEMU code.
+
+Tracing shows the fast-path is taken almost all the time, though not
+100% so the slow one is still necessary.
+
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ block/pbs.c | 17 ++++++++++++++---
+ 1 file changed, 14 insertions(+), 3 deletions(-)
+
+diff --git a/block/pbs.c b/block/pbs.c
+index 1481a2bfd1..fbf0d8d845 100644
+--- a/block/pbs.c
++++ b/block/pbs.c
+@@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
+     BDRVPBSState *s = bs->opaque;
+     int ret;
+     char *pbs_error = NULL;
+-    uint8_t *buf = malloc(bytes);
++    uint8_t *buf;
++    bool inline_buf = true;
++
++    /* for single-buffer IO vectors we can fast-path the write directly to it */
++    if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
++        buf = qiov->iov->iov_base;
++    } else {
++        inline_buf = false;
++        buf = g_malloc(bytes);
++    }
+ 
+     ReadCallbackData rcb = {
+         .co = qemu_coroutine_self(),
+@@ -218,8 +227,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
+         return -EIO;
+     }
+ 
+-    qemu_iovec_from_buf(qiov, 0, buf, bytes);
+-    free(buf);
++    if (!inline_buf) {
++        qemu_iovec_from_buf(qiov, 0, buf, bytes);
++        g_free(buf);
++    }
+ 
+     return ret;
+ }
diff --git a/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch b/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
new file mode 100644
index 0000000..601f8c7
--- /dev/null
+++ b/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
@@ -0,0 +1,23 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Tue, 2 Mar 2021 16:34:28 +0100
+Subject: [PATCH] PVE: block/stream: increase chunk size
+
+Ceph favors bigger chunks, so increase to 4M.
+---
+ block/stream.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/block/stream.c b/block/stream.c
+index 236384f2f7..a5371420e3 100644
+--- a/block/stream.c
++++ b/block/stream.c
+@@ -26,7 +26,7 @@ enum {
+      * large enough to process multiple clusters in a single call, so
+      * that populating contiguous regions of the image is efficient.
+      */
+-    STREAM_CHUNK = 512 * 1024, /* in bytes */
++    STREAM_CHUNK = 4 * 1024 * 1024, /* in bytes */
+ };
+ 
+ typedef struct StreamBlockJob {
diff --git a/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch b/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
new file mode 100644
index 0000000..e40fa2e
--- /dev/null
+++ b/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
@@ -0,0 +1,42 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Tue, 2 Mar 2021 16:11:54 +0100
+Subject: [PATCH] block/io: accept NULL qiov in bdrv_pad_request
+
+Some operations, e.g. block-stream, perform reads while discarding the
+results (only copy-on-read matters). In this case they will pass NULL as
+the target QEMUIOVector, which will however trip bdrv_pad_request, since
+it wants to extend its passed vector.
+
+Simply check for NULL and do nothing, there's no reason to pad the
+target if it will be discarded anyway.
+---
+ block/io.c | 13 ++++++++-----
+ 1 file changed, 8 insertions(+), 5 deletions(-)
+
+diff --git a/block/io.c b/block/io.c
+index ec5e152bb7..08dee005ec 100644
+--- a/block/io.c
++++ b/block/io.c
+@@ -1613,13 +1613,16 @@ static bool bdrv_pad_request(BlockDriverState *bs,
+         return false;
+     }
+ 
+-    qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
+-                             *qiov, *qiov_offset, *bytes,
+-                             pad->buf + pad->buf_len - pad->tail, pad->tail);
++    if (*qiov) {
++        qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
++                                *qiov, *qiov_offset, *bytes,
++                                pad->buf + pad->buf_len - pad->tail, pad->tail);
++        *qiov = &pad->local_qiov;
++        *qiov_offset = 0;
++    }
++
+     *bytes += pad->head + pad->tail;
+     *offset -= pad->head;
+-    *qiov = &pad->local_qiov;
+-    *qiov_offset = 0;
+ 
+     return true;
+ }
diff --git a/debian/patches/pve/0047-block-add-alloc-track-driver.patch b/debian/patches/pve/0047-block-add-alloc-track-driver.patch
new file mode 100644
index 0000000..6aaa186
--- /dev/null
+++ b/debian/patches/pve/0047-block-add-alloc-track-driver.patch
@@ -0,0 +1,380 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Mon, 7 Dec 2020 15:21:03 +0100
+Subject: [PATCH] block: add alloc-track driver
+
+Add a new filter node 'alloc-track', which seperates reads and writes to
+different children, thus allowing to put a backing image behind any
+blockdev (regardless of driver support). Since we can't detect any
+pre-allocated blocks, we can only track new writes, hence the write
+target ('file') for this node must always be empty.
+
+Intended use case is for live restoring, i.e. add a backup image as a
+block device into a VM, then put an alloc-track on the restore target
+and set the backup as backing. With this, one can use a regular
+'block-stream' to restore the image, while the VM can already run in the
+background. Copy-on-read will help make progress as the VM reads as
+well.
+
+This only worked if the target supports backing images, so up until now
+only for qcow2, with alloc-track any driver for the target can be used.
+
+If 'auto-remove' is set, alloc-track will automatically detach itself
+once the backing image is removed. It will be replaced by 'file'.
+
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ block/alloc-track.c | 331 ++++++++++++++++++++++++++++++++++++++++++++
+ block/meson.build   |   1 +
+ 2 files changed, 332 insertions(+)
+ create mode 100644 block/alloc-track.c
+
+diff --git a/block/alloc-track.c b/block/alloc-track.c
+new file mode 100644
+index 0000000000..cc06cfca13
+--- /dev/null
++++ b/block/alloc-track.c
+@@ -0,0 +1,331 @@
++/*
++ * Node to allow backing images to be applied to any node. Assumes a blank
++ * image to begin with, only new writes are tracked as allocated, thus this
++ * must never be put on a node that already contains data.
++ *
++ * Copyright (c) 2020 Proxmox Server Solutions GmbH
++ * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
++ * See the COPYING file in the top-level directory.
++ */
++
++#include "qemu/osdep.h"
++#include "qapi/error.h"
++#include "block/block_int.h"
++#include "qapi/qmp/qdict.h"
++#include "qapi/qmp/qstring.h"
++#include "qemu/cutils.h"
++#include "qemu/option.h"
++#include "qemu/module.h"
++#include "sysemu/block-backend.h"
++
++#define TRACK_OPT_AUTO_REMOVE "auto-remove"
++
++typedef struct {
++    BdrvDirtyBitmap *bitmap;
++    bool dropping;
++    bool auto_remove;
++} BDRVAllocTrackState;
++
++static QemuOptsList runtime_opts = {
++    .name = "alloc-track",
++    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
++    .desc = {
++        {
++            .name = TRACK_OPT_AUTO_REMOVE,
++            .type = QEMU_OPT_BOOL,
++            .help = "automatically replace this node with 'file' when 'backing'"
++                    "is detached",
++        },
++        { /* end of list */ }
++    },
++};
++
++static void track_refresh_limits(BlockDriverState *bs, Error **errp)
++{
++    BlockDriverInfo bdi;
++
++    if (!bs->file) {
++        return;
++    }
++
++    /* always use alignment from underlying write device so RMW cycle for
++     * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
++     * cluster allocation in 'file') */
++    bdrv_get_info(bs->file->bs, &bdi);
++    bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
++                                   MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
++}
++
++static int track_open(BlockDriverState *bs, QDict *options, int flags,
++                      Error **errp)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    QemuOpts *opts;
++    Error *local_err = NULL;
++    int ret = 0;
++
++    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
++    qemu_opts_absorb_qdict(opts, options, &local_err);
++    if (local_err) {
++        error_propagate(errp, local_err);
++        ret = -EINVAL;
++        goto fail;
++    }
++
++    s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
++
++    /* open the target (write) node, backing will be attached by block layer */
++    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
++                               BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
++                               &local_err);
++    if (local_err) {
++        ret = -EINVAL;
++        error_propagate(errp, local_err);
++        goto fail;
++    }
++
++    track_refresh_limits(bs, errp);
++    uint64_t gran = bs->bl.request_alignment;
++    s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
++    if (local_err) {
++        ret = -EIO;
++        error_propagate(errp, local_err);
++        goto fail;
++    }
++
++    s->dropping = false;
++
++fail:
++    if (ret < 0) {
++        bdrv_unref_child(bs, bs->file);
++        if (s->bitmap) {
++            bdrv_release_dirty_bitmap(s->bitmap);
++        }
++    }
++    qemu_opts_del(opts);
++    return ret;
++}
++
++static void track_close(BlockDriverState *bs)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    if (s->bitmap) {
++        bdrv_release_dirty_bitmap(s->bitmap);
++    }
++}
++
++static int64_t track_getlength(BlockDriverState *bs)
++{
++    return bdrv_getlength(bs->file->bs);
++}
++
++static int coroutine_fn track_co_preadv(BlockDriverState *bs,
++    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    QEMUIOVector local_qiov;
++    int ret;
++
++    /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
++    uint64_t cur_offset, local_offset;
++    int64_t local_bytes;
++    bool alloc;
++
++    /* a read request can span multiple granularity-sized chunks, and can thus
++     * contain blocks with different allocation status - we could just iterate
++     * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
++     * to find the next flip and consider everything up to that in one go */
++    for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
++        local_offset = offset + cur_offset;
++        alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
++        if (alloc) {
++            local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
++                                                      bytes - cur_offset);
++        } else {
++            local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
++                                                       bytes - cur_offset);
++        }
++
++        /* _bitmap_next_X return is -1 if no end found within limit, otherwise
++         * offset of next flip (to start of image) */
++        local_bytes = local_bytes < 0 ?
++            bytes - cur_offset :
++            local_bytes - local_offset;
++
++        qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
++
++        if (alloc) {
++            ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
++                                 &local_qiov, flags);
++        } else if (bs->backing) {
++            ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
++                                 &local_qiov, flags);
++        } else {
++            ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
++        }
++
++        if (ret != 0) {
++            break;
++        }
++    }
++
++    return ret;
++}
++
++static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
++    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
++{
++    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
++}
++
++static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
++    int64_t offset, int count, BdrvRequestFlags flags)
++{
++    return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
++}
++
++static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
++    int64_t offset, int count)
++{
++    return bdrv_co_pdiscard(bs->file, offset, count);
++}
++
++static coroutine_fn int track_co_flush(BlockDriverState *bs)
++{
++    return bdrv_co_flush(bs->file->bs);
++}
++
++static int coroutine_fn track_co_block_status(BlockDriverState *bs,
++                                              bool want_zero,
++                                              int64_t offset,
++                                              int64_t bytes,
++                                              int64_t *pnum,
++                                              int64_t *map,
++                                              BlockDriverState **file)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++
++    bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
++    int64_t next_flipped;
++    if (alloc) {
++        next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
++    } else {
++        next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
++    }
++
++    /* in case not the entire region has the same state, we need to set pnum to
++     * indicate for how many bytes our result is valid */
++    *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
++    *map = offset;
++
++    if (alloc) {
++        *file = bs->file->bs;
++        return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
++    } else if (bs->backing) {
++        *file = bs->backing->bs;
++    }
++    return 0;
++}
++
++static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
++                             BdrvChildRole role, BlockReopenQueue *reopen_queue,
++                             uint64_t perm, uint64_t shared,
++                             uint64_t *nperm, uint64_t *nshared)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++
++    *nshared = BLK_PERM_ALL;
++
++    /* in case we're currently dropping ourselves, claim to not use any
++     * permissions at all - which is fine, since from this point on we will
++     * never issue a read or write anymore */
++    if (s->dropping) {
++        *nperm = 0;
++        return;
++    }
++
++    if (role & BDRV_CHILD_DATA) {
++        *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
++    } else {
++        /* 'backing' is also a child of our BDS, but we don't expect it to be
++         * writeable, so we only forward 'consistent read' */
++        *nperm = perm & BLK_PERM_CONSISTENT_READ;
++    }
++}
++
++static void track_drop(void *opaque)
++{
++    BlockDriverState *bs = (BlockDriverState*)opaque;
++    BlockDriverState *file = bs->file->bs;
++    BDRVAllocTrackState *s = bs->opaque;
++
++    assert(file);
++
++    /* we rely on the fact that we're not used anywhere else, so let's wait
++     * until we have the only reference to ourselves */
++    if (bs->refcnt > 1) {
++        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, opaque);
++        return;
++    }
++
++    /* we do not need a bdrv_drained_end, since this is applied only to the node
++     * which gets removed by bdrv_replace_node */
++    bdrv_drained_begin(bs);
++
++    /* now that we're drained, we can safely set 'dropping' */
++    s->dropping = true;
++    bdrv_child_refresh_perms(bs, bs->file, &error_abort);
++
++    /* this will bdrv_unref() and thus drop us */
++    bdrv_replace_node(bs, file, &error_abort);
++}
++
++static int track_change_backing_file(BlockDriverState *bs,
++                                     const char *backing_file,
++                                     const char *backing_fmt)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    if (s->auto_remove && backing_file == NULL && backing_fmt == NULL) {
++        /* backing file has been disconnected, there's no longer any use for
++         * this node, so let's remove ourselves from the block graph - we need
++         * to schedule this for later however, since when this function is
++         * called, the blockjob modifying us is probably not done yet and has a
++         * blocker on 'bs' */
++        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
++    }
++
++    return 0;
++}
++
++static BlockDriver bdrv_alloc_track = {
++    .format_name                      = "alloc-track",
++    .instance_size                    = sizeof(BDRVAllocTrackState),
++
++    .bdrv_file_open                   = track_open,
++    .bdrv_close                       = track_close,
++    .bdrv_getlength                   = track_getlength,
++    .bdrv_child_perm                  = track_child_perm,
++    .bdrv_refresh_limits              = track_refresh_limits,
++
++    .bdrv_co_pwrite_zeroes            = track_co_pwrite_zeroes,
++    .bdrv_co_pwritev                  = track_co_pwritev,
++    .bdrv_co_preadv                   = track_co_preadv,
++    .bdrv_co_pdiscard                 = track_co_pdiscard,
++
++    .bdrv_co_flush                    = track_co_flush,
++    .bdrv_co_flush_to_disk            = track_co_flush,
++
++    .supports_backing                 = true,
++
++    .bdrv_co_block_status             = track_co_block_status,
++    .bdrv_change_backing_file         = track_change_backing_file,
++};
++
++static void bdrv_alloc_track_init(void)
++{
++    bdrv_register(&bdrv_alloc_track);
++}
++
++block_init(bdrv_alloc_track_init);
+diff --git a/block/meson.build b/block/meson.build
+index a070060e53..e387990764 100644
+--- a/block/meson.build
++++ b/block/meson.build
+@@ -2,6 +2,7 @@ block_ss.add(genh)
+ block_ss.add(files(
+   'accounting.c',
+   'aio_task.c',
++  'alloc-track.c',
+   'amend.c',
+   'backup.c',
+   'backup-dump.c',
diff --git a/debian/patches/series b/debian/patches/series
index 1b30d97..f6de587 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -55,3 +55,7 @@ pve/0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
 pve/0041-PVE-fall-back-to-open-iscsi-initiatorname.patch
 pve/0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
 pve/0043-PBS-add-master-key-support.patch
+pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
+pve/0045-PVE-block-stream-increase-chunk-size.patch
+pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
+pve/0047-block-add-alloc-track-driver.patch
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (2 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 03/11] add alloc-track block driver patch Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader Stefan Reiter
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Retain the old constructor for compatibility, most use cases don't need
an LRU cache anyway.

For now convert the 'mount' API to use the new variant, as the same set
of chunks might be accessed multiple times in a random pattern there.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* unchanged

 src/bin/proxmox_backup_client/mount.rs |  4 +-
 src/client/remote_chunk_reader.rs      | 77 ++++++++++++++++++++------
 2 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/src/bin/proxmox_backup_client/mount.rs b/src/bin/proxmox_backup_client/mount.rs
index 24100752..f80869b1 100644
--- a/src/bin/proxmox_backup_client/mount.rs
+++ b/src/bin/proxmox_backup_client/mount.rs
@@ -252,7 +252,7 @@ async fn mount_do(param: Value, pipe: Option<Fd>) -> Result<Value, Error> {
     if server_archive_name.ends_with(".didx") {
         let index = client.download_dynamic_index(&manifest, &server_archive_name).await?;
         let most_used = index.find_most_used_chunks(8);
-        let chunk_reader = RemoteChunkReader::new(client.clone(), crypt_config, file_info.chunk_crypt_mode(), most_used);
+        let chunk_reader = RemoteChunkReader::new_lru_cached(client.clone(), crypt_config, file_info.chunk_crypt_mode(), most_used, 16);
         let reader = BufferedDynamicReader::new(index, chunk_reader);
         let archive_size = reader.archive_size();
         let reader: proxmox_backup::pxar::fuse::Reader =
@@ -278,7 +278,7 @@ async fn mount_do(param: Value, pipe: Option<Fd>) -> Result<Value, Error> {
     } else if server_archive_name.ends_with(".fidx") {
         let index = client.download_fixed_index(&manifest, &server_archive_name).await?;
         let size = index.index_bytes();
-        let chunk_reader = RemoteChunkReader::new(client.clone(), crypt_config, file_info.chunk_crypt_mode(), HashMap::new());
+        let chunk_reader = RemoteChunkReader::new_lru_cached(client.clone(), crypt_config, file_info.chunk_crypt_mode(), HashMap::new(), 16);
         let reader = AsyncIndexReader::new(index, chunk_reader);
 
         let name = &format!("{}:{}/{}", repo.to_string(), path, archive_name);
diff --git a/src/client/remote_chunk_reader.rs b/src/client/remote_chunk_reader.rs
index 06f693a2..1314bcdc 100644
--- a/src/client/remote_chunk_reader.rs
+++ b/src/client/remote_chunk_reader.rs
@@ -8,6 +8,13 @@ use anyhow::{bail, Error};
 use super::BackupReader;
 use crate::backup::{AsyncReadChunk, CryptConfig, CryptMode, DataBlob, ReadChunk};
 use crate::tools::runtime::block_on;
+use crate::tools::lru_cache::LruCache;
+
+struct Cache {
+    cache_hint: HashMap<[u8; 32], usize>,
+    hinted: HashMap<[u8; 32], Vec<u8>>,
+    lru: Option<LruCache<[u8; 32], Vec<u8>>>,
+}
 
 /// Read chunks from remote host using ``BackupReader``
 #[derive(Clone)]
@@ -15,8 +22,7 @@ pub struct RemoteChunkReader {
     client: Arc<BackupReader>,
     crypt_config: Option<Arc<CryptConfig>>,
     crypt_mode: CryptMode,
-    cache_hint: Arc<HashMap<[u8; 32], usize>>,
-    cache: Arc<Mutex<HashMap<[u8; 32], Vec<u8>>>>,
+    cache: Arc<Mutex<Cache>>,
 }
 
 impl RemoteChunkReader {
@@ -28,13 +34,30 @@ impl RemoteChunkReader {
         crypt_config: Option<Arc<CryptConfig>>,
         crypt_mode: CryptMode,
         cache_hint: HashMap<[u8; 32], usize>,
+    ) -> Self {
+        Self::new_lru_cached(client, crypt_config, crypt_mode, cache_hint, 0)
+    }
+
+    /// Create a new instance.
+    ///
+    /// Chunks listed in ``cache_hint`` are cached and kept in RAM, as well as the last
+    /// 'cache_last' accessed chunks.
+    pub fn new_lru_cached(
+        client: Arc<BackupReader>,
+        crypt_config: Option<Arc<CryptConfig>>,
+        crypt_mode: CryptMode,
+        cache_hint: HashMap<[u8; 32], usize>,
+        cache_last: usize,
     ) -> Self {
         Self {
             client,
             crypt_config,
             crypt_mode,
-            cache_hint: Arc::new(cache_hint),
-            cache: Arc::new(Mutex::new(HashMap::new())),
+            cache: Arc::new(Mutex::new(Cache {
+                hinted: HashMap::with_capacity(cache_hint.len()),
+                lru: if cache_last == 0 { None } else { Some(LruCache::new(cache_last)) },
+                cache_hint,
+            })),
         }
     }
 
@@ -64,6 +87,34 @@ impl RemoteChunkReader {
             },
         }
     }
+
+    fn cache_get(&self, digest: &[u8; 32]) -> Option<Vec<u8>> {
+        let cache = &mut *self.cache.lock().unwrap();
+        if let Some(data) = cache.hinted.get(digest) {
+            return Some(data.to_vec());
+        }
+
+        cache
+            .lru
+            .as_mut()
+            .map(|lru| lru.get_mut(*digest).map(|x| x.to_vec()))
+            .flatten()
+    }
+
+    fn cache_insert(&self, digest: &[u8; 32], raw_data: &Vec<u8>) {
+        let cache = &mut *self.cache.lock().unwrap();
+
+        // if hinted, always cache given digest
+        if cache.cache_hint.contains_key(digest) {
+            cache.hinted.insert(*digest, raw_data.to_vec());
+            return;
+        }
+
+        // otherwise put in LRU
+        if let Some(ref mut lru) = cache.lru {
+            lru.insert(*digest, raw_data.to_vec());
+        }
+    }
 }
 
 impl ReadChunk for RemoteChunkReader {
@@ -72,18 +123,14 @@ impl ReadChunk for RemoteChunkReader {
     }
 
     fn read_chunk(&self, digest: &[u8; 32]) -> Result<Vec<u8>, Error> {
-        if let Some(raw_data) = (*self.cache.lock().unwrap()).get(digest) {
-            return Ok(raw_data.to_vec());
+        if let Some(raw_data) = self.cache_get(digest) {
+            return Ok(raw_data);
         }
 
         let chunk = ReadChunk::read_raw_chunk(self, digest)?;
 
         let raw_data = chunk.decode(self.crypt_config.as_ref().map(Arc::as_ref), Some(digest))?;
-
-        let use_cache = self.cache_hint.contains_key(digest);
-        if use_cache {
-            (*self.cache.lock().unwrap()).insert(*digest, raw_data.to_vec());
-        }
+        self.cache_insert(digest, &raw_data);
 
         Ok(raw_data)
     }
@@ -102,18 +149,14 @@ impl AsyncReadChunk for RemoteChunkReader {
         digest: &'a [u8; 32],
     ) -> Pin<Box<dyn Future<Output = Result<Vec<u8>, Error>> + Send + 'a>> {
         Box::pin(async move {
-            if let Some(raw_data) = (*self.cache.lock().unwrap()).get(digest) {
+            if let Some(raw_data) = self.cache_get(digest) {
                 return Ok(raw_data.to_vec());
             }
 
             let chunk = Self::read_raw_chunk(self, digest).await?;
 
             let raw_data = chunk.decode(self.crypt_config.as_ref().map(Arc::as_ref), Some(digest))?;
-
-            let use_cache = self.cache_hint.contains_key(digest);
-            if use_cache {
-                (*self.cache.lock().unwrap()).insert(*digest, raw_data.to_vec());
-            }
+            self.cache_insert(digest, &raw_data);
 
             Ok(raw_data)
         })
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (3 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-16 20:17   ` Thomas Lamprecht
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 06/11] make qemu_drive_mirror_monitor more generic Stefan Reiter
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Values chosen by fair dice roll, seems to be a good sweet spot on my
machine where any less causes performance degradation but any more
doesn't really make it go any faster.

Keep in mind that those values are per drive in an actual restore.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Depends on new proxmox-backup.

v2:
* unchanged

 src/restore.rs | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/restore.rs b/src/restore.rs
index 0790d7f..a1acce4 100644
--- a/src/restore.rs
+++ b/src/restore.rs
@@ -218,15 +218,16 @@ impl RestoreTask {
 
         let index = client.download_fixed_index(&manifest, &archive_name).await?;
         let archive_size = index.index_bytes();
-        let most_used = index.find_most_used_chunks(8);
+        let most_used = index.find_most_used_chunks(16); // 64 MB most used cache
 
         let file_info = manifest.lookup_file_info(&archive_name)?;
 
-        let chunk_reader = RemoteChunkReader::new(
+        let chunk_reader = RemoteChunkReader::new_lru_cached(
             Arc::clone(&client),
             self.crypt_config.clone(),
             file_info.chunk_crypt_mode(),
             most_used,
+            64, // 256 MB LRU cache
         );
 
         let reader = AsyncIndexReader::new(index, chunk_reader);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 06/11] make qemu_drive_mirror_monitor more generic
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (4 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives Stefan Reiter
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

...so it works with other block jobs as well. Intended use case is
block-stream, which also requires a new "auto" (wait only) completion
mode, since it finishes automatically anyway.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* don't rename function, only add $op to the end and default to "mirror"

 PVE/QemuServer.pm | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index a498444..0c39a6b 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6792,55 +6792,61 @@ sub qemu_drive_mirror {
 # 'complete': wait until all jobs are ready, block-job-complete them (default)
 # 'cancel': wait until all jobs are ready, block-job-cancel them
 # 'skip': wait until all jobs are ready, return with block jobs in ready state
+# 'auto': wait until all jobs disappear, only use for jobs which complete automatically
 sub qemu_drive_mirror_monitor {
-    my ($vmid, $vmiddst, $jobs, $completion, $qga) = @_;
+    my ($vmid, $vmiddst, $jobs, $completion, $qga, $op) = @_;
 
     $completion //= 'complete';
+    $op //= "mirror";
 
     eval {
 	my $err_complete = 0;
 
 	while (1) {
-	    die "storage migration timed out\n" if $err_complete > 300;
+	    die "block job ('$op') timed out\n" if $err_complete > 300;
 
 	    my $stats = mon_cmd($vmid, "query-block-jobs");
 
-	    my $running_mirror_jobs = {};
+	    my $running_jobs = {};
 	    foreach my $stat (@$stats) {
-		next if $stat->{type} ne 'mirror';
-		$running_mirror_jobs->{$stat->{device}} = $stat;
+		next if $stat->{type} ne $op;
+		$running_jobs->{$stat->{device}} = $stat;
 	    }
 
 	    my $readycounter = 0;
 
 	    foreach my $job (keys %$jobs) {
 
-	        if(defined($jobs->{$job}->{complete}) && !defined($running_mirror_jobs->{$job})) {
-		    print "$job : finished\n";
+		my $vanished = !defined($running_jobs->{$job});
+		my $complete = defined($jobs->{$job}->{complete}) && $vanished;
+	        if($complete || ($vanished && $completion eq 'auto')) {
+		    print "$job: finished\n";
 		    delete $jobs->{$job};
 		    next;
 		}
 
-		die "$job: mirroring has been cancelled\n" if !defined($running_mirror_jobs->{$job});
+		die "$job: '$op' has been cancelled\n" if !defined($running_jobs->{$job});
 
-		my $busy = $running_mirror_jobs->{$job}->{busy};
-		my $ready = $running_mirror_jobs->{$job}->{ready};
-		if (my $total = $running_mirror_jobs->{$job}->{len}) {
-		    my $transferred = $running_mirror_jobs->{$job}->{offset} || 0;
+		my $busy = $running_jobs->{$job}->{busy};
+		my $ready = $running_jobs->{$job}->{ready};
+		if (my $total = $running_jobs->{$job}->{len}) {
+		    my $transferred = $running_jobs->{$job}->{offset} || 0;
 		    my $remaining = $total - $transferred;
 		    my $percent = sprintf "%.2f", ($transferred * 100 / $total);
 
 		    print "$job: transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n";
 		}
 
-		$readycounter++ if $running_mirror_jobs->{$job}->{ready};
+		$readycounter++ if $running_jobs->{$job}->{ready};
 	    }
 
 	    last if scalar(keys %$jobs) == 0;
 
 	    if ($readycounter == scalar(keys %$jobs)) {
-		print "all mirroring jobs are ready \n";
-		last if $completion eq 'skip'; #do the complete later
+		print "all '$op' jobs are ready\n";
+
+		# do the complete later (or has already been done)
+		last if $completion eq 'skip' || $completion eq 'auto';
 
 		if ($vmiddst && $vmiddst != $vmid) {
 		    my $agent_running = $qga && qga_check_running($vmid);
@@ -6896,7 +6902,7 @@ sub qemu_drive_mirror_monitor {
 
     if ($err) {
 	eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
-	die "mirroring error: $err";
+	die "block job ('$op') error: $err";
     }
 
 }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (5 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 06/11] make qemu_drive_mirror_monitor more generic Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 08/11] enable live-restore for PBS Stefan Reiter
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Uses the custom 'alloc-track' filter node to redirect writes to the
original drives target, while unwritten blocks will be read from the
specified PBS snapshot.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* build logic directly into print_drive_commandline_full, instead of string
  replacements in config_to_command
* support "rbd:" storage paths as target

 PVE/QemuServer.pm | 77 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 0c39a6b..22edc2a 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -1514,28 +1514,31 @@ sub get_initiator_name {
 }
 
 sub print_drive_commandline_full {
-    my ($storecfg, $vmid, $drive) = @_;
+    my ($storecfg, $vmid, $drive, $pbs_name) = @_;
 
     my $path;
     my $volid = $drive->{file};
-    my $format;
+    my $format = $drive->{format};
 
     if (drive_is_cdrom($drive)) {
 	$path = get_iso_path($storecfg, $vmid, $volid);
+        die "cannot back cdrom drive with PBS snapshot\n" if $pbs_name;
     } else {
 	my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
 	if ($storeid) {
 	    $path = PVE::Storage::path($storecfg, $volid);
 	    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
-	    $format = qemu_img_format($scfg, $volname);
+	    $format //= qemu_img_format($scfg, $volname);
 	} else {
 	    $path = $volid;
-	    $format = "raw";
+	    $format //= "raw";
 	}
    }
 
+   my $is_rbd = $path =~ m/^rbd:/;
+
     my $opts = '';
-    my @qemu_drive_options = qw(heads secs cyls trans media format cache rerror werror aio discard);
+    my @qemu_drive_options = qw(heads secs cyls trans media cache rerror werror aio discard);
     foreach my $o (@qemu_drive_options) {
 	$opts .= ",$o=$drive->{$o}" if defined($drive->{$o});
     }
@@ -1568,7 +1571,13 @@ sub print_drive_commandline_full {
 	}
     }
 
-    $opts .= ",format=$format" if $format && !$drive->{format};
+    if ($pbs_name) {
+	$format = "rbd" if $is_rbd;
+	die "PBS backing requires a drive with known format\n" if !$format;
+	$opts .= ",format=alloc-track,file.driver=$format";
+    } elsif ($format) {
+	$opts .= ",format=$format";
+    }
 
     my $cache_direct = 0;
 
@@ -1598,14 +1607,41 @@ sub print_drive_commandline_full {
 	    # This used to be our default with discard not being specified:
 	    $detectzeroes = 'on';
 	}
-	$opts .= ",detect-zeroes=$detectzeroes" if $detectzeroes;
+
+	# note: 'detect-zeroes' works per blockdev and we want it to persist
+	# after the alloc-track is removed, so put it on 'file' directly
+	my $dz_param = $pbs_name ? "file.detect-zeroes" : "detect-zeroes";
+	$opts .= ",$dz_param=$detectzeroes" if $detectzeroes;
     }
 
-    my $pathinfo = $path ? "file=$path," : '';
+    if ($pbs_name) {
+	$opts .= ",backing=$pbs_name";
+	$opts .= ",auto-remove=on";
+    }
+
+    # my $file_param = $pbs_name ? "file.file.filename" : "file";
+    my $file_param = "file";
+    if ($pbs_name) {
+	# non-rbd drivers require the underlying file to be a seperate block
+	# node, so add a second .file indirection
+	$file_param .= ".file" if !$is_rbd;
+	$file_param .= ".filename";
+    }
+    my $pathinfo = $path ? "$file_param=$path," : '';
 
     return "${pathinfo}if=none,id=drive-$drive->{interface}$drive->{index}$opts";
 }
 
+sub print_pbs_blockdev {
+    my ($pbs_conf, $pbs_name) = @_;
+    my $blockdev = "driver=pbs,node-name=$pbs_name,read-only=on";
+    $blockdev .= ",repository=$pbs_conf->{repository}";
+    $blockdev .= ",snapshot=$pbs_conf->{snapshot}";
+    $blockdev .= ",archive=$pbs_conf->{archive}";
+    $blockdev .= ",keyfile=$pbs_conf->{keyfile}" if $pbs_conf->{keyfile};
+    return $blockdev;
+}
+
 sub print_netdevice_full {
     my ($vmid, $conf, $net, $netid, $bridges, $use_old_bios_files, $arch, $machine_type) = @_;
 
@@ -3023,7 +3059,8 @@ sub query_understood_cpu_flags {
 }
 
 sub config_to_command {
-    my ($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu) = @_;
+    my ($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu,
+        $pbs_backing) = @_;
 
     my $cmd = [];
     my $globalFlags = [];
@@ -3519,7 +3556,14 @@ sub config_to_command {
 	    $ahcicontroller->{$controller}=1;
         }
 
-	my $drive_cmd = print_drive_commandline_full($storecfg, $vmid, $drive);
+	my $pbs_conf = $pbs_backing->{$ds};
+	my $pbs_name = undef;
+	if ($pbs_conf) {
+	    $pbs_name = "drive-$ds-pbs";
+	    push @$devices, '-blockdev', print_pbs_blockdev($pbs_conf, $pbs_name);
+	}
+
+	my $drive_cmd = print_drive_commandline_full($storecfg, $vmid, $drive, $pbs_name);
 	$drive_cmd .= ',readonly' if PVE::QemuConfig->is_template($conf);
 
 	push @$devices, '-drive',$drive_cmd;
@@ -4923,6 +4967,15 @@ sub vm_start {
 #   timeout => in seconds
 #   paused => start VM in paused state (backup)
 #   resume => resume from hibernation
+#   pbs-backing => {
+#      sata0 => {
+#         repository
+#         snapshot
+#         keyfile
+#         archive
+#      },
+#      virtio2 => ...
+#   }
 # migrate_opts:
 #   nbd => volumes for NBD exports (vm_migrate_alloc_nbd_disks)
 #   migratedfrom => source node
@@ -4969,8 +5022,8 @@ sub vm_start_nolock {
 	print "Resuming suspended VM\n";
     }
 
-    my ($cmd, $vollist, $spice_port) =
-	config_to_command($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu);
+    my ($cmd, $vollist, $spice_port) = config_to_command($storecfg, $vmid,
+	$conf, $defaults, $forcemachine, $forcecpu, $params->{'pbs-backing'});
 
     my $migration_ip;
     my $get_migration_ip = sub {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 08/11] enable live-restore for PBS
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (6 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm Stefan Reiter
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Enables live-restore functionality using the 'alloc-track' QEMU driver.
This allows starting a VM immediately when restoring from a PBS
snapshot. The snapshot is mounted into the VM, so it can boot from that,
while guest reads and a 'block-stream' job handle the restore in the
background.

If an error occurs, the VM is deleted and all data written during the
restore is lost.

The VM remains locked during the restore, which automatically prohibits
any modifications to the config while restoring. Some modifications
might potentially be safe, however, this is experimental enough that I
believe this would cause more bad stuff(tm) than actually satisfy any
use cases.

Pool handling is slightly adjusted so the VM can be added to the pool
before the restore starts.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Looks better with -w

v2:
* move pool processing of vma backups to QemuServer.pm too, to keep it in one
  file at least

 PVE/API2/Qemu.pm  |  14 ++++-
 PVE/QemuServer.pm | 150 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 138 insertions(+), 26 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index feb9ea8..28607f4 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -490,6 +490,12 @@ __PACKAGE__->register_method({
 		    description => "Assign a unique random ethernet address.",
 		    requires => 'archive',
 		},
+		'live-restore' => {
+		    optional => 1,
+		    type => 'boolean',
+		    description => "Start the VM immediately from the backup and restore in background. PBS only.",
+		    requires => 'archive',
+		},
 		pool => {
 		    optional => 1,
 		    type => 'string', format => 'pve-poolid',
@@ -531,6 +537,10 @@ __PACKAGE__->register_method({
 	my $start_after_create = extract_param($param, 'start');
 	my $storage = extract_param($param, 'storage');
 	my $unique = extract_param($param, 'unique');
+	my $live_restore = extract_param($param, 'live-restore');
+
+	raise_param_exc({ 'start' => "cannot specify 'start' with 'live-restore'" })
+	    if $start_after_create && $live_restore;
 
 	if (defined(my $ssh_keys = $param->{sshkeys})) {
 		$ssh_keys = URI::Escape::uri_unescape($ssh_keys);
@@ -613,8 +623,10 @@ __PACKAGE__->register_method({
 		    pool => $pool,
 		    unique => $unique,
 		    bwlimit => $bwlimit,
+		    live => $live_restore,
 		};
 		if ($archive->{type} eq 'file' || $archive->{type} eq 'pipe') {
+		    die "live-restore is only compatible with PBS\n" if $live_restore;
 		    PVE::QemuServer::restore_file_archive($archive->{path} // '-', $vmid, $authuser, $restore_options);
 		} elsif ($archive->{type} eq 'pbs') {
 		    PVE::QemuServer::restore_proxmox_backup_archive($archive->{volid}, $vmid, $authuser, $restore_options);
@@ -628,8 +640,6 @@ __PACKAGE__->register_method({
 		    eval { PVE::QemuServer::template_create($vmid, $restored_conf) };
 		    warn $@ if $@;
 		}
-
-		PVE::AccessControl::add_vm_to_pool($vmid, $pool) if $pool;
 	    };
 
 	    # ensure no old replication state are exists
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 22edc2a..d4ee8ec 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6128,7 +6128,7 @@ sub restore_proxmox_backup_archive {
 
     my $repo = PVE::PBSClient::get_repository($scfg);
 
-    # This is only used for `pbs-restore`!
+    # This is only used for `pbs-restore` and the QEMU PBS driver (live-restore)
     my $password = PVE::Storage::PBSPlugin::pbs_get_password($scfg, $storeid);
     local $ENV{PBS_PASSWORD} = $password;
     local $ENV{PBS_FINGERPRINT} = $fingerprint if defined($fingerprint);
@@ -6225,34 +6225,35 @@ sub restore_proxmox_backup_archive {
 	# allocate volumes
 	my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid);
 
-	foreach my $virtdev (sort keys %$virtdev_hash) {
-	    my $d = $virtdev_hash->{$virtdev};
-	    next if $d->{is_cloudinit}; # no need to restore cloudinit
+	if (!$options->{live}) {
+	    foreach my $virtdev (sort keys %$virtdev_hash) {
+		my $d = $virtdev_hash->{$virtdev};
+		next if $d->{is_cloudinit}; # no need to restore cloudinit
 
-	    my $volid = $d->{volid};
+		my $volid = $d->{volid};
 
-	    my $path = PVE::Storage::path($storecfg, $volid);
+		my $path = PVE::Storage::path($storecfg, $volid);
 
-	    # This is the ONLY user of the PBS_ env vars set on top of this function!
-	    my $pbs_restore_cmd = [
-		'/usr/bin/pbs-restore',
-		'--repository', $repo,
-		$pbs_backup_name,
-		"$d->{devname}.img.fidx",
-		$path,
-		'--verbose',
-		];
+		my $pbs_restore_cmd = [
+		    '/usr/bin/pbs-restore',
+		    '--repository', $repo,
+		    $pbs_backup_name,
+		    "$d->{devname}.img.fidx",
+		    $path,
+		    '--verbose',
+		    ];
 
-	    push @$pbs_restore_cmd, '--format', $d->{format} if $d->{format};
-	    push @$pbs_restore_cmd, '--keyfile', $keyfile if -e $keyfile;
+		push @$pbs_restore_cmd, '--format', $d->{format} if $d->{format};
+		push @$pbs_restore_cmd, '--keyfile', $keyfile if -e $keyfile;
 
-	    if (PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $volid)) {
-		push @$pbs_restore_cmd, '--skip-zero';
+		if (PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $volid)) {
+		    push @$pbs_restore_cmd, '--skip-zero';
+		}
+
+		my $dbg_cmdstring = PVE::Tools::cmd2string($pbs_restore_cmd);
+		print "restore proxmox backup image: $dbg_cmdstring\n";
+		run_command($pbs_restore_cmd);
 	    }
-
-	    my $dbg_cmdstring = PVE::Tools::cmd2string($pbs_restore_cmd);
-	    print "restore proxmox backup image: $dbg_cmdstring\n";
-	    run_command($pbs_restore_cmd);
 	}
 
 	$fh->seek(0, 0) || die "seek failed - $!\n";
@@ -6269,7 +6270,9 @@ sub restore_proxmox_backup_archive {
     };
     my $err = $@;
 
-    $restore_deactivate_volumes->($storecfg, $devinfo);
+    if ($err || !$options->{live}) {
+	$restore_deactivate_volumes->($storecfg, $devinfo);
+    }
 
     rmtree $tmpdir;
 
@@ -6286,6 +6289,103 @@ sub restore_proxmox_backup_archive {
 
     eval { rescan($vmid, 1); };
     warn $@ if $@;
+
+    PVE::AccessControl::add_vm_to_pool($vmid, $options->{pool}) if $options->{pool};
+
+    if ($options->{live}) {
+	eval {
+	    # enable interrupts
+	    local $SIG{INT} =
+		local $SIG{TERM} =
+		local $SIG{QUIT} =
+		local $SIG{HUP} =
+		local $SIG{PIPE} = sub { die "interrupted by signal\n"; };
+
+	    my $conf = PVE::QemuConfig->load_config($vmid);
+	    die "cannot do live-restore for template\n"
+		if PVE::QemuConfig->is_template($conf);
+
+	    pbs_live_restore($vmid, $conf, $storecfg, $devinfo, $repo, $keyfile, $pbs_backup_name);
+	};
+
+	$err = $@;
+	if ($err) {
+	    warn "Detroying live-restore VM, all temporary data will be lost!\n";
+	    $restore_deactivate_volumes->($storecfg, $devinfo);
+	    $restore_destroy_volumes->($storecfg, $devinfo);
+	    unlink $conffile;
+	    die $err;
+	}
+    }
+}
+
+sub pbs_live_restore {
+    my ($vmid, $conf, $storecfg, $restored_disks, $repo, $keyfile, $snap) = @_;
+
+    print "Starting VM for live-restore\n";
+
+    my $pbs_backing = {};
+    foreach my $ds (keys %$restored_disks) {
+	$ds =~ m/^drive-(.*)$/;
+	$pbs_backing->{$1} = {
+	    repository => $repo,
+	    snapshot => $snap,
+	    archive => "$ds.img.fidx",
+	};
+	$pbs_backing->{$1}->{keyfile} = $keyfile if -e $keyfile;
+    }
+
+    eval {
+	# make sure HA doesn't interrupt our restore by stopping the VM
+	if (PVE::HA::Config::vm_is_ha_managed($vmid)) {
+	    my $cmd = ['ha-manager', 'set',  "vm:$vmid", '--state', 'started'];
+	    PVE::Tools::run_command($cmd);
+	}
+
+	# start VM with backing chain pointing to PBS backup, environment vars
+	# for PBS driver in QEMU (PBS_PASSWORD and PBS_FINGERPRINT) are already
+	# set by our caller
+	PVE::QemuServer::vm_start_nolock(
+	    $storecfg,
+	    $vmid,
+	    $conf,
+	    {
+		paused => 1,
+		'pbs-backing' => $pbs_backing,
+	    },
+	    {},
+	);
+
+	# begin streaming, i.e. data copy from PBS to target disk for every vol,
+	# this will effectively collapse the backing image chain consisting of
+	# [target <- alloc-track -> PBS snapshot] to just [target] (alloc-track
+	# removes itself once all backing images vanish with 'auto-remove=on')
+	my $jobs = {};
+	foreach my $ds (keys %$restored_disks) {
+	    my $job_id = "restore-$ds";
+	    mon_cmd($vmid, 'block-stream',
+		'job-id' => $job_id,
+		device => "$ds",
+	    );
+	    $jobs->{$job_id} = {};
+	}
+
+	mon_cmd($vmid, 'cont');
+	qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'stream');
+
+	# all jobs finished, remove blockdevs now to disconnect from PBS
+	foreach my $ds (keys %$restored_disks) {
+	    mon_cmd($vmid, 'blockdev-del', 'node-name' => "$ds-pbs");
+	}
+    };
+
+    my $err = $@;
+
+    if ($err) {
+	warn "An error occured during live-restore: $err\n";
+	_do_vm_stop($storecfg, $vmid, 1, 1, 10, 0, 1);
+	die "live-restore failed\n";
+    }
 }
 
 sub restore_vma_archive {
@@ -6498,6 +6598,8 @@ sub restore_vma_archive {
 
     eval { rescan($vmid, 1); };
     warn $@ if $@;
+
+    PVE::AccessControl::add_vm_to_pool($vmid, $opts->{pool}) if $opts->{pool};
 }
 
 sub restore_tar_archive {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (7 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 08/11] enable live-restore for PBS Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 10/11] live-restore: register qmeventd handle Stefan Reiter
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

...to be reused by live-restore.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* unchanged

 PVE/QemuServer.pm        | 28 ++++++++++++++++++++++++++++
 PVE/VZDump/QemuServer.pm | 32 ++------------------------------
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index d4ee8ec..e420de3 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7470,6 +7470,34 @@ sub device_bootorder {
     return $bootorder;
 }
 
+sub register_qmeventd_handle {
+    my ($vmid) = @_;
+
+    my $fh;
+    my $peer = "/var/run/qmeventd.sock";
+    my $count = 0;
+
+    for (;;) {
+	$count++;
+	$fh = IO::Socket::UNIX->new(Peer => $peer, Blocking => 0, Timeout => 1);
+	last if $fh;
+	if ($! != EINTR && $! != EAGAIN) {
+	    die "unable to connect to qmeventd socket (vmid: $vmid) - $!\n";
+	}
+	if ($count > 4) {
+	    die "unable to connect to qmeventd socket (vmid: $vmid) - timeout "
+	      . "after $count retries\n";
+	}
+	usleep(25000);
+    }
+
+    # send handshake to mark VM as backing up
+    print $fh to_json({vzdump => {vmid => "$vmid"}});
+
+    # return handle to be closed later when inhibit is no longer required
+    return $fh;
+}
+
 # bash completion helper
 
 sub complete_backup_archives {
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 901b366..aaff712 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -481,7 +481,7 @@ sub archive_pbs {
     my $devlist = _get_task_devlist($task);
 
     $self->enforce_vm_running_for_backup($vmid);
-    $self->register_qmeventd_handle($vmid);
+    $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
 
     my $backup_job_uuid;
     eval {
@@ -650,7 +650,7 @@ sub archive_vma {
     my $devlist = _get_task_devlist($task);
 
     $self->enforce_vm_running_for_backup($vmid);
-    $self->register_qmeventd_handle($vmid);
+    $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
 
     my $cpid;
     my $backup_job_uuid;
@@ -809,34 +809,6 @@ sub enforce_vm_running_for_backup {
     die $@ if $@;
 }
 
-sub register_qmeventd_handle {
-    my ($self, $vmid) = @_;
-
-    my $fh;
-    my $peer = "/var/run/qmeventd.sock";
-    my $count = 0;
-
-    for (;;) {
-	$count++;
-	$fh = IO::Socket::UNIX->new(Peer => $peer, Blocking => 0, Timeout => 1);
-	last if $fh;
-	if ($! != EINTR && $! != EAGAIN) {
-	    $self->log("warn", "unable to connect to qmeventd socket (vmid: $vmid) - $!\n");
-	    return;
-	}
-	if ($count > 4) {
-	    $self->log("warn", "unable to connect to qmeventd socket (vmid: $vmid)"
-			     . " - timeout after $count retries\n");
-	    return;
-	}
-	usleep(25000);
-    }
-
-    # send handshake to mark VM as backing up
-    print $fh to_json({vzdump => {vmid => "$vmid"}});
-    $self->{qmeventd_fh} = $fh;
-}
-
 # resume VM againe once we got in a clear state (stop mode backup of running VM)
 sub resume_vm_after_job_start {
     my ($self, $task, $vmid) = @_;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 10/11] live-restore: register qmeventd handle
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (8 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 manager 11/11] ui: restore: add live-restore checkbox Stefan Reiter
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Similar to backups, prevent QEMU from being killed by qmeventd during
the live-restore, so a guest can shut itself down without aborting the
restore operation.

Note that the 'close' is only to be explicit, the handle will also be
closed in case an operation errors (i.e. when the 'eval' is left).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* unchanged

 PVE/QemuServer.pm | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index e420de3..233441e 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6356,6 +6356,8 @@ sub pbs_live_restore {
 	    {},
 	);
 
+	my $qmeventd_fd = register_qmeventd_handle($vmid);
+
 	# begin streaming, i.e. data copy from PBS to target disk for every vol,
 	# this will effectively collapse the backing image chain consisting of
 	# [target <- alloc-track -> PBS snapshot] to just [target] (alloc-track
@@ -6377,6 +6379,8 @@ sub pbs_live_restore {
 	foreach my $ds (keys %$restored_disks) {
 	    mon_cmd($vmid, 'blockdev-del', 'node-name' => "$ds-pbs");
 	}
+
+	close($qmeventd_fd);
     };
 
     my $err = $@;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH v2 manager 11/11] ui: restore: add live-restore checkbox
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (9 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 10/11] live-restore: register qmeventd handle Stefan Reiter
@ 2021-03-03  9:56 ` Stefan Reiter
  2021-04-15 18:34   ` [pve-devel] applied: " Thomas Lamprecht
  2021-03-22 11:08 ` [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Dominic Jäger
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Stefan Reiter @ 2021-03-03  9:56 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Add 'isPBS' parameter for Restore window so we can detect when to show
the 'live-restore' checkbox.

Includes a warning about this feature being experimental for now.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* unchanged

 www/manager6/grid/BackupView.js    |  6 ++++-
 www/manager6/storage/BackupView.js |  5 +++-
 www/manager6/window/Restore.js     | 38 +++++++++++++++++++++++++++++-
 3 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/www/manager6/grid/BackupView.js b/www/manager6/grid/BackupView.js
index 2a496fab..38f127c9 100644
--- a/www/manager6/grid/BackupView.js
+++ b/www/manager6/grid/BackupView.js
@@ -79,6 +79,7 @@ Ext.define('PVE.grid.BackupView', {
 	    }
 	}, 100);
 
+	let isPBS = false;
 	var setStorage = function(storage) {
 	    var url = '/api2/json/nodes/' + nodename + '/storage/' + storage + '/content';
 	    url += '?content=backup';
@@ -101,13 +102,15 @@ Ext.define('PVE.grid.BackupView', {
 		change: function(f, value) {
 		    let storage = f.getStore().findRecord('storage', value, 0, false, true, true);
 		    if (storage) {
-			let isPBS = storage.data.type === 'pbs';
+			isPBS = storage.data.type === 'pbs';
 			me.getColumns().forEach((column) => {
 			    let id = column.dataIndex;
 			    if (id === 'verification' || id === 'encrypted') {
 				column.setHidden(!isPBS);
 			    }
 			});
+		    } else {
+			isPBS = false;
 		    }
 		    setStorage(value);
 		},
@@ -176,6 +179,7 @@ Ext.define('PVE.grid.BackupView', {
 		    volid: rec.data.volid,
 		    volidText: PVE.Utils.render_storage_content(rec.data.volid, {}, rec),
 		    vmtype: vmtype,
+		    isPBS: isPBS,
 		});
 		win.show();
 		win.on('destroy', reload);
diff --git a/www/manager6/storage/BackupView.js b/www/manager6/storage/BackupView.js
index 87d446ca..3dd500c2 100644
--- a/www/manager6/storage/BackupView.js
+++ b/www/manager6/storage/BackupView.js
@@ -74,6 +74,8 @@ Ext.define('PVE.storage.BackupView', {
 	    }
 	});
 
+	let isPBS = me.pluginType === 'pbs';
+
 	me.tbar = [
 	    {
 		xtype: 'proxmoxButton',
@@ -95,6 +97,7 @@ Ext.define('PVE.storage.BackupView', {
 			volid: rec.data.volid,
 			volidText: PVE.Utils.render_storage_content(rec.data.volid, {}, rec),
 			vmtype: vmtype,
+			isPBS: isPBS,
 		    });
 		    win.show();
 		    win.on('destroy', reload);
@@ -117,7 +120,7 @@ Ext.define('PVE.storage.BackupView', {
 	    pruneButton,
 	];
 
-	if (me.pluginType === 'pbs') {
+	if (isPBS) {
 	    me.extraColumns = {
 		encrypted: {
 		    header: gettext('Encrypted'),
diff --git a/www/manager6/window/Restore.js b/www/manager6/window/Restore.js
index d220c7bf..d9cb12a0 100644
--- a/www/manager6/window/Restore.js
+++ b/www/manager6/window/Restore.js
@@ -3,6 +3,20 @@ Ext.define('PVE.window.Restore', {
 
     resizable: false,
 
+    controller: {
+	xclass: 'Ext.app.ViewController',
+	control: {
+	    '#liveRestore': {
+		change: function(el, newVal) {
+		    let liveWarning = this.lookupReference('liveWarning');
+		    liveWarning.setHidden(!newVal);
+		    let start = this.lookupReference('start');
+		    start.setDisabled(newVal);
+		},
+	    },
+	},
+    },
+
     initComponent: function() {
 	var me = this;
 
@@ -84,6 +98,7 @@ Ext.define('PVE.window.Restore', {
 		{
 		    xtype: 'proxmoxcheckbox',
 		    name: 'start',
+		    reference: 'start',
 		    flex: 1,
 		    fieldLabel: gettext('Start after restore'),
 		    labelWidth: 105,
@@ -99,6 +114,26 @@ Ext.define('PVE.window.Restore', {
 		value: true,
 		fieldLabel: gettext('Unprivileged container'),
 	    });
+	} else if (me.vmtype === 'qemu') {
+	    items.push({
+		xtype: 'proxmoxcheckbox',
+		name: 'live-restore',
+		itemId: 'liveRestore',
+		flex: 1,
+		fieldLabel: gettext('Live restore'),
+		checked: false,
+		hidden: !me.isPBS,
+		// align checkbox with 'start' if 'unique' is hidden
+		labelWidth: me.vmid ? 105 : 100,
+	    });
+	    items.push({
+		xtype: 'displayfield',
+		reference: 'liveWarning',
+		// TODO: Remove once more tested/stable?
+		value: gettext('Warning: Live-restore is experimental! The VM will start immediately (with a disk performance penalty) and restore will happen in the background. If anything goes wrong, data written by the VM during the restore will be lost.'),
+		userCls: 'pmx-hint',
+		hidden: true,
+	    });
 	}
 
 	me.formPanel = Ext.create('Ext.form.Panel', {
@@ -144,7 +179,8 @@ Ext.define('PVE.window.Restore', {
 		    force: me.vmid ? 1 : 0,
 		};
 		if (values.unique) { params.unique = 1; }
-		if (values.start) { params.start = 1; }
+		if (values.start && !values['live-restore']) { params.start = 1; }
+		if (values['live-restore']) { params['live-restore'] = 1; }
 		if (values.storage) { params.storage = values.storage; }
 
 		if (values.bwlimit !== undefined) {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] applied: [PATCH v2 pve-qemu 01/11] clean up pve/ patches by merging
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 01/11] clean up pve/ patches by merging Stefan Reiter
@ 2021-03-03 16:32   ` Thomas Lamprecht
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Lamprecht @ 2021-03-03 16:32 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 03.03.21 10:56, Stefan Reiter wrote:
> No functional change intended.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Unrelated to rest of series.
> 
>  ...ckup-proxmox-backup-patches-for-qemu.patch | 665 ++++++-------
>  ...estore-new-command-to-restore-from-p.patch |  18 +-
>  ...-coroutines-to-fix-AIO-freeze-cleanu.patch | 914 ------------------
>  ...-support-for-sync-bitmap-mode-never.patch} |   0
>  ...support-for-conditional-and-always-.patch} |   0
>  ...heck-for-bitmap-mode-without-bitmap.patch} |   0
>  ...to-bdrv_dirty_bitmap_merge_internal.patch} |   0
>  ...-iotests-add-test-for-bitmap-mirror.patch} |   0
>  ...0035-mirror-move-some-checks-to-qmp.patch} |   0
>  ...rty-bitmap-tracking-for-incremental.patch} |  80 +-
>  .../pve/0037-PVE-various-PBS-fixes.patch      | 218 +++++
>  ...-driver-to-map-backup-archives-into.patch} |   0
>  ...name-incremental-to-use-dirty-bitmap.patch | 126 ---
>  ...d-query_proxmox_support-QMP-command.patch} |   4 +-
>  .../pve/0039-PVE-fixup-pbs-restore-API.patch  |  44 -
>  ...-add-query-pbs-bitmap-info-QMP-call.patch} |   0
>  ...irty-counter-for-non-incremental-bac.patch |  30 -
>  ...t-stderr-to-journal-when-daemonized.patch} |   0
>  ...use-proxmox_backup_check_incremental.patch |  36 -
>  ...-sequential-job-transaction-support.patch} |  20 +-
>  ...ckup-add-compress-and-encrypt-option.patch | 103 --
>  ...transaction-to-synchronize-job-stat.patch} |   0
>  ...block-on-finishing-and-cleanup-crea.patch} | 245 +++--
>  ...grate-dirty-bitmap-state-via-savevm.patch} |   0
>  ...issing-crypt-and-compress-parameters.patch |  43 -
>  ...rite-callback-with-big-blocks-correc.patch |  76 --
>  ...irty-bitmap-migrate-other-bitmaps-e.patch} |   0
>  ...-block-handling-to-PBS-dump-callback.patch |  85 --
>  ...ll-back-to-open-iscsi-initiatorname.patch} |   0
>  ...outine-QMP-for-backup-cancel_backup.patch} |   0
>  ... => 0049-PBS-add-master-key-support.patch} |   0
>  ...n-up-error-handling-for-create_backu.patch | 187 ----
>  ...-multiple-CREATED-jobs-in-sequential.patch |  39 -
>  debian/patches/series                         |  50 +-
>  34 files changed, 830 insertions(+), 2153 deletions(-)
>  delete mode 100644 debian/patches/pve/0030-PVE-Backup-avoid-coroutines-to-fix-AIO-freeze-cleanu.patch
>  rename debian/patches/pve/{0031-drive-mirror-add-support-for-sync-bitmap-mode-never.patch => 0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch} (100%)
>  rename debian/patches/pve/{0032-drive-mirror-add-support-for-conditional-and-always-.patch => 0031-drive-mirror-add-support-for-conditional-and-always-.patch} (100%)
>  rename debian/patches/pve/{0033-mirror-add-check-for-bitmap-mode-without-bitmap.patch => 0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch} (100%)
>  rename debian/patches/pve/{0034-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch => 0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch} (100%)
>  rename debian/patches/pve/{0035-iotests-add-test-for-bitmap-mirror.patch => 0034-iotests-add-test-for-bitmap-mirror.patch} (100%)
>  rename debian/patches/pve/{0036-mirror-move-some-checks-to-qmp.patch => 0035-mirror-move-some-checks-to-qmp.patch} (100%)
>  rename debian/patches/pve/{0037-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch => 0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch} (88%)
>  create mode 100644 debian/patches/pve/0037-PVE-various-PBS-fixes.patch
>  rename debian/patches/pve/{0043-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch => 0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch} (100%)
>  delete mode 100644 debian/patches/pve/0038-PVE-backup-rename-incremental-to-use-dirty-bitmap.patch
>  rename debian/patches/pve/{0044-PVE-add-query_proxmox_support-QMP-command.patch => 0039-PVE-add-query_proxmox_support-QMP-command.patch} (94%)
>  delete mode 100644 debian/patches/pve/0039-PVE-fixup-pbs-restore-API.patch
>  rename debian/patches/pve/{0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch => 0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch} (100%)
>  delete mode 100644 debian/patches/pve/0040-PVE-always-set-dirty-counter-for-non-incremental-bac.patch
>  rename debian/patches/pve/{0049-PVE-redirect-stderr-to-journal-when-daemonized.patch => 0041-PVE-redirect-stderr-to-journal-when-daemonized.patch} (100%)
>  delete mode 100644 debian/patches/pve/0041-PVE-use-proxmox_backup_check_incremental.patch
>  rename debian/patches/pve/{0050-PVE-Add-sequential-job-transaction-support.patch => 0042-PVE-Add-sequential-job-transaction-support.patch} (75%)
>  delete mode 100644 debian/patches/pve/0042-PVE-fixup-pbs-backup-add-compress-and-encrypt-option.patch
>  rename debian/patches/pve/{0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch => 0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch} (100%)
>  rename debian/patches/pve/{0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch => 0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch} (63%)
>  rename debian/patches/pve/{0054-PVE-Migrate-dirty-bitmap-state-via-savevm.patch => 0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch} (100%)
>  delete mode 100644 debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
>  delete mode 100644 debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
>  rename debian/patches/pve/{0055-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch => 0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch} (100%)
>  delete mode 100644 debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
>  rename debian/patches/pve/{0057-PVE-fall-back-to-open-iscsi-initiatorname.patch => 0047-PVE-fall-back-to-open-iscsi-initiatorname.patch} (100%)
>  rename debian/patches/pve/{0058-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch => 0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch} (100%)
>  rename debian/patches/pve/{0059-PBS-add-master-key-support.patch => 0049-PBS-add-master-key-support.patch} (100%)
>  delete mode 100644 debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
>  delete mode 100644 debian/patches/pve/0056-PVE-fix-aborting-multiple-CREATED-jobs-in-sequential.patch
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] applied: [PATCH v2 pve-qemu 02/11] move bitmap-mirror patches to seperate folder
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 02/11] move bitmap-mirror patches to seperate folder Stefan Reiter
@ 2021-03-03 16:32   ` Thomas Lamprecht
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Lamprecht @ 2021-03-03 16:32 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 03.03.21 10:56, Stefan Reiter wrote:
> ...instead of having them in the middle of the backup related patches.
> These might (hopefully) become upstream at some point as well.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Unrelated to rest of series.
> 
>  ...-support-for-sync-bitmap-mode-never.patch} | 30 +++++++-------
>  ...support-for-conditional-and-always-.patch} |  0
>  ...heck-for-bitmap-mode-without-bitmap.patch} |  4 +-
>  ...to-bdrv_dirty_bitmap_merge_internal.patch} |  0
>  ...-iotests-add-test-for-bitmap-mirror.patch} |  0
>  ...0006-mirror-move-some-checks-to-qmp.patch} |  4 +-
>  ...le-posix-make-locking-optiono-on-cre.patch |  4 +-
>  ...-Backup-add-backup-dump-block-driver.patch |  2 +-
>  ...ckup-proxmox-backup-patches-for-qemu.patch |  6 +--
>  ...rty-bitmap-tracking-for-incremental.patch} |  0
>  ...patch => 0031-PVE-various-PBS-fixes.patch} |  0
>  ...-driver-to-map-backup-archives-into.patch} |  0
>  ...d-query_proxmox_support-QMP-command.patch} |  0
>  ...-add-query-pbs-bitmap-info-QMP-call.patch} |  0
>  ...t-stderr-to-journal-when-daemonized.patch} |  0
>  ...-sequential-job-transaction-support.patch} |  0
>  ...transaction-to-synchronize-job-stat.patch} |  0
>  ...block-on-finishing-and-cleanup-crea.patch} |  0
>  ...grate-dirty-bitmap-state-via-savevm.patch} |  0
>  ...irty-bitmap-migrate-other-bitmaps-e.patch} |  0
>  ...ll-back-to-open-iscsi-initiatorname.patch} |  0
>  ...outine-QMP-for-backup-cancel_backup.patch} |  0
>  ... => 0043-PBS-add-master-key-support.patch} |  0
>  debian/patches/series                         | 40 +++++++++----------
>  24 files changed, 45 insertions(+), 45 deletions(-)
>  rename debian/patches/{pve/0030-drive-mirror-add-support-for-sync-bitmap-mode-never.patch => bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch} (96%)
>  rename debian/patches/{pve/0031-drive-mirror-add-support-for-conditional-and-always-.patch => bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch} (100%)
>  rename debian/patches/{pve/0032-mirror-add-check-for-bitmap-mode-without-bitmap.patch => bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch} (90%)
>  rename debian/patches/{pve/0033-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch => bitmap-mirror/0004-mirror-switch-to-bdrv_dirty_bitmap_merge_internal.patch} (100%)
>  rename debian/patches/{pve/0034-iotests-add-test-for-bitmap-mirror.patch => bitmap-mirror/0005-iotests-add-test-for-bitmap-mirror.patch} (100%)
>  rename debian/patches/{pve/0035-mirror-move-some-checks-to-qmp.patch => bitmap-mirror/0006-mirror-move-some-checks-to-qmp.patch} (99%)
>  rename debian/patches/pve/{0036-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch => 0030-PVE-Backup-Add-dirty-bitmap-tracking-for-incremental.patch} (100%)
>  rename debian/patches/pve/{0037-PVE-various-PBS-fixes.patch => 0031-PVE-various-PBS-fixes.patch} (100%)
>  rename debian/patches/pve/{0038-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch => 0032-PVE-Add-PBS-block-driver-to-map-backup-archives-into.patch} (100%)
>  rename debian/patches/pve/{0039-PVE-add-query_proxmox_support-QMP-command.patch => 0033-PVE-add-query_proxmox_support-QMP-command.patch} (100%)
>  rename debian/patches/pve/{0040-PVE-add-query-pbs-bitmap-info-QMP-call.patch => 0034-PVE-add-query-pbs-bitmap-info-QMP-call.patch} (100%)
>  rename debian/patches/pve/{0041-PVE-redirect-stderr-to-journal-when-daemonized.patch => 0035-PVE-redirect-stderr-to-journal-when-daemonized.patch} (100%)
>  rename debian/patches/pve/{0042-PVE-Add-sequential-job-transaction-support.patch => 0036-PVE-Add-sequential-job-transaction-support.patch} (100%)
>  rename debian/patches/pve/{0043-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch => 0037-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch} (100%)
>  rename debian/patches/pve/{0044-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch => 0038-PVE-Backup-Don-t-block-on-finishing-and-cleanup-crea.patch} (100%)
>  rename debian/patches/pve/{0045-PVE-Migrate-dirty-bitmap-state-via-savevm.patch => 0039-PVE-Migrate-dirty-bitmap-state-via-savevm.patch} (100%)
>  rename debian/patches/pve/{0046-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch => 0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch} (100%)
>  rename debian/patches/pve/{0047-PVE-fall-back-to-open-iscsi-initiatorname.patch => 0041-PVE-fall-back-to-open-iscsi-initiatorname.patch} (100%)
>  rename debian/patches/pve/{0048-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch => 0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch} (100%)
>  rename debian/patches/pve/{0049-PBS-add-master-key-support.patch => 0043-PBS-add-master-key-support.patch} (100%)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [pve-devel] [PATCH v2 pve-qemu 03/11] add alloc-track block driver patch
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 03/11] add alloc-track block driver patch Stefan Reiter
@ 2021-03-15 14:14   ` Wolfgang Bumiller
  2021-03-15 15:41     ` [pve-devel] [PATCH pve-qemu v3] " Stefan Reiter
  0 siblings, 1 reply; 25+ messages in thread
From: Wolfgang Bumiller @ 2021-03-15 14:14 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: pve-devel, pbs-devel

series looks mostly fine, just a comment inline below...

On Wed, Mar 03, 2021 at 10:56:04AM +0100, Stefan Reiter wrote:
> See added patches for more info, overview:
> 0044: slightly increase PBS performance by reducing allocations
> 0045: slightly increase block-stream performance for Ceph
> 0046: don't crash with block-stream on RBD
> 0047: add alloc-track driver for live restore
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> works best with updated proxmox-backup-qemu, but no hard dependency
> 
> v2:
> * now sent as pve-qemu combined patch
> * 0060 is unchanged
> * 0061/0062 are new, they fix restore via user-space Ceph
> * 0063 (alloc-track) is updated with Wolfgangs feedback (track_co_block_status)
>   and updated for 5.2 compatibility (mainly track_drop)
> 
>  ...st-path-reads-without-allocation-if-.patch |  52 +++
>  ...PVE-block-stream-increase-chunk-size.patch |  23 ++
>  ...accept-NULL-qiov-in-bdrv_pad_request.patch |  42 ++
>  .../0047-block-add-alloc-track-driver.patch   | 380 ++++++++++++++++++
>  debian/patches/series                         |   4 +
>  5 files changed, 501 insertions(+)
>  create mode 100644 debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
>  create mode 100644 debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
>  create mode 100644 debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
>  create mode 100644 debian/patches/pve/0047-block-add-alloc-track-driver.patch
> 
> diff --git a/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch b/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
> new file mode 100644
> index 0000000..a85ebc2
> --- /dev/null
> +++ b/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
> @@ -0,0 +1,52 @@
> +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> +From: Stefan Reiter <s.reiter@proxmox.com>
> +Date: Wed, 9 Dec 2020 11:46:57 +0100
> +Subject: [PATCH] PVE: block/pbs: fast-path reads without allocation if
> + possible
> +
> +...and switch over to g_malloc/g_free while at it to align with other
> +QEMU code.
> +
> +Tracing shows the fast-path is taken almost all the time, though not
> +100% so the slow one is still necessary.
> +
> +Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> +---
> + block/pbs.c | 17 ++++++++++++++---
> + 1 file changed, 14 insertions(+), 3 deletions(-)
> +
> +diff --git a/block/pbs.c b/block/pbs.c
> +index 1481a2bfd1..fbf0d8d845 100644
> +--- a/block/pbs.c
> ++++ b/block/pbs.c
> +@@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
> +     BDRVPBSState *s = bs->opaque;
> +     int ret;
> +     char *pbs_error = NULL;
> +-    uint8_t *buf = malloc(bytes);
> ++    uint8_t *buf;
> ++    bool inline_buf = true;
> ++
> ++    /* for single-buffer IO vectors we can fast-path the write directly to it */
> ++    if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
> ++        buf = qiov->iov->iov_base;
> ++    } else {
> ++        inline_buf = false;
> ++        buf = g_malloc(bytes);
> ++    }
> + 
> +     ReadCallbackData rcb = {
> +         .co = qemu_coroutine_self(),
> +@@ -218,8 +227,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
> +         return -EIO;
> +     }
> + 
> +-    qemu_iovec_from_buf(qiov, 0, buf, bytes);
> +-    free(buf);
> ++    if (!inline_buf) {
> ++        qemu_iovec_from_buf(qiov, 0, buf, bytes);
> ++        g_free(buf);
> ++    }
> + 
> +     return ret;
> + }
> diff --git a/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch b/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
> new file mode 100644
> index 0000000..601f8c7
> --- /dev/null
> +++ b/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
> @@ -0,0 +1,23 @@
> +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> +From: Stefan Reiter <s.reiter@proxmox.com>
> +Date: Tue, 2 Mar 2021 16:34:28 +0100
> +Subject: [PATCH] PVE: block/stream: increase chunk size
> +
> +Ceph favors bigger chunks, so increase to 4M.
> +---
> + block/stream.c | 2 +-
> + 1 file changed, 1 insertion(+), 1 deletion(-)
> +
> +diff --git a/block/stream.c b/block/stream.c
> +index 236384f2f7..a5371420e3 100644
> +--- a/block/stream.c
> ++++ b/block/stream.c
> +@@ -26,7 +26,7 @@ enum {
> +      * large enough to process multiple clusters in a single call, so
> +      * that populating contiguous regions of the image is efficient.
> +      */
> +-    STREAM_CHUNK = 512 * 1024, /* in bytes */
> ++    STREAM_CHUNK = 4 * 1024 * 1024, /* in bytes */
> + };
> + 
> + typedef struct StreamBlockJob {
> diff --git a/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch b/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
> new file mode 100644
> index 0000000..e40fa2e
> --- /dev/null
> +++ b/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
> @@ -0,0 +1,42 @@
> +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> +From: Stefan Reiter <s.reiter@proxmox.com>
> +Date: Tue, 2 Mar 2021 16:11:54 +0100
> +Subject: [PATCH] block/io: accept NULL qiov in bdrv_pad_request
> +
> +Some operations, e.g. block-stream, perform reads while discarding the
> +results (only copy-on-read matters). In this case they will pass NULL as
> +the target QEMUIOVector, which will however trip bdrv_pad_request, since
> +it wants to extend its passed vector.
> +
> +Simply check for NULL and do nothing, there's no reason to pad the
> +target if it will be discarded anyway.
> +---
> + block/io.c | 13 ++++++++-----
> + 1 file changed, 8 insertions(+), 5 deletions(-)
> +
> +diff --git a/block/io.c b/block/io.c
> +index ec5e152bb7..08dee005ec 100644
> +--- a/block/io.c
> ++++ b/block/io.c
> +@@ -1613,13 +1613,16 @@ static bool bdrv_pad_request(BlockDriverState *bs,
> +         return false;
> +     }
> + 
> +-    qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
> +-                             *qiov, *qiov_offset, *bytes,
> +-                             pad->buf + pad->buf_len - pad->tail, pad->tail);
> ++    if (*qiov) {
> ++        qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
> ++                                *qiov, *qiov_offset, *bytes,
> ++                                pad->buf + pad->buf_len - pad->tail, pad->tail);
> ++        *qiov = &pad->local_qiov;
> ++        *qiov_offset = 0;
> ++    }
> ++
> +     *bytes += pad->head + pad->tail;
> +     *offset -= pad->head;
> +-    *qiov = &pad->local_qiov;
> +-    *qiov_offset = 0;
> + 
> +     return true;
> + }
> diff --git a/debian/patches/pve/0047-block-add-alloc-track-driver.patch b/debian/patches/pve/0047-block-add-alloc-track-driver.patch
> new file mode 100644
> index 0000000..6aaa186
> --- /dev/null
> +++ b/debian/patches/pve/0047-block-add-alloc-track-driver.patch
> @@ -0,0 +1,380 @@
> +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> +From: Stefan Reiter <s.reiter@proxmox.com>
> +Date: Mon, 7 Dec 2020 15:21:03 +0100
> +Subject: [PATCH] block: add alloc-track driver
> +
> +Add a new filter node 'alloc-track', which seperates reads and writes to
> +different children, thus allowing to put a backing image behind any
> +blockdev (regardless of driver support). Since we can't detect any
> +pre-allocated blocks, we can only track new writes, hence the write
> +target ('file') for this node must always be empty.
> +
> +Intended use case is for live restoring, i.e. add a backup image as a
> +block device into a VM, then put an alloc-track on the restore target
> +and set the backup as backing. With this, one can use a regular
> +'block-stream' to restore the image, while the VM can already run in the
> +background. Copy-on-read will help make progress as the VM reads as
> +well.
> +
> +This only worked if the target supports backing images, so up until now
> +only for qcow2, with alloc-track any driver for the target can be used.
> +
> +If 'auto-remove' is set, alloc-track will automatically detach itself
> +once the backing image is removed. It will be replaced by 'file'.
> +
> +Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> +---
> + block/alloc-track.c | 331 ++++++++++++++++++++++++++++++++++++++++++++
> + block/meson.build   |   1 +
> + 2 files changed, 332 insertions(+)
> + create mode 100644 block/alloc-track.c
> +
> +diff --git a/block/alloc-track.c b/block/alloc-track.c
> +new file mode 100644
> +index 0000000000..cc06cfca13
> +--- /dev/null
> ++++ b/block/alloc-track.c
> +@@ -0,0 +1,331 @@
> ++/*
> ++ * Node to allow backing images to be applied to any node. Assumes a blank
> ++ * image to begin with, only new writes are tracked as allocated, thus this
> ++ * must never be put on a node that already contains data.
> ++ *
> ++ * Copyright (c) 2020 Proxmox Server Solutions GmbH
> ++ * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
> ++ *
> ++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
> ++ * See the COPYING file in the top-level directory.
> ++ */
> ++
> ++#include "qemu/osdep.h"
> ++#include "qapi/error.h"
> ++#include "block/block_int.h"
> ++#include "qapi/qmp/qdict.h"
> ++#include "qapi/qmp/qstring.h"
> ++#include "qemu/cutils.h"
> ++#include "qemu/option.h"
> ++#include "qemu/module.h"
> ++#include "sysemu/block-backend.h"
> ++
> ++#define TRACK_OPT_AUTO_REMOVE "auto-remove"
> ++
> ++typedef struct {
> ++    BdrvDirtyBitmap *bitmap;
> ++    bool dropping;
> ++    bool auto_remove;
> ++} BDRVAllocTrackState;
> ++
> ++static QemuOptsList runtime_opts = {
> ++    .name = "alloc-track",
> ++    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> ++    .desc = {
> ++        {
> ++            .name = TRACK_OPT_AUTO_REMOVE,
> ++            .type = QEMU_OPT_BOOL,
> ++            .help = "automatically replace this node with 'file' when 'backing'"
> ++                    "is detached",
> ++        },
> ++        { /* end of list */ }
> ++    },
> ++};
> ++
> ++static void track_refresh_limits(BlockDriverState *bs, Error **errp)
> ++{
> ++    BlockDriverInfo bdi;
> ++
> ++    if (!bs->file) {
> ++        return;
> ++    }
> ++
> ++    /* always use alignment from underlying write device so RMW cycle for
> ++     * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
> ++     * cluster allocation in 'file') */
> ++    bdrv_get_info(bs->file->bs, &bdi);
> ++    bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
> ++                                   MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
> ++}
> ++
> ++static int track_open(BlockDriverState *bs, QDict *options, int flags,
> ++                      Error **errp)
> ++{
> ++    BDRVAllocTrackState *s = bs->opaque;
> ++    QemuOpts *opts;
> ++    Error *local_err = NULL;
> ++    int ret = 0;
> ++
> ++    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
> ++    qemu_opts_absorb_qdict(opts, options, &local_err);
> ++    if (local_err) {
> ++        error_propagate(errp, local_err);
> ++        ret = -EINVAL;
> ++        goto fail;
> ++    }
> ++
> ++    s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
> ++
> ++    /* open the target (write) node, backing will be attached by block layer */
> ++    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
> ++                               BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
> ++                               &local_err);
> ++    if (local_err) {
> ++        ret = -EINVAL;
> ++        error_propagate(errp, local_err);
> ++        goto fail;
> ++    }
> ++
> ++    track_refresh_limits(bs, errp);
> ++    uint64_t gran = bs->bl.request_alignment;
> ++    s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
> ++    if (local_err) {
> ++        ret = -EIO;
> ++        error_propagate(errp, local_err);
> ++        goto fail;
> ++    }
> ++
> ++    s->dropping = false;
> ++
> ++fail:
> ++    if (ret < 0) {
> ++        bdrv_unref_child(bs, bs->file);
> ++        if (s->bitmap) {
> ++            bdrv_release_dirty_bitmap(s->bitmap);
> ++        }
> ++    }
> ++    qemu_opts_del(opts);
> ++    return ret;
> ++}
> ++
> ++static void track_close(BlockDriverState *bs)
> ++{
> ++    BDRVAllocTrackState *s = bs->opaque;
> ++    if (s->bitmap) {
> ++        bdrv_release_dirty_bitmap(s->bitmap);
> ++    }
> ++}
> ++
> ++static int64_t track_getlength(BlockDriverState *bs)
> ++{
> ++    return bdrv_getlength(bs->file->bs);
> ++}
> ++
> ++static int coroutine_fn track_co_preadv(BlockDriverState *bs,
> ++    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
> ++{
> ++    BDRVAllocTrackState *s = bs->opaque;
> ++    QEMUIOVector local_qiov;
> ++    int ret;
> ++
> ++    /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
> ++    uint64_t cur_offset, local_offset;
> ++    int64_t local_bytes;
> ++    bool alloc;
> ++
> ++    /* a read request can span multiple granularity-sized chunks, and can thus
> ++     * contain blocks with different allocation status - we could just iterate
> ++     * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
> ++     * to find the next flip and consider everything up to that in one go */
> ++    for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
> ++        local_offset = offset + cur_offset;
> ++        alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
> ++        if (alloc) {
> ++            local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
> ++                                                      bytes - cur_offset);
> ++        } else {
> ++            local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
> ++                                                       bytes - cur_offset);
> ++        }
> ++
> ++        /* _bitmap_next_X return is -1 if no end found within limit, otherwise
> ++         * offset of next flip (to start of image) */
> ++        local_bytes = local_bytes < 0 ?
> ++            bytes - cur_offset :
> ++            local_bytes - local_offset;
> ++
> ++        qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
> ++
> ++        if (alloc) {
> ++            ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
> ++                                 &local_qiov, flags);
> ++        } else if (bs->backing) {
> ++            ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
> ++                                 &local_qiov, flags);
> ++        } else {
> ++            ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
> ++        }
> ++
> ++        if (ret != 0) {
> ++            break;
> ++        }
> ++    }
> ++
> ++    return ret;
> ++}
> ++
> ++static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
> ++    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
> ++{
> ++    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
> ++}
> ++
> ++static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
> ++    int64_t offset, int count, BdrvRequestFlags flags)
> ++{
> ++    return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
> ++}
> ++
> ++static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
> ++    int64_t offset, int count)
> ++{
> ++    return bdrv_co_pdiscard(bs->file, offset, count);
> ++}
> ++
> ++static coroutine_fn int track_co_flush(BlockDriverState *bs)
> ++{
> ++    return bdrv_co_flush(bs->file->bs);
> ++}
> ++
> ++static int coroutine_fn track_co_block_status(BlockDriverState *bs,
> ++                                              bool want_zero,
> ++                                              int64_t offset,
> ++                                              int64_t bytes,
> ++                                              int64_t *pnum,
> ++                                              int64_t *map,
> ++                                              BlockDriverState **file)
> ++{
> ++    BDRVAllocTrackState *s = bs->opaque;
> ++
> ++    bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
> ++    int64_t next_flipped;
> ++    if (alloc) {
> ++        next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
> ++    } else {
> ++        next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
> ++    }
> ++
> ++    /* in case not the entire region has the same state, we need to set pnum to
> ++     * indicate for how many bytes our result is valid */
> ++    *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
> ++    *map = offset;
> ++
> ++    if (alloc) {
> ++        *file = bs->file->bs;
> ++        return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> ++    } else if (bs->backing) {
> ++        *file = bs->backing->bs;
> ++    }
> ++    return 0;
> ++}
> ++
> ++static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
> ++                             BdrvChildRole role, BlockReopenQueue *reopen_queue,
> ++                             uint64_t perm, uint64_t shared,
> ++                             uint64_t *nperm, uint64_t *nshared)
> ++{
> ++    BDRVAllocTrackState *s = bs->opaque;
> ++
> ++    *nshared = BLK_PERM_ALL;
> ++
> ++    /* in case we're currently dropping ourselves, claim to not use any
> ++     * permissions at all - which is fine, since from this point on we will
> ++     * never issue a read or write anymore */
> ++    if (s->dropping) {
> ++        *nperm = 0;
> ++        return;
> ++    }
> ++
> ++    if (role & BDRV_CHILD_DATA) {
> ++        *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
> ++    } else {
> ++        /* 'backing' is also a child of our BDS, but we don't expect it to be
> ++         * writeable, so we only forward 'consistent read' */
> ++        *nperm = perm & BLK_PERM_CONSISTENT_READ;
> ++    }
> ++}
> ++
> ++static void track_drop(void *opaque)
> ++{
> ++    BlockDriverState *bs = (BlockDriverState*)opaque;
> ++    BlockDriverState *file = bs->file->bs;
> ++    BDRVAllocTrackState *s = bs->opaque;
> ++
> ++    assert(file);
> ++
> ++    /* we rely on the fact that we're not used anywhere else, so let's wait
> ++     * until we have the only reference to ourselves */
> ++    if (bs->refcnt > 1) {
> ++        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, opaque);

I'm not very happy with this, but I'm not deep enough in the whole
bdrv/drain/BH/aio context logic to figure out a better way right now.
However... (see below)

> ++        return;
> ++    }
> ++
> ++    /* we do not need a bdrv_drained_end, since this is applied only to the node
> ++     * which gets removed by bdrv_replace_node */
> ++    bdrv_drained_begin(bs);
> ++
> ++    /* now that we're drained, we can safely set 'dropping' */
> ++    s->dropping = true;
> ++    bdrv_child_refresh_perms(bs, bs->file, &error_abort);
> ++
> ++    /* this will bdrv_unref() and thus drop us */
> ++    bdrv_replace_node(bs, file, &error_abort);
> ++}
> ++
> ++static int track_change_backing_file(BlockDriverState *bs,
> ++                                     const char *backing_file,
> ++                                     const char *backing_fmt)
> ++{
> ++    BDRVAllocTrackState *s = bs->opaque;
> ++    if (s->auto_remove && backing_file == NULL && backing_fmt == NULL) {

...I'd like to at least make sure that this is only triggered once here,
so either reuse (and already set) `s->dropping` here, or add another
boolean for this?

Also, perhaps we should give the drop callback a hard reference
(bdrv_ref(bs) and bump the count check inside track_drop), just to make
sure we don't end up crashing in the `drop` handler when we
simultaneously try to hot-unplug the disk from the VM.

> ++        /* backing file has been disconnected, there's no longer any use for
> ++         * this node, so let's remove ourselves from the block graph - we need
> ++         * to schedule this for later however, since when this function is
> ++         * called, the blockjob modifying us is probably not done yet and has a
> ++         * blocker on 'bs' */
> ++        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
> ++    }
> ++
> ++    return 0;
> ++}
> ++
> ++static BlockDriver bdrv_alloc_track = {
> ++    .format_name                      = "alloc-track",
> ++    .instance_size                    = sizeof(BDRVAllocTrackState),
> ++
> ++    .bdrv_file_open                   = track_open,
> ++    .bdrv_close                       = track_close,
> ++    .bdrv_getlength                   = track_getlength,
> ++    .bdrv_child_perm                  = track_child_perm,
> ++    .bdrv_refresh_limits              = track_refresh_limits,
> ++
> ++    .bdrv_co_pwrite_zeroes            = track_co_pwrite_zeroes,
> ++    .bdrv_co_pwritev                  = track_co_pwritev,
> ++    .bdrv_co_preadv                   = track_co_preadv,
> ++    .bdrv_co_pdiscard                 = track_co_pdiscard,
> ++
> ++    .bdrv_co_flush                    = track_co_flush,
> ++    .bdrv_co_flush_to_disk            = track_co_flush,
> ++
> ++    .supports_backing                 = true,
> ++
> ++    .bdrv_co_block_status             = track_co_block_status,
> ++    .bdrv_change_backing_file         = track_change_backing_file,
> ++};
> ++
> ++static void bdrv_alloc_track_init(void)
> ++{
> ++    bdrv_register(&bdrv_alloc_track);
> ++}
> ++
> ++block_init(bdrv_alloc_track_init);
> +diff --git a/block/meson.build b/block/meson.build
> +index a070060e53..e387990764 100644
> +--- a/block/meson.build
> ++++ b/block/meson.build
> +@@ -2,6 +2,7 @@ block_ss.add(genh)
> + block_ss.add(files(
> +   'accounting.c',
> +   'aio_task.c',
> ++  'alloc-track.c',
> +   'amend.c',
> +   'backup.c',
> +   'backup-dump.c',
> diff --git a/debian/patches/series b/debian/patches/series
> index 1b30d97..f6de587 100644
> --- a/debian/patches/series
> +++ b/debian/patches/series
> @@ -55,3 +55,7 @@ pve/0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
>  pve/0041-PVE-fall-back-to-open-iscsi-initiatorname.patch
>  pve/0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
>  pve/0043-PBS-add-master-key-support.patch
> +pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
> +pve/0045-PVE-block-stream-increase-chunk-size.patch
> +pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
> +pve/0047-block-add-alloc-track-driver.patch
> -- 
> 2.20.1




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] [PATCH pve-qemu v3] add alloc-track block driver patch
  2021-03-15 14:14   ` Wolfgang Bumiller
@ 2021-03-15 15:41     ` Stefan Reiter
  2021-03-16 19:57       ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Reiter @ 2021-03-15 15:41 UTC (permalink / raw)
  To: pve-devel, pbs-devel

See added patches for more info, overview:
0044: slightly increase PBS performance by reducing allocations
0045: slightly increase block-stream performance for Ceph
0046: don't crash with block-stream on RBD
0047: add alloc-track driver for live restore

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v3:
* improve track_drop as discussed by @Wolfgang, both on and off list
  (DropState, additional bdrv_ref/unref)

track_drop is certainly not beautiful, but it works reliably in our use-case...

 ...st-path-reads-without-allocation-if-.patch |  52 +++
 ...PVE-block-stream-increase-chunk-size.patch |  23 ++
 ...accept-NULL-qiov-in-bdrv_pad_request.patch |  42 ++
 .../0047-block-add-alloc-track-driver.patch   | 391 ++++++++++++++++++
 debian/patches/series                         |   4 +
 5 files changed, 512 insertions(+)
 create mode 100644 debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
 create mode 100644 debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
 create mode 100644 debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
 create mode 100644 debian/patches/pve/0047-block-add-alloc-track-driver.patch

diff --git a/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch b/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
new file mode 100644
index 0000000..a85ebc2
--- /dev/null
+++ b/debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
@@ -0,0 +1,52 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Wed, 9 Dec 2020 11:46:57 +0100
+Subject: [PATCH] PVE: block/pbs: fast-path reads without allocation if
+ possible
+
+...and switch over to g_malloc/g_free while at it to align with other
+QEMU code.
+
+Tracing shows the fast-path is taken almost all the time, though not
+100% so the slow one is still necessary.
+
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ block/pbs.c | 17 ++++++++++++++---
+ 1 file changed, 14 insertions(+), 3 deletions(-)
+
+diff --git a/block/pbs.c b/block/pbs.c
+index 1481a2bfd1..fbf0d8d845 100644
+--- a/block/pbs.c
++++ b/block/pbs.c
+@@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
+     BDRVPBSState *s = bs->opaque;
+     int ret;
+     char *pbs_error = NULL;
+-    uint8_t *buf = malloc(bytes);
++    uint8_t *buf;
++    bool inline_buf = true;
++
++    /* for single-buffer IO vectors we can fast-path the write directly to it */
++    if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
++        buf = qiov->iov->iov_base;
++    } else {
++        inline_buf = false;
++        buf = g_malloc(bytes);
++    }
+ 
+     ReadCallbackData rcb = {
+         .co = qemu_coroutine_self(),
+@@ -218,8 +227,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
+         return -EIO;
+     }
+ 
+-    qemu_iovec_from_buf(qiov, 0, buf, bytes);
+-    free(buf);
++    if (!inline_buf) {
++        qemu_iovec_from_buf(qiov, 0, buf, bytes);
++        g_free(buf);
++    }
+ 
+     return ret;
+ }
diff --git a/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch b/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
new file mode 100644
index 0000000..601f8c7
--- /dev/null
+++ b/debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
@@ -0,0 +1,23 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Tue, 2 Mar 2021 16:34:28 +0100
+Subject: [PATCH] PVE: block/stream: increase chunk size
+
+Ceph favors bigger chunks, so increase to 4M.
+---
+ block/stream.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/block/stream.c b/block/stream.c
+index 236384f2f7..a5371420e3 100644
+--- a/block/stream.c
++++ b/block/stream.c
+@@ -26,7 +26,7 @@ enum {
+      * large enough to process multiple clusters in a single call, so
+      * that populating contiguous regions of the image is efficient.
+      */
+-    STREAM_CHUNK = 512 * 1024, /* in bytes */
++    STREAM_CHUNK = 4 * 1024 * 1024, /* in bytes */
+ };
+ 
+ typedef struct StreamBlockJob {
diff --git a/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch b/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
new file mode 100644
index 0000000..e40fa2e
--- /dev/null
+++ b/debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
@@ -0,0 +1,42 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Tue, 2 Mar 2021 16:11:54 +0100
+Subject: [PATCH] block/io: accept NULL qiov in bdrv_pad_request
+
+Some operations, e.g. block-stream, perform reads while discarding the
+results (only copy-on-read matters). In this case they will pass NULL as
+the target QEMUIOVector, which will however trip bdrv_pad_request, since
+it wants to extend its passed vector.
+
+Simply check for NULL and do nothing, there's no reason to pad the
+target if it will be discarded anyway.
+---
+ block/io.c | 13 ++++++++-----
+ 1 file changed, 8 insertions(+), 5 deletions(-)
+
+diff --git a/block/io.c b/block/io.c
+index ec5e152bb7..08dee005ec 100644
+--- a/block/io.c
++++ b/block/io.c
+@@ -1613,13 +1613,16 @@ static bool bdrv_pad_request(BlockDriverState *bs,
+         return false;
+     }
+ 
+-    qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
+-                             *qiov, *qiov_offset, *bytes,
+-                             pad->buf + pad->buf_len - pad->tail, pad->tail);
++    if (*qiov) {
++        qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
++                                *qiov, *qiov_offset, *bytes,
++                                pad->buf + pad->buf_len - pad->tail, pad->tail);
++        *qiov = &pad->local_qiov;
++        *qiov_offset = 0;
++    }
++
+     *bytes += pad->head + pad->tail;
+     *offset -= pad->head;
+-    *qiov = &pad->local_qiov;
+-    *qiov_offset = 0;
+ 
+     return true;
+ }
diff --git a/debian/patches/pve/0047-block-add-alloc-track-driver.patch b/debian/patches/pve/0047-block-add-alloc-track-driver.patch
new file mode 100644
index 0000000..db46371
--- /dev/null
+++ b/debian/patches/pve/0047-block-add-alloc-track-driver.patch
@@ -0,0 +1,391 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Stefan Reiter <s.reiter@proxmox.com>
+Date: Mon, 7 Dec 2020 15:21:03 +0100
+Subject: [PATCH] block: add alloc-track driver
+
+Add a new filter node 'alloc-track', which seperates reads and writes to
+different children, thus allowing to put a backing image behind any
+blockdev (regardless of driver support). Since we can't detect any
+pre-allocated blocks, we can only track new writes, hence the write
+target ('file') for this node must always be empty.
+
+Intended use case is for live restoring, i.e. add a backup image as a
+block device into a VM, then put an alloc-track on the restore target
+and set the backup as backing. With this, one can use a regular
+'block-stream' to restore the image, while the VM can already run in the
+background. Copy-on-read will help make progress as the VM reads as
+well.
+
+This only worked if the target supports backing images, so up until now
+only for qcow2, with alloc-track any driver for the target can be used.
+
+If 'auto-remove' is set, alloc-track will automatically detach itself
+once the backing image is removed. It will be replaced by 'file'.
+
+Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
+---
+ block/alloc-track.c | 342 ++++++++++++++++++++++++++++++++++++++++++++
+ block/meson.build   |   1 +
+ 2 files changed, 343 insertions(+)
+ create mode 100644 block/alloc-track.c
+
+diff --git a/block/alloc-track.c b/block/alloc-track.c
+new file mode 100644
+index 0000000000..b579380279
+--- /dev/null
++++ b/block/alloc-track.c
+@@ -0,0 +1,342 @@
++/*
++ * Node to allow backing images to be applied to any node. Assumes a blank
++ * image to begin with, only new writes are tracked as allocated, thus this
++ * must never be put on a node that already contains data.
++ *
++ * Copyright (c) 2020 Proxmox Server Solutions GmbH
++ * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
++ * See the COPYING file in the top-level directory.
++ */
++
++#include "qemu/osdep.h"
++#include "qapi/error.h"
++#include "block/block_int.h"
++#include "qapi/qmp/qdict.h"
++#include "qapi/qmp/qstring.h"
++#include "qemu/cutils.h"
++#include "qemu/option.h"
++#include "qemu/module.h"
++#include "sysemu/block-backend.h"
++
++#define TRACK_OPT_AUTO_REMOVE "auto-remove"
++
++typedef enum DropState {
++    DropNone,
++    DropRequested,
++    DropInProgress,
++} DropState;
++
++typedef struct {
++    BdrvDirtyBitmap *bitmap;
++    DropState drop_state;
++    bool auto_remove;
++} BDRVAllocTrackState;
++
++static QemuOptsList runtime_opts = {
++    .name = "alloc-track",
++    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
++    .desc = {
++        {
++            .name = TRACK_OPT_AUTO_REMOVE,
++            .type = QEMU_OPT_BOOL,
++            .help = "automatically replace this node with 'file' when 'backing'"
++                    "is detached",
++        },
++        { /* end of list */ }
++    },
++};
++
++static void track_refresh_limits(BlockDriverState *bs, Error **errp)
++{
++    BlockDriverInfo bdi;
++
++    if (!bs->file) {
++        return;
++    }
++
++    /* always use alignment from underlying write device so RMW cycle for
++     * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
++     * cluster allocation in 'file') */
++    bdrv_get_info(bs->file->bs, &bdi);
++    bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
++                                   MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
++}
++
++static int track_open(BlockDriverState *bs, QDict *options, int flags,
++                      Error **errp)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    QemuOpts *opts;
++    Error *local_err = NULL;
++    int ret = 0;
++
++    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
++    qemu_opts_absorb_qdict(opts, options, &local_err);
++    if (local_err) {
++        error_propagate(errp, local_err);
++        ret = -EINVAL;
++        goto fail;
++    }
++
++    s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
++
++    /* open the target (write) node, backing will be attached by block layer */
++    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
++                               BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
++                               &local_err);
++    if (local_err) {
++        ret = -EINVAL;
++        error_propagate(errp, local_err);
++        goto fail;
++    }
++
++    track_refresh_limits(bs, errp);
++    uint64_t gran = bs->bl.request_alignment;
++    s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
++    if (local_err) {
++        ret = -EIO;
++        error_propagate(errp, local_err);
++        goto fail;
++    }
++
++    s->drop_state = DropNone;
++
++fail:
++    if (ret < 0) {
++        bdrv_unref_child(bs, bs->file);
++        if (s->bitmap) {
++            bdrv_release_dirty_bitmap(s->bitmap);
++        }
++    }
++    qemu_opts_del(opts);
++    return ret;
++}
++
++static void track_close(BlockDriverState *bs)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    if (s->bitmap) {
++        bdrv_release_dirty_bitmap(s->bitmap);
++    }
++}
++
++static int64_t track_getlength(BlockDriverState *bs)
++{
++    return bdrv_getlength(bs->file->bs);
++}
++
++static int coroutine_fn track_co_preadv(BlockDriverState *bs,
++    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    QEMUIOVector local_qiov;
++    int ret;
++
++    /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
++    uint64_t cur_offset, local_offset;
++    int64_t local_bytes;
++    bool alloc;
++
++    /* a read request can span multiple granularity-sized chunks, and can thus
++     * contain blocks with different allocation status - we could just iterate
++     * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
++     * to find the next flip and consider everything up to that in one go */
++    for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
++        local_offset = offset + cur_offset;
++        alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
++        if (alloc) {
++            local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
++                                                      bytes - cur_offset);
++        } else {
++            local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
++                                                       bytes - cur_offset);
++        }
++
++        /* _bitmap_next_X return is -1 if no end found within limit, otherwise
++         * offset of next flip (to start of image) */
++        local_bytes = local_bytes < 0 ?
++            bytes - cur_offset :
++            local_bytes - local_offset;
++
++        qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
++
++        if (alloc) {
++            ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
++                                 &local_qiov, flags);
++        } else if (bs->backing) {
++            ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
++                                 &local_qiov, flags);
++        } else {
++            ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
++        }
++
++        if (ret != 0) {
++            break;
++        }
++    }
++
++    return ret;
++}
++
++static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
++    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
++{
++    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
++}
++
++static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
++    int64_t offset, int count, BdrvRequestFlags flags)
++{
++    return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
++}
++
++static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
++    int64_t offset, int count)
++{
++    return bdrv_co_pdiscard(bs->file, offset, count);
++}
++
++static coroutine_fn int track_co_flush(BlockDriverState *bs)
++{
++    return bdrv_co_flush(bs->file->bs);
++}
++
++static int coroutine_fn track_co_block_status(BlockDriverState *bs,
++                                              bool want_zero,
++                                              int64_t offset,
++                                              int64_t bytes,
++                                              int64_t *pnum,
++                                              int64_t *map,
++                                              BlockDriverState **file)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++
++    bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
++    int64_t next_flipped;
++    if (alloc) {
++        next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
++    } else {
++        next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
++    }
++
++    /* in case not the entire region has the same state, we need to set pnum to
++     * indicate for how many bytes our result is valid */
++    *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
++    *map = offset;
++
++    if (alloc) {
++        *file = bs->file->bs;
++        return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
++    } else if (bs->backing) {
++        *file = bs->backing->bs;
++    }
++    return 0;
++}
++
++static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
++                             BdrvChildRole role, BlockReopenQueue *reopen_queue,
++                             uint64_t perm, uint64_t shared,
++                             uint64_t *nperm, uint64_t *nshared)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++
++    *nshared = BLK_PERM_ALL;
++
++    /* in case we're currently dropping ourselves, claim to not use any
++     * permissions at all - which is fine, since from this point on we will
++     * never issue a read or write anymore */
++    if (s->drop_state == DropInProgress) {
++        *nperm = 0;
++        return;
++    }
++
++    if (role & BDRV_CHILD_DATA) {
++        *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
++    } else {
++        /* 'backing' is also a child of our BDS, but we don't expect it to be
++         * writeable, so we only forward 'consistent read' */
++        *nperm = perm & BLK_PERM_CONSISTENT_READ;
++    }
++}
++
++static void track_drop(void *opaque)
++{
++    BlockDriverState *bs = (BlockDriverState*)opaque;
++    BlockDriverState *file = bs->file->bs;
++    BDRVAllocTrackState *s = bs->opaque;
++
++    assert(file);
++
++    /* we rely on the fact that we're not used anywhere else, so let's wait
++     * until we're only used once - in the drive connected to the guest (and one
++     * ref is held by bdrv_ref in track_change_backing_file) */
++    if (bs->refcnt > 2) {
++        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, opaque);
++        return;
++    }
++
++    /* we do not need a bdrv_drained_end, since this is applied only to the node
++     * which gets removed by bdrv_replace_node */
++    bdrv_drained_begin(bs);
++
++    /* now that we're drained, we can safely set 'DropInProgress' */
++    s->drop_state = DropInProgress;
++    bdrv_child_refresh_perms(bs, bs->file, &error_abort);
++
++    bdrv_replace_node(bs, file, &error_abort);
++    bdrv_unref(bs);
++}
++
++static int track_change_backing_file(BlockDriverState *bs,
++                                     const char *backing_file,
++                                     const char *backing_fmt)
++{
++    BDRVAllocTrackState *s = bs->opaque;
++    if (s->auto_remove && s->drop_state == DropNone &&
++        backing_file == NULL && backing_fmt == NULL)
++    {
++        /* backing file has been disconnected, there's no longer any use for
++         * this node, so let's remove ourselves from the block graph - we need
++         * to schedule this for later however, since when this function is
++         * called, the blockjob modifying us is probably not done yet and has a
++         * blocker on 'bs' */
++        s->drop_state = DropRequested;
++        bdrv_ref(bs);
++        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
++    }
++
++    return 0;
++}
++
++static BlockDriver bdrv_alloc_track = {
++    .format_name                      = "alloc-track",
++    .instance_size                    = sizeof(BDRVAllocTrackState),
++
++    .bdrv_file_open                   = track_open,
++    .bdrv_close                       = track_close,
++    .bdrv_getlength                   = track_getlength,
++    .bdrv_child_perm                  = track_child_perm,
++    .bdrv_refresh_limits              = track_refresh_limits,
++
++    .bdrv_co_pwrite_zeroes            = track_co_pwrite_zeroes,
++    .bdrv_co_pwritev                  = track_co_pwritev,
++    .bdrv_co_preadv                   = track_co_preadv,
++    .bdrv_co_pdiscard                 = track_co_pdiscard,
++
++    .bdrv_co_flush                    = track_co_flush,
++    .bdrv_co_flush_to_disk            = track_co_flush,
++
++    .supports_backing                 = true,
++
++    .bdrv_co_block_status             = track_co_block_status,
++    .bdrv_change_backing_file         = track_change_backing_file,
++};
++
++static void bdrv_alloc_track_init(void)
++{
++    bdrv_register(&bdrv_alloc_track);
++}
++
++block_init(bdrv_alloc_track_init);
+diff --git a/block/meson.build b/block/meson.build
+index a070060e53..e387990764 100644
+--- a/block/meson.build
++++ b/block/meson.build
+@@ -2,6 +2,7 @@ block_ss.add(genh)
+ block_ss.add(files(
+   'accounting.c',
+   'aio_task.c',
++  'alloc-track.c',
+   'amend.c',
+   'backup.c',
+   'backup-dump.c',
diff --git a/debian/patches/series b/debian/patches/series
index f29a15d..da4f9c7 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -56,3 +56,7 @@ pve/0040-migration-block-dirty-bitmap-migrate-other-bitmaps-e.patch
 pve/0041-PVE-fall-back-to-open-iscsi-initiatorname.patch
 pve/0042-PVE-Use-coroutine-QMP-for-backup-cancel_backup.patch
 pve/0043-PBS-add-master-key-support.patch
+pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
+pve/0045-PVE-block-stream-increase-chunk-size.patch
+pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
+pve/0047-block-add-alloc-track-driver.patch
-- 
2.20.1





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] applied: [PATCH pve-qemu v3] add alloc-track block driver patch
  2021-03-15 15:41     ` [pve-devel] [PATCH pve-qemu v3] " Stefan Reiter
@ 2021-03-16 19:57       ` Thomas Lamprecht
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Lamprecht @ 2021-03-16 19:57 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 15.03.21 16:41, Stefan Reiter wrote:
> See added patches for more info, overview:
> 0044: slightly increase PBS performance by reducing allocations
> 0045: slightly increase block-stream performance for Ceph
> 0046: don't crash with block-stream on RBD
> 0047: add alloc-track driver for live restore
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> v3:
> * improve track_drop as discussed by @Wolfgang, both on and off list
>   (DropState, additional bdrv_ref/unref)
> 
> track_drop is certainly not beautiful, but it works reliably in our use-case...
> 
>  ...st-path-reads-without-allocation-if-.patch |  52 +++
>  ...PVE-block-stream-increase-chunk-size.patch |  23 ++
>  ...accept-NULL-qiov-in-bdrv_pad_request.patch |  42 ++
>  .../0047-block-add-alloc-track-driver.patch   | 391 ++++++++++++++++++
>  debian/patches/series                         |   4 +
>  5 files changed, 512 insertions(+)
>  create mode 100644 debian/patches/pve/0044-PVE-block-pbs-fast-path-reads-without-allocation-if-.patch
>  create mode 100644 debian/patches/pve/0045-PVE-block-stream-increase-chunk-size.patch
>  create mode 100644 debian/patches/pve/0046-block-io-accept-NULL-qiov-in-bdrv_pad_request.patch
>  create mode 100644 debian/patches/pve/0047-block-add-alloc-track-driver.patch
> 
>

applied, thanks!

I'd feel more comfortable with the bdrv_pad_request change being upstreamed rather
soonish... Maybe one could also build a reasonable case for the alloc-track to be
accepted upstream someday..




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader Stefan Reiter
@ 2021-03-16 20:17   ` Thomas Lamprecht
  2021-03-17 13:37     ` Stefan Reiter
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Lamprecht @ 2021-03-16 20:17 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 03.03.21 10:56, Stefan Reiter wrote:
> Values chosen by fair dice roll, seems to be a good sweet spot on my
> machine where any less causes performance degradation but any more
> doesn't really make it go any faster.
> 
> Keep in mind that those values are per drive in an actual restore.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Depends on new proxmox-backup.
> 
> v2:
> * unchanged
> 
>  src/restore.rs | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/src/restore.rs b/src/restore.rs
> index 0790d7f..a1acce4 100644
> --- a/src/restore.rs
> +++ b/src/restore.rs
> @@ -218,15 +218,16 @@ impl RestoreTask {
>  
>          let index = client.download_fixed_index(&manifest, &archive_name).await?;
>          let archive_size = index.index_bytes();
> -        let most_used = index.find_most_used_chunks(8);
> +        let most_used = index.find_most_used_chunks(16); // 64 MB most used cache



>  
>          let file_info = manifest.lookup_file_info(&archive_name)?;
>  
> -        let chunk_reader = RemoteChunkReader::new(
> +        let chunk_reader = RemoteChunkReader::new_lru_cached(
>              Arc::clone(&client),
>              self.crypt_config.clone(),
>              file_info.chunk_crypt_mode(),
>              most_used,
> +            64, // 256 MB LRU cache

how does this work with low(er) memory situations? Lots of people do not over
dimension their memory that much, and especially the need for mass-recovery could
seem to correlate with reduced resource availability (a node failed, now I need
to restore X backups on my <test/old/other-already-in-use> node, so multiple
restore jobs may run in parallel, and they all may have even multiple disks,
so tens of GiB of memory just for the cache are not that unlikely.

How is the behavior, hard failure if memory is not available? Also, some archives
may be smaller than 256 MiB (EFI disk??) so there it'd be weird to have 256 cache
and get 64 of most used chunks if that's all/more than it would actually need to
be..

There may be the reversed situation too, beefy fast node with lots of memory
and restore is used as recovery or migration but network bw/latency to PBS is not
that good - so bigger cache could be wanted.

Maybe we could get the available memory and use that as hint, I mean as memory
usage can be highly dynamic it will never be perfect, but better than just ignoring
it..

>          );
>  
>          let reader = AsyncIndexReader::new(index, chunk_reader);
> 





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader
  2021-03-16 20:17   ` Thomas Lamprecht
@ 2021-03-17 13:37     ` Stefan Reiter
  2021-03-17 13:59       ` Thomas Lamprecht
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Reiter @ 2021-03-17 13:37 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion, pbs-devel

On 16/03/2021 21:17, Thomas Lamprecht wrote:
> On 03.03.21 10:56, Stefan Reiter wrote:
>> Values chosen by fair dice roll, seems to be a good sweet spot on my
>> machine where any less causes performance degradation but any more
>> doesn't really make it go any faster.
>>
>> Keep in mind that those values are per drive in an actual restore.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>
>> Depends on new proxmox-backup.
>>
>> v2:
>> * unchanged
>>
>>   src/restore.rs | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/restore.rs b/src/restore.rs
>> index 0790d7f..a1acce4 100644
>> --- a/src/restore.rs
>> +++ b/src/restore.rs
>> @@ -218,15 +218,16 @@ impl RestoreTask {
>>   
>>           let index = client.download_fixed_index(&manifest, &archive_name).await?;
>>           let archive_size = index.index_bytes();
>> -        let most_used = index.find_most_used_chunks(8);
>> +        let most_used = index.find_most_used_chunks(16); // 64 MB most used cache
> 
> 
> 
>>   
>>           let file_info = manifest.lookup_file_info(&archive_name)?;
>>   
>> -        let chunk_reader = RemoteChunkReader::new(
>> +        let chunk_reader = RemoteChunkReader::new_lru_cached(
>>               Arc::clone(&client),
>>               self.crypt_config.clone(),
>>               file_info.chunk_crypt_mode(),
>>               most_used,
>> +            64, // 256 MB LRU cache
> 
> how does this work with low(er) memory situations? Lots of people do not over
> dimension their memory that much, and especially the need for mass-recovery could
> seem to correlate with reduced resource availability (a node failed, now I need
> to restore X backups on my <test/old/other-already-in-use> node, so multiple
> restore jobs may run in parallel, and they all may have even multiple disks,
> so tens of GiB of memory just for the cache are not that unlikely.

This is a seperate function from the regular restore, so it currently 
only affects live-restore. This is not an operation you would usually do 
under memory constraints anyway, and regular restore is unaffected if 
you just want the data.

Upcoming single-file restore too though, I suppose, where it might make 
more sense...

> 
> How is the behavior, hard failure if memory is not available? Also, some archives
> may be smaller than 256 MiB (EFI disk??) so there it'd be weird to have 256 cache
> and get 64 of most used chunks if that's all/more than it would actually need to
> be..

Yes, if memory is unavailable it is a hard error. Memory should not be 
pre-allocated however, so restoring this way will only ever use as much 
memory as the disk size (not accounting for overhead).

> 
> There may be the reversed situation too, beefy fast node with lots of memory
> and restore is used as recovery or migration but network bw/latency to PBS is not
> that good - so bigger cache could be wanted.

The reason I chose the numbers I did was that I couldn't see any real 
performance benefits by going higher, though I didn't specifically test 
with slow networking.

I don't believe more cache would improve the situation there though, 
this is mostly to avoid random access from the guest and the linear 
access from the block-stream operation to interfere with each other, and 
allow multiple smaller guest reads within the same chunk to be served 
quickly.

> 
> Maybe we could get the available memory and use that as hint, I mean as memory
> usage can be highly dynamic it will never be perfect, but better than just ignoring
> it..

If anything, I'd make it user-configurable - I don't think a heuristic 
would be a good choice here.

This way we could also set it smaller for single-file restore for 
example - on the other hand, that adds another parameter to the already 
somewhat cluttered QEMU<->Rust interface.

> 
>>           );
>>   
>>           let reader = AsyncIndexReader::new(index, chunk_reader);
>>
> 




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader
  2021-03-17 13:37     ` Stefan Reiter
@ 2021-03-17 13:59       ` Thomas Lamprecht
  2021-03-17 16:03         ` [pve-devel] [pbs-devel] " Dietmar Maurer
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Lamprecht @ 2021-03-17 13:59 UTC (permalink / raw)
  To: Stefan Reiter, Proxmox VE development discussion, pbs-devel

On 17.03.21 14:37, Stefan Reiter wrote:
> On 16/03/2021 21:17, Thomas Lamprecht wrote:
>> On 03.03.21 10:56, Stefan Reiter wrote:
>>> Values chosen by fair dice roll, seems to be a good sweet spot on my
>>> machine where any less causes performance degradation but any more
>>> doesn't really make it go any faster.
>>>
>>> Keep in mind that those values are per drive in an actual restore.
>>>
>>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>>> ---
>>>
>>> Depends on new proxmox-backup.
>>>
>>> v2:
>>> * unchanged
>>>
>>>   src/restore.rs | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/restore.rs b/src/restore.rs
>>> index 0790d7f..a1acce4 100644
>>> --- a/src/restore.rs
>>> +++ b/src/restore.rs
>>> @@ -218,15 +218,16 @@ impl RestoreTask {
>>>             let index = client.download_fixed_index(&manifest, &archive_name).await?;
>>>           let archive_size = index.index_bytes();
>>> -        let most_used = index.find_most_used_chunks(8);
>>> +        let most_used = index.find_most_used_chunks(16); // 64 MB most used cache
>>
>>
>>
>>>             let file_info = manifest.lookup_file_info(&archive_name)?;
>>>   -        let chunk_reader = 
RemoteChunkReader::new(
>>> +        let chunk_reader = RemoteChunkReader::new_lru_cached(
>>>               Arc::clone(&client),
>>>               self.crypt_config.clone(),
>>>               file_info.chunk_crypt_mode(),
>>>               most_used,
>>> +            64, // 256 MB LRU cache
>>
>> how does this work with low(er) memory situations? Lots of people do not over
>> dimension their memory that much, and especially the need for mass-recovery could
>> seem to correlate with reduced resource availability (a node failed, now I need
>> to restore X backups on my <test/old/other-already-in-use> node, so multiple
>> restore jobs may run in parallel, and they all may have even multiple disks,
>> so tens of GiB of memory just for the cache are not that unlikely.
> 
> This is a seperate function from the regular restore, so it currently only affects live-restore. This is not an operation you would usually do under memory constraints anyway, and regular restore is unaffected if you just want the data.

And how exactly do you figure/argue that users won't use it if easily available?
Users *will* do use this in a memory constrained environment as it gets their guest
faster up again, cue mass restore on node with not much resources left.
 
> Upcoming single-file restore too though, I suppose, where it might make 
more sense...
> 
>>
>> How is the behavior, hard failure if memory is not available? Also, some archives
>> may be smaller than 256 MiB (EFI disk??) so there it'd be weird to have 256 cache
>> and get 64 of most used chunks if that's all/more than it would actually need to
>> be..
> 
> Yes, if memory is unavailable it is a hard error. Memory should not be pre-allocated however, so restoring this way will only ever use as much memory as the disk size (not accounting for overhead).

So basically RSS is increased by chunk-sized blocks. But a alloc error is 
not a hard
error here for the total operation, couldn't we catch that and continue with the LRU
size we actually have allocated?

> 
>>
>> There may be the reversed situation too, beefy fast node with lots of memory
>> and restore is used as recovery or migration but network bw/latency to 
PBS is not
>> that good - so bigger cache could be wanted.
> 
> The reason I chose the numbers I did was that I couldn't see any real performance benefits by going higher, though I didn't specifically test with slow networking.
> 
> I don't believe more cache would improve the situation there though, this is mostly to avoid random access from the guest and the linear access from the block-stream operation to interfere with each other, and allow multiple smaller guest reads within the same chunk to be served quickly.

What are the workloads you tested to be so sure about this?

From above statement I'd think that for any workload with a working set 
bigger than
256 MiB it would help? So basically any production DB load (albeit that should be
handled by the DBs memory caching, so maybe not the best example).

I'm just thinking that exposing this as a knob could help, must not be 
placed, but would be nice if there.

> 
>>
>> Maybe we could get the available memory and use that as hint, I mean as memory
>> usage can be highly dynamic it will never be perfect, but better than just ignoring
>> it..
> 
> If anything, I'd make it user-configurable - I don't think a heuristic would be a good choice here.

Yeah, heuristic is not an good option as we cannot know how the system memory
situation will be in the future.

> 
> This way we could also set it smaller for single-file restore for example - on the other hand, that adds another parameter to the already somewhat cluttered QEMU<->Rust interface.

cue versioned structs incoming ;)

> 
>>
>>>           );
>>>             let reader = AsyncIndexReader::new(index, chunk_reader);
>>>
>>





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [pve-devel] [pbs-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader
  2021-03-17 13:59       ` Thomas Lamprecht
@ 2021-03-17 16:03         ` Dietmar Maurer
  0 siblings, 0 replies; 25+ messages in thread
From: Dietmar Maurer @ 2021-03-17 16:03 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Thomas Lamprecht,
	Stefan Reiter, Proxmox VE development discussion

What about using a memory mapped files as cache. That way, you do not
need to care about available memory?

> >> Maybe we could get the available memory and use that as hint, I mean as memory
> >> usage can be highly dynamic it will never be perfect, but better than just ignoring
> >> it..
> > 
> > If anything, I'd make it user-configurable - I don't think a heuristic would be a good choice here.
> 
> Yeah, heuristic is not an good option as we cannot know how the system memory
> situation will be in the future.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (10 preceding siblings ...)
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 manager 11/11] ui: restore: add live-restore checkbox Stefan Reiter
@ 2021-03-22 11:08 ` Dominic Jäger
  2021-04-06 19:09 ` [pve-devel] partially-applied: " Thomas Lamprecht
  2021-04-15 18:35 ` [pve-devel] " Thomas Lamprecht
  13 siblings, 0 replies; 25+ messages in thread
From: Dominic Jäger @ 2021-03-22 11:08 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: pbs-devel

On Wed, Mar 03, 2021 at 10:56:01AM +0100, Stefan Reiter wrote:
> 
> "live-restore" allows starting a VM immediately from a backup snapshot, no
> waiting for a long restore process.  

Live restore worked multiple times for me from ext4, xfs & zfs datastores to zfs and lvm with backups of
- a Debian VM and
- a Windows VM

--Tested-by: Dominic Jäger <d.jaeger@proxmox.com>

Possible nice-to-haves for the future:
- Notice if a storage gets full. In the good case only the restore process
  freezes. When choosing local, functions like removing VMs start to fail.
- editing the VM configuration before the restore. Otherwise one wrong setting
  like kvm:1, or a mounted cdrom from some unavailable storage can prevent the
  whole "live" part.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] partially-applied: [PATCH v2 00/11] live-restore for PBS snapshots
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (11 preceding siblings ...)
  2021-03-22 11:08 ` [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Dominic Jäger
@ 2021-04-06 19:09 ` Thomas Lamprecht
  2021-04-15 18:35 ` [pve-devel] " Thomas Lamprecht
  13 siblings, 0 replies; 25+ messages in thread
From: Thomas Lamprecht @ 2021-04-06 19:09 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 03.03.21 10:56, Stefan Reiter wrote:
> qemu-server: Stefan Reiter (5):
>   make qemu_drive_mirror_monitor more generic
>   cfg2cmd: allow PBS snapshots as backing files for drives
>   enable live-restore for PBS
>   extract register_qmeventd_handle to QemuServer.pm
>   live-restore: register qmeventd handle

applied the qemu-server ones with some followups for some smaller cleanups and
improvements, thanks!




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [pve-devel] applied: [PATCH v2 manager 11/11] ui: restore: add live-restore checkbox
  2021-03-03  9:56 ` [pve-devel] [PATCH v2 manager 11/11] ui: restore: add live-restore checkbox Stefan Reiter
@ 2021-04-15 18:34   ` Thomas Lamprecht
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Lamprecht @ 2021-04-15 18:34 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 03.03.21 10:56, Stefan Reiter wrote:
> Add 'isPBS' parameter for Restore window so we can detect when to show
> the 'live-restore' checkbox.
> 
> Includes a warning about this feature being experimental for now.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> v2:
> * unchanged
> 
>  www/manager6/grid/BackupView.js    |  6 ++++-
>  www/manager6/storage/BackupView.js |  5 +++-
>  www/manager6/window/Restore.js     | 38 +++++++++++++++++++++++++++++-
>  3 files changed, 46 insertions(+), 3 deletions(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots
  2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (12 preceding siblings ...)
  2021-04-06 19:09 ` [pve-devel] partially-applied: " Thomas Lamprecht
@ 2021-04-15 18:35 ` Thomas Lamprecht
  13 siblings, 0 replies; 25+ messages in thread
From: Thomas Lamprecht @ 2021-04-15 18:35 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 03.03.21 10:56, Stefan Reiter wrote:
> "live-restore" allows starting a VM immediately from a backup snapshot, no
> waiting for a long restore process. This is made possible with QEMU backing
> images, i.e. data is read from the backup which is attached to the VM as a
> drive, but new data is written to the destination, while a background process
> ('block-stream') copies over data in a linear fashion as well.

as mentioned off-list, it works good in general, but live-migration afterwards
seem to be broken, as resume on target fails:

kvm: ../target/i386/kvm.c:2538: kvm_put_msr_feature_control: Assertion `ret == 1' failed.

Stack trace of thread 1240139:
#0  0x00007f96834617bb __GI_raise (libc.so.6)
#1  0x00007f968344c535 __GI_abort (libc.so.6)
#2  0x00007f968344c40f __assert_fail_base (libc.so.6)
#3  0x00007f968345a102 __GI___assert_fail (libc.so.6)
#4  0x000055e795c18b82 kvm_put_msr_feature_control (qemu-system-x86_64)
#5  0x000055e795c9be7e do_kvm_cpu_synchronize_post_init (qemu-system-x86_64)
#6  0x000055e795acc5a6 process_queued_cpu_work (qemu-system-x86_64)
#7  0x000055e795d355b8 kvm_vcpu_thread_fn (qemu-system-x86_64)
#8  0x000055e795ece91a qemu_thread_start (qemu-system-x86_64)
#9  0x00007f96835f4fa3 start_thread (libpthread.so.0)
#10 0x00007f96835234cf __clone (libc.so.6)




^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-04-15 18:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03  9:56 [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Stefan Reiter
2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 01/11] clean up pve/ patches by merging Stefan Reiter
2021-03-03 16:32   ` [pve-devel] applied: " Thomas Lamprecht
2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 02/11] move bitmap-mirror patches to seperate folder Stefan Reiter
2021-03-03 16:32   ` [pve-devel] applied: " Thomas Lamprecht
2021-03-03  9:56 ` [pve-devel] [PATCH v2 pve-qemu 03/11] add alloc-track block driver patch Stefan Reiter
2021-03-15 14:14   ` Wolfgang Bumiller
2021-03-15 15:41     ` [pve-devel] [PATCH pve-qemu v3] " Stefan Reiter
2021-03-16 19:57       ` [pve-devel] applied: " Thomas Lamprecht
2021-03-03  9:56 ` [pve-devel] [PATCH v2 proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant Stefan Reiter
2021-03-03  9:56 ` [pve-devel] [PATCH v2 proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader Stefan Reiter
2021-03-16 20:17   ` Thomas Lamprecht
2021-03-17 13:37     ` Stefan Reiter
2021-03-17 13:59       ` Thomas Lamprecht
2021-03-17 16:03         ` [pve-devel] [pbs-devel] " Dietmar Maurer
2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 06/11] make qemu_drive_mirror_monitor more generic Stefan Reiter
2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives Stefan Reiter
2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 08/11] enable live-restore for PBS Stefan Reiter
2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm Stefan Reiter
2021-03-03  9:56 ` [pve-devel] [PATCH v2 qemu-server 10/11] live-restore: register qmeventd handle Stefan Reiter
2021-03-03  9:56 ` [pve-devel] [PATCH v2 manager 11/11] ui: restore: add live-restore checkbox Stefan Reiter
2021-04-15 18:34   ` [pve-devel] applied: " Thomas Lamprecht
2021-03-22 11:08 ` [pve-devel] [PATCH v2 00/11] live-restore for PBS snapshots Dominic Jäger
2021-04-06 19:09 ` [pve-devel] partially-applied: " Thomas Lamprecht
2021-04-15 18:35 ` [pve-devel] " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal