public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH 00/11] live-restore for PBS snapshots
@ 2021-01-11 11:13 Stefan Reiter
  2021-01-11 11:13 ` [pve-devel] [PATCH qemu 01/11] PVE: explicitly add libuuid as linking dependency Stefan Reiter
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:13 UTC (permalink / raw)
  To: pve-devel, pbs-devel

"live-restore" allows starting a VM immediately from a backup snapshot, no
waiting for a long restore process. This is made possible with QEMU backing
images, i.e. data is read from the backup which is attached to the VM as a
drive, but new data is written to the destination, while a background process
('block-stream') copies over data in a linear fashion as well.

QEMU backing images are normally only supported for qcow2 images, but since the
destination always starts out empty, we can use a dirty bitmap to achieve the
same effect - this is implemented as the 'alloc-track' driver in the 'qemu' part
of the series.

The Rust part of the equation is adjusted to provide (quiet a lot) more caching,
as mixing random read/write from the guest with the linear reads from the
background process (both of which may use read sizes smaller or bigger than a
single chunk) would thrash performance without large buffers.

I've marked the feature as 'experimental' in the GUI for now, as I'm sure there
are a lot of edge cases I've missed to test, and there's also the possibility of
data loss, since anything the VM writes during the restore is removed if the
restore fails.


qemu: Stefan Reiter (3):
  PVE: explicitly add libuuid as linking dependency
  PVE: block/pbs: fast-path reads without allocation if possible
  block: add alloc-track driver

 Makefile.objs       |   2 +
 block/Makefile.objs |   1 +
 block/alloc-track.c | 319 ++++++++++++++++++++++++++++++++++++++++++++
 block/pbs.c         |  17 ++-
 4 files changed, 336 insertions(+), 3 deletions(-)
 create mode 100644 block/alloc-track.c

proxmox-backup: Stefan Reiter (1):
  RemoteChunkReader: add LRU cached variant

 src/bin/proxmox_backup_client/mount.rs |  4 +-
 src/client/remote_chunk_reader.rs      | 77 ++++++++++++++++++++------
 2 files changed, 62 insertions(+), 19 deletions(-)

proxmox-backup-qemu: Stefan Reiter (1):
  access: use bigger cache and LRU chunk reader

 src/restore.rs | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

qemu-server: Stefan Reiter (5):
  make qemu_drive_mirror_monitor more generic
  cfg2cmd: allow PBS snapshots as backing files for drives
  enable live-restore for PBS
  extract register_qmeventd_handle to QemuServer.pm
  live-restore: register qmeventd handle

 PVE/API2/Qemu.pm         |  15 ++-
 PVE/QemuServer.pm        | 263 ++++++++++++++++++++++++++++++++-------
 PVE/VZDump/QemuServer.pm |  32 +----
 3 files changed, 233 insertions(+), 77 deletions(-)

manager: Stefan Reiter (1):
  ui: restore: add live-restore checkbox

 www/manager6/grid/BackupView.js    |  8 +++++--
 www/manager6/storage/BackupView.js |  7 ++++--
 www/manager6/window/Restore.js     | 38 +++++++++++++++++++++++++++++-
 3 files changed, 48 insertions(+), 5 deletions(-)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu 01/11] PVE: explicitly add libuuid as linking dependency
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
@ 2021-01-11 11:13 ` Stefan Reiter
  2021-01-12 12:04   ` [pve-devel] [pbs-devel] " Thomas Lamprecht
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu 02/11] PVE: block/pbs: fast-path reads without allocation if possible Stefan Reiter
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:13 UTC (permalink / raw)
  To: pve-devel, pbs-devel

This previously only worked since linking against glusterfs pulls in
libuuid as well. Make it more explicit and allow debug builds without
glusterfs.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Unrelated to rest of series, just annoyed me.

 Makefile.objs | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Makefile.objs b/Makefile.objs
index c7ba4e11e7..9c8d17df70 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -54,6 +54,8 @@ common-obj-y += net/
 common-obj-y += qdev-monitor.o
 common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
+pve-backup.o-libs := -luuid
+vma-writer.o-libs := -luuid
 os-posix.o-libs := -lsystemd
 
 common-obj-$(CONFIG_LINUX) += fsdev/
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu 02/11] PVE: block/pbs: fast-path reads without allocation if possible
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
  2021-01-11 11:13 ` [pve-devel] [PATCH qemu 01/11] PVE: explicitly add libuuid as linking dependency Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-12  9:29   ` Wolfgang Bumiller
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver Stefan Reiter
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

...and switch over to g_malloc/g_free while at it to align with other
QEMU code.

Tracing shows the fast-path is taken almost all the time, though not
100% so the slow one is still necessary.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

For that slight unnoticable performance boost :)

 block/pbs.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/block/pbs.c b/block/pbs.c
index 1481a2bfd1..fbf0d8d845 100644
--- a/block/pbs.c
+++ b/block/pbs.c
@@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
     BDRVPBSState *s = bs->opaque;
     int ret;
     char *pbs_error = NULL;
-    uint8_t *buf = malloc(bytes);
+    uint8_t *buf;
+    bool inline_buf = true;
+
+    /* for single-buffer IO vectors we can fast-path the write directly to it */
+    if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
+        buf = qiov->iov->iov_base;
+    } else {
+        inline_buf = false;
+        buf = g_malloc(bytes);
+    }
 
     ReadCallbackData rcb = {
         .co = qemu_coroutine_self(),
@@ -218,8 +227,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
         return -EIO;
     }
 
-    qemu_iovec_from_buf(qiov, 0, buf, bytes);
-    free(buf);
+    if (!inline_buf) {
+        qemu_iovec_from_buf(qiov, 0, buf, bytes);
+        g_free(buf);
+    }
 
     return ret;
 }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
  2021-01-11 11:13 ` [pve-devel] [PATCH qemu 01/11] PVE: explicitly add libuuid as linking dependency Stefan Reiter
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu 02/11] PVE: block/pbs: fast-path reads without allocation if possible Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-12 10:54   ` Wolfgang Bumiller
  2021-01-11 11:14 ` [pve-devel] [PATCH proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant Stefan Reiter
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Add a new filter node 'alloc-track', which seperates reads and writes to
different children, thus allowing to put a backing image behind any
blockdev (regardless of driver support). Since we can't detect any
pre-allocated blocks, we can only track new writes, hence the write
target ('file') for this node must always be empty.

Intended use case is for live restoring, i.e. add a backup image as a
block device into a VM, then put an alloc-track on the restore target
and set the backup as backing. With this, one can use a regular
'block-stream' to restore the image, while the VM can already run in the
background. Copy-on-read will help make progress as the VM reads as
well.

This only worked if the target supports backing images, so up until now
only for qcow2, with alloc-track any driver for the target can be used.

If 'auto-remove' is set, alloc-track will automatically detach itself
once the backing image is removed. It will be replaced by 'file'.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Potentially for upstream as well, but let's maybe test it a bit here first ;)

 block/Makefile.objs |   1 +
 block/alloc-track.c | 319 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 320 insertions(+)
 create mode 100644 block/alloc-track.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 28fb3b7f7c..04f91b8cc9 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -13,6 +13,7 @@ block-obj-$(CONFIG_QED) += qed-check.o
 block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
 block-obj-y += quorum.o
 block-obj-y += zeroinit.o
+block-obj-y += alloc-track.o
 block-obj-y += blkdebug.o blkverify.o blkreplay.o
 block-obj-$(CONFIG_PARALLELS) += parallels.o
 block-obj-y += blklogwrites.o
diff --git a/block/alloc-track.c b/block/alloc-track.c
new file mode 100644
index 0000000000..f2a4080579
--- /dev/null
+++ b/block/alloc-track.c
@@ -0,0 +1,319 @@
+/*
+ * Filter to allow backing images to be applied to any node. Assumes a blank
+ * image to begin with, only new writes are tracked as allocated, thus this
+ * must never be put on a node that already contains data.
+ *
+ * Copyright (c) 2020 Proxmox Server Solutions GmbH
+ * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/cutils.h"
+#include "qemu/option.h"
+#include "qemu/module.h"
+
+#define TRACK_OPT_AUTO_REMOVE "auto-remove"
+
+typedef struct {
+    BdrvDirtyBitmap *bitmap;
+    bool dropping;
+    bool auto_remove;
+} BDRVAllocTrackState;
+
+static QemuOptsList runtime_opts = {
+    .name = "alloc-track",
+    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+    .desc = {
+        {
+            .name = TRACK_OPT_AUTO_REMOVE,
+            .type = QEMU_OPT_BOOL,
+            .help = "automatically replace this node with 'file' when 'backing'"
+                    "is detached",
+        },
+        { /* end of list */ }
+    },
+};
+
+static void track_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    BlockDriverInfo bdi;
+
+    if (!bs->file) {
+        return;
+    }
+
+    /* always use alignment from underlying write device so RMW cycle for
+     * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
+     * cluster allocation in 'file') */
+    bdrv_get_info(bs->file->bs, &bdi);
+    bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
+                                   MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
+}
+
+static int track_open(BlockDriverState *bs, QDict *options, int flags,
+                      Error **errp)
+{
+    BDRVAllocTrackState *s = bs->opaque;
+    QemuOpts *opts;
+    Error *local_err = NULL;
+    int ret = 0;
+
+    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
+
+    /* open the target (write) node, backing will be attached by block layer */
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+                               BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
+                               &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        error_propagate(errp, local_err);
+        goto fail;
+    }
+
+    track_refresh_limits(bs, errp);
+    uint64_t gran = bs->bl.request_alignment;
+    s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
+    if (local_err) {
+        ret = -EIO;
+        error_propagate(errp, local_err);
+        goto fail;
+    }
+
+    s->dropping = false;
+
+fail:
+    if (ret < 0) {
+        bdrv_unref_child(bs, bs->file);
+        if (s->bitmap) {
+            bdrv_release_dirty_bitmap(s->bitmap);
+        }
+    }
+    qemu_opts_del(opts);
+    return ret;
+}
+
+static void track_close(BlockDriverState *bs)
+{
+    BDRVAllocTrackState *s = bs->opaque;
+    if (s->bitmap) {
+        bdrv_release_dirty_bitmap(s->bitmap);
+    }
+}
+
+static int64_t track_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file->bs);
+}
+
+static int coroutine_fn track_co_preadv(BlockDriverState *bs,
+    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+{
+    BDRVAllocTrackState *s = bs->opaque;
+    QEMUIOVector local_qiov;
+    int ret;
+
+    /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
+    uint64_t cur_offset, local_offset;
+    int64_t local_bytes;
+    bool alloc;
+
+    /* a read request can span multiple granularity-sized chunks, and can thus
+     * contain blocks with different allocation status - we could just iterate
+     * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
+     * to find the next flip and consider everything up to that in one go */
+    for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
+        local_offset = offset + cur_offset;
+        alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
+        if (alloc) {
+            local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
+                                                      bytes - cur_offset);
+        } else {
+            local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
+                                                       bytes - cur_offset);
+        }
+
+        /* _bitmap_next_X return is -1 if no end found within limit, otherwise
+         * offset of next flip (to start of image) */
+        local_bytes = local_bytes < 0 ?
+            bytes - cur_offset :
+            local_bytes - local_offset;
+
+        qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
+
+        if (alloc) {
+            ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
+                                 &local_qiov, flags);
+        } else if (bs->backing) {
+            ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
+                                 &local_qiov, flags);
+        } else {
+            ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+        }
+
+        if (ret != 0) {
+            break;
+        }
+    }
+
+    return ret;
+}
+
+static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
+    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+{
+    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int count, BdrvRequestFlags flags)
+{
+    return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
+}
+
+static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
+    int64_t offset, int count)
+{
+    return bdrv_co_pdiscard(bs->file, offset, count);
+}
+
+static coroutine_fn int track_co_flush(BlockDriverState *bs)
+{
+    return bdrv_co_flush(bs->file->bs);
+}
+
+static int coroutine_fn track_co_block_status(BlockDriverState *bs,
+                                              bool want_zero,
+                                              int64_t offset,
+                                              int64_t bytes,
+                                              int64_t *pnum,
+                                              int64_t *map,
+                                              BlockDriverState **file)
+{
+    BDRVAllocTrackState *s = bs->opaque;
+
+    bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
+    int64_t next_flipped;
+    if (alloc) {
+        next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
+    } else {
+        next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
+    }
+
+    /* in case not the entire region has the same state, we need to set pnum to
+     * indicate for how many bytes our result is valid */
+    *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
+    *map = offset;
+
+    if (alloc) {
+        *file = bs->file->bs;
+        return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+    } else {
+        *file = bs->backing->bs;
+        return 0;
+    }
+}
+
+static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
+                             BdrvChildRole role, BlockReopenQueue *reopen_queue,
+                             uint64_t perm, uint64_t shared,
+                             uint64_t *nperm, uint64_t *nshared)
+{
+    BDRVAllocTrackState *s = bs->opaque;
+
+    *nshared = BLK_PERM_ALL;
+
+    /* in case we're currently dropping ourselves, claim to not use any
+     * permissions at all - which is fine, since from this point on we will
+     * never issue a read or write anymore */
+    if (s->dropping) {
+        *nperm = 0;
+        return;
+    }
+
+    if (role & BDRV_CHILD_DATA) {
+        *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+    } else {
+        /* 'backing' is also a child of our BDS, but we don't expect it to be
+         * writeable, so we only forward 'consistent read' */
+        *nperm = perm & BLK_PERM_CONSISTENT_READ;
+    }
+}
+
+static void track_drop(void *opaque)
+{
+    BlockDriverState *bs = (BlockDriverState*)opaque;
+    BDRVAllocTrackState *s = bs->opaque;
+
+    bdrv_drained_begin(bs);
+
+    s->dropping = true;
+
+    bdrv_child_refresh_perms(bs, bs->file, &error_abort);
+    bdrv_replace_node(bs, bs->file->bs, &error_abort);
+    bdrv_set_backing_hd(bs, NULL, &error_abort);
+
+    bdrv_drained_end(bs);
+}
+
+static int track_change_backing_file(BlockDriverState *bs,
+                                     const char *backing_file,
+                                     const char *backing_fmt)
+{
+    if (backing_file == NULL && backing_fmt == NULL) {
+        /* backing file has been disconnected, there's no longer any use for
+         * this node, so let's remove ourselves from the block graph - we need
+         * to schedule this for later however, since when this function is
+         * called, the blockjob modifying us is probably not done yet and has a
+         * blocker on 'bs' */
+        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
+    }
+
+    return 0;
+}
+
+static BlockDriver bdrv_alloc_track = {
+    .format_name                      = "alloc-track",
+    .instance_size                    = sizeof(BDRVAllocTrackState),
+
+    .bdrv_file_open                   = track_open,
+    .bdrv_close                       = track_close,
+    .bdrv_getlength                   = track_getlength,
+    .bdrv_child_perm                  = track_child_perm,
+    .bdrv_refresh_limits              = track_refresh_limits,
+
+    .bdrv_co_pwrite_zeroes            = track_co_pwrite_zeroes,
+    .bdrv_co_pwritev                  = track_co_pwritev,
+    .bdrv_co_preadv                   = track_co_preadv,
+    .bdrv_co_pdiscard                 = track_co_pdiscard,
+
+    .bdrv_co_flush                    = track_co_flush,
+    .bdrv_co_flush_to_disk            = track_co_flush,
+
+    .is_filter                        = true,
+    .supports_backing                 = true,
+
+    .bdrv_co_block_status             = track_co_block_status,
+    .bdrv_change_backing_file         = track_change_backing_file,
+};
+
+static void bdrv_alloc_track_init(void)
+{
+    bdrv_register(&bdrv_alloc_track);
+}
+
+block_init(bdrv_alloc_track_init);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (2 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-11 11:14 ` [pve-devel] [PATCH proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader Stefan Reiter
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Retain the old constructor for compatibility, most use cases don't need
an LRU cache anyway.

For now convert the 'mount' API to use the new variant, as the same set
of chunks might be accessed multiple times in a random pattern there.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

I looked at using the Accessor API of LruCache first, which would make this a
bit cleaner, but that's a trait and we use async, so...

 src/bin/proxmox_backup_client/mount.rs |  4 +-
 src/client/remote_chunk_reader.rs      | 77 ++++++++++++++++++++------
 2 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/src/bin/proxmox_backup_client/mount.rs b/src/bin/proxmox_backup_client/mount.rs
index 6a22f78b..7785d812 100644
--- a/src/bin/proxmox_backup_client/mount.rs
+++ b/src/bin/proxmox_backup_client/mount.rs
@@ -251,7 +251,7 @@ async fn mount_do(param: Value, pipe: Option<Fd>) -> Result<Value, Error> {
     if server_archive_name.ends_with(".didx") {
         let index = client.download_dynamic_index(&manifest, &server_archive_name).await?;
         let most_used = index.find_most_used_chunks(8);
-        let chunk_reader = RemoteChunkReader::new(client.clone(), crypt_config, file_info.chunk_crypt_mode(), most_used);
+        let chunk_reader = RemoteChunkReader::new_lru_cached(client.clone(), crypt_config, file_info.chunk_crypt_mode(), most_used, 16);
         let reader = BufferedDynamicReader::new(index, chunk_reader);
         let archive_size = reader.archive_size();
         let reader: proxmox_backup::pxar::fuse::Reader =
@@ -277,7 +277,7 @@ async fn mount_do(param: Value, pipe: Option<Fd>) -> Result<Value, Error> {
     } else if server_archive_name.ends_with(".fidx") {
         let index = client.download_fixed_index(&manifest, &server_archive_name).await?;
         let size = index.index_bytes();
-        let chunk_reader = RemoteChunkReader::new(client.clone(), crypt_config, file_info.chunk_crypt_mode(), HashMap::new());
+        let chunk_reader = RemoteChunkReader::new_lru_cached(client.clone(), crypt_config, file_info.chunk_crypt_mode(), HashMap::new(), 16);
         let reader = AsyncIndexReader::new(index, chunk_reader);
 
         let name = &format!("{}:{}/{}", repo.to_string(), path, archive_name);
diff --git a/src/client/remote_chunk_reader.rs b/src/client/remote_chunk_reader.rs
index 06f693a2..1314bcdc 100644
--- a/src/client/remote_chunk_reader.rs
+++ b/src/client/remote_chunk_reader.rs
@@ -8,6 +8,13 @@ use anyhow::{bail, Error};
 use super::BackupReader;
 use crate::backup::{AsyncReadChunk, CryptConfig, CryptMode, DataBlob, ReadChunk};
 use crate::tools::runtime::block_on;
+use crate::tools::lru_cache::LruCache;
+
+struct Cache {
+    cache_hint: HashMap<[u8; 32], usize>,
+    hinted: HashMap<[u8; 32], Vec<u8>>,
+    lru: Option<LruCache<[u8; 32], Vec<u8>>>,
+}
 
 /// Read chunks from remote host using ``BackupReader``
 #[derive(Clone)]
@@ -15,8 +22,7 @@ pub struct RemoteChunkReader {
     client: Arc<BackupReader>,
     crypt_config: Option<Arc<CryptConfig>>,
     crypt_mode: CryptMode,
-    cache_hint: Arc<HashMap<[u8; 32], usize>>,
-    cache: Arc<Mutex<HashMap<[u8; 32], Vec<u8>>>>,
+    cache: Arc<Mutex<Cache>>,
 }
 
 impl RemoteChunkReader {
@@ -28,13 +34,30 @@ impl RemoteChunkReader {
         crypt_config: Option<Arc<CryptConfig>>,
         crypt_mode: CryptMode,
         cache_hint: HashMap<[u8; 32], usize>,
+    ) -> Self {
+        Self::new_lru_cached(client, crypt_config, crypt_mode, cache_hint, 0)
+    }
+
+    /// Create a new instance.
+    ///
+    /// Chunks listed in ``cache_hint`` are cached and kept in RAM, as well as the last
+    /// 'cache_last' accessed chunks.
+    pub fn new_lru_cached(
+        client: Arc<BackupReader>,
+        crypt_config: Option<Arc<CryptConfig>>,
+        crypt_mode: CryptMode,
+        cache_hint: HashMap<[u8; 32], usize>,
+        cache_last: usize,
     ) -> Self {
         Self {
             client,
             crypt_config,
             crypt_mode,
-            cache_hint: Arc::new(cache_hint),
-            cache: Arc::new(Mutex::new(HashMap::new())),
+            cache: Arc::new(Mutex::new(Cache {
+                hinted: HashMap::with_capacity(cache_hint.len()),
+                lru: if cache_last == 0 { None } else { Some(LruCache::new(cache_last)) },
+                cache_hint,
+            })),
         }
     }
 
@@ -64,6 +87,34 @@ impl RemoteChunkReader {
             },
         }
     }
+
+    fn cache_get(&self, digest: &[u8; 32]) -> Option<Vec<u8>> {
+        let cache = &mut *self.cache.lock().unwrap();
+        if let Some(data) = cache.hinted.get(digest) {
+            return Some(data.to_vec());
+        }
+
+        cache
+            .lru
+            .as_mut()
+            .map(|lru| lru.get_mut(*digest).map(|x| x.to_vec()))
+            .flatten()
+    }
+
+    fn cache_insert(&self, digest: &[u8; 32], raw_data: &Vec<u8>) {
+        let cache = &mut *self.cache.lock().unwrap();
+
+        // if hinted, always cache given digest
+        if cache.cache_hint.contains_key(digest) {
+            cache.hinted.insert(*digest, raw_data.to_vec());
+            return;
+        }
+
+        // otherwise put in LRU
+        if let Some(ref mut lru) = cache.lru {
+            lru.insert(*digest, raw_data.to_vec());
+        }
+    }
 }
 
 impl ReadChunk for RemoteChunkReader {
@@ -72,18 +123,14 @@ impl ReadChunk for RemoteChunkReader {
     }
 
     fn read_chunk(&self, digest: &[u8; 32]) -> Result<Vec<u8>, Error> {
-        if let Some(raw_data) = (*self.cache.lock().unwrap()).get(digest) {
-            return Ok(raw_data.to_vec());
+        if let Some(raw_data) = self.cache_get(digest) {
+            return Ok(raw_data);
         }
 
         let chunk = ReadChunk::read_raw_chunk(self, digest)?;
 
         let raw_data = chunk.decode(self.crypt_config.as_ref().map(Arc::as_ref), Some(digest))?;
-
-        let use_cache = self.cache_hint.contains_key(digest);
-        if use_cache {
-            (*self.cache.lock().unwrap()).insert(*digest, raw_data.to_vec());
-        }
+        self.cache_insert(digest, &raw_data);
 
         Ok(raw_data)
     }
@@ -102,18 +149,14 @@ impl AsyncReadChunk for RemoteChunkReader {
         digest: &'a [u8; 32],
     ) -> Pin<Box<dyn Future<Output = Result<Vec<u8>, Error>> + Send + 'a>> {
         Box::pin(async move {
-            if let Some(raw_data) = (*self.cache.lock().unwrap()).get(digest) {
+            if let Some(raw_data) = self.cache_get(digest) {
                 return Ok(raw_data.to_vec());
             }
 
             let chunk = Self::read_raw_chunk(self, digest).await?;
 
             let raw_data = chunk.decode(self.crypt_config.as_ref().map(Arc::as_ref), Some(digest))?;
-
-            let use_cache = self.cache_hint.contains_key(digest);
-            if use_cache {
-                (*self.cache.lock().unwrap()).insert(*digest, raw_data.to_vec());
-            }
+            self.cache_insert(digest, &raw_data);
 
             Ok(raw_data)
         })
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (3 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 06/11] make qemu_drive_mirror_monitor more generic Stefan Reiter
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Values chosen by fair dice roll, seems to be a good sweet spot on my
machine where any less causes performance degradation but any more
doesn't really make it go any faster.

Keep in mind that those values are per drive in an actual restore.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Depends on new proxmox-backup.

 src/restore.rs | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/restore.rs b/src/restore.rs
index 24983dd..fdb3d1c 100644
--- a/src/restore.rs
+++ b/src/restore.rs
@@ -218,15 +218,16 @@ impl RestoreTask {
 
         let index = client.download_fixed_index(&manifest, &archive_name).await?;
         let archive_size = index.index_bytes();
-        let most_used = index.find_most_used_chunks(8);
+        let most_used = index.find_most_used_chunks(16); // 64 MB most used cache
 
         let file_info = manifest.lookup_file_info(&archive_name)?;
 
-        let chunk_reader = RemoteChunkReader::new(
+        let chunk_reader = RemoteChunkReader::new_lru_cached(
             Arc::clone(&client),
             self.crypt_config.clone(),
             file_info.chunk_crypt_mode(),
             most_used,
+            64, // 256 MB LRU cache
         );
 
         let reader = AsyncIndexReader::new(index, chunk_reader);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu-server 06/11] make qemu_drive_mirror_monitor more generic
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (4 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-12 13:19   ` Wolfgang Bumiller
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives Stefan Reiter
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

...so it works with other block jobs as well. Intended use case is
block-stream, which also requires a new "auto" (wait only) completion
mode, since it finishes automatically anyway.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 PVE/QemuServer.pm | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index bca5669..d517dae 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6777,15 +6777,16 @@ sub qemu_drive_mirror {
 	die "mirroring error: $err\n";
     }
 
-    qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga);
+    qemu_drive_job_monitor("mirror", $vmid, $vmiddst, $jobs, $completion, $qga);
 }
 
 # $completion can be either
 # 'complete': wait until all jobs are ready, block-job-complete them (default)
 # 'cancel': wait until all jobs are ready, block-job-cancel them
 # 'skip': wait until all jobs are ready, return with block jobs in ready state
-sub qemu_drive_mirror_monitor {
-    my ($vmid, $vmiddst, $jobs, $completion, $qga) = @_;
+# 'auto': wait until all jobs disappear, only use for jobs which complete automatically
+sub qemu_drive_job_monitor {
+    my ($op, $vmid, $vmiddst, $jobs, $completion, $qga) = @_;
 
     $completion //= 'complete';
 
@@ -6793,46 +6794,50 @@ sub qemu_drive_mirror_monitor {
 	my $err_complete = 0;
 
 	while (1) {
-	    die "storage migration timed out\n" if $err_complete > 300;
+	    die "block job ('$op') timed out\n" if $err_complete > 300;
 
 	    my $stats = mon_cmd($vmid, "query-block-jobs");
 
-	    my $running_mirror_jobs = {};
+	    my $running_jobs = {};
 	    foreach my $stat (@$stats) {
-		next if $stat->{type} ne 'mirror';
-		$running_mirror_jobs->{$stat->{device}} = $stat;
+		next if $stat->{type} ne $op;
+		$running_jobs->{$stat->{device}} = $stat;
 	    }
 
 	    my $readycounter = 0;
 
 	    foreach my $job (keys %$jobs) {
 
-	        if(defined($jobs->{$job}->{complete}) && !defined($running_mirror_jobs->{$job})) {
-		    print "$job : finished\n";
+		my $vanished = !defined($running_jobs->{$job});
+		my $complete = defined($jobs->{$job}->{complete}) && $vanished;
+	        if($complete || ($vanished && $completion eq 'auto')) {
+		    print "$job: finished\n";
 		    delete $jobs->{$job};
 		    next;
 		}
 
-		die "$job: mirroring has been cancelled\n" if !defined($running_mirror_jobs->{$job});
+		die "$job: '$op' has been cancelled\n" if !defined($running_jobs->{$job});
 
-		my $busy = $running_mirror_jobs->{$job}->{busy};
-		my $ready = $running_mirror_jobs->{$job}->{ready};
-		if (my $total = $running_mirror_jobs->{$job}->{len}) {
-		    my $transferred = $running_mirror_jobs->{$job}->{offset} || 0;
+		my $busy = $running_jobs->{$job}->{busy};
+		my $ready = $running_jobs->{$job}->{ready};
+		if (my $total = $running_jobs->{$job}->{len}) {
+		    my $transferred = $running_jobs->{$job}->{offset} || 0;
 		    my $remaining = $total - $transferred;
 		    my $percent = sprintf "%.2f", ($transferred * 100 / $total);
 
 		    print "$job: transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n";
 		}
 
-		$readycounter++ if $running_mirror_jobs->{$job}->{ready};
+		$readycounter++ if $running_jobs->{$job}->{ready};
 	    }
 
 	    last if scalar(keys %$jobs) == 0;
 
 	    if ($readycounter == scalar(keys %$jobs)) {
-		print "all mirroring jobs are ready \n";
-		last if $completion eq 'skip'; #do the complete later
+		print "all '$op' jobs are ready\n";
+
+		# do the complete later (or has already been done)
+		last if $completion eq 'skip' || $completion eq 'auto';
 
 		if ($vmiddst && $vmiddst != $vmid) {
 		    my $agent_running = $qga && qga_check_running($vmid);
@@ -6888,7 +6893,7 @@ sub qemu_drive_mirror_monitor {
 
     if ($err) {
 	eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
-	die "mirroring error: $err";
+	die "block job ('$op') error: $err";
     }
 
 }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (5 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 06/11] make qemu_drive_mirror_monitor more generic Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-28 16:25   ` Thomas Lamprecht
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 08/11] enable live-restore for PBS Stefan Reiter
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Uses the custom 'alloc-track' filter node to redirect writes to the
original drives target, while unwritten blocks will be read from the
specified PBS snapshot.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 PVE/QemuServer.pm | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index d517dae..6de6ff6 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -3022,7 +3022,8 @@ sub query_understood_cpu_flags {
 }
 
 sub config_to_command {
-    my ($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu) = @_;
+    my ($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu,
+        $pbs_backing) = @_;
 
     my $cmd = [];
     my $globalFlags = [];
@@ -3521,6 +3522,32 @@ sub config_to_command {
 	my $drive_cmd = print_drive_commandline_full($storecfg, $vmid, $drive);
 	$drive_cmd .= ',readonly' if PVE::QemuConfig->is_template($conf);
 
+	if ($pbs_backing && defined($pbs_backing->{$ds})) {
+	    my $pbs_conf = $pbs_backing->{$ds};
+	    my $pbs_name = "drive-$ds-pbs";
+
+	    # add PBS block device to access snapshot from QEMU
+	    my $blockdev = "driver=pbs,node-name=$pbs_name,read-only=on";
+	    $blockdev .= ",repository=$pbs_conf->{repository}";
+	    $blockdev .= ",snapshot=$pbs_conf->{snapshot}";
+	    $blockdev .= ",archive=$pbs_conf->{archive}";
+	    $blockdev .= ",keyfile=$pbs_conf->{keyfile}" if $pbs_conf->{keyfile};
+	    push @$devices, '-blockdev', $blockdev;
+
+	    # modify drive command to put the desired target file behind an
+	    # 'alloc-track' node
+	    $drive_cmd =~ s/file=([^,]+)/file.file.filename=$1/;
+	    $drive_cmd =~ s/,format=([^,]+)/,file.driver=$1,format=alloc-track/;
+	    $drive_cmd .= ",backing=$pbs_name";
+	    $drive_cmd .= ",auto-remove=on";
+
+	    # note: 'cache' and 'aio' directly affect the 'drive' parameter, so
+	    # we don't need to change them to 'file.', but 'detect-zeroes' works
+	    # per blockdev and we want it to persist after the alloc-track is
+	    # removed, so put it on 'file' directly
+	    $drive_cmd =~ s/,detect-zeroes=([^,]+)/,file.detect-zeroes=$1/;
+	}
+
 	push @$devices, '-drive',$drive_cmd;
 	push @$devices, '-device', print_drivedevice_full(
 	    $storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
@@ -4915,6 +4942,15 @@ sub vm_start {
 #   timeout => in seconds
 #   paused => start VM in paused state (backup)
 #   resume => resume from hibernation
+#   pbs-backing => {
+#      sata0 => {
+#         repository
+#         snapshot
+#         keyfile
+#         archive
+#      },
+#      virtio2 => ...
+#   }
 # migrate_opts:
 #   nbd => volumes for NBD exports (vm_migrate_alloc_nbd_disks)
 #   migratedfrom => source node
@@ -4961,8 +4997,8 @@ sub vm_start_nolock {
 	print "Resuming suspended VM\n";
     }
 
-    my ($cmd, $vollist, $spice_port) =
-	config_to_command($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu);
+    my ($cmd, $vollist, $spice_port) = config_to_command($storecfg, $vmid,
+	$conf, $defaults, $forcemachine, $forcecpu, $params->{'pbs-backing'});
 
     my $migration_ip;
     my $get_migration_ip = sub {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu-server 08/11] enable live-restore for PBS
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (6 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm Stefan Reiter
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Enables live-restore functionality using the 'alloc-track' QEMU driver.
This allows starting a VM immediately when restoring from a PBS
snapshot. The snapshot is mounted into the VM, so it can boot from that,
while guest reads and a 'block-stream' job handle the restore in the
background.

If an error occurs, the VM is deleted and all data written during the
restore is lost.

The VM remains locked during the restore, which automatically prohibits
any modifications to the config while restoring. Some modifications
might potentially be safe, however, this is experimental enough that I
believe this would cause more bad stuff(tm) than actually satisfy any
use cases.

Pool handling is slightly adjusted so the VM can be added to the pool
before the restore starts.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Looks better with -w

 PVE/API2/Qemu.pm  |  15 ++++-
 PVE/QemuServer.pm | 148 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 137 insertions(+), 26 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index e2d2d67..a02a354 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -490,6 +490,12 @@ __PACKAGE__->register_method({
 		    description => "Assign a unique random ethernet address.",
 		    requires => 'archive',
 		},
+		'live-restore' => {
+		    optional => 1,
+		    type => 'boolean',
+		    description => "Start the VM immediately from the backup and restore in background. PBS only.",
+		    requires => 'archive',
+		},
 		pool => {
 		    optional => 1,
 		    type => 'string', format => 'pve-poolid',
@@ -531,6 +537,10 @@ __PACKAGE__->register_method({
 	my $start_after_create = extract_param($param, 'start');
 	my $storage = extract_param($param, 'storage');
 	my $unique = extract_param($param, 'unique');
+	my $live_restore = extract_param($param, 'live-restore');
+
+	raise_param_exc({ 'start' => "cannot specify 'start' with 'live-restore'" })
+	    if $start_after_create && $live_restore;
 
 	if (defined(my $ssh_keys = $param->{sshkeys})) {
 		$ssh_keys = URI::Escape::uri_unescape($ssh_keys);
@@ -613,9 +623,12 @@ __PACKAGE__->register_method({
 		    pool => $pool,
 		    unique => $unique,
 		    bwlimit => $bwlimit,
+		    live => $live_restore,
 		};
 		if ($archive->{type} eq 'file' || $archive->{type} eq 'pipe') {
+		    die "live-restore is only compatible with PBS\n" if $live_restore;
 		    PVE::QemuServer::restore_file_archive($archive->{path} // '-', $vmid, $authuser, $restore_options);
+		    PVE::AccessControl::add_vm_to_pool($vmid, $pool) if $pool;
 		} elsif ($archive->{type} eq 'pbs') {
 		    PVE::QemuServer::restore_proxmox_backup_archive($archive->{volid}, $vmid, $authuser, $restore_options);
 		} else {
@@ -628,8 +641,6 @@ __PACKAGE__->register_method({
 		    eval { PVE::QemuServer::template_create($vmid, $restored_conf) };
 		    warn $@ if $@;
 		}
-
-		PVE::AccessControl::add_vm_to_pool($vmid, $pool) if $pool;
 	    };
 
 	    # ensure no old replication state are exists
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 6de6ff6..4c34595 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6103,7 +6103,7 @@ sub restore_proxmox_backup_archive {
 
     my $repo = PVE::PBSClient::get_repository($scfg);
 
-    # This is only used for `pbs-restore`!
+    # This is only used for `pbs-restore` and the QEMU PBS driver (live-restore)
     my $password = PVE::Storage::PBSPlugin::pbs_get_password($scfg, $storeid);
     local $ENV{PBS_PASSWORD} = $password;
     local $ENV{PBS_FINGERPRINT} = $fingerprint if defined($fingerprint);
@@ -6200,34 +6200,35 @@ sub restore_proxmox_backup_archive {
 	# allocate volumes
 	my $map = $restore_allocate_devices->($storecfg, $virtdev_hash, $vmid);
 
-	foreach my $virtdev (sort keys %$virtdev_hash) {
-	    my $d = $virtdev_hash->{$virtdev};
-	    next if $d->{is_cloudinit}; # no need to restore cloudinit
+	if (!$options->{live}) {
+	    foreach my $virtdev (sort keys %$virtdev_hash) {
+		my $d = $virtdev_hash->{$virtdev};
+		next if $d->{is_cloudinit}; # no need to restore cloudinit
 
-	    my $volid = $d->{volid};
+		my $volid = $d->{volid};
 
-	    my $path = PVE::Storage::path($storecfg, $volid);
+		my $path = PVE::Storage::path($storecfg, $volid);
 
-	    # This is the ONLY user of the PBS_ env vars set on top of this function!
-	    my $pbs_restore_cmd = [
-		'/usr/bin/pbs-restore',
-		'--repository', $repo,
-		$pbs_backup_name,
-		"$d->{devname}.img.fidx",
-		$path,
-		'--verbose',
-		];
+		my $pbs_restore_cmd = [
+		    '/usr/bin/pbs-restore',
+		    '--repository', $repo,
+		    $pbs_backup_name,
+		    "$d->{devname}.img.fidx",
+		    $path,
+		    '--verbose',
+		    ];
 
-	    push @$pbs_restore_cmd, '--format', $d->{format} if $d->{format};
-	    push @$pbs_restore_cmd, '--keyfile', $keyfile if -e $keyfile;
+		push @$pbs_restore_cmd, '--format', $d->{format} if $d->{format};
+		push @$pbs_restore_cmd, '--keyfile', $keyfile if -e $keyfile;
 
-	    if (PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $volid)) {
-		push @$pbs_restore_cmd, '--skip-zero';
+		if (PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $volid)) {
+		    push @$pbs_restore_cmd, '--skip-zero';
+		}
+
+		my $dbg_cmdstring = PVE::Tools::cmd2string($pbs_restore_cmd);
+		print "restore proxmox backup image: $dbg_cmdstring\n";
+		run_command($pbs_restore_cmd);
 	    }
-
-	    my $dbg_cmdstring = PVE::Tools::cmd2string($pbs_restore_cmd);
-	    print "restore proxmox backup image: $dbg_cmdstring\n";
-	    run_command($pbs_restore_cmd);
 	}
 
 	$fh->seek(0, 0) || die "seek failed - $!\n";
@@ -6244,7 +6245,9 @@ sub restore_proxmox_backup_archive {
     };
     my $err = $@;
 
-    $restore_deactivate_volumes->($storecfg, $devinfo);
+    if ($err || !$options->{live}) {
+	$restore_deactivate_volumes->($storecfg, $devinfo);
+    }
 
     rmtree $tmpdir;
 
@@ -6261,6 +6264,103 @@ sub restore_proxmox_backup_archive {
 
     eval { rescan($vmid, 1); };
     warn $@ if $@;
+
+    PVE::AccessControl::add_vm_to_pool($vmid, $options->{pool}) if $options->{pool};
+
+    if ($options->{live}) {
+	eval {
+	    # enable interrupts
+	    local $SIG{INT} =
+		local $SIG{TERM} =
+		local $SIG{QUIT} =
+		local $SIG{HUP} =
+		local $SIG{PIPE} = sub { die "interrupted by signal\n"; };
+
+	    my $conf = PVE::QemuConfig->load_config($vmid);
+	    die "cannot do live-restore for template\n"
+		if PVE::QemuConfig->is_template($conf);
+
+	    pbs_live_restore($vmid, $conf, $storecfg, $devinfo, $repo, $keyfile, $pbs_backup_name);
+	};
+
+	$err = $@;
+	if ($err) {
+	    warn "Detroying live-restore VM, all temporary data will be lost!\n";
+	    $restore_deactivate_volumes->($storecfg, $devinfo);
+	    $restore_destroy_volumes->($storecfg, $devinfo);
+	    unlink $conffile;
+	    die $err;
+	}
+    }
+}
+
+sub pbs_live_restore {
+    my ($vmid, $conf, $storecfg, $restored_disks, $repo, $keyfile, $snap) = @_;
+
+    print "Starting VM for live-restore\n";
+
+    my $pbs_backing = {};
+    foreach my $ds (keys %$restored_disks) {
+	$ds =~ m/^drive-(.*)$/;
+	$pbs_backing->{$1} = {
+	    repository => $repo,
+	    snapshot => $snap,
+	    archive => "$ds.img.fidx",
+	};
+	$pbs_backing->{$1}->{keyfile} = $keyfile if -e $keyfile;
+    }
+
+    eval {
+	# make sure HA doesn't interrupt our restore by stopping the VM
+	if (PVE::HA::Config::vm_is_ha_managed($vmid)) {
+	    my $cmd = ['ha-manager', 'set',  "vm:$vmid", '--state', 'started'];
+	    PVE::Tools::run_command($cmd);
+	}
+
+	# start VM with backing chain pointing to PBS backup, environment vars
+	# for PBS driver in QEMU (PBS_PASSWORD and PBS_FINGERPRINT) are already
+	# set by our caller
+	PVE::QemuServer::vm_start_nolock(
+	    $storecfg,
+	    $vmid,
+	    $conf,
+	    {
+		paused => 1,
+		'pbs-backing' => $pbs_backing,
+	    },
+	    {},
+	);
+
+	# begin streaming, i.e. data copy from PBS to target disk for every vol,
+	# this will effectively collapse the backing image chain consisting of
+	# [target <- alloc-track -> PBS snapshot] to just [target] (alloc-track
+	# removes itself once all backing images vanish with 'auto-remove=on')
+	my $jobs = {};
+	foreach my $ds (keys %$restored_disks) {
+	    my $job_id = "restore-$ds";
+	    mon_cmd($vmid, 'block-stream',
+		'job-id' => $job_id,
+		device => "$ds",
+	    );
+	    $jobs->{$job_id} = {};
+	}
+
+	mon_cmd($vmid, 'cont');
+	qemu_drive_job_monitor('stream', $vmid, undef, $jobs, 'auto');
+
+	# all jobs finished, remove blockdevs now to disconnect from PBS
+	foreach my $ds (keys %$restored_disks) {
+	    mon_cmd($vmid, 'blockdev-del', 'node-name' => "$ds-pbs");
+	}
+    };
+
+    my $err = $@;
+
+    if ($err) {
+	warn "An error occured during live-restore: $err\n";
+	_do_vm_stop($storecfg, $vmid, 1, 1, 10, 0, 1);
+	die "live-restore failed\n";
+    }
 }
 
 sub restore_vma_archive {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (7 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 08/11] enable live-restore for PBS Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 10/11] live-restore: register qmeventd handle Stefan Reiter
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

...to be reused by live-restore.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 PVE/QemuServer.pm        | 28 ++++++++++++++++++++++++++++
 PVE/VZDump/QemuServer.pm | 32 ++------------------------------
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 4c34595..aa88a9d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7432,6 +7432,34 @@ sub device_bootorder {
     return $bootorder;
 }
 
+sub register_qmeventd_handle {
+    my ($vmid) = @_;
+
+    my $fh;
+    my $peer = "/var/run/qmeventd.sock";
+    my $count = 0;
+
+    for (;;) {
+	$count++;
+	$fh = IO::Socket::UNIX->new(Peer => $peer, Blocking => 0, Timeout => 1);
+	last if $fh;
+	if ($! != EINTR && $! != EAGAIN) {
+	    die "unable to connect to qmeventd socket (vmid: $vmid) - $!\n";
+	}
+	if ($count > 4) {
+	    die "unable to connect to qmeventd socket (vmid: $vmid) - timeout "
+	      . "after $count retries\n";
+	}
+	usleep(25000);
+    }
+
+    # send handshake to mark VM as backing up
+    print $fh to_json({vzdump => {vmid => "$vmid"}});
+
+    # return handle to be closed later when inhibit is no longer required
+    return $fh;
+}
+
 # bash completion helper
 
 sub complete_backup_archives {
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index b322701..d689e3d 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -512,7 +512,7 @@ sub archive_pbs {
     my $devlist = _get_task_devlist($task);
 
     $self->enforce_vm_running_for_backup($vmid);
-    $self->register_qmeventd_handle($vmid);
+    $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
 
     my $backup_job_uuid;
     eval {
@@ -681,7 +681,7 @@ sub archive_vma {
     my $devlist = _get_task_devlist($task);
 
     $self->enforce_vm_running_for_backup($vmid);
-    $self->register_qmeventd_handle($vmid);
+    $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
 
     my $cpid;
     my $backup_job_uuid;
@@ -840,34 +840,6 @@ sub enforce_vm_running_for_backup {
     die $@ if $@;
 }
 
-sub register_qmeventd_handle {
-    my ($self, $vmid) = @_;
-
-    my $fh;
-    my $peer = "/var/run/qmeventd.sock";
-    my $count = 0;
-
-    for (;;) {
-	$count++;
-	$fh = IO::Socket::UNIX->new(Peer => $peer, Blocking => 0, Timeout => 1);
-	last if $fh;
-	if ($! != EINTR && $! != EAGAIN) {
-	    $self->log("warn", "unable to connect to qmeventd socket (vmid: $vmid) - $!\n");
-	    return;
-	}
-	if ($count > 4) {
-	    $self->log("warn", "unable to connect to qmeventd socket (vmid: $vmid)"
-			     . " - timeout after $count retries\n");
-	    return;
-	}
-	usleep(25000);
-    }
-
-    # send handshake to mark VM as backing up
-    print $fh to_json({vzdump => {vmid => "$vmid"}});
-    $self->{qmeventd_fh} = $fh;
-}
-
 # resume VM againe once we got in a clear state (stop mode backup of running VM)
 sub resume_vm_after_job_start {
     my ($self, $task, $vmid) = @_;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH qemu-server 10/11] live-restore: register qmeventd handle
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (8 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-11 11:14 ` [pve-devel] [PATCH manager 11/11] ui: restore: add live-restore checkbox Stefan Reiter
  2021-01-11 15:50 ` [pve-devel] [PATCH 00/11] live-restore for PBS snapshots aderumier
  11 siblings, 0 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Similar to backups, prevent QEMU from being killed by qmeventd during
the live-restore, so a guest can shut itself down without aborting the
restore operation.

Note that the 'close' is only to be explicit, the handle will also be
closed in case an operation errors (i.e. when the 'eval' is left).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 PVE/QemuServer.pm | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index aa88a9d..e22fbed 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6331,6 +6331,8 @@ sub pbs_live_restore {
 	    {},
 	);
 
+	my $qmeventd_fd = register_qmeventd_handle($vmid);
+
 	# begin streaming, i.e. data copy from PBS to target disk for every vol,
 	# this will effectively collapse the backing image chain consisting of
 	# [target <- alloc-track -> PBS snapshot] to just [target] (alloc-track
@@ -6352,6 +6354,8 @@ sub pbs_live_restore {
 	foreach my $ds (keys %$restored_disks) {
 	    mon_cmd($vmid, 'blockdev-del', 'node-name' => "$ds-pbs");
 	}
+
+	close($qmeventd_fd);
     };
 
     my $err = $@;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH manager 11/11] ui: restore: add live-restore checkbox
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (9 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 10/11] live-restore: register qmeventd handle Stefan Reiter
@ 2021-01-11 11:14 ` Stefan Reiter
  2021-01-11 15:50 ` [pve-devel] [PATCH 00/11] live-restore for PBS snapshots aderumier
  11 siblings, 0 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 11:14 UTC (permalink / raw)
  To: pve-devel, pbs-devel

Add 'isPBS' parameter for Restore window so we can detect when to show
the 'live-restore' checkbox.

Includes a warning about this feature being experimental for now.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 www/manager6/grid/BackupView.js    |  8 +++++--
 www/manager6/storage/BackupView.js |  7 ++++--
 www/manager6/window/Restore.js     | 38 +++++++++++++++++++++++++++++-
 3 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/www/manager6/grid/BackupView.js b/www/manager6/grid/BackupView.js
index 9312c59c..41708030 100644
--- a/www/manager6/grid/BackupView.js
+++ b/www/manager6/grid/BackupView.js
@@ -79,6 +79,7 @@ Ext.define('PVE.grid.BackupView', {
 	    }
 	}, 100);
 
+	let isPBS = false;
 	var setStorage = function(storage) {
 	    var url = '/api2/json/nodes/' + nodename + '/storage/' + storage + '/content';
 	    url += '?content=backup';
@@ -101,13 +102,15 @@ Ext.define('PVE.grid.BackupView', {
 		change: function(f, value) {
 		    let storage = f.getStore().findRecord('storage', value, 0, false, true, true);
 		    if (storage) {
-			let isPBS = storage.data.type === 'pbs';
+			isPBS = storage.data.type === 'pbs';
 			me.getColumns().forEach((column) => {
 			    let id = column.dataIndex;
 			    if (id === 'verification' || id === 'encrypted') {
 				column.setHidden(!isPBS);
 			    }
 			});
+		    } else {
+			isPBS = false;
 		    }
 		    setStorage(value);
 		}
@@ -175,7 +178,8 @@ Ext.define('PVE.grid.BackupView', {
 		    vmid: vmid,
 		    volid: rec.data.volid,
 		    volidText: PVE.Utils.render_storage_content(rec.data.volid, {}, rec),
-		    vmtype: vmtype
+		    vmtype: vmtype,
+		    isPBS: isPBS,
 		});
 		win.show();
 		win.on('destroy', reload);
diff --git a/www/manager6/storage/BackupView.js b/www/manager6/storage/BackupView.js
index 8c5a286d..63418ec9 100644
--- a/www/manager6/storage/BackupView.js
+++ b/www/manager6/storage/BackupView.js
@@ -74,6 +74,8 @@ Ext.define('PVE.storage.BackupView', {
 	    }
 	});
 
+	let isPBS = me.pluginType === 'pbs';
+
 	me.tbar = [
 	    {
 		xtype: 'proxmoxButton',
@@ -94,7 +96,8 @@ Ext.define('PVE.storage.BackupView', {
 			nodename: nodename,
 			volid: rec.data.volid,
 			volidText: PVE.Utils.render_storage_content(rec.data.volid, {}, rec),
-			vmtype: vmtype
+			vmtype: vmtype,
+			isPBS: isPBS,
 		    });
 		    win.show();
 		    win.on('destroy', reload);
@@ -117,7 +120,7 @@ Ext.define('PVE.storage.BackupView', {
 	    pruneButton,
 	];
 
-	if (me.pluginType === 'pbs') {
+	if (isPBS) {
 	    me.extraColumns = {
 		encrypted: {
 		    header: gettext('Encrypted'),
diff --git a/www/manager6/window/Restore.js b/www/manager6/window/Restore.js
index f720eb3b..9fe50ec2 100644
--- a/www/manager6/window/Restore.js
+++ b/www/manager6/window/Restore.js
@@ -3,6 +3,20 @@ Ext.define('PVE.window.Restore', {
 
     resizable: false,
 
+    controller: {
+	xclass: 'Ext.app.ViewController',
+	control: {
+	    '#liveRestore': {
+		change: function(el, newVal) {
+		    let liveWarning = this.lookupReference('liveWarning');
+		    liveWarning.setHidden(!newVal);
+		    let start = this.lookupReference('start');
+		    start.setDisabled(newVal);
+		}
+	    },
+	}
+    },
+
     initComponent : function() {
 	var me = this;
 
@@ -84,6 +98,7 @@ Ext.define('PVE.window.Restore', {
 		{
 		    xtype: 'proxmoxcheckbox',
 		    name: 'start',
+		    reference: 'start',
 		    flex: 1,
 		    fieldLabel: gettext('Start after restore'),
 		    labelWidth: 105,
@@ -99,6 +114,26 @@ Ext.define('PVE.window.Restore', {
 		value: true,
 		fieldLabel: gettext('Unprivileged container')
 	    });
+	} else if (me.vmtype === 'qemu') {
+	    items.push({
+		xtype: 'proxmoxcheckbox',
+		name: 'live-restore',
+		itemId: 'liveRestore',
+		flex: 1,
+		fieldLabel: gettext('Live restore'),
+		checked: false,
+		hidden: !me.isPBS,
+		// align checkbox with 'start' if 'unique' is hidden
+		labelWidth: !!me.vmid ? 105 : 100,
+	    });
+	    items.push({
+		xtype: 'displayfield',
+		reference: 'liveWarning',
+		// TODO: Remove once more tested/stable?
+		value: gettext('Warning: Live-restore is experimental! The VM will start immediately (with a disk performance penalty) and restore will happen in the background. If anything goes wrong, data written by the VM during the restore will be lost.'),
+		userCls: 'pmx-hint',
+		hidden: true,
+	    });
 	}
 
 	me.formPanel = Ext.create('Ext.form.Panel', {
@@ -144,7 +179,8 @@ Ext.define('PVE.window.Restore', {
 		    force: me.vmid ? 1 : 0
 		};
 		if (values.unique) { params.unique = 1; }
-		if (values.start) { params.start = 1; }
+		if (values.start && !values['live-restore']) { params.start = 1; }
+		if (values['live-restore']) { params['live-restore'] = 1; }
 		if (values.storage) { params.storage = values.storage; }
 
 		if (values.bwlimit !== undefined) {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH 00/11] live-restore for PBS snapshots
  2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
                   ` (10 preceding siblings ...)
  2021-01-11 11:14 ` [pve-devel] [PATCH manager 11/11] ui: restore: add live-restore checkbox Stefan Reiter
@ 2021-01-11 15:50 ` aderumier
  2021-01-11 16:42   ` Stefan Reiter
  2021-01-12 10:31   ` [pve-devel] [pbs-devel] " Thomas Lamprecht
  11 siblings, 2 replies; 26+ messages in thread
From: aderumier @ 2021-01-11 15:50 UTC (permalink / raw)
  To: Proxmox VE development discussion, pbs-devel

Hi,
that's great !

I'm not sure it's related, but do you have any plan to add support to
restore/extract files from the vm filesystem directly ? 
if we could make some kind of qemu-nbd + mount loop of the backup
volume for example.


This is the main blocking feature for me, instead using my rbd
import/export for backups.

Regards,

Alexandre

Le lundi 11 janvier 2021 à 12:13 +0100, Stefan Reiter a écrit :
> "live-restore" allows starting a VM immediately from a backup
> snapshot, no
> waiting for a long restore process. This is made possible with QEMU
> backing
> images, i.e. data is read from the backup which is attached to the VM
> as a
> drive, but new data is written to the destination, while a background
> process
> ('block-stream') copies over data in a linear fashion as well.
> 
> QEMU backing images are normally only supported for qcow2 images, but
> since the
> destination always starts out empty, we can use a dirty bitmap to
> achieve the
> same effect - this is implemented as the 'alloc-track' driver in the
> 'qemu' part
> of the series.
> 
> The Rust part of the equation is adjusted to provide (quiet a lot)
> more caching,
> as mixing random read/write from the guest with the linear reads from
> the
> background process (both of which may use read sizes smaller or
> bigger than a
> single chunk) would thrash performance without large buffers.
> 
> I've marked the feature as 'experimental' in the GUI for now, as I'm
> sure there
> are a lot of edge cases I've missed to test, and there's also the
> possibility of
> data loss, since anything the VM writes during the restore is removed
> if the
> restore fails.
> 
> 
> qemu: Stefan Reiter (3):
>   PVE: explicitly add libuuid as linking dependency
>   PVE: block/pbs: fast-path reads without allocation if possible
>   block: add alloc-track driver
> 
>  Makefile.objs       |   2 +
>  block/Makefile.objs |   1 +
>  block/alloc-track.c | 319
> ++++++++++++++++++++++++++++++++++++++++++++
>  block/pbs.c         |  17 ++-
>  4 files changed, 336 insertions(+), 3 deletions(-)
>  create mode 100644 block/alloc-track.c
> 
> proxmox-backup: Stefan Reiter (1):
>   RemoteChunkReader: add LRU cached variant
> 
>  src/bin/proxmox_backup_client/mount.rs |  4 +-
>  src/client/remote_chunk_reader.rs      | 77 ++++++++++++++++++++----
> --
>  2 files changed, 62 insertions(+), 19 deletions(-)
> 
> proxmox-backup-qemu: Stefan Reiter (1):
>   access: use bigger cache and LRU chunk reader
> 
>  src/restore.rs | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> qemu-server: Stefan Reiter (5):
>   make qemu_drive_mirror_monitor more generic
>   cfg2cmd: allow PBS snapshots as backing files for drives
>   enable live-restore for PBS
>   extract register_qmeventd_handle to QemuServer.pm
>   live-restore: register qmeventd handle
> 
>  PVE/API2/Qemu.pm         |  15 ++-
>  PVE/QemuServer.pm        | 263 ++++++++++++++++++++++++++++++++-----
> --
>  PVE/VZDump/QemuServer.pm |  32 +----
>  3 files changed, 233 insertions(+), 77 deletions(-)
> 
> manager: Stefan Reiter (1):
>   ui: restore: add live-restore checkbox
> 
>  www/manager6/grid/BackupView.js    |  8 +++++--
>  www/manager6/storage/BackupView.js |  7 ++++--
>  www/manager6/window/Restore.js     | 38
> +++++++++++++++++++++++++++++-
>  3 files changed, 48 insertions(+), 5 deletions(-)
> 





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH 00/11] live-restore for PBS snapshots
  2021-01-11 15:50 ` [pve-devel] [PATCH 00/11] live-restore for PBS snapshots aderumier
@ 2021-01-11 16:42   ` Stefan Reiter
  2021-01-12  9:10     ` aderumier
  2021-01-12 10:31   ` [pve-devel] [pbs-devel] " Thomas Lamprecht
  1 sibling, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-11 16:42 UTC (permalink / raw)
  To: pve-devel

On 1/11/21 4:50 PM, aderumier@odiso.com wrote:
> Hi,
> that's great !
> 
> I'm not sure it's related, but do you have any plan to add support to
> restore/extract files from the vm filesystem directly ?
> if we could make some kind of qemu-nbd + mount loop of the backup
> volume for example.
> 

Unrelated, but yes, we're working on it.

At the moment you can already map snapshots via loop devices using the 
map command, e.g.:
proxmox-backup-client map vm/115/2020-10-13T15:03:12Z drive-scsi0.img
# mapped as /dev/loopN
proxmox-backup-client unmap ...

What's planned is a fully featured file-restore, including a GUI in PVE 
(similar to what's already available for CTs in the PBS interface).

~ Stefan

> 
> This is the main blocking feature for me, instead using my rbd
> import/export for backups.
> 
> Regards,
> 
> Alexandre
>  > Le lundi 11 janvier 2021 à 12:13 +0100, Stefan Reiter a écrit :
>> "live-restore" allows starting a VM immediately from a backup
>> snapshot, no
>> waiting for a long restore process. This is made possible with QEMU
>> backing
>> images, i.e. data is read from the backup which is attached to the VM
>> as a
>> drive, but new data is written to the destination, while a background
>> process
>> ('block-stream') copies over data in a linear fashion as well.
>>
>> QEMU backing images are normally only supported for qcow2 images, but
>> since the
>> destination always starts out empty, we can use a dirty bitmap to
>> achieve the
>> same effect - this is implemented as the 'alloc-track' driver in the
>> 'qemu' part
>> of the series.
>>
>> The Rust part of the equation is adjusted to provide (quiet a lot)
>> more caching,
>> as mixing random read/write from the guest with the linear reads from
>> the
>> background process (both of which may use read sizes smaller or
>> bigger than a
>> single chunk) would thrash performance without large buffers.
>>
>> I've marked the feature as 'experimental' in the GUI for now, as I'm
>> sure there
>> are a lot of edge cases I've missed to test, and there's also the
>> possibility of
>> data loss, since anything the VM writes during the restore is removed
>> if the
>> restore fails.
>>
>>
>> qemu: Stefan Reiter (3):
>>    PVE: explicitly add libuuid as linking dependency
>>    PVE: block/pbs: fast-path reads without allocation if possible
>>    block: add alloc-track driver
>>
>>   Makefile.objs       |   2 +
>>   block/Makefile.objs |   1 +
>>   block/alloc-track.c | 319
>> ++++++++++++++++++++++++++++++++++++++++++++
>>   block/pbs.c         |  17 ++-
>>   4 files changed, 336 insertions(+), 3 deletions(-)
>>   create mode 100644 block/alloc-track.c
>>
>> proxmox-backup: Stefan Reiter (1):
>>    RemoteChunkReader: add LRU cached variant
>>
>>   src/bin/proxmox_backup_client/mount.rs |  4 +-
>>   src/client/remote_chunk_reader.rs      | 77 ++++++++++++++++++++----
>> --
>>   2 files changed, 62 insertions(+), 19 deletions(-)
>>
>> proxmox-backup-qemu: Stefan Reiter (1):
>>    access: use bigger cache and LRU chunk reader
>>
>>   src/restore.rs | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> qemu-server: Stefan Reiter (5):
>>    make qemu_drive_mirror_monitor more generic
>>    cfg2cmd: allow PBS snapshots as backing files for drives
>>    enable live-restore for PBS
>>    extract register_qmeventd_handle to QemuServer.pm
>>    live-restore: register qmeventd handle
>>
>>   PVE/API2/Qemu.pm         |  15 ++-
>>   PVE/QemuServer.pm        | 263 ++++++++++++++++++++++++++++++++-----
>> --
>>   PVE/VZDump/QemuServer.pm |  32 +----
>>   3 files changed, 233 insertions(+), 77 deletions(-)
>>
>> manager: Stefan Reiter (1):
>>    ui: restore: add live-restore checkbox
>>
>>   www/manager6/grid/BackupView.js    |  8 +++++--
>>   www/manager6/storage/BackupView.js |  7 ++++--
>>   www/manager6/window/Restore.js     | 38
>> +++++++++++++++++++++++++++++-
>>   3 files changed, 48 insertions(+), 5 deletions(-)
>>
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH 00/11] live-restore for PBS snapshots
  2021-01-11 16:42   ` Stefan Reiter
@ 2021-01-12  9:10     ` aderumier
  0 siblings, 0 replies; 26+ messages in thread
From: aderumier @ 2021-01-12  9:10 UTC (permalink / raw)
  To: Stefan Reiter, pve-devel; +Cc: Proxmox VE development discussion

>>What's planned is a fully featured file-restore, including a GUI in
PVE 
>>(similar to what's already available for CTs in the PBS interface).

Wonderful ! 

Thanks about the "proxmox-backup-client map", I'll try it.

Le lundi 11 janvier 2021 à 17:42 +0100, Stefan Reiter a écrit :
> On 1/11/21 4:50 PM, aderumier@odiso.com wrote:
> > Hi,
> > that's great !
> > 
> > I'm not sure it's related, but do you have any plan to add support
> > to
> > restore/extract files from the vm filesystem directly ?
> > if we could make some kind of qemu-nbd + mount loop of the backup
> > volume for example.
> > 
> 
> Unrelated, but yes, we're working on it.
> 
> At the moment you can already map snapshots via loop devices using
> the 
> map command, e.g.:
> proxmox-backup-client map vm/115/2020-10-13T15:03:12Z drive-scsi0.img
> # mapped as /dev/loopN
> proxmox-backup-client unmap ...
> 
> What's planned is a fully featured file-restore, including a GUI in
> PVE 
> (similar to what's already available for CTs in the PBS interface).
> 
> ~ Stefan
> 
> > 
> > This is the main blocking feature for me, instead using my rbd
> > import/export for backups.
> > 
> > Regards,
> > 
> > Alexandre
> >  > Le lundi 11 janvier 2021 à 12:13 +0100, Stefan Reiter a écrit :
> > > "live-restore" allows starting a VM immediately from a backup
> > > snapshot, no
> > > waiting for a long restore process. This is made possible with
> > > QEMU
> > > backing
> > > images, i.e. data is read from the backup which is attached to
> > > the VM
> > > as a
> > > drive, but new data is written to the destination, while a
> > > background
> > > process
> > > ('block-stream') copies over data in a linear fashion as well.
> > > 
> > > QEMU backing images are normally only supported for qcow2 images,
> > > but
> > > since the
> > > destination always starts out empty, we can use a dirty bitmap to
> > > achieve the
> > > same effect - this is implemented as the 'alloc-track' driver in
> > > the
> > > 'qemu' part
> > > of the series.
> > > 
> > > The Rust part of the equation is adjusted to provide (quiet a
> > > lot)
> > > more caching,
> > > as mixing random read/write from the guest with the linear reads
> > > from
> > > the
> > > background process (both of which may use read sizes smaller or
> > > bigger than a
> > > single chunk) would thrash performance without large buffers.
> > > 
> > > I've marked the feature as 'experimental' in the GUI for now, as
> > > I'm
> > > sure there
> > > are a lot of edge cases I've missed to test, and there's also the
> > > possibility of
> > > data loss, since anything the VM writes during the restore is
> > > removed
> > > if the
> > > restore fails.
> > > 
> > > 
> > > qemu: Stefan Reiter (3):
> > >    PVE: explicitly add libuuid as linking dependency
> > >    PVE: block/pbs: fast-path reads without allocation if possible
> > >    block: add alloc-track driver
> > > 
> > >   Makefile.objs       |   2 +
> > >   block/Makefile.objs |   1 +
> > >   block/alloc-track.c | 319
> > > ++++++++++++++++++++++++++++++++++++++++++++
> > >   block/pbs.c         |  17 ++-
> > >   4 files changed, 336 insertions(+), 3 deletions(-)
> > >   create mode 100644 block/alloc-track.c
> > > 
> > > proxmox-backup: Stefan Reiter (1):
> > >    RemoteChunkReader: add LRU cached variant
> > > 
> > >   src/bin/proxmox_backup_client/mount.rs |  4 +-
> > >   src/client/remote_chunk_reader.rs      | 77
> > > ++++++++++++++++++++----
> > > --
> > >   2 files changed, 62 insertions(+), 19 deletions(-)
> > > 
> > > proxmox-backup-qemu: Stefan Reiter (1):
> > >    access: use bigger cache and LRU chunk reader
> > > 
> > >   src/restore.rs | 5 +++--
> > >   1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > qemu-server: Stefan Reiter (5):
> > >    make qemu_drive_mirror_monitor more generic
> > >    cfg2cmd: allow PBS snapshots as backing files for drives
> > >    enable live-restore for PBS
> > >    extract register_qmeventd_handle to QemuServer.pm
> > >    live-restore: register qmeventd handle
> > > 
> > >   PVE/API2/Qemu.pm         |  15 ++-
> > >   PVE/QemuServer.pm        | 263
> > > ++++++++++++++++++++++++++++++++-----
> > > --
> > >   PVE/VZDump/QemuServer.pm |  32 +----
> > >   3 files changed, 233 insertions(+), 77 deletions(-)
> > > 
> > > manager: Stefan Reiter (1):
> > >    ui: restore: add live-restore checkbox
> > > 
> > >   www/manager6/grid/BackupView.js    |  8 +++++--
> > >   www/manager6/storage/BackupView.js |  7 ++++--
> > >   www/manager6/window/Restore.js     | 38
> > > +++++++++++++++++++++++++++++-
> > >   3 files changed, 48 insertions(+), 5 deletions(-)
> > > 
> > 
> > 
> > 
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> > 
> 





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH qemu 02/11] PVE: block/pbs: fast-path reads without allocation if possible
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu 02/11] PVE: block/pbs: fast-path reads without allocation if possible Stefan Reiter
@ 2021-01-12  9:29   ` Wolfgang Bumiller
  0 siblings, 0 replies; 26+ messages in thread
From: Wolfgang Bumiller @ 2021-01-12  9:29 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: pve-devel, pbs-devel

On Mon, Jan 11, 2021 at 12:14:00PM +0100, Stefan Reiter wrote:
> ...and switch over to g_malloc/g_free while at it to align with other
> QEMU code.
> 
> Tracing shows the fast-path is taken almost all the time, though not
> 100% so the slow one is still necessary.

I wonder if vectored reads could be implemented directly at the
library&rust code level... this would probably have to go all the way
from here to the hyper request code though, since it's not really part
of AsyncRead :-\

> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> For that slight unnoticable performance boost :)
> 
>  block/pbs.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/block/pbs.c b/block/pbs.c
> index 1481a2bfd1..fbf0d8d845 100644
> --- a/block/pbs.c
> +++ b/block/pbs.c
> @@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
>      BDRVPBSState *s = bs->opaque;
>      int ret;
>      char *pbs_error = NULL;
> -    uint8_t *buf = malloc(bytes);
> +    uint8_t *buf;
> +    bool inline_buf = true;
> +
> +    /* for single-buffer IO vectors we can fast-path the write directly to it */
> +    if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
> +        buf = qiov->iov->iov_base;
> +    } else {
> +        inline_buf = false;
> +        buf = g_malloc(bytes);
> +    }
>  
>      ReadCallbackData rcb = {
>          .co = qemu_coroutine_self(),
> @@ -218,8 +227,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
>          return -EIO;
>      }
>  
> -    qemu_iovec_from_buf(qiov, 0, buf, bytes);
> -    free(buf);
> +    if (!inline_buf) {
> +        qemu_iovec_from_buf(qiov, 0, buf, bytes);
> +        g_free(buf);
> +    }
>  
>      return ret;
>  }
> -- 
> 2.20.1




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [pbs-devel] [PATCH 00/11] live-restore for PBS snapshots
  2021-01-11 15:50 ` [pve-devel] [PATCH 00/11] live-restore for PBS snapshots aderumier
  2021-01-11 16:42   ` Stefan Reiter
@ 2021-01-12 10:31   ` Thomas Lamprecht
  2021-01-12 11:23     ` Thomas Lamprecht
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Lamprecht @ 2021-01-12 10:31 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, aderumier,
	Proxmox VE development discussion

On 11.01.21 16:50, aderumier@odiso.com wrote:
> if we could make some kind of qemu-nbd + mount loop of the backup
> volume for example.

That is already possible through qemu-nbd tool (we have a PBS block driver backend),
see: https://lists.proxmox.com/pipermail/pve-user/2020-July/171883.html

Or do you mean something else?

FWIW, there's an idea (and some intial POC) to later also have a file explorer similar
like for file level backups in PVE, which starts a minimal VM, directly booting a
shipped Linux initrd and pbs rust daemon, providing an API to list/extract files from
any blocklevel backup containing a filesystem a modern Linux kernel can understand.

The VM is used for isolation purpose, every kernel FS dev says mounting a filesystem
should be considered unsafe and can theoretically altered to break the host. As
backups may not be 100% trusted a VM environment is much safer.





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver Stefan Reiter
@ 2021-01-12 10:54   ` Wolfgang Bumiller
  2021-01-12 11:29     ` Stefan Reiter
  0 siblings, 1 reply; 26+ messages in thread
From: Wolfgang Bumiller @ 2021-01-12 10:54 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: pve-devel, pbs-devel

one inline comment, otherwise lgtm

On Mon, Jan 11, 2021 at 12:14:01PM +0100, Stefan Reiter wrote:
> Add a new filter node 'alloc-track', which seperates reads and writes to
> different children, thus allowing to put a backing image behind any
> blockdev (regardless of driver support). Since we can't detect any
> pre-allocated blocks, we can only track new writes, hence the write
> target ('file') for this node must always be empty.
> 
> Intended use case is for live restoring, i.e. add a backup image as a
> block device into a VM, then put an alloc-track on the restore target
> and set the backup as backing. With this, one can use a regular
> 'block-stream' to restore the image, while the VM can already run in the
> background. Copy-on-read will help make progress as the VM reads as
> well.
> 
> This only worked if the target supports backing images, so up until now
> only for qcow2, with alloc-track any driver for the target can be used.
> 
> If 'auto-remove' is set, alloc-track will automatically detach itself
> once the backing image is removed. It will be replaced by 'file'.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Potentially for upstream as well, but let's maybe test it a bit here first ;)
> 
>  block/Makefile.objs |   1 +
>  block/alloc-track.c | 319 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 320 insertions(+)
>  create mode 100644 block/alloc-track.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 28fb3b7f7c..04f91b8cc9 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -13,6 +13,7 @@ block-obj-$(CONFIG_QED) += qed-check.o
>  block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
>  block-obj-y += quorum.o
>  block-obj-y += zeroinit.o
> +block-obj-y += alloc-track.o
>  block-obj-y += blkdebug.o blkverify.o blkreplay.o
>  block-obj-$(CONFIG_PARALLELS) += parallels.o
>  block-obj-y += blklogwrites.o
> diff --git a/block/alloc-track.c b/block/alloc-track.c
> new file mode 100644
> index 0000000000..f2a4080579
> --- /dev/null
> +++ b/block/alloc-track.c
> @@ -0,0 +1,319 @@
> +/*
> + * Filter to allow backing images to be applied to any node. Assumes a blank
> + * image to begin with, only new writes are tracked as allocated, thus this
> + * must never be put on a node that already contains data.
> + *
> + * Copyright (c) 2020 Proxmox Server Solutions GmbH
> + * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "block/block_int.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qstring.h"
> +#include "qemu/cutils.h"
> +#include "qemu/option.h"
> +#include "qemu/module.h"
> +
> +#define TRACK_OPT_AUTO_REMOVE "auto-remove"
> +
> +typedef struct {
> +    BdrvDirtyBitmap *bitmap;
> +    bool dropping;
> +    bool auto_remove;
> +} BDRVAllocTrackState;
> +
> +static QemuOptsList runtime_opts = {
> +    .name = "alloc-track",
> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> +    .desc = {
> +        {
> +            .name = TRACK_OPT_AUTO_REMOVE,
> +            .type = QEMU_OPT_BOOL,
> +            .help = "automatically replace this node with 'file' when 'backing'"
> +                    "is detached",
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
> +static void track_refresh_limits(BlockDriverState *bs, Error **errp)
> +{
> +    BlockDriverInfo bdi;
> +
> +    if (!bs->file) {
> +        return;
> +    }
> +
> +    /* always use alignment from underlying write device so RMW cycle for
> +     * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
> +     * cluster allocation in 'file') */
> +    bdrv_get_info(bs->file->bs, &bdi);
> +    bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
> +                                   MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
> +}
> +
> +static int track_open(BlockDriverState *bs, QDict *options, int flags,
> +                      Error **errp)
> +{
> +    BDRVAllocTrackState *s = bs->opaque;
> +    QemuOpts *opts;
> +    Error *local_err = NULL;
> +    int ret = 0;
> +
> +    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
> +    qemu_opts_absorb_qdict(opts, options, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
> +
> +    /* open the target (write) node, backing will be attached by block layer */
> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
> +                               BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
> +                               &local_err);
> +    if (local_err) {
> +        ret = -EINVAL;
> +        error_propagate(errp, local_err);
> +        goto fail;
> +    }
> +
> +    track_refresh_limits(bs, errp);
> +    uint64_t gran = bs->bl.request_alignment;
> +    s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
> +    if (local_err) {
> +        ret = -EIO;
> +        error_propagate(errp, local_err);
> +        goto fail;
> +    }
> +
> +    s->dropping = false;
> +
> +fail:
> +    if (ret < 0) {
> +        bdrv_unref_child(bs, bs->file);
> +        if (s->bitmap) {
> +            bdrv_release_dirty_bitmap(s->bitmap);
> +        }
> +    }
> +    qemu_opts_del(opts);
> +    return ret;
> +}
> +
> +static void track_close(BlockDriverState *bs)
> +{
> +    BDRVAllocTrackState *s = bs->opaque;
> +    if (s->bitmap) {
> +        bdrv_release_dirty_bitmap(s->bitmap);
> +    }
> +}
> +
> +static int64_t track_getlength(BlockDriverState *bs)
> +{
> +    return bdrv_getlength(bs->file->bs);
> +}
> +
> +static int coroutine_fn track_co_preadv(BlockDriverState *bs,
> +    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
> +{
> +    BDRVAllocTrackState *s = bs->opaque;
> +    QEMUIOVector local_qiov;
> +    int ret;
> +
> +    /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
> +    uint64_t cur_offset, local_offset;
> +    int64_t local_bytes;
> +    bool alloc;
> +
> +    /* a read request can span multiple granularity-sized chunks, and can thus
> +     * contain blocks with different allocation status - we could just iterate
> +     * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
> +     * to find the next flip and consider everything up to that in one go */
> +    for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
> +        local_offset = offset + cur_offset;
> +        alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
> +        if (alloc) {
> +            local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
> +                                                      bytes - cur_offset);
> +        } else {
> +            local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
> +                                                       bytes - cur_offset);
> +        }
> +
> +        /* _bitmap_next_X return is -1 if no end found within limit, otherwise
> +         * offset of next flip (to start of image) */
> +        local_bytes = local_bytes < 0 ?
> +            bytes - cur_offset :
> +            local_bytes - local_offset;
> +
> +        qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
> +
> +        if (alloc) {
> +            ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
> +                                 &local_qiov, flags);
> +        } else if (bs->backing) {
> +            ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
> +                                 &local_qiov, flags);
> +        } else {

Here you handle the case where there's no backing file, but other
functions (eg. track_co_block_status) simply assume that `bs->backing`
is always valid.

Is this intentional?

It makes me also wonder - since you just zero out the iov below - if
this driver could maybe also replace the current zeroinit filter?
Otherwise it's at least a good reminder that the zeroinit code could
start making use of bitmaps as well ;-)

> +            ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
> +        }
> +
> +        if (ret != 0) {
> +            break;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
> +    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
> +{
> +    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
> +}
> +
> +static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
> +    int64_t offset, int count, BdrvRequestFlags flags)
> +{
> +    return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
> +}
> +
> +static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
> +    int64_t offset, int count)
> +{
> +    return bdrv_co_pdiscard(bs->file, offset, count);
> +}
> +
> +static coroutine_fn int track_co_flush(BlockDriverState *bs)
> +{
> +    return bdrv_co_flush(bs->file->bs);
> +}
> +
> +static int coroutine_fn track_co_block_status(BlockDriverState *bs,
> +                                              bool want_zero,
> +                                              int64_t offset,
> +                                              int64_t bytes,
> +                                              int64_t *pnum,
> +                                              int64_t *map,
> +                                              BlockDriverState **file)
> +{
> +    BDRVAllocTrackState *s = bs->opaque;
> +
> +    bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
> +    int64_t next_flipped;
> +    if (alloc) {
> +        next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
> +    } else {
> +        next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
> +    }
> +
> +    /* in case not the entire region has the same state, we need to set pnum to
> +     * indicate for how many bytes our result is valid */
> +    *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
> +    *map = offset;
> +
> +    if (alloc) {
> +        *file = bs->file->bs;
> +        return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> +    } else {
> +        *file = bs->backing->bs;
> +        return 0;
> +    }
> +}
> +
> +static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
> +                             BdrvChildRole role, BlockReopenQueue *reopen_queue,
> +                             uint64_t perm, uint64_t shared,
> +                             uint64_t *nperm, uint64_t *nshared)
> +{
> +    BDRVAllocTrackState *s = bs->opaque;
> +
> +    *nshared = BLK_PERM_ALL;
> +
> +    /* in case we're currently dropping ourselves, claim to not use any
> +     * permissions at all - which is fine, since from this point on we will
> +     * never issue a read or write anymore */
> +    if (s->dropping) {
> +        *nperm = 0;
> +        return;
> +    }
> +
> +    if (role & BDRV_CHILD_DATA) {
> +        *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
> +    } else {
> +        /* 'backing' is also a child of our BDS, but we don't expect it to be
> +         * writeable, so we only forward 'consistent read' */
> +        *nperm = perm & BLK_PERM_CONSISTENT_READ;
> +    }
> +}
> +
> +static void track_drop(void *opaque)
> +{
> +    BlockDriverState *bs = (BlockDriverState*)opaque;
> +    BDRVAllocTrackState *s = bs->opaque;
> +
> +    bdrv_drained_begin(bs);
> +
> +    s->dropping = true;
> +
> +    bdrv_child_refresh_perms(bs, bs->file, &error_abort);
> +    bdrv_replace_node(bs, bs->file->bs, &error_abort);
> +    bdrv_set_backing_hd(bs, NULL, &error_abort);
> +
> +    bdrv_drained_end(bs);
> +}
> +
> +static int track_change_backing_file(BlockDriverState *bs,
> +                                     const char *backing_file,
> +                                     const char *backing_fmt)
> +{
> +    if (backing_file == NULL && backing_fmt == NULL) {
> +        /* backing file has been disconnected, there's no longer any use for
> +         * this node, so let's remove ourselves from the block graph - we need
> +         * to schedule this for later however, since when this function is
> +         * called, the blockjob modifying us is probably not done yet and has a
> +         * blocker on 'bs' */
> +        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
> +    }
> +
> +    return 0;
> +}
> +
> +static BlockDriver bdrv_alloc_track = {
> +    .format_name                      = "alloc-track",
> +    .instance_size                    = sizeof(BDRVAllocTrackState),
> +
> +    .bdrv_file_open                   = track_open,
> +    .bdrv_close                       = track_close,
> +    .bdrv_getlength                   = track_getlength,
> +    .bdrv_child_perm                  = track_child_perm,
> +    .bdrv_refresh_limits              = track_refresh_limits,
> +
> +    .bdrv_co_pwrite_zeroes            = track_co_pwrite_zeroes,
> +    .bdrv_co_pwritev                  = track_co_pwritev,
> +    .bdrv_co_preadv                   = track_co_preadv,
> +    .bdrv_co_pdiscard                 = track_co_pdiscard,
> +
> +    .bdrv_co_flush                    = track_co_flush,
> +    .bdrv_co_flush_to_disk            = track_co_flush,
> +
> +    .is_filter                        = true,
> +    .supports_backing                 = true,
> +
> +    .bdrv_co_block_status             = track_co_block_status,
> +    .bdrv_change_backing_file         = track_change_backing_file,
> +};
> +
> +static void bdrv_alloc_track_init(void)
> +{
> +    bdrv_register(&bdrv_alloc_track);
> +}
> +
> +block_init(bdrv_alloc_track_init);
> -- 
> 2.20.1




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [pbs-devel] [PATCH 00/11] live-restore for PBS snapshots
  2021-01-12 10:31   ` [pve-devel] [pbs-devel] " Thomas Lamprecht
@ 2021-01-12 11:23     ` Thomas Lamprecht
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Lamprecht @ 2021-01-12 11:23 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, aderumier,
	Proxmox VE development discussion

On 12.01.21 11:31, Thomas Lamprecht wrote:
> On 11.01.21 16:50, aderumier@odiso.com wrote:
>> if we could make some kind of qemu-nbd + mount loop of the backup
>> volume for example.
> 
> That is already possible through qemu-nbd tool (we have a PBS block driver backend),
> see: https://lists.proxmox.com/pipermail/pve-user/2020-July/171883.html

Oh, I missed checking the pve-devel list where Stefan already pointed you to
the client's map command (at the time I wrote above linked message that did
not yet exist) - so you can ignore that mail.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver
  2021-01-12 10:54   ` Wolfgang Bumiller
@ 2021-01-12 11:29     ` Stefan Reiter
  2021-01-12 13:42       ` Wolfgang Bumiller
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-12 11:29 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pve-devel, pbs-devel

On 12/01/2021 11:54, Wolfgang Bumiller wrote:
> one inline comment, otherwise lgtm
> 
> On Mon, Jan 11, 2021 at 12:14:01PM +0100, Stefan Reiter wrote:
>> Add a new filter node 'alloc-track', which seperates reads and writes to
>> different children, thus allowing to put a backing image behind any
>> blockdev (regardless of driver support). Since we can't detect any
>> pre-allocated blocks, we can only track new writes, hence the write
>> target ('file') for this node must always be empty.
>>
>> Intended use case is for live restoring, i.e. add a backup image as a
>> block device into a VM, then put an alloc-track on the restore target
>> and set the backup as backing. With this, one can use a regular
>> 'block-stream' to restore the image, while the VM can already run in the
>> background. Copy-on-read will help make progress as the VM reads as
>> well.
>>
>> This only worked if the target supports backing images, so up until now
>> only for qcow2, with alloc-track any driver for the target can be used.
>>
>> If 'auto-remove' is set, alloc-track will automatically detach itself
>> once the backing image is removed. It will be replaced by 'file'.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>
>> Potentially for upstream as well, but let's maybe test it a bit here first ;)
>>
>>   block/Makefile.objs |   1 +
>>   block/alloc-track.c | 319 ++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 320 insertions(+)
>>   create mode 100644 block/alloc-track.c
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 28fb3b7f7c..04f91b8cc9 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -13,6 +13,7 @@ block-obj-$(CONFIG_QED) += qed-check.o
>>   block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
>>   block-obj-y += quorum.o
>>   block-obj-y += zeroinit.o
>> +block-obj-y += alloc-track.o
>>   block-obj-y += blkdebug.o blkverify.o blkreplay.o
>>   block-obj-$(CONFIG_PARALLELS) += parallels.o
>>   block-obj-y += blklogwrites.o
>> diff --git a/block/alloc-track.c b/block/alloc-track.c
>> new file mode 100644
>> index 0000000000..f2a4080579
>> --- /dev/null
>> +++ b/block/alloc-track.c
>> @@ -0,0 +1,319 @@
>> +/*
>> + * Filter to allow backing images to be applied to any node. Assumes a blank
>> + * image to begin with, only new writes are tracked as allocated, thus this
>> + * must never be put on a node that already contains data.
>> + *
>> + * Copyright (c) 2020 Proxmox Server Solutions GmbH
>> + * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> +#include "block/block_int.h"
>> +#include "qapi/qmp/qdict.h"
>> +#include "qapi/qmp/qstring.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/option.h"
>> +#include "qemu/module.h"
>> +
>> +#define TRACK_OPT_AUTO_REMOVE "auto-remove"
>> +
>> +typedef struct {
>> +    BdrvDirtyBitmap *bitmap;
>> +    bool dropping;
>> +    bool auto_remove;
>> +} BDRVAllocTrackState;
>> +
>> +static QemuOptsList runtime_opts = {
>> +    .name = "alloc-track",
>> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = TRACK_OPT_AUTO_REMOVE,
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "automatically replace this node with 'file' when 'backing'"
>> +                    "is detached",
>> +        },
>> +        { /* end of list */ }
>> +    },
>> +};
>> +
>> +static void track_refresh_limits(BlockDriverState *bs, Error **errp)
>> +{
>> +    BlockDriverInfo bdi;
>> +
>> +    if (!bs->file) {
>> +        return;
>> +    }
>> +
>> +    /* always use alignment from underlying write device so RMW cycle for
>> +     * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
>> +     * cluster allocation in 'file') */
>> +    bdrv_get_info(bs->file->bs, &bdi);
>> +    bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
>> +                                   MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
>> +}
>> +
>> +static int track_open(BlockDriverState *bs, QDict *options, int flags,
>> +                      Error **errp)
>> +{
>> +    BDRVAllocTrackState *s = bs->opaque;
>> +    QemuOpts *opts;
>> +    Error *local_err = NULL;
>> +    int ret = 0;
>> +
>> +    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
>> +    qemu_opts_absorb_qdict(opts, options, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
>> +
>> +    /* open the target (write) node, backing will be attached by block layer */
>> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>> +                               BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
>> +                               &local_err);
>> +    if (local_err) {
>> +        ret = -EINVAL;
>> +        error_propagate(errp, local_err);
>> +        goto fail;
>> +    }
>> +
>> +    track_refresh_limits(bs, errp);
>> +    uint64_t gran = bs->bl.request_alignment;
>> +    s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
>> +    if (local_err) {
>> +        ret = -EIO;
>> +        error_propagate(errp, local_err);
>> +        goto fail;
>> +    }
>> +
>> +    s->dropping = false;
>> +
>> +fail:
>> +    if (ret < 0) {
>> +        bdrv_unref_child(bs, bs->file);
>> +        if (s->bitmap) {
>> +            bdrv_release_dirty_bitmap(s->bitmap);
>> +        }
>> +    }
>> +    qemu_opts_del(opts);
>> +    return ret;
>> +}
>> +
>> +static void track_close(BlockDriverState *bs)
>> +{
>> +    BDRVAllocTrackState *s = bs->opaque;
>> +    if (s->bitmap) {
>> +        bdrv_release_dirty_bitmap(s->bitmap);
>> +    }
>> +}
>> +
>> +static int64_t track_getlength(BlockDriverState *bs)
>> +{
>> +    return bdrv_getlength(bs->file->bs);
>> +}
>> +
>> +static int coroutine_fn track_co_preadv(BlockDriverState *bs,
>> +    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
>> +{
>> +    BDRVAllocTrackState *s = bs->opaque;
>> +    QEMUIOVector local_qiov;
>> +    int ret;
>> +
>> +    /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
>> +    uint64_t cur_offset, local_offset;
>> +    int64_t local_bytes;
>> +    bool alloc;
>> +
>> +    /* a read request can span multiple granularity-sized chunks, and can thus
>> +     * contain blocks with different allocation status - we could just iterate
>> +     * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
>> +     * to find the next flip and consider everything up to that in one go */
>> +    for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
>> +        local_offset = offset + cur_offset;
>> +        alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
>> +        if (alloc) {
>> +            local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
>> +                                                      bytes - cur_offset);
>> +        } else {
>> +            local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
>> +                                                       bytes - cur_offset);
>> +        }
>> +
>> +        /* _bitmap_next_X return is -1 if no end found within limit, otherwise
>> +         * offset of next flip (to start of image) */
>> +        local_bytes = local_bytes < 0 ?
>> +            bytes - cur_offset :
>> +            local_bytes - local_offset;
>> +
>> +        qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
>> +
>> +        if (alloc) {
>> +            ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
>> +                                 &local_qiov, flags);
>> +        } else if (bs->backing) {
>> +            ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
>> +                                 &local_qiov, flags);
>> +        } else {
> 
> Here you handle the case where there's no backing file, but other
> functions (eg. track_co_block_status) simply assume that `bs->backing`
> is always valid.
> 
> Is this intentional?
> 

Not intentional, but the only place where bs->backing is assumed is in 
track_co_block_status. Fix for this is simple enough, since according to 
the comment in 'include/block/block_int.h' when you return 0, the block 
layer automatically checks if a backing image is attached and assumes 
zero otherwise.

So basically just to avoid a potential NULL deref:

-----------------
diff --git a/block/alloc-track.c b/block/alloc-track.c
index f2a4080579..b1a41be05a 100644
--- a/block/alloc-track.c
+++ b/block/alloc-track.c
@@ -222,10 +222,10 @@ static int coroutine_fn 
track_co_block_status(BlockDriverState *bs,
      if (alloc) {
          *file = bs->file->bs;
          return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
-    } else {
+    } else if (bs->backing) {
          *file = bs->backing->bs;
-        return 0;
      }
+    return 0;
  }

  static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
-----------------

Yeah?

> It makes me also wonder - since you just zero out the iov below - if
> this driver could maybe also replace the current zeroinit filter?
> Otherwise it's at least a good reminder that the zeroinit code could
> start making use of bitmaps as well ;-)
> 

But zeroinit operates on writes, not reads? Not sure how this driver 
would help...

>> +            ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
>> +        }
>> +
>> +        if (ret != 0) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
>> +    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
>> +{
>> +    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
>> +}
>> +
>> +static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
>> +    int64_t offset, int count, BdrvRequestFlags flags)
>> +{
>> +    return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
>> +}
>> +
>> +static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
>> +    int64_t offset, int count)
>> +{
>> +    return bdrv_co_pdiscard(bs->file, offset, count);
>> +}
>> +
>> +static coroutine_fn int track_co_flush(BlockDriverState *bs)
>> +{
>> +    return bdrv_co_flush(bs->file->bs);
>> +}
>> +
>> +static int coroutine_fn track_co_block_status(BlockDriverState *bs,
>> +                                              bool want_zero,
>> +                                              int64_t offset,
>> +                                              int64_t bytes,
>> +                                              int64_t *pnum,
>> +                                              int64_t *map,
>> +                                              BlockDriverState **file)
>> +{
>> +    BDRVAllocTrackState *s = bs->opaque;
>> +
>> +    bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
>> +    int64_t next_flipped;
>> +    if (alloc) {
>> +        next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
>> +    } else {
>> +        next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
>> +    }
>> +
>> +    /* in case not the entire region has the same state, we need to set pnum to
>> +     * indicate for how many bytes our result is valid */
>> +    *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
>> +    *map = offset;
>> +
>> +    if (alloc) {
>> +        *file = bs->file->bs;
>> +        return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
>> +    } else {
>> +        *file = bs->backing->bs;
>> +        return 0;
>> +    }
>> +}
>> +
>> +static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
>> +                             BdrvChildRole role, BlockReopenQueue *reopen_queue,
>> +                             uint64_t perm, uint64_t shared,
>> +                             uint64_t *nperm, uint64_t *nshared)
>> +{
>> +    BDRVAllocTrackState *s = bs->opaque;
>> +
>> +    *nshared = BLK_PERM_ALL;
>> +
>> +    /* in case we're currently dropping ourselves, claim to not use any
>> +     * permissions at all - which is fine, since from this point on we will
>> +     * never issue a read or write anymore */
>> +    if (s->dropping) {
>> +        *nperm = 0;
>> +        return;
>> +    }
>> +
>> +    if (role & BDRV_CHILD_DATA) {
>> +        *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
>> +    } else {
>> +        /* 'backing' is also a child of our BDS, but we don't expect it to be
>> +         * writeable, so we only forward 'consistent read' */
>> +        *nperm = perm & BLK_PERM_CONSISTENT_READ;
>> +    }
>> +}
>> +
>> +static void track_drop(void *opaque)
>> +{
>> +    BlockDriverState *bs = (BlockDriverState*)opaque;
>> +    BDRVAllocTrackState *s = bs->opaque;
>> +
>> +    bdrv_drained_begin(bs);
>> +
>> +    s->dropping = true;
>> +
>> +    bdrv_child_refresh_perms(bs, bs->file, &error_abort);
>> +    bdrv_replace_node(bs, bs->file->bs, &error_abort);
>> +    bdrv_set_backing_hd(bs, NULL, &error_abort);
>> +
>> +    bdrv_drained_end(bs);
>> +}
>> +
>> +static int track_change_backing_file(BlockDriverState *bs,
>> +                                     const char *backing_file,
>> +                                     const char *backing_fmt)
>> +{
>> +    if (backing_file == NULL && backing_fmt == NULL) {
>> +        /* backing file has been disconnected, there's no longer any use for
>> +         * this node, so let's remove ourselves from the block graph - we need
>> +         * to schedule this for later however, since when this function is
>> +         * called, the blockjob modifying us is probably not done yet and has a
>> +         * blocker on 'bs' */
>> +        aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static BlockDriver bdrv_alloc_track = {
>> +    .format_name                      = "alloc-track",
>> +    .instance_size                    = sizeof(BDRVAllocTrackState),
>> +
>> +    .bdrv_file_open                   = track_open,
>> +    .bdrv_close                       = track_close,
>> +    .bdrv_getlength                   = track_getlength,
>> +    .bdrv_child_perm                  = track_child_perm,
>> +    .bdrv_refresh_limits              = track_refresh_limits,
>> +
>> +    .bdrv_co_pwrite_zeroes            = track_co_pwrite_zeroes,
>> +    .bdrv_co_pwritev                  = track_co_pwritev,
>> +    .bdrv_co_preadv                   = track_co_preadv,
>> +    .bdrv_co_pdiscard                 = track_co_pdiscard,
>> +
>> +    .bdrv_co_flush                    = track_co_flush,
>> +    .bdrv_co_flush_to_disk            = track_co_flush,
>> +
>> +    .is_filter                        = true,
>> +    .supports_backing                 = true,
>> +
>> +    .bdrv_co_block_status             = track_co_block_status,
>> +    .bdrv_change_backing_file         = track_change_backing_file,
>> +};
>> +
>> +static void bdrv_alloc_track_init(void)
>> +{
>> +    bdrv_register(&bdrv_alloc_track);
>> +}
>> +
>> +block_init(bdrv_alloc_track_init);
>> -- 
>> 2.20.1




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [pbs-devel] [PATCH qemu 01/11] PVE: explicitly add libuuid as linking dependency
  2021-01-11 11:13 ` [pve-devel] [PATCH qemu 01/11] PVE: explicitly add libuuid as linking dependency Stefan Reiter
@ 2021-01-12 12:04   ` Thomas Lamprecht
  2021-01-18 10:27     ` [pve-devel] [PATCH pve-qemu] explicitly specify " Stefan Reiter
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Lamprecht @ 2021-01-12 12:04 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter, pve-devel

On 11.01.21 12:13, Stefan Reiter wrote:
> This previously only worked since linking against glusterfs pulls in
> libuuid as well. Make it more explicit and allow debug builds without
> glusterfs.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Unrelated to rest of series, just annoyed me.

can you send this as a (standalone) patch of the patch(es) adding the respective
object to the buildsystem?




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 06/11] make qemu_drive_mirror_monitor more generic
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 06/11] make qemu_drive_mirror_monitor more generic Stefan Reiter
@ 2021-01-12 13:19   ` Wolfgang Bumiller
  0 siblings, 0 replies; 26+ messages in thread
From: Wolfgang Bumiller @ 2021-01-12 13:19 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: pve-devel, pbs-devel

On Mon, Jan 11, 2021 at 12:14:04PM +0100, Stefan Reiter wrote:
> ...so it works with other block jobs as well. Intended use case is
> block-stream, which also requires a new "auto" (wait only) completion
> mode, since it finishes automatically anyway.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>  PVE/QemuServer.pm | 41 +++++++++++++++++++++++------------------
>  1 file changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index bca5669..d517dae 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -6777,15 +6777,16 @@ sub qemu_drive_mirror {
>  	die "mirroring error: $err\n";
>      }
>  
> -    qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga);
> +    qemu_drive_job_monitor("mirror", $vmid, $vmiddst, $jobs, $completion, $qga);
>  }
>  
>  # $completion can be either
>  # 'complete': wait until all jobs are ready, block-job-complete them (default)
>  # 'cancel': wait until all jobs are ready, block-job-cancel them
>  # 'skip': wait until all jobs are ready, return with block jobs in ready state
> -sub qemu_drive_mirror_monitor {
> -    my ($vmid, $vmiddst, $jobs, $completion, $qga) = @_;
> +# 'auto': wait until all jobs disappear, only use for jobs which complete automatically
> +sub qemu_drive_job_monitor {
> +    my ($op, $vmid, $vmiddst, $jobs, $completion, $qga) = @_;

I'd feel *much* safer if $op was added to the end and initialized with

    $op //= 'mirror';

>  
>      $completion //= 'complete';
>  
> @@ -6793,46 +6794,50 @@ sub qemu_drive_mirror_monitor {
>  	my $err_complete = 0;
>  
>  	while (1) {
> -	    die "storage migration timed out\n" if $err_complete > 300;
> +	    die "block job ('$op') timed out\n" if $err_complete > 300;
>  
>  	    my $stats = mon_cmd($vmid, "query-block-jobs");
>  
> -	    my $running_mirror_jobs = {};
> +	    my $running_jobs = {};
>  	    foreach my $stat (@$stats) {
> -		next if $stat->{type} ne 'mirror';
> -		$running_mirror_jobs->{$stat->{device}} = $stat;
> +		next if $stat->{type} ne $op;
> +		$running_jobs->{$stat->{device}} = $stat;
>  	    }
>  
>  	    my $readycounter = 0;
>  
>  	    foreach my $job (keys %$jobs) {
>  
> -	        if(defined($jobs->{$job}->{complete}) && !defined($running_mirror_jobs->{$job})) {
> -		    print "$job : finished\n";
> +		my $vanished = !defined($running_jobs->{$job});
> +		my $complete = defined($jobs->{$job}->{complete}) && $vanished;
> +	        if($complete || ($vanished && $completion eq 'auto')) {
> +		    print "$job: finished\n";
>  		    delete $jobs->{$job};
>  		    next;
>  		}
>  
> -		die "$job: mirroring has been cancelled\n" if !defined($running_mirror_jobs->{$job});
> +		die "$job: '$op' has been cancelled\n" if !defined($running_jobs->{$job});
>  
> -		my $busy = $running_mirror_jobs->{$job}->{busy};
> -		my $ready = $running_mirror_jobs->{$job}->{ready};
> -		if (my $total = $running_mirror_jobs->{$job}->{len}) {
> -		    my $transferred = $running_mirror_jobs->{$job}->{offset} || 0;
> +		my $busy = $running_jobs->{$job}->{busy};
> +		my $ready = $running_jobs->{$job}->{ready};
> +		if (my $total = $running_jobs->{$job}->{len}) {
> +		    my $transferred = $running_jobs->{$job}->{offset} || 0;
>  		    my $remaining = $total - $transferred;
>  		    my $percent = sprintf "%.2f", ($transferred * 100 / $total);
>  
>  		    print "$job: transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n";
>  		}
>  
> -		$readycounter++ if $running_mirror_jobs->{$job}->{ready};
> +		$readycounter++ if $running_jobs->{$job}->{ready};
>  	    }
>  
>  	    last if scalar(keys %$jobs) == 0;
>  
>  	    if ($readycounter == scalar(keys %$jobs)) {
> -		print "all mirroring jobs are ready \n";
> -		last if $completion eq 'skip'; #do the complete later
> +		print "all '$op' jobs are ready\n";
> +
> +		# do the complete later (or has already been done)
> +		last if $completion eq 'skip' || $completion eq 'auto';
>  
>  		if ($vmiddst && $vmiddst != $vmid) {
>  		    my $agent_running = $qga && qga_check_running($vmid);
> @@ -6888,7 +6893,7 @@ sub qemu_drive_mirror_monitor {
>  
>      if ($err) {
>  	eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
> -	die "mirroring error: $err";
> +	die "block job ('$op') error: $err";
>      }
>  
>  }
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver
  2021-01-12 11:29     ` Stefan Reiter
@ 2021-01-12 13:42       ` Wolfgang Bumiller
  0 siblings, 0 replies; 26+ messages in thread
From: Wolfgang Bumiller @ 2021-01-12 13:42 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: pve-devel, pbs-devel

On Tue, Jan 12, 2021 at 12:29:05PM +0100, Stefan Reiter wrote:
> On 12/01/2021 11:54, Wolfgang Bumiller wrote:
> > one inline comment, otherwise lgtm
> > 
> > On Mon, Jan 11, 2021 at 12:14:01PM +0100, Stefan Reiter wrote:
> > > Add a new filter node 'alloc-track', which seperates reads and writes to
> > > different children, thus allowing to put a backing image behind any
> > > blockdev (regardless of driver support). Since we can't detect any
> > > pre-allocated blocks, we can only track new writes, hence the write
> > > target ('file') for this node must always be empty.
> > > 
> > > Intended use case is for live restoring, i.e. add a backup image as a
> > > block device into a VM, then put an alloc-track on the restore target
> > > and set the backup as backing. With this, one can use a regular
> > > 'block-stream' to restore the image, while the VM can already run in the
> > > background. Copy-on-read will help make progress as the VM reads as
> > > well.
> > > 
> > > This only worked if the target supports backing images, so up until now
> > > only for qcow2, with alloc-track any driver for the target can be used.
> > > 
> > > If 'auto-remove' is set, alloc-track will automatically detach itself
> > > once the backing image is removed. It will be replaced by 'file'.
> > > 
> > > Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> > > ---
> > > 
> > > Potentially for upstream as well, but let's maybe test it a bit here first ;)
> > > 
> > >   block/Makefile.objs |   1 +
> > >   block/alloc-track.c | 319 ++++++++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 320 insertions(+)
> > >   create mode 100644 block/alloc-track.c
> > > 
> > > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > > index 28fb3b7f7c..04f91b8cc9 100644
> > > --- a/block/Makefile.objs
> > > +++ b/block/Makefile.objs
> > > @@ -13,6 +13,7 @@ block-obj-$(CONFIG_QED) += qed-check.o
> > >   block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
> > >   block-obj-y += quorum.o
> > >   block-obj-y += zeroinit.o
> > > +block-obj-y += alloc-track.o
> > >   block-obj-y += blkdebug.o blkverify.o blkreplay.o
> > >   block-obj-$(CONFIG_PARALLELS) += parallels.o
> > >   block-obj-y += blklogwrites.o
> > > diff --git a/block/alloc-track.c b/block/alloc-track.c
> > > new file mode 100644
> > > index 0000000000..f2a4080579
> > > --- /dev/null
> > > +++ b/block/alloc-track.c
> > > @@ -0,0 +1,319 @@
> > > +/*
> > > + * Filter to allow backing images to be applied to any node. Assumes a blank
> > > + * image to begin with, only new writes are tracked as allocated, thus this
> > > + * must never be put on a node that already contains data.
> > > + *
> > > + * Copyright (c) 2020 Proxmox Server Solutions GmbH
> > > + * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > + * See the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qapi/error.h"
> > > +#include "block/block_int.h"
> > > +#include "qapi/qmp/qdict.h"
> > > +#include "qapi/qmp/qstring.h"
> > > +#include "qemu/cutils.h"
> > > +#include "qemu/option.h"
> > > +#include "qemu/module.h"
> > > +
> > > +#define TRACK_OPT_AUTO_REMOVE "auto-remove"
> > > +
> > > +typedef struct {
> > > +    BdrvDirtyBitmap *bitmap;
> > > +    bool dropping;
> > > +    bool auto_remove;
> > > +} BDRVAllocTrackState;
> > > +
> > > +static QemuOptsList runtime_opts = {
> > > +    .name = "alloc-track",
> > > +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> > > +    .desc = {
> > > +        {
> > > +            .name = TRACK_OPT_AUTO_REMOVE,
> > > +            .type = QEMU_OPT_BOOL,
> > > +            .help = "automatically replace this node with 'file' when 'backing'"
> > > +                    "is detached",
> > > +        },
> > > +        { /* end of list */ }
> > > +    },
> > > +};
> > > +
> > > +static void track_refresh_limits(BlockDriverState *bs, Error **errp)
> > > +{
> > > +    BlockDriverInfo bdi;
> > > +
> > > +    if (!bs->file) {
> > > +        return;
> > > +    }
> > > +
> > > +    /* always use alignment from underlying write device so RMW cycle for
> > > +     * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
> > > +     * cluster allocation in 'file') */
> > > +    bdrv_get_info(bs->file->bs, &bdi);
> > > +    bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
> > > +                                   MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
> > > +}
> > > +
> > > +static int track_open(BlockDriverState *bs, QDict *options, int flags,
> > > +                      Error **errp)
> > > +{
> > > +    BDRVAllocTrackState *s = bs->opaque;
> > > +    QemuOpts *opts;
> > > +    Error *local_err = NULL;
> > > +    int ret = 0;
> > > +
> > > +    opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
> > > +    qemu_opts_absorb_qdict(opts, options, &local_err);
> > > +    if (local_err) {
> > > +        error_propagate(errp, local_err);
> > > +        ret = -EINVAL;
> > > +        goto fail;
> > > +    }
> > > +
> > > +    s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
> > > +
> > > +    /* open the target (write) node, backing will be attached by block layer */
> > > +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
> > > +                               BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
> > > +                               &local_err);
> > > +    if (local_err) {
> > > +        ret = -EINVAL;
> > > +        error_propagate(errp, local_err);
> > > +        goto fail;
> > > +    }
> > > +
> > > +    track_refresh_limits(bs, errp);
> > > +    uint64_t gran = bs->bl.request_alignment;
> > > +    s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
> > > +    if (local_err) {
> > > +        ret = -EIO;
> > > +        error_propagate(errp, local_err);
> > > +        goto fail;
> > > +    }
> > > +
> > > +    s->dropping = false;
> > > +
> > > +fail:
> > > +    if (ret < 0) {
> > > +        bdrv_unref_child(bs, bs->file);
> > > +        if (s->bitmap) {
> > > +            bdrv_release_dirty_bitmap(s->bitmap);
> > > +        }
> > > +    }
> > > +    qemu_opts_del(opts);
> > > +    return ret;
> > > +}
> > > +
> > > +static void track_close(BlockDriverState *bs)
> > > +{
> > > +    BDRVAllocTrackState *s = bs->opaque;
> > > +    if (s->bitmap) {
> > > +        bdrv_release_dirty_bitmap(s->bitmap);
> > > +    }
> > > +}
> > > +
> > > +static int64_t track_getlength(BlockDriverState *bs)
> > > +{
> > > +    return bdrv_getlength(bs->file->bs);
> > > +}
> > > +
> > > +static int coroutine_fn track_co_preadv(BlockDriverState *bs,
> > > +    uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
> > > +{
> > > +    BDRVAllocTrackState *s = bs->opaque;
> > > +    QEMUIOVector local_qiov;
> > > +    int ret;
> > > +
> > > +    /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
> > > +    uint64_t cur_offset, local_offset;
> > > +    int64_t local_bytes;
> > > +    bool alloc;
> > > +
> > > +    /* a read request can span multiple granularity-sized chunks, and can thus
> > > +     * contain blocks with different allocation status - we could just iterate
> > > +     * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
> > > +     * to find the next flip and consider everything up to that in one go */
> > > +    for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
> > > +        local_offset = offset + cur_offset;
> > > +        alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
> > > +        if (alloc) {
> > > +            local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
> > > +                                                      bytes - cur_offset);
> > > +        } else {
> > > +            local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
> > > +                                                       bytes - cur_offset);
> > > +        }
> > > +
> > > +        /* _bitmap_next_X return is -1 if no end found within limit, otherwise
> > > +         * offset of next flip (to start of image) */
> > > +        local_bytes = local_bytes < 0 ?
> > > +            bytes - cur_offset :
> > > +            local_bytes - local_offset;
> > > +
> > > +        qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
> > > +
> > > +        if (alloc) {
> > > +            ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
> > > +                                 &local_qiov, flags);
> > > +        } else if (bs->backing) {
> > > +            ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
> > > +                                 &local_qiov, flags);
> > > +        } else {
> > 
> > Here you handle the case where there's no backing file, but other
> > functions (eg. track_co_block_status) simply assume that `bs->backing`
> > is always valid.
> > 
> > Is this intentional?
> > 
> 
> Not intentional, but the only place where bs->backing is assumed is in
> track_co_block_status. Fix for this is simple enough, since according to the
> comment in 'include/block/block_int.h' when you return 0, the block layer
> automatically checks if a backing image is attached and assumes zero
> otherwise.
> 
> So basically just to avoid a potential NULL deref:
> 
> -----------------
> diff --git a/block/alloc-track.c b/block/alloc-track.c
> index f2a4080579..b1a41be05a 100644
> --- a/block/alloc-track.c
> +++ b/block/alloc-track.c
> @@ -222,10 +222,10 @@ static int coroutine_fn
> track_co_block_status(BlockDriverState *bs,
>      if (alloc) {
>          *file = bs->file->bs;
>          return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> -    } else {
> +    } else if (bs->backing) {
>          *file = bs->backing->bs;
> -        return 0;
>      }
> +    return 0;
>  }
> 
>  static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
> -----------------
> 
> Yeah?
> 
> > It makes me also wonder - since you just zero out the iov below - if
> > this driver could maybe also replace the current zeroinit filter?
> > Otherwise it's at least a good reminder that the zeroinit code could
> > start making use of bitmaps as well ;-)
> > 
> 
> But zeroinit operates on writes, not reads? Not sure how this driver would
> help...

It tracks write offsets, this one tracks writes in general. The main
task of zeroinit is to make `bdrv_co_pwrite_zeroes` a nop if the
destination isn't "dirty" (under the assumption that its initial state
is all zeroed out already). So the difference would just be that there's
no backing file and that `pwrite_zeroes` checks the bitmap.
But yeah, come to think of it, that might be conflating too many things
into one driver.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* [pve-devel] [PATCH pve-qemu] explicitly specify libuuid as linking dependency
  2021-01-12 12:04   ` [pve-devel] [pbs-devel] " Thomas Lamprecht
@ 2021-01-18 10:27     ` Stefan Reiter
  2021-01-27  8:29       ` Stefan Reiter
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Reiter @ 2021-01-18 10:27 UTC (permalink / raw)
  To: pve-devel, t.lamprecht

This previously only worked since linking against glusterfs pulls in
libuuid as well. Make it more explicit and allow debug builds without
glusterfs.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

patches in patches in patches in

 ...026-PVE-Backup-add-vma-backup-format-code.patch | 14 +++++++++++---
 ...VE-Backup-proxmox-backup-patches-for-qemu.patch | 14 +++++++++++---
 ...PVE-add-query_proxmox_support-QMP-command.patch |  2 +-
 ...fix-missing-crypt-and-compress-parameters.patch |  2 +-
 ...PBS-write-callback-with-big-blocks-correc.patch |  2 +-
 ...-zero-block-handling-to-PBS-dump-callback.patch |  2 +-
 ...48-PVE-add-query-pbs-bitmap-info-QMP-call.patch |  2 +-
 ...edirect-stderr-to-journal-when-daemonized.patch | 10 +++++-----
 ...Use-a-transaction-to-synchronize-job-stat.patch |  2 +-
 ...Use-more-coroutines-and-don-t-block-on-fi.patch |  2 +-
 ...-clean-up-error-handling-for-create_backu.patch |  2 +-
 ...PVE-Migrate-dirty-bitmap-state-via-savevm.patch |  2 +-
 12 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/debian/patches/pve/0026-PVE-Backup-add-vma-backup-format-code.patch b/debian/patches/pve/0026-PVE-Backup-add-vma-backup-format-code.patch
index bbc4562..94e27f8 100644
--- a/debian/patches/pve/0026-PVE-Backup-add-vma-backup-format-code.patch
+++ b/debian/patches/pve/0026-PVE-Backup-add-vma-backup-format-code.patch
@@ -5,12 +5,12 @@ Subject: [PATCH] PVE-Backup: add vma backup format code
 
 ---
  Makefile      |   3 +-
- Makefile.objs |   1 +
+ Makefile.objs |   2 +
  vma-reader.c  | 857 ++++++++++++++++++++++++++++++++++++++++++++++++++
  vma-writer.c  | 790 ++++++++++++++++++++++++++++++++++++++++++++++
  vma.c         | 839 ++++++++++++++++++++++++++++++++++++++++++++++++
  vma.h         | 150 +++++++++
- 6 files changed, 2639 insertions(+), 1 deletion(-)
+ 6 files changed, 2640 insertions(+), 1 deletion(-)
  create mode 100644 vma-reader.c
  create mode 100644 vma-writer.c
  create mode 100644 vma.c
@@ -38,7 +38,7 @@ index 13dd708c4a..7b8c17ce2d 100644
  qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS)
  
 diff --git a/Makefile.objs b/Makefile.objs
-index a1307c12a8..ade7b17a69 100644
+index a1307c12a8..674b7ccf3e 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -17,6 +17,7 @@ block-obj-y = block/ nbd/ scsi/
@@ -49,6 +49,14 @@ index a1307c12a8..ade7b17a69 100644
  
  block-obj-m = block/
  
+@@ -51,6 +52,7 @@ common-obj-y += net/
+ common-obj-y += qdev-monitor.o
+ common-obj-$(CONFIG_WIN32) += os-win32.o
+ common-obj-$(CONFIG_POSIX) += os-posix.o
++vma-writer.o-libs := -luuid
+ 
+ common-obj-$(CONFIG_LINUX) += fsdev/
+ 
 diff --git a/vma-reader.c b/vma-reader.c
 new file mode 100644
 index 0000000000..2b1d1cdab3
diff --git a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
index 7896692..c669717 100644
--- a/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
+++ b/debian/patches/pve/0028-PVE-Backup-proxmox-backup-patches-for-qemu.patch
@@ -5,7 +5,7 @@ Subject: [PATCH] PVE-Backup: proxmox backup patches for qemu
 
 ---
  Makefile                       |   1 +
- Makefile.objs                  |   2 +
+ Makefile.objs                  |   3 +
  Makefile.target                |   2 +-
  block/monitor/block-hmp-cmds.c |  33 ++
  blockdev.c                     |   1 +
@@ -20,7 +20,7 @@ Subject: [PATCH] PVE-Backup: proxmox backup patches for qemu
  qapi/block-core.json           | 109 ++++
  qapi/common.json               |  13 +
  qapi/misc.json                 |  13 -
- 16 files changed, 1440 insertions(+), 15 deletions(-)
+ 16 files changed, 1441 insertions(+), 15 deletions(-)
  create mode 100644 proxmox-backup-client.c
  create mode 100644 proxmox-backup-client.h
  create mode 100644 pve-backup.c
@@ -38,7 +38,7 @@ index 7b8c17ce2d..aec216968d 100644
  
  qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS)
 diff --git a/Makefile.objs b/Makefile.objs
-index ade7b17a69..240eb503f2 100644
+index 674b7ccf3e..3c22019042 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -33,6 +33,7 @@ endif # CONFIG_SOFTMMU or CONFIG_TOOLS
@@ -57,6 +57,14 @@ index ade7b17a69..240eb503f2 100644
  common-obj-y += dump/
  common-obj-y += job-qmp.o
  common-obj-y += monitor/
+@@ -53,6 +55,7 @@ common-obj-y += qdev-monitor.o
+ common-obj-$(CONFIG_WIN32) += os-win32.o
+ common-obj-$(CONFIG_POSIX) += os-posix.o
+ vma-writer.o-libs := -luuid
++pve-backup.o-libs := -luuid
+ 
+ common-obj-$(CONFIG_LINUX) += fsdev/
+ 
 diff --git a/Makefile.target b/Makefile.target
 index ffa2657269..fd3fd6d2a7 100644
 --- a/Makefile.target
diff --git a/debian/patches/pve/0044-PVE-add-query_proxmox_support-QMP-command.patch b/debian/patches/pve/0044-PVE-add-query_proxmox_support-QMP-command.patch
index e2dc20f..5d14f08 100644
--- a/debian/patches/pve/0044-PVE-add-query_proxmox_support-QMP-command.patch
+++ b/debian/patches/pve/0044-PVE-add-query_proxmox_support-QMP-command.patch
@@ -16,7 +16,7 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  2 files changed, 33 insertions(+)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index bfb648d6b5..6bf138cfc6 100644
+index bfb648d6b5..ba9d0d8a86 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -1051,3 +1051,11 @@ BackupStatus *qmp_query_backup(Error **errp)
diff --git a/debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch b/debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
index 34b0f51..d4a03be 100644
--- a/debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
+++ b/debian/patches/pve/0045-pbs-fix-missing-crypt-and-compress-parameters.patch
@@ -9,7 +9,7 @@ Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index 6bf138cfc6..cd3a132d8b 100644
+index ba9d0d8a86..e1dcf10a45 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -958,6 +958,8 @@ UuidInfo *qmp_backup(
diff --git a/debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch b/debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
index d45a455..7457eb6 100644
--- a/debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
+++ b/debian/patches/pve/0046-PVE-handle-PBS-write-callback-with-big-blocks-correc.patch
@@ -17,7 +17,7 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  1 file changed, 22 insertions(+), 8 deletions(-)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index cd3a132d8b..f14273645a 100644
+index e1dcf10a45..3eba85506a 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -67,6 +67,7 @@ opts_init(pvebackup_init);
diff --git a/debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch b/debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
index 930ec8f..3bb6b35 100644
--- a/debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
+++ b/debian/patches/pve/0047-PVE-add-zero-block-handling-to-PBS-dump-callback.patch
@@ -20,7 +20,7 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  1 file changed, 9 insertions(+), 5 deletions(-)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index f14273645a..bd802c6205 100644
+index 3eba85506a..40c2697b37 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -8,6 +8,7 @@
diff --git a/debian/patches/pve/0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch b/debian/patches/pve/0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch
index 54f1dfa..f011610 100644
--- a/debian/patches/pve/0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch
+++ b/debian/patches/pve/0048-PVE-add-query-pbs-bitmap-info-QMP-call.patch
@@ -68,7 +68,7 @@ index 3ff014d32a..c3227a1498 100644
                             info->zero_bytes, zero_per);
  
 diff --git a/pve-backup.c b/pve-backup.c
-index bd802c6205..2db4a62580 100644
+index 40c2697b37..1e7b92a950 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -46,6 +46,7 @@ static struct PVEBackupState {
diff --git a/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch b/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
index 1d54282..ae92ff4 100644
--- a/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
+++ b/debian/patches/pve/0049-PVE-redirect-stderr-to-journal-when-daemonized.patch
@@ -1,6 +1,6 @@
 From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
 From: Stefan Reiter <s.reiter@proxmox.com>
-Date: Tue, 30 Jun 2020 13:10:10 +0200
+Date: Tue, 12 Jan 2021 14:12:20 +0100
 Subject: [PATCH] PVE: redirect stderr to journal when daemonized
 
 QEMU uses the logging for error messages usually, so LOG_ERR is most
@@ -13,13 +13,13 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  2 files changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/Makefile.objs b/Makefile.objs
-index 240eb503f2..c7ba4e11e7 100644
+index 3c22019042..744c26d3b0 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
-@@ -54,6 +54,7 @@ common-obj-y += net/
- common-obj-y += qdev-monitor.o
- common-obj-$(CONFIG_WIN32) += os-win32.o
+@@ -56,6 +56,7 @@ common-obj-$(CONFIG_WIN32) += os-win32.o
  common-obj-$(CONFIG_POSIX) += os-posix.o
+ vma-writer.o-libs := -luuid
+ pve-backup.o-libs := -luuid
 +os-posix.o-libs := -lsystemd
  
  common-obj-$(CONFIG_LINUX) += fsdev/
diff --git a/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch b/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
index daa2498..bb3b026 100644
--- a/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
+++ b/debian/patches/pve/0051-PVE-Backup-Use-a-transaction-to-synchronize-job-stat.patch
@@ -16,7 +16,7 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  1 file changed, 49 insertions(+), 118 deletions(-)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index 2db4a62580..b52f4a9364 100644
+index 1e7b92a950..f3df43eb8c 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -52,6 +52,7 @@ static struct PVEBackupState {
diff --git a/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch b/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
index 49213d9..556890f 100644
--- a/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
+++ b/debian/patches/pve/0052-PVE-Backup-Use-more-coroutines-and-don-t-block-on-fi.patch
@@ -38,7 +38,7 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  2 files changed, 95 insertions(+), 58 deletions(-)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index b52f4a9364..4402c0cb17 100644
+index f3df43eb8c..f3b8ce1f3a 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -33,7 +33,9 @@ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
diff --git a/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch b/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
index 873fd37..92dc3e0 100644
--- a/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
+++ b/debian/patches/pve/0053-PVE-fix-and-clean-up-error-handling-for-create_backu.patch
@@ -22,7 +22,7 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
  1 file changed, 54 insertions(+), 25 deletions(-)
 
 diff --git a/pve-backup.c b/pve-backup.c
-index 4402c0cb17..c7cde0fb0e 100644
+index f3b8ce1f3a..ded90cb822 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -50,6 +50,7 @@ static struct PVEBackupState {
diff --git a/debian/patches/pve/0055-PVE-Migrate-dirty-bitmap-state-via-savevm.patch b/debian/patches/pve/0055-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
index 1c64645..f325c0a 100644
--- a/debian/patches/pve/0055-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
+++ b/debian/patches/pve/0055-PVE-Migrate-dirty-bitmap-state-via-savevm.patch
@@ -159,7 +159,7 @@ index 0000000000..29f2b3860d
 +                         NULL);
 +}
 diff --git a/pve-backup.c b/pve-backup.c
-index c7cde0fb0e..f65f1dda26 100644
+index ded90cb822..6b25292ba1 100644
 --- a/pve-backup.c
 +++ b/pve-backup.c
 @@ -1130,5 +1130,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
-- 
2.20.1





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH pve-qemu] explicitly specify libuuid as linking dependency
  2021-01-18 10:27     ` [pve-devel] [PATCH pve-qemu] explicitly specify " Stefan Reiter
@ 2021-01-27  8:29       ` Stefan Reiter
  0 siblings, 0 replies; 26+ messages in thread
From: Stefan Reiter @ 2021-01-27  8:29 UTC (permalink / raw)
  To: pve-devel

On 18/01/2021 11:27, Stefan Reiter wrote:
> This previously only worked since linking against glusterfs pulls in
> libuuid as well. Make it more explicit and allow debug builds without
> glusterfs.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 

FYI: This patch can be seen as superseded by my 5.2 series, which 
applies on top of current master and also includes the libuuid fixes, 
since I had to rewrite all the custom linking for meson anyway.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives
  2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives Stefan Reiter
@ 2021-01-28 16:25   ` Thomas Lamprecht
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Lamprecht @ 2021-01-28 16:25 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter, pbs-devel

On 11.01.21 12:14, Stefan Reiter wrote:
> Uses the custom 'alloc-track' filter node to redirect writes to the
> original drives target, while unwritten blocks will be read from the
> specified PBS snapshot.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>  PVE/QemuServer.pm | 42 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index d517dae..6de6ff6 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -3022,7 +3022,8 @@ sub query_understood_cpu_flags {
>  }
>  
>  sub config_to_command {
> -    my ($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu) = @_;
> +    my ($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu,
> +        $pbs_backing) = @_;
>  
>      my $cmd = [];
>      my $globalFlags = [];
> @@ -3521,6 +3522,32 @@ sub config_to_command {
>  	my $drive_cmd = print_drive_commandline_full($storecfg, $vmid, $drive);
>  	$drive_cmd .= ',readonly' if PVE::QemuConfig->is_template($conf);
>  
> +	if ($pbs_backing && defined($pbs_backing->{$ds})) {

please avoid extending config_to_command to much, move to own method.

> +	    my $pbs_conf = $pbs_backing->{$ds};
> +	    my $pbs_name = "drive-$ds-pbs";
> +
> +	    # add PBS block device to access snapshot from QEMU
> +	    my $blockdev = "driver=pbs,node-name=$pbs_name,read-only=on";
> +	    $blockdev .= ",repository=$pbs_conf->{repository}";
> +	    $blockdev .= ",snapshot=$pbs_conf->{snapshot}";
> +	    $blockdev .= ",archive=$pbs_conf->{archive}";
> +	    $blockdev .= ",keyfile=$pbs_conf->{keyfile}" if $pbs_conf->{keyfile};
> +	    push @$devices, '-blockdev', $blockdev;
> +
> +	    # modify drive command to put the desired target file behind an
> +	    # 'alloc-track' node
> +	    $drive_cmd =~ s/file=([^,]+)/file.file.filename=$1/;
> +	    $drive_cmd =~ s/,format=([^,]+)/,file.driver=$1,format=alloc-track/;

This seems really fragile and hacky, please avoid string replacements on variables
we assemble ourself.

> +	    $drive_cmd .= ",backing=$pbs_name";
> +	    $drive_cmd .= ",auto-remove=on";
> +
> +	    # note: 'cache' and 'aio' directly affect the 'drive' parameter, so
> +	    # we don't need to change them to 'file.', but 'detect-zeroes' works
> +	    # per blockdev and we want it to persist after the alloc-track is
> +	    # removed, so put it on 'file' directly
> +	    $drive_cmd =~ s/,detect-zeroes=([^,]+)/,file.detect-zeroes=$1/;
> +	}
> +
>  	push @$devices, '-drive',$drive_cmd;
>  	push @$devices, '-device', print_drivedevice_full(
>  	    $storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
> @@ -4915,6 +4942,15 @@ sub vm_start {
>  #   timeout => in seconds
>  #   paused => start VM in paused state (backup)
>  #   resume => resume from hibernation
> +#   pbs-backing => {
> +#      sata0 => {
> +#         repository
> +#         snapshot
> +#         keyfile
> +#         archive
> +#      },
> +#      virtio2 => ...
> +#   }
>  # migrate_opts:
>  #   nbd => volumes for NBD exports (vm_migrate_alloc_nbd_disks)
>  #   migratedfrom => source node
> @@ -4961,8 +4997,8 @@ sub vm_start_nolock {
>  	print "Resuming suspended VM\n";
>      }
>  
> -    my ($cmd, $vollist, $spice_port) =
> -	config_to_command($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu);
> +    my ($cmd, $vollist, $spice_port) = config_to_command($storecfg, $vmid,
> +	$conf, $defaults, $forcemachine, $forcecpu, $params->{'pbs-backing'});
>  
>      my $migration_ip;
>      my $get_migration_ip = sub {
> 






^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-28 16:25 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 11:13 [pve-devel] [PATCH 00/11] live-restore for PBS snapshots Stefan Reiter
2021-01-11 11:13 ` [pve-devel] [PATCH qemu 01/11] PVE: explicitly add libuuid as linking dependency Stefan Reiter
2021-01-12 12:04   ` [pve-devel] [pbs-devel] " Thomas Lamprecht
2021-01-18 10:27     ` [pve-devel] [PATCH pve-qemu] explicitly specify " Stefan Reiter
2021-01-27  8:29       ` Stefan Reiter
2021-01-11 11:14 ` [pve-devel] [PATCH qemu 02/11] PVE: block/pbs: fast-path reads without allocation if possible Stefan Reiter
2021-01-12  9:29   ` Wolfgang Bumiller
2021-01-11 11:14 ` [pve-devel] [PATCH qemu 03/11] block: add alloc-track driver Stefan Reiter
2021-01-12 10:54   ` Wolfgang Bumiller
2021-01-12 11:29     ` Stefan Reiter
2021-01-12 13:42       ` Wolfgang Bumiller
2021-01-11 11:14 ` [pve-devel] [PATCH proxmox-backup 04/11] RemoteChunkReader: add LRU cached variant Stefan Reiter
2021-01-11 11:14 ` [pve-devel] [PATCH proxmox-backup-qemu 05/11] access: use bigger cache and LRU chunk reader Stefan Reiter
2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 06/11] make qemu_drive_mirror_monitor more generic Stefan Reiter
2021-01-12 13:19   ` Wolfgang Bumiller
2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 07/11] cfg2cmd: allow PBS snapshots as backing files for drives Stefan Reiter
2021-01-28 16:25   ` Thomas Lamprecht
2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 08/11] enable live-restore for PBS Stefan Reiter
2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 09/11] extract register_qmeventd_handle to QemuServer.pm Stefan Reiter
2021-01-11 11:14 ` [pve-devel] [PATCH qemu-server 10/11] live-restore: register qmeventd handle Stefan Reiter
2021-01-11 11:14 ` [pve-devel] [PATCH manager 11/11] ui: restore: add live-restore checkbox Stefan Reiter
2021-01-11 15:50 ` [pve-devel] [PATCH 00/11] live-restore for PBS snapshots aderumier
2021-01-11 16:42   ` Stefan Reiter
2021-01-12  9:10     ` aderumier
2021-01-12 10:31   ` [pve-devel] [pbs-devel] " Thomas Lamprecht
2021-01-12 11:23     ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal