[pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing
@ 2024-03-15 10:24 Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 01/21] block/copy-before-write: fix permission Fiona Ebner
                   ` (20 more replies)
  0 siblings, 21 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

Changes in v2 (thanks - not limited to - to Fabian and Alexandre for
feedback!):
    * Use v3 of "discard-source" upstream series (v4 was posted in the
      meantime but without any semantic change)
    * Add patches to specify minimum cluster size during backup, to
      allow discard to work even if fleecing image has larger cluster
      size than backup target.
    * Add permission check for fleecing storage.
    * Record fleecing image in config to be able to clean up after
      hard failure.
    * Do not use "same storage as image" as default fleecing storage.
    * Use qcow2 for fleecing image if storage supports it
    * Flesh out recommendations for fleecing storage in docs.

When a backup for a VM is started, QEMU will install a
"copy-before-write" filter in its block layer. This filter ensures
that upon new guest writes, old data still needed for the backup is
sent to the backup target first. The guest write blocks until this
operation is finished so guest IO to not-yet-backed-up sectors will be
limited by the speed of the backup target.

With backup fleecing, such old data is cached in a fleecing image
rather than sent directly to the backup target. This can help guest IO
performance and even prevent hangs in certain scenarios, at the cost
of requiring more storage space.

With this series it will be possible to enable backup-fleecing via
e.g. `vzdump 123 --fleecing enabled=1,storage=local-lvm` with fleecing
images created on the storage `local-lvm`. The fleecing storage should
be a fast local storage which supports thin-provisioning and discard.
If the storage supports qcow2, that is used as the fleecing image
format. If the underlying file system does not support discard, with
qcow2 and preallocation=off, at least already allocated parts of the
image can be re-used later.

Fleecing images are created by qemu-server via pve-storage and
attached to QEMU before the backup starts, and cleaned up after the
backup finished or failed. The naming schema for fleecing images is
'vm-ID-fleece-N(.FORMAT)'. The allocated images are recorded in the
guest configuration, so that even after a hard failure, clean-up can
be re-attempted. While not too bad, it's a non-trivial amount of code
and I'm not 100% sure about the cost-benefit, so sending those as RFC.

The fleecing image needs to be the exact same size as the source, but
luckily, an explicit size can be specified when attaching a raw image
to QEMU so there are no size issues when using storages that have
coarser allocation/round up. For qcow2, it seems that virtual size can
be nearly arbitrary (i.e. modulo 512 byte granularity) during
allocation.

While tests seem fine so far, most important part to review is the
setup of the backup job and bitmap handling inside QEMU.

QEMU patches are for the submodule for better reviewability. There are
two prerequisites (that are expected to be picked up by upstream at
some point):

1. For being able to discard the fleecing image, addition of a
discard-source parameter [0].

2. In combination with discard, cluster size issue when fleecing image
has a larger cluster size than backup target. Proposed workaround is
to be able to specify the minimum granularity for the backup job [1].

Dependencies:
pve-manager -> pve-guest-common -> pve-common
            \-> qemu-server

Plus new pve-qemu-kvm to actually be able to use the feature.

[0]: https://lore.kernel.org/qemu-devel/20240228141501.455989-1-vsementsov@yandex-team.ru/
[1]: https://lore.kernel.org/qemu-devel/20240308155158.830258-1-f.ebner@proxmox.com/

qemu:

Fiona Ebner (3):
  copy-before-write: allow specifying minimum cluster size
  backup: add minimum cluster size to performance options
  PVE backup: add fleecing option

Vladimir Sementsov-Ogievskiy (4):
  block/copy-before-write: fix permission
  block/copy-before-write: support unligned snapshot-discard
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter

 block/backup.c                         |   5 +-
 block/block-copy.c                     |  29 ++++-
 block/copy-before-write.c              |  42 ++++++--
 block/copy-before-write.h              |   2 +
 block/monitor/block-hmp-cmds.c         |   1 +
 block/replication.c                    |   4 +-
 blockdev.c                             |   5 +-
 include/block/block-common.h           |   2 +
 include/block/block-copy.h             |   3 +
 include/block/block_int-global-state.h |   2 +-
 pve-backup.c                           | 143 ++++++++++++++++++++++++-
 qapi/block-core.json                   |  29 ++++-
 tests/qemu-iotests/257.out             | 112 +++++++++----------
 13 files changed, 298 insertions(+), 81 deletions(-)

common:

Fiona Ebner (1):
  json schema: add format description for pve-storage-id standard option

 src/PVE/JSONSchema.pm | 1 +
 1 file changed, 1 insertion(+)

guest-common:

Fiona Ebner (3):
  vzdump: schema: add fleecing property string
  vzdump: schema: make storage for fleecing semi-optional
  abstract config: do not copy fleecing images entry for snapshot

 src/PVE/AbstractConfig.pm |  1 +
 src/PVE/VZDump/Common.pm  | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

manager:

Fiona Ebner (2):
  vzdump: handle new 'fleecing' property string
  api: backup/vzdump: add permission check for fleecing storage

 PVE/API2/Backup.pm | 10 ++++++++--
 PVE/API2/VZDump.pm |  9 +++++----
 PVE/VZDump.pm      | 12 ++++++++++++
 3 files changed, 25 insertions(+), 6 deletions(-)

qemu-server:

Fiona Ebner (7):
  backup: disk info: also keep track of size
  backup: implement fleecing option
  parse config: allow config keys with minus sign
  schema: add fleecing-images config property
  vzdump: better cleanup fleecing images after hard errors
  migration: attempt to clean up potential left-over fleecing images
  destroy vm: clean up potential left-over fleecing images

 PVE/API2/Qemu.pm         |   9 +++
 PVE/QemuConfig.pm        |  40 ++++++++++
 PVE/QemuMigrate.pm       |   3 +
 PVE/QemuServer.pm        |  12 ++-
 PVE/VZDump/QemuServer.pm | 163 ++++++++++++++++++++++++++++++++++++++-
 5 files changed, 224 insertions(+), 3 deletions(-)

docs:

Fiona Ebner (1):
  vzdump: add section about backup fleecing

 vzdump.adoc | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

Summary over all repositories:
  25 files changed, 624 insertions(+), 90 deletions(-)

-- 
Generated by git-murpp 0.5.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu v2 01/21] block/copy-before-write: fix permission
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 02/21] block/copy-before-write: support unligned snapshot-discard Fiona Ebner
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

In case when source node does not have any parents, the condition still
works as required: backup job do create the parent by

  block_job_create -> block_job_add_bdrv -> bdrv_root_attach_child

Still, in this case checking @perm variable doesn't work, as backup job
creates the root blk with empty permissions (as it rely on CBW filter
to require correct permissions and don't want to create extra
conflicts).

So, we should not check @perm.

The hack may be dropped entirely when transactional insertion of
filter (when we don't try to recalculate permissions in intermediate
state, when filter does conflict with original parent of the source
node) merged (old big series
"[PATCH v5 00/45] Transactional block-graph modifying API"[1] and it's
current in-flight part is "[PATCH v8 0/7] blockdev-replace"[2])

[1] https://patchew.org/QEMU/20220330212902.590099-1-vsementsov@openvz.org/
[2] https://patchew.org/QEMU/20231017184444.932733-1-vsementsov@yandex-team.ru/

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * update to latest upstream version

 block/copy-before-write.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index b866e42271..a2dddf6f57 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -364,9 +364,13 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
                            perm, shared, nperm, nshared);
 
         if (!QLIST_EMPTY(&bs->parents)) {
-            if (perm & BLK_PERM_WRITE) {
-                *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
-            }
+            /*
+             * Note, that source child may be shared with backup job. Backup job
+             * does create own blk parent on copy-before-write node, so this
+             * works even if source node does not have any parents before backup
+             * start
+             */
+            *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
             *nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
         }
     }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu v2 02/21] block/copy-before-write: support unligned snapshot-discard
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 01/21] block/copy-before-write: fix permission Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 03/21] block/copy-before-write: create block_copy bitmap in filter node Fiona Ebner
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

First thing that crashes on unligned access here is
bdrv_reset_dirty_bitmap(). Correct way is to align-down the
snapshot-discard request.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * update to latest upstream version

 block/copy-before-write.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index a2dddf6f57..0a219c2b75 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -325,14 +325,24 @@ static int coroutine_fn GRAPH_RDLOCK
 cbw_co_pdiscard_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
     BDRVCopyBeforeWriteState *s = bs->opaque;
+    uint32_t cluster_size = block_copy_cluster_size(s->bcs);
+    int64_t aligned_offset = QEMU_ALIGN_UP(offset, cluster_size);
+    int64_t aligned_end = QEMU_ALIGN_DOWN(offset + bytes, cluster_size);
+    int64_t aligned_bytes;
+
+    if (aligned_end <= aligned_offset) {
+        return 0;
+    }
+    aligned_bytes = aligned_end - aligned_offset;
 
     WITH_QEMU_LOCK_GUARD(&s->lock) {
-        bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+        bdrv_reset_dirty_bitmap(s->access_bitmap, aligned_offset,
+                                aligned_bytes);
     }
 
-    block_copy_reset(s->bcs, offset, bytes);
+    block_copy_reset(s->bcs, aligned_offset, aligned_bytes);
 
-    return bdrv_co_pdiscard(s->target, offset, bytes);
+    return bdrv_co_pdiscard(s->target, aligned_offset, aligned_bytes);
 }
 
 static void cbw_refresh_filename(BlockDriverState *bs)
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu v2 03/21] block/copy-before-write: create block_copy bitmap in filter node
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 01/21] block/copy-before-write: fix permission Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 02/21] block/copy-before-write: support unligned snapshot-discard Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 04/21] qapi: blockdev-backup: add discard-source parameter Fiona Ebner
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Currently block_copy creates copy_bitmap in source node. But that is in
bad relation with .independent_close=true of copy-before-write filter:
source node may be detached and removed before .bdrv_close() handler
called, which should call block_copy_state_free(), which in turn should
remove copy_bitmap.

That's all not ideal: it would be better if internal bitmap of
block-copy object is not attached to any node. But that is not possible
now.

The simplest solution is just create copy_bitmap in filter node, where
anyway two other bitmaps are created.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * update to latest upstream version

 block/block-copy.c         |   3 +-
 block/copy-before-write.c  |   2 +-
 include/block/block-copy.h |   1 +
 tests/qemu-iotests/257.out | 112 ++++++++++++++++++-------------------
 4 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index e13d7bc6b6..b61685f1a2 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -346,6 +346,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+                                     BlockDriverState *copy_bitmap_bs,
                                      const BdrvDirtyBitmap *bitmap,
                                      Error **errp)
 {
@@ -360,7 +361,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         return NULL;
     }
 
-    copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
+    copy_bitmap = bdrv_create_dirty_bitmap(copy_bitmap_bs, cluster_size, NULL,
                                            errp);
     if (!copy_bitmap) {
         return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 0a219c2b75..d3b95bd600 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -470,7 +470,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+    s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         ret = -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 0700953ab8..8b41643bfa 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+                                     BlockDriverState *copy_bitmap_bs,
                                      const BdrvDirtyBitmap *bitmap,
                                      Error **errp);
 
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index aa76131ca9..c33dd7f3a9 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -120,16 +120,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -596,16 +596,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -865,16 +865,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -1341,16 +1341,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -1610,16 +1610,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -2086,16 +2086,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -2355,16 +2355,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -2831,16 +2831,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -3100,16 +3100,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -3576,16 +3576,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -3845,16 +3845,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -4321,16 +4321,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -4590,16 +4590,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
@@ -5066,16 +5066,16 @@ write -P0x67 0x3fe0000 0x20000
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      }
-    ],
-    "drive0": [
+      },
       {
         "busy": false,
         "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-      },
+      }
+    ],
+    "drive0": [
       {
         "busy": false,
         "count": 458752,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu v2 04/21] qapi: blockdev-backup: add discard-source parameter
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (2 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 03/21] block/copy-before-write: create block_copy bitmap in filter node Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 05/21] copy-before-write: allow specifying minimum cluster size Fiona Ebner
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:

[guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
   |            |
   | root       | file
   v            v
[copy-before-write]
   |             |
   | file        | target
   v             v
[active disk]   [temp.img]

In this case discard-after-copy does two things:

 - discard data in temp.img to save disk space
 - avoid further copy-before-write operation in discarded area

Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Still we can't take it
unconditionally, as it will break normal backup from RO source. So, we
have to add a parameter and pass it thorough bdrv_open flags.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * update to latest upstream version

 block/backup.c                         |  5 +++--
 block/block-copy.c                     |  9 +++++++++
 block/copy-before-write.c              | 15 +++++++++++++--
 block/copy-before-write.h              |  1 +
 block/replication.c                    |  4 ++--
 blockdev.c                             |  2 +-
 include/block/block-common.h           |  2 ++
 include/block/block-copy.h             |  1 +
 include/block/block_int-global-state.h |  2 +-
 qapi/block-core.json                   |  4 ++++
 10 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index af87fa6aa9..3dc955f625 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -332,7 +332,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *target, int64_t speed,
                   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
                   BitmapSyncMode bitmap_mode,
-                  bool compress,
+                  bool compress, bool discard_source,
                   const char *filter_node_name,
                   BackupPerf *perf,
                   BlockdevOnError on_source_error,
@@ -429,7 +429,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         goto error;
     }
 
-    cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
+    cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
+                          &bcs, errp);
     if (!cbw) {
         goto error;
     }
diff --git a/block/block-copy.c b/block/block-copy.c
index b61685f1a2..3c61e52bae 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -137,6 +137,7 @@ typedef struct BlockCopyState {
     CoMutex lock;
     int64_t in_flight_bytes;
     BlockCopyMethod method;
+    bool discard_source;
     BlockReqList reqs;
     QLIST_HEAD(, BlockCopyCallState) calls;
     /*
@@ -348,6 +349,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      BlockDriverState *copy_bitmap_bs,
                                      const BdrvDirtyBitmap *bitmap,
+                                     bool discard_source,
                                      Error **errp)
 {
     ERRP_GUARD();
@@ -409,6 +411,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                     cluster_size),
     };
 
+    s->discard_source = discard_source;
     block_copy_set_copy_opts(s, false, false);
 
     ratelimit_init(&s->rate_limit);
@@ -580,6 +583,12 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
     co_put_to_shres(s->mem, t->req.bytes);
     block_copy_task_end(t, ret);
 
+    if (s->discard_source && ret == 0) {
+        int64_t nbytes =
+            MIN(t->req.offset + t->req.bytes, s->len) - t->req.offset;
+        bdrv_co_pdiscard(s->source, t->req.offset, nbytes);
+    }
+
     return ret;
 }
 
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index d3b95bd600..3503702d71 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState {
     BdrvChild *target;
     OnCbwError on_cbw_error;
     uint32_t cbw_timeout_ns;
+    bool discard_source;
 
     /*
      * @lock: protects access to @access_bitmap, @done_bitmap and
@@ -357,6 +358,8 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
                            uint64_t perm, uint64_t shared,
                            uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVCopyBeforeWriteState *s = bs->opaque;
+
     if (!(role & BDRV_CHILD_FILTERED)) {
         /*
          * Target child
@@ -381,6 +384,10 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
              * start
              */
             *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+            if (s->discard_source) {
+                *nperm = *nperm | BLK_PERM_WRITE;
+            }
+
             *nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
         }
     }
@@ -470,7 +477,9 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
             ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
              bs->file->bs->supported_zero_flags);
 
-    s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
+    s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;
+    s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
+                                  flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         ret = -EINVAL;
@@ -544,12 +553,14 @@ BlockDriver bdrv_cbw_filter = {
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
+                                  bool discard_source,
                                   BlockCopyState **bcs,
                                   Error **errp)
 {
     BDRVCopyBeforeWriteState *state;
     BlockDriverState *top;
     QDict *opts;
+    int flags = BDRV_O_RDWR | (discard_source ? BDRV_O_CBW_DISCARD_SOURCE : 0);
 
     assert(source->total_sectors == target->total_sectors);
     GLOBAL_STATE_CODE();
@@ -562,7 +573,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
     qdict_put_str(opts, "file", bdrv_get_node_name(source));
     qdict_put_str(opts, "target", bdrv_get_node_name(target));
 
-    top = bdrv_insert_node(source, opts, BDRV_O_RDWR, errp);
+    top = bdrv_insert_node(source, opts, flags, errp);
     if (!top) {
         return NULL;
     }
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 6e72bb25e9..01af0cd3c4 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -39,6 +39,7 @@
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
+                                  bool discard_source,
                                   BlockCopyState **bcs,
                                   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/block/replication.c b/block/replication.c
index ea4bf1aa80..39ad78cf98 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -579,8 +579,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
 
         s->backup_job = backup_job_create(
                                 NULL, s->secondary_disk->bs, s->hidden_disk->bs,
-                                0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
-                                &perf,
+                                0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, false,
+                                NULL, &perf,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
diff --git a/blockdev.c b/blockdev.c
index 7793143d76..ce3fef924c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2802,7 +2802,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
 
     job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
                             backup->sync, bmap, backup->bitmap_mode,
-                            backup->compress,
+                            backup->compress, backup->discard_source,
                             backup->filter_node_name,
                             &perf,
                             backup->on_source_error,
diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..913a8b259c 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -234,6 +234,8 @@ typedef enum {
                                       read-write fails */
 #define BDRV_O_IO_URING    0x40000 /* use io_uring instead of the thread pool */
 
+#define BDRV_O_CBW_DISCARD_SOURCE 0x80000 /* for copy-before-write filter */
+
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
 
 
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 8b41643bfa..bdc703bacd 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -27,6 +27,7 @@ typedef struct BlockCopyCallState BlockCopyCallState;
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      BlockDriverState *copy_bitmap_bs,
                                      const BdrvDirtyBitmap *bitmap,
+                                     bool discard_source,
                                      Error **errp);
 
 /* Function should be called prior any actual copy request */
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index 32f0f9858a..546f2b5532 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -189,7 +189,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                             MirrorSyncMode sync_mode,
                             BdrvDirtyBitmap *sync_bitmap,
                             BitmapSyncMode bitmap_mode,
-                            bool compress,
+                            bool compress, bool discard_source,
                             const char *filter_node_name,
                             BackupPerf *perf,
                             BlockdevOnError on_source_error,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 09de550c95..4297e5beda 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1816,6 +1816,9 @@
 #     node specified by @drive.  If this option is not given, a node
 #     name is autogenerated.  (Since: 4.2)
 #
+# @discard-source: Discard blocks on source which are already copied
+#     to the target.  (Since 9.0)
+#
 # @x-perf: Performance options.  (Since 6.0)
 #
 # Features:
@@ -1837,6 +1840,7 @@
             '*on-target-error': 'BlockdevOnError',
             '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
             '*filter-node-name': 'str',
+            '*discard-source': 'bool',
             '*x-perf': { 'type': 'BackupPerf',
                          'features': [ 'unstable' ] } } }
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu v2 05/21] copy-before-write: allow specifying minimum cluster size
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (3 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 04/21] qapi: blockdev-backup: add discard-source parameter Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 06/21] backup: add minimum cluster size to performance options Fiona Ebner
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.

Copy-before-write operations will use at least this granularity and in
particular, discard requests to the source node will too. If the
granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.

The QAPI uses uint32 so the value will be non-negative, but still fit
into a uint64_t.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 block/block-copy.c         | 17 +++++++++++++----
 block/copy-before-write.c  |  3 ++-
 include/block/block-copy.h |  1 +
 qapi/block-core.json       |  8 +++++++-
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 3c61e52bae..c9a722a5a6 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
 }
 
 static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
+                                                 int64_t min_cluster_size,
                                                  Error **errp)
 {
     int ret;
@@ -330,7 +331,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
                     "used. If the actual block size of the target exceeds "
                     "this default, the backup may be unusable",
                     BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
-        return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+        return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
     } else if (ret < 0 && !target_does_cow) {
         error_setg_errno(errp, -ret,
             "Couldn't determine the cluster size of the target image, "
@@ -340,16 +341,18 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
         return ret;
     } else if (ret < 0 && target_does_cow) {
         /* Not fatal; just trudge on ahead. */
-        return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+        return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
     }
 
-    return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
+    return MAX(min_cluster_size,
+               MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size));
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      BlockDriverState *copy_bitmap_bs,
                                      const BdrvDirtyBitmap *bitmap,
                                      bool discard_source,
+                                     int64_t min_cluster_size,
                                      Error **errp)
 {
     ERRP_GUARD();
@@ -358,7 +361,13 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
     BdrvDirtyBitmap *copy_bitmap;
     bool is_fleecing;
 
-    cluster_size = block_copy_calculate_cluster_size(target->bs, errp);
+    if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
+        error_setg(errp, "min-cluster-size needs to be a power of 2");
+        return NULL;
+    }
+
+    cluster_size = block_copy_calculate_cluster_size(target->bs,
+                                                     min_cluster_size, errp);
     if (cluster_size < 0) {
         return NULL;
     }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 3503702d71..4a8c5bdb62 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -479,7 +479,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
 
     s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;
     s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
-                                  flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
+                                  flags & BDRV_O_CBW_DISCARD_SOURCE,
+                                  opts->min_cluster_size, errp);
     if (!s->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         ret = -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index bdc703bacd..77857c6c68 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      BlockDriverState *copy_bitmap_bs,
                                      const BdrvDirtyBitmap *bitmap,
                                      bool discard_source,
+                                     int64_t min_cluster_size,
                                      Error **errp);
 
 /* Function should be called prior any actual copy request */
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4297e5beda..33e7e3c090 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4825,12 +4825,18 @@
 #     @on-cbw-error parameter will decide how this failure is handled.
 #     Default 0. (Since 7.1)
 #
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+#     operations.  Has to be a power of 2.  No effect if smaller than
+#     the maximum of the target's cluster size and 64 KiB.  Default 0.
+#     (Since 8.1)
+#
 # Since: 6.2
 ##
 { 'struct': 'BlockdevOptionsCbw',
   'base': 'BlockdevOptionsGenericFormat',
   'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap',
-            '*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32' } }
+            '*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32',
+            '*min-cluster-size': 'uint32' } }
 
 ##
 # @BlockdevOptions:
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu v2 06/21] backup: add minimum cluster size to performance options
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (4 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 05/21] copy-before-write: allow specifying minimum cluster size Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option Fiona Ebner
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.

Backup/block-copy will use at least this granularity for copy operations
and in particular, discard requests to the backup source will too. If
the granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 block/backup.c            | 2 +-
 block/copy-before-write.c | 2 ++
 block/copy-before-write.h | 1 +
 blockdev.c                | 3 +++
 qapi/block-core.json      | 9 +++++++--
 5 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 3dc955f625..ac5bd81338 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -430,7 +430,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     }
 
     cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
-                          &bcs, errp);
+                          perf->min_cluster_size, &bcs, errp);
     if (!cbw) {
         goto error;
     }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4a8c5bdb62..9ca5ec5e5c 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -555,6 +555,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
                                   bool discard_source,
+                                  int64_t min_cluster_size,
                                   BlockCopyState **bcs,
                                   Error **errp)
 {
@@ -573,6 +574,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
     }
     qdict_put_str(opts, "file", bdrv_get_node_name(source));
     qdict_put_str(opts, "target", bdrv_get_node_name(target));
+    qdict_put_int(opts, "min-cluster-size", min_cluster_size);
 
     top = bdrv_insert_node(source, opts, flags, errp);
     if (!top) {
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 01af0cd3c4..dc6cafe7fa 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
                                   bool discard_source,
+                                  int64_t min_cluster_size,
                                   BlockCopyState **bcs,
                                   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index ce3fef924c..5ae1dde73c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2729,6 +2729,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
         if (backup->x_perf->has_max_chunk) {
             perf.max_chunk = backup->x_perf->max_chunk;
         }
+        if (backup->x_perf->has_min_cluster_size) {
+            perf.min_cluster_size = backup->x_perf->min_cluster_size;
+        }
     }
 
     if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 33e7e3c090..58fd637e86 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1757,11 +1757,16 @@
 #     it should not be less than job cluster size which is calculated
 #     as maximum of target image cluster size and 64k.  Default 0.
 #
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+#     and background copy operations.  Has to be a power of 2.  No
+#     effect if smaller than the maximum of the target's cluster size
+#     and 64 KiB.  Default 0. (Since 8.1)
+#
 # Since: 6.0
 ##
 { 'struct': 'BackupPerf',
-  'data': { '*use-copy-range': 'bool',
-            '*max-workers': 'int', '*max-chunk': 'int64' } }
+  'data': { '*use-copy-range': 'bool', '*max-workers': 'int',
+            '*max-chunk': 'int64', '*min-cluster-size': 'uint32' } }
 
 ##
 # @BackupCommon:
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (5 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 06/21] backup: add minimum cluster size to performance options Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-04-08 12:45   ` Wolfgang Bumiller
  2024-03-15 10:24 ` [pve-devel] [PATCH common v2 08/21] json schema: add format description for pve-storage-id standard option Fiona Ebner
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

When a fleecing option is given, it is expected that each device has
a corresponding "-fleecing" block device already attached, except for
EFI disk and TPM state, where fleecing is never used.

The following graph was adapted from [0] which also contains more
details about fleecing.

[guest]
   |
   | root
   v                 file
[copy-before-write]<------[snapshot-access]
   |           |
   | file      | target
   v           v
[source] [fleecing]

For fleecing, a copy-before-write filter is inserted on top of the
source node, as well as a snapshot-access node pointing to the filter
node which allows to read the consistent state of the image at the
time it was inserted. New guest writes are passed through the
copy-before-write filter which will first copy over old data to the
fleecing image in case that old data is still needed by the
snapshot-access node.

The backup process will sequentially read from the snapshot access,
which has a bitmap and knows whether to read from the original image
or the fleecing image to get the "snapshot" state, i.e. data from the
source image at the time when the copy-before-write filter was
inserted. After reading, the copied sections are discarded from the
fleecing image to reduce space usage.

All of this can be restricted by an initial dirty bitmap to parts of
the source image that are required for an incremental backup.

For discard to work, it is necessary that the fleecing image does not
have a larger cluster size than the backup job granularity. Since
querying that size does not always work, e.g. for RBD with krbd, the
cluster size will not be reported, a minimum of 4 MiB is used. A job
with PBS target already has at least this granularity, so it's just
relevant for other targets. I.e. edge cases where this minimum is not
enough should be very rare in practice. If ever necessary in the
future, can still add a passed-in value for the backup QMP command to
override.

Additionally, the cbw-timeout and on-cbw-error=break-snapshot options
are set when installing the copy-before-write filter and
snapshot-access. When an error or timeout occurs, the problematic (and
each further) snapshot operation will fail and thus cancel the backup
instead of breaking the guest write.

Note that job_id cannot be inferred from the snapshot-access bs because
it has no parent, so just pass the one from the original bs.

[0]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg876056.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * specify minimum cluster size for backup job

 block/monitor/block-hmp-cmds.c |   1 +
 pve-backup.c                   | 143 ++++++++++++++++++++++++++++++++-
 qapi/block-core.json           |   8 +-
 3 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 6efe28cef5..ca29cc4281 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1064,6 +1064,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
         NULL, NULL,
         devlist, qdict_haskey(qdict, "speed"), speed,
         false, 0, // BackupPerf max-workers
+        false, false, // fleecing
         &error);
 
     hmp_handle_error(mon, error);
diff --git a/pve-backup.c b/pve-backup.c
index e6b17b797e..00aaff6509 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -7,8 +7,10 @@
 #include "sysemu/blockdev.h"
 #include "block/block_int-global-state.h"
 #include "block/blockjob.h"
+#include "block/copy-before-write.h"
 #include "block/dirty-bitmap.h"
 #include "qapi/qapi-commands-block.h"
+#include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/cutils.h"
 
@@ -80,8 +82,15 @@ static void pvebackup_init(void)
 // initialize PVEBackupState at startup
 opts_init(pvebackup_init);
 
+typedef struct PVEBackupFleecingInfo {
+    BlockDriverState *bs;
+    BlockDriverState *cbw;
+    BlockDriverState *snapshot_access;
+} PVEBackupFleecingInfo;
+
 typedef struct PVEBackupDevInfo {
     BlockDriverState *bs;
+    PVEBackupFleecingInfo fleecing;
     size_t size;
     uint64_t block_size;
     uint8_t dev_id;
@@ -361,6 +370,25 @@ static void pvebackup_complete_cb(void *opaque, int ret)
     PVEBackupDevInfo *di = opaque;
     di->completed_ret = ret;
 
+    /*
+     * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
+     * won't be done as a coroutine anyways:
+     * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
+     *   just spawn a BH calling bdrv_unref().
+     * - For cbw, draining would need to spawn a BH.
+     *
+     * Note that the AioContext lock is already acquired by our caller, i.e.
+     * job_finalize_single_locked()
+     */
+    if (di->fleecing.snapshot_access) {
+        bdrv_unref(di->fleecing.snapshot_access);
+        di->fleecing.snapshot_access = NULL;
+    }
+    if (di->fleecing.cbw) {
+        bdrv_cbw_drop(di->fleecing.cbw);
+        di->fleecing.cbw = NULL;
+    }
+
     /*
      * Schedule stream cleanup in async coroutine. close_image and finish might
      * take a while, so we can't block on them here. This way it also doesn't
@@ -521,9 +549,82 @@ static void create_backup_jobs_bh(void *opaque) {
 
         bdrv_drained_begin(di->bs);
 
+        BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+
+        BlockDriverState *source_bs = di->bs;
+        bool discard_source = false;
+        const char *job_id = bdrv_get_device_name(di->bs);
+        if (di->fleecing.bs) {
+            QDict *cbw_opts = qdict_new();
+            qdict_put_str(cbw_opts, "driver", "copy-before-write");
+            qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
+            qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
+
+            if (di->bitmap) {
+                /*
+                 * Only guest writes to parts relevant for the backup need to be intercepted with
+                 * old data being copied to the fleecing image.
+                 */
+                qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
+                qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap));
+            }
+            /*
+             * Fleecing storage is supposed to be fast and it's better to break backup than guest
+             * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so
+             * abort a bit before that.
+             */
+            qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
+            qdict_put_int(cbw_opts, "cbw-timeout", 45);
+
+            di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err);
+
+            if (!di->fleecing.cbw) {
+                error_setg(errp, "appending cbw node for fleecing failed: %s",
+                           local_err ? error_get_pretty(local_err) : "unknown error");
+                break;
+            }
+
+            QDict *snapshot_access_opts = qdict_new();
+            qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
+            qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw));
+
+            /*
+             * Holding the AioContext lock here would cause a deadlock, because bdrv_open_driver()
+             * will aquire it a second time. But it's allowed to be held exactly once when polling
+             * and that happens when the bdrv_refresh_total_sectors() call is made there.
+             */
+            aio_context_release(aio_context);
+            di->fleecing.snapshot_access =
+                bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
+            aio_context_acquire(aio_context);
+            if (!di->fleecing.snapshot_access) {
+                error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+                           local_err ? error_get_pretty(local_err) : "unknown error");
+                break;
+            }
+            source_bs = di->fleecing.snapshot_access;
+            discard_source = true;
+
+            /*
+             * bdrv_get_info() just retuns 0 (= doesn't matter) for RBD when using krbd. But discard
+             * on the fleecing image won't work if the backup job's granularity is less than the RBD
+             * object size (default 4 MiB), so it does matter. Always use at least 4 MiB. With a PBS
+             * target, the backup job granularity would already be at least this much.
+             */
+            perf.min_cluster_size = 4 * 1024 * 1024;
+            /*
+             * For discard to work, cluster size for the backup job must be at least the same as for
+             * the fleecing image.
+             */
+            BlockDriverInfo bdi;
+            if (bdrv_get_info(di->fleecing.bs, &bdi) >= 0) {
+                perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
+            }
+        }
+
         BlockJob *job = backup_job_create(
-            NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
-            bitmap_mode, false, NULL, &backup_state.perf, BLOCKDEV_ON_ERROR_REPORT,
+            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+            bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
             BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
             &local_err);
 
@@ -581,6 +682,14 @@ static void create_backup_jobs_bh(void *opaque) {
     aio_co_enter(data->ctx, data->co);
 }
 
+/*
+ * EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
+ */
+static bool device_uses_fleecing(const char *device_id)
+{
+    return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
+}
+
 /*
  * Returns a list of device infos, which needs to be freed by the caller. In
  * case of an error, errp will be set, but the returned value might still be a
@@ -588,6 +697,7 @@ static void create_backup_jobs_bh(void *opaque) {
  */
 static GList coroutine_fn *get_device_info(
     const char *devlist,
+    bool fleecing,
     Error **errp)
 {
     gchar **devs = NULL;
@@ -611,6 +721,31 @@ static GList coroutine_fn *get_device_info(
             }
             PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
             di->bs = bs;
+
+            if (fleecing && device_uses_fleecing(*d)) {
+                g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
+                BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
+                if (!fleecing_blk) {
+                    error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+                              "Device '%s' not found", fleecing_devid);
+                    goto err;
+                }
+                BlockDriverState *fleecing_bs = blk_bs(fleecing_blk);
+                if (!bdrv_co_is_inserted(fleecing_bs)) {
+                    error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, fleecing_devid);
+                    goto err;
+                }
+                /*
+                 * Fleecing image needs to be the same size to act as a cbw target.
+                 */
+                if (bs->total_sectors != fleecing_bs->total_sectors) {
+                    error_setg(errp, "Size mismatch for '%s' - sector count %ld != %ld",
+                               fleecing_devid, fleecing_bs->total_sectors, bs->total_sectors);
+                    goto err;
+                }
+                di->fleecing.bs = fleecing_bs;
+            }
+
             di_list = g_list_append(di_list, di);
             d++;
         }
@@ -660,6 +795,7 @@ UuidInfo coroutine_fn *qmp_backup(
     const char *devlist,
     bool has_speed, int64_t speed,
     bool has_max_workers, int64_t max_workers,
+    bool has_fleecing, bool fleecing,
     Error **errp)
 {
     assert(qemu_in_coroutine());
@@ -687,7 +823,7 @@ UuidInfo coroutine_fn *qmp_backup(
     /* Todo: try to auto-detect format based on file name */
     format = has_format ? format : BACKUP_FORMAT_VMA;
 
-    di_list = get_device_info(devlist, &local_err);
+    di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto err;
@@ -1086,5 +1222,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
     ret->query_bitmap_info = true;
     ret->pbs_masterkey = true;
     ret->backup_max_workers = true;
+    ret->backup_fleecing = true;
     return ret;
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 58fd637e86..0bc5f42677 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -933,6 +933,10 @@
 #
 # @max-workers: see @BackupPerf for details. Default 16.
 #
+# @fleecing: perform a backup with fleecing. For each device in @devlist, a
+#            corresponing '-fleecing' device with the same size already needs to
+#            be present.
+#
 # Returns: the uuid of the backup job
 #
 ##
@@ -953,7 +957,8 @@
                                     '*firewall-file': 'str',
                                     '*devlist': 'str',
                                     '*speed': 'int',
-                                    '*max-workers': 'int' },
+                                    '*max-workers': 'int',
+                                    '*fleecing': 'bool' },
   'returns': 'UuidInfo', 'coroutine': true }
 
 ##
@@ -1009,6 +1014,7 @@
             'pbs-dirty-bitmap-migration': 'bool',
             'pbs-masterkey': 'bool',
             'pbs-library-version': 'str',
+            'backup-fleecing': 'bool',
             'backup-max-workers': 'bool' } }
 
 ##
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH common v2 08/21] json schema: add format description for pve-storage-id standard option
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (6 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH guest-common v2 09/21] vzdump: schema: add fleecing property string Fiona Ebner
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

so that the option can be used as part of a property string.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 src/PVE/JSONSchema.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
index 7be0595..a74bbe2 100644
--- a/src/PVE/JSONSchema.pm
+++ b/src/PVE/JSONSchema.pm
@@ -84,6 +84,7 @@ register_standard_option('pve-iface', {
 register_standard_option('pve-storage-id', {
     description => "The storage identifier.",
     type => 'string', format => 'pve-storage-id',
+    format_description => 'storage ID',
 });
 
 register_standard_option('pve-bridge-id', {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH guest-common v2 09/21] vzdump: schema: add fleecing property string
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (7 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH common v2 08/21] json schema: add format description for pve-storage-id standard option Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH guest-common v2 10/21] vzdump: schema: make storage for fleecing semi-optional Fiona Ebner
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

It's a property string, because that avoids having an implicit
"enabled" as part of a 'fleecing-storage' property. And there likely
will be more options in the future, e.g. threshold/limit for the
fleecing image size.

Storage is non-optional, so the storage choice needs to be a conscious
decision. Can allow for a default later, when a good choice can be
made further down the stack. The original idea with "same storage as
VM disk" is not great, because e.g. for LVM, it would require the same
size as the disk up front.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * Make storage non-optional.

 src/PVE/VZDump/Common.pm | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/PVE/VZDump/Common.pm b/src/PVE/VZDump/Common.pm
index 5be87a0..30bd044 100644
--- a/src/PVE/VZDump/Common.pm
+++ b/src/PVE/VZDump/Common.pm
@@ -33,6 +33,7 @@ my $dowhash_to_dow = sub {
 };
 
 our $PROPERTY_STRINGS = {
+    'fleecing' => 'backup-fleecing',
     'performance' => 'backup-performance',
     'prune-backups' => 'prune-backups',
 };
@@ -79,6 +80,24 @@ sub parse_dow {
     return $res;
 };
 
+PVE::JSONSchema::register_format('backup-fleecing', {
+    enabled => {
+	description => "Enable backup fleecing. Cache backup data from blocks where new guest "
+	    ."writes happen on specified storage instead of copying them directly to the backup "
+	    ."target. This can help guest IO performance and even prevent hangs, at the cost of "
+	    ."requiring more storage space.",
+	type => 'boolean',
+	default => 0,
+	optional => 1,
+	default_key => 1,
+    },
+    storage => get_standard_option('pve-storage-id', {
+	description => "Use this storage to storage fleecing images. For efficient space usage, "
+	    ."it's best to use a local storage that supports discard and either thin provisioning "
+	    ."or sparse files.",
+    }),
+});
+
 PVE::JSONSchema::register_format('backup-performance', {
     'max-workers' => {
 	description => "Applies to VMs. Allow up to this many IO workers at the same time.",
@@ -262,6 +281,12 @@ my $confdesc = {
 	format => 'backup-performance',
 	optional => 1,
     },
+    fleecing => {
+	type => 'string',
+	description => "Options for backup fleecing (VM only).",
+	format => 'backup-fleecing',
+	optional => 1,
+    },
     lockwait => {
 	type => 'integer',
 	description => "Maximal time to wait for the global lock (minutes).",
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH guest-common v2 10/21] vzdump: schema: make storage for fleecing semi-optional
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (8 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH guest-common v2 09/21] vzdump: schema: add fleecing property string Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [RFC guest-common v2 11/21] abstract config: do not copy fleecing images entry for snapshot Fiona Ebner
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

so it doesn't need to be set when explicitly disabling fleecing. Needs
a custom verifier to enforce it being set when enabled.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 src/PVE/VZDump/Common.pm | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/PVE/VZDump/Common.pm b/src/PVE/VZDump/Common.pm
index 30bd044..f7cdb66 100644
--- a/src/PVE/VZDump/Common.pm
+++ b/src/PVE/VZDump/Common.pm
@@ -95,8 +95,20 @@ PVE::JSONSchema::register_format('backup-fleecing', {
 	description => "Use this storage to storage fleecing images. For efficient space usage, "
 	    ."it's best to use a local storage that supports discard and either thin provisioning "
 	    ."or sparse files.",
+	optional => 1,
     }),
-});
+}, \&verify_backup_fleecing);
+
+sub verify_backup_fleecing {
+    my ($param, $noerr) = @_;
+
+    if (!$param->{storage} && $param->{enabled}) {
+	return if $noerr;
+	die "'storage' parameter is required when 'enabled' is set\n";
+    }
+
+    return $param;
+}
 
 PVE::JSONSchema::register_format('backup-performance', {
     'max-workers' => {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [RFC guest-common v2 11/21] abstract config: do not copy fleecing images entry for snapshot
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (9 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH guest-common v2 10/21] vzdump: schema: make storage for fleecing semi-optional Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH manager v2 12/21] vzdump: handle new 'fleecing' property string Fiona Ebner
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

RFC because belongs to the optional "improved cleanup"-part of the
series. See qemu-server patches for more info.

New in v2.

 src/PVE/AbstractConfig.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index a286b13..a2ed142 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -684,6 +684,7 @@ sub __snapshot_copy_config {
 	next if $k eq 'lock';
 	next if $k eq 'digest';
 	next if $k eq 'description';
+	next if $k eq 'fleecing-images';
 	next if $k =~ m/^unused\d+$/;
 
 	$dest->{$k} = $source->{$k};
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH manager v2 12/21] vzdump: handle new 'fleecing' property string
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (10 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [RFC guest-common v2 11/21] abstract config: do not copy fleecing images entry for snapshot Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage Fiona Ebner
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes in v2.

 PVE/VZDump.pm | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index 152eb3e5..74eb0c83 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -130,6 +130,15 @@ my $generate_notes = sub {
     return $notes_template;
 };
 
+my sub parse_fleecing {
+    my ($param) = @_;
+
+    if (defined(my $fleecing = $param->{fleecing})) {
+	return if ref($fleecing) eq 'HASH'; # already parsed
+	$param->{fleecing} = PVE::JSONSchema::parse_property_string('backup-fleecing', $fleecing);
+    }
+}
+
 my sub parse_performance {
     my ($param) = @_;
 
@@ -278,6 +287,7 @@ sub read_vzdump_defaults {
 	} keys %$confdesc_for_defaults
     };
     $parse_prune_backups_maxfiles->($defaults, "defaults in VZDump schema");
+    parse_fleecing($defaults);
     parse_performance($defaults);
 
     my $raw;
@@ -304,6 +314,7 @@ sub read_vzdump_defaults {
 	$res->{mailto} = [ @mailto ];
     }
     $parse_prune_backups_maxfiles->($res, "options in '$fn'");
+    parse_fleecing($res);
     parse_performance($res);
 
     foreach my $key (keys %$defaults) {
@@ -1453,6 +1464,7 @@ sub verify_vzdump_parameters {
 	if defined($param->{'prune-backups'}) && defined($param->{maxfiles});
 
     $parse_prune_backups_maxfiles->($param, 'CLI parameters');
+    parse_fleecing($param);
     parse_performance($param);
 
     if (my $template = $param->{'notes-template'}) {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (11 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH manager v2 12/21] vzdump: handle new 'fleecing' property string Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-04-08  8:47   ` Wolfgang Bumiller
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu-server v2 14/21] backup: disk info: also keep track of size Fiona Ebner
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

Similar to how Datastore.AllocateSpace is required for the backup
storage, it should also be required for the fleecing storage.

Removing a fleecing storage from a job does not require more
permissions than for modifying the job.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 PVE/API2/Backup.pm | 10 ++++++++--
 PVE/API2/VZDump.pm |  9 +++++----
 PVE/VZDump.pm      |  2 +-
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/PVE/API2/Backup.pm b/PVE/API2/Backup.pm
index 70753c2e..86f7dbdd 100644
--- a/PVE/API2/Backup.pm
+++ b/PVE/API2/Backup.pm
@@ -42,7 +42,7 @@ my $vzdump_job_id_prop = {
 
 # NOTE: also used by the vzdump API call.
 sub assert_param_permission_common {
-    my ($rpcenv, $user, $param) = @_;
+    my ($rpcenv, $user, $param, $is_delete) = @_;
     return if $user eq 'root@pam'; # always OK
 
     for my $key (qw(tmpdir dumpdir script)) {
@@ -52,6 +52,12 @@ sub assert_param_permission_common {
     if (grep { defined($param->{$_}) } qw(bwlimit ionice performance)) {
 	$rpcenv->check($user, "/", [ 'Sys.Modify' ]);
     }
+
+    if ($param->{fleecing} && !$is_delete) {
+	my $fleecing = PVE::VZDump::parse_fleecing($param);
+	$rpcenv->check($user, "/storage/$fleecing->{storage}", [ 'Datastore.AllocateSpace' ])
+	    if $fleecing->{storage};
+    }
 }
 
 my sub assert_param_permission_create {
@@ -70,7 +76,7 @@ my sub assert_param_permission_update {
     return if $user eq 'root@pam'; # always OK
 
     assert_param_permission_common($rpcenv, $user, $update);
-    assert_param_permission_common($rpcenv, $user, $delete);
+    assert_param_permission_common($rpcenv, $user, $delete, 1);
 
     if ($update->{storage}) {
 	$rpcenv->check($user, "/storage/$update->{storage}", [ 'Datastore.Allocate' ])
diff --git a/PVE/API2/VZDump.pm b/PVE/API2/VZDump.pm
index f66fc740..7f92e7ec 100644
--- a/PVE/API2/VZDump.pm
+++ b/PVE/API2/VZDump.pm
@@ -41,10 +41,11 @@ __PACKAGE__->register_method ({
     description => "Create backup.",
     permissions => {
 	description => "The user needs 'VM.Backup' permissions on any VM, and "
-	    ."'Datastore.AllocateSpace' on the backup storage. The 'tmpdir', 'dumpdir' and "
-	    ."'script' parameters are restricted to the 'root\@pam' user. The 'maxfiles' and "
-	    ."'prune-backups' settings require 'Datastore.Allocate' on the backup storage. The "
-	    ."'bwlimit', 'performance' and 'ionice' parameters require 'Sys.Modify' on '/'. ",
+	    ."'Datastore.AllocateSpace' on the backup storage (and fleecing storage when fleecing "
+	    ."is used). The 'tmpdir', 'dumpdir' and 'script' parameters are restricted to the "
+	    ."'root\@pam' user. The 'maxfiles' and 'prune-backups' settings require "
+	    ."'Datastore.Allocate' on the backup storage. The 'bwlimit', 'performance' and "
+	    ."'ionice' parameters require 'Sys.Modify' on '/'.",
 	user => 'all',
     },
     protected => 1,
diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index 74eb0c83..88149d68 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -130,7 +130,7 @@ my $generate_notes = sub {
     return $notes_template;
 };
 
-my sub parse_fleecing {
+sub parse_fleecing {
     my ($param) = @_;
 
     if (defined(my $fleecing = $param->{fleecing})) {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu-server v2 14/21] backup: disk info: also keep track of size
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (12 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu-server v2 15/21] backup: implement fleecing option Fiona Ebner
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

which will be needed to allocate fleecing images.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes in v2.

 PVE/VZDump/QemuServer.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index be7d8e1e..51498dbc 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -140,6 +140,7 @@ sub prepare {
 	    path => $path,
 	    volid => $volid,
 	    storeid => $storeid,
+	    size => $size,
 	    format => $format,
 	    virtdev => $ds,
 	    qmdevice => "drive-$ds",
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH qemu-server v2 15/21] backup: implement fleecing option
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (13 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu-server v2 14/21] backup: disk info: also keep track of size Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 16/21] parse config: allow config keys with minus sign Fiona Ebner
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

Management for fleecing images is implemented here. If the fleecing
option is set, for each disk (except EFI disk and TPM state) a new
fleecing image is allocated on the configured fleecing storage (same
storage as original disk by default). The disk is attached to QEMU
with the 'size' parameter, because the block node in QEMU has to be
the exact same size and the newly allocated image might be bigger if
the storage has a coarser allocation or rounded up. After backup, the
disks are detached and removed from the storage.

If the storage supports qcow2, use that as the fleecing image format.
This allows saving some space even on storages that do not properly
support discard, like, for example, older versions of NFS.

Since there can be multiple volumes with the same volume name on
different storages, the fleecing image's name cannot be just based on
the original volume's name. The schema vm-ID-fleece-N(.FORMAT) with N
incrementing for each disk is used.

Partially inspired by the existing handling of the TPM state image
during backup.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * fleecing storage is now non-optional, so expect it to be set
    * use qcow2 as fleecing format if storage supports it
    * use vm-ID-fleece-N naming schema for fleecing disks

 PVE/VZDump/QemuServer.pm | 144 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 142 insertions(+), 2 deletions(-)

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 51498dbc..8c97ee62 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -26,6 +26,7 @@ use PVE::Format qw(render_duration render_bytes);
 
 use PVE::QemuConfig;
 use PVE::QemuServer;
+use PVE::QemuServer::Helpers;
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 
@@ -525,6 +526,121 @@ sub get_and_check_pbs_encryption_config {
     die "internal error - unhandled case for getting & checking PBS encryption ($keyfile, $master_keyfile)!";
 }
 
+my sub cleanup_fleecing_images {
+    my ($self, $disks) = @_;
+
+    for my $di ($disks->@*) {
+	if (my $volid = $di->{'fleece-volid'}) {
+	    eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); };
+	    $self->log('warn', "error removing fleecing image '$volid' - $@") if $@;
+	}
+    }
+}
+
+my sub allocate_fleecing_images {
+    my ($self, $disks, $vmid, $fleecing_storeid, $format) = @_;
+
+    die "internal error - no fleecing storage specified\n" if !$fleecing_storeid;
+
+    # TODO what about potential left-over images from a failed attempt? Just
+    # auto-remove? While unlikely, could conflict with manually created image from user...
+
+    eval {
+	my $n = 0; # counter for fleecing image names
+
+	for my $di ($disks->@*) {
+	    next if $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; # too small to be worth it
+	    if ($di->{type} eq 'block' || $di->{type} eq 'file') {
+		my $scfg = PVE::Storage::storage_config($self->{storecfg}, $fleecing_storeid);
+		my $name = "vm-$vmid-fleece-$n";
+		$name .= ".$format" if $scfg->{path};
+
+		my $size = PVE::Tools::convert_size($di->{size}, 'b' => 'kb');
+
+		$di->{'fleece-volid'} = PVE::Storage::vdisk_alloc(
+		    $self->{storecfg}, $fleecing_storeid, $vmid, $format, $name, $size);
+
+		$n++;
+	    } else {
+		die "implement me (type '$di->{type}')";
+	    }
+	}
+    };
+    if (my $err = $@) {
+	cleanup_fleecing_images($self, $disks);
+	die $err;
+    }
+}
+
+my sub detach_fleecing_images {
+    my ($disks, $vmid) = @_;
+
+    return if !PVE::QemuServer::Helpers::vm_running_locally($vmid);
+
+    for my $di ($disks->@*) {
+	if (my $volid = $di->{'fleece-volid'}) {
+	    my $devid = "$di->{qmdevice}-fleecing";
+	    $devid =~ s/^drive-//; # re-added by qemu_drivedel()
+	    eval { PVE::QemuServer::qemu_drivedel($vmid, $devid) };
+	}
+    }
+}
+
+my sub attach_fleecing_images {
+    my ($self, $disks, $vmid, $format) = @_;
+
+    # unconditionally try to remove potential left-overs from a previous backup
+    detach_fleecing_images($disks, $vmid);
+
+    my $vollist = [ map { $_->{'fleece-volid'} } grep { $_->{'fleece-volid'} } $disks->@* ];
+    PVE::Storage::activate_volumes($self->{storecfg}, $vollist);
+
+    for my $di ($disks->@*) {
+	if (my $volid = $di->{'fleece-volid'}) {
+	    $self->loginfo("$di->{qmdevice}: attaching fleecing image $volid to QEMU");
+
+	    my $path = PVE::Storage::path($self->{storecfg}, $volid);
+	    my $devid = "$di->{qmdevice}-fleecing";
+	    my $drive = "file=$path,if=none,id=$devid,format=$format,discard=unmap";
+	    # Specify size explicitly, to make it work if storage backend rounded up size for
+	    # fleecing image when allocating.
+	    $drive .= ",size=$di->{size}" if $format eq 'raw';
+	    $drive =~ s/\\/\\\\/g;
+	    my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"");
+	    die "attaching fleecing image $volid failed - $ret\n" if $ret !~ m/OK/s;
+	}
+    }
+}
+
+my sub check_and_prepare_fleecing {
+    my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = @_;
+
+    # Even if the VM was started specifically for fleecing, it's possible that the VM is resumed and
+    # then starts doing IO. For VMs that are not resumed the fleecing images will just stay empty,
+    # so there is no big cost.
+
+    my $use_fleecing = $fleecing_opts && $fleecing_opts->{enabled} && !$is_template;
+
+    if ($use_fleecing && !defined($qemu_support->{'backup-fleecing'})) {
+	$self->log(
+	    'warn',
+	    "running QEMU version does not support backup fleecing - continuing without",
+	);
+	$use_fleecing = 0;
+    }
+
+    if ($use_fleecing) {
+	my ($default_format, $valid_formats) = PVE::Storage::storage_default_format(
+	    $self->{storecfg}, $fleecing_opts->{storage});
+	my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 'qcow2' : 'raw';
+
+	allocate_fleecing_images($self, $disks, $vmid, $fleecing_opts->{storage}, $format);
+	attach_fleecing_images($self, $disks, $vmid, $format);
+    }
+
+    return $use_fleecing;
+}
+
 sub archive_pbs {
     my ($self, $task, $vmid) = @_;
 
@@ -578,6 +694,7 @@ sub archive_pbs {
 
     # get list early so we die on unkown drive types before doing anything
     my $devlist = _get_task_devlist($task);
+    my $use_fleecing;
 
     $self->enforce_vm_running_for_backup($vmid);
     $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -606,6 +723,11 @@ sub archive_pbs {
 
 	$attach_tpmstate_drive->($self, $task, $vmid);
 
+	my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
+
+	$use_fleecing = check_and_prepare_fleecing(
+	    $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support);
+
 	my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
 
 	my $params = {
@@ -617,6 +739,8 @@ sub archive_pbs {
 	    devlist => $devlist,
 	    'config-file' => $conffile,
 	};
+	$params->{fleecing} = JSON::true if $use_fleecing;
+
 	if (defined(my $ns = $scfg->{namespace})) {
 	    $params->{'backup-ns'} = $ns;
 	}
@@ -633,7 +757,6 @@ sub archive_pbs {
 	    $params->{"master-keyfile"} = $master_keyfile if defined($master_keyfile);
 	}
 
-	my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
 	$params->{'use-dirty-bitmap'} = JSON::true
 	    if $qemu_support->{'pbs-dirty-bitmap'} && !$is_template;
 
@@ -665,6 +788,11 @@ sub archive_pbs {
     }
     $self->restore_vm_power_state($vmid);
 
+    if ($use_fleecing) {
+	detach_fleecing_images($task->{disks}, $vmid);
+	cleanup_fleecing_images($self, $task->{disks});
+    }
+
     die $err if $err;
 }
 
@@ -724,8 +852,10 @@ sub archive_vma {
 	$speed = $opts->{bwlimit}*1024;
     }
 
+    my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
+
     my $diskcount = scalar(@{$task->{disks}});
-    if (PVE::QemuConfig->is_template($self->{vmlist}->{$vmid}) || !$diskcount) {
+    if ($is_template || !$diskcount) {
 	my @pathlist;
 	foreach my $di (@{$task->{disks}}) {
 	    if ($di->{type} eq 'block' || $di->{type} eq 'file') {
@@ -765,6 +895,7 @@ sub archive_vma {
     }
 
     my $devlist = _get_task_devlist($task);
+    my $use_fleecing;
 
     $self->enforce_vm_running_for_backup($vmid);
     $self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -784,6 +915,9 @@ sub archive_vma {
 
 	$attach_tpmstate_drive->($self, $task, $vmid);
 
+	$use_fleecing = check_and_prepare_fleecing(
+	    $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support);
+
 	my $outfh;
 	if ($opts->{stdout}) {
 	    $outfh = $opts->{stdout};
@@ -812,6 +946,7 @@ sub archive_vma {
 		devlist => $devlist
 	    };
 	    $params->{'firewall-file'} = $firewall if -e $firewall;
+	    $params->{fleecing} = JSON::true if $use_fleecing;
 	    add_backup_performance_options($params, $opts->{performance}, $qemu_support);
 
 	    $qmpclient->queue_cmd($vmid, $backup_cb, 'backup', %$params);
@@ -853,6 +988,11 @@ sub archive_vma {
 
     $self->restore_vm_power_state($vmid);
 
+    if ($use_fleecing) {
+	detach_fleecing_images($task->{disks}, $vmid);
+	cleanup_fleecing_images($self, $task->{disks});
+    }
+
     if ($err) {
 	if ($cpid) {
 	    kill(9, $cpid);
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [RFC qemu-server v2 16/21] parse config: allow config keys with minus sign
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (14 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu-server v2 15/21] backup: implement fleecing option Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 17/21] schema: add fleecing-images config property Fiona Ebner
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

In preparation for the upcoming 'fleecing-images' key. To avoid mixing
of options with - and options with _, which is not very user-friendly,
it would be nice to add aliases for existing options with _. And
long-term, backup restore handlers could switch to the modern keys
with -.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

RFC, because this and the rest of the qemu-server patches are for
the optional improved cleanup.

New in v2.

 PVE/QemuServer.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index ed8b054e..e3519e4c 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2385,7 +2385,7 @@ sub parse_vm_config {
 	    } else {
 		$handle_error->("vm $vmid - property 'delete' is only allowed in [PENDING]\n");
 	    }
-	} elsif ($line =~ m/^([a-z][a-z_]*\d*):\s*(.+?)\s*$/) {
+	} elsif ($line =~ m/^([a-z][a-z_\-]*\d*):\s*(.+?)\s*$/) {
 	    my $key = $1;
 	    my $value = $2;
 	    if ($section eq 'cloudinit') {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [RFC qemu-server v2 17/21] schema: add fleecing-images config property
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (15 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 16/21] parse config: allow config keys with minus sign Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 18/21] vzdump: better cleanup fleecing images after hard errors Fiona Ebner
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

to be used internally to record volume IDs of fleecing images
allocated during backup.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 PVE/API2/Qemu.pm         | 9 +++++++++
 PVE/QemuServer.pm        | 7 +++++++
 PVE/VZDump/QemuServer.pm | 1 +
 3 files changed, 17 insertions(+)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 40b6c302..977eaadf 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -898,6 +898,9 @@ __PACKAGE__->register_method({
 	$param->{cpuunits} = PVE::CGroup::clamp_cpu_shares($param->{cpuunits})
 	    if defined($param->{cpuunits}); # clamp value depending on cgroup version
 
+	raise_param_exc({ 'fleecing-images' => "Cannot set option - for internal use only." })
+	    if $param->{'fleecing-images'};
+
 	PVE::Cluster::check_cfs_quorum();
 
 	my $filename = PVE::QemuConfig->config_file($vmid);
@@ -1591,6 +1594,9 @@ my $update_vm_api  = sub {
 	push @paramarr, "-$key", $value;
     }
 
+    raise_param_exc({ 'fleecing-images' => "Cannot set option - for internal use only." })
+	if $param->{'fleecing-images'};
+
     my $skiplock = extract_param($param, 'skiplock');
     raise_param_exc({ skiplock => "Only root may use this option." })
 	if $skiplock && $authuser ne 'root@pam';
@@ -3673,6 +3679,9 @@ __PACKAGE__->register_method({
 		next if $opt eq 'snapshots' ||  $opt eq 'parent' || $opt eq 'snaptime' ||
 		    $opt eq 'vmstate' || $opt eq 'snapstate';
 
+		# left-overs, not cloned
+		next if $opt eq 'fleecing-images';
+
 		# no need to copy unused images, because VMID(owner) changes anyways
 		next if $opt =~ m/^unused\d+$/;
 
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index e3519e4c..91e34166 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -736,6 +736,13 @@ EODESCR
 	description => "List of host cores used to execute guest processes, for example: 0,5,8-11",
 	optional => 1,
     },
+    'fleecing-images' => {
+	type => 'string',
+	format => 'pve-volume-id-list',
+	description => "For internal use only. List of fleecing images allocated during backup."
+	    ." If no backup is running, these are left-overs that failed to be removed.",
+	optional => 1,
+    },
 };
 
 my $cicustom_fmt = {
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 8c97ee62..8a44a1fd 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -249,6 +249,7 @@ sub assemble {
 	    next;
 	}
 	next if $line =~ m/^lock:/ || $line =~ m/^parent:/;
+	next if $line =~ m/^fleecing-images:/;
 
 	print $outfd $line;
     }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [RFC qemu-server v2 18/21] vzdump: better cleanup fleecing images after hard errors
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (16 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 17/21] schema: add fleecing-images config property Fiona Ebner
@ 2024-03-15 10:24 ` Fiona Ebner
  2024-03-15 10:25 ` [pve-devel] [RFC qemu-server v2 19/21] migration: attempt to clean up potential left-over fleecing images Fiona Ebner
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:24 UTC (permalink / raw)
  To: pve-devel

By recording the allocated fleecing images in the VM config, they
are not immediately orphaned, should a hard error occur during
backup that prevents cleanup.

They are attempted to be cleaned up during the next backup run.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 PVE/QemuConfig.pm        | 40 ++++++++++++++++++++++++++++++++++++++++
 PVE/VZDump/QemuServer.pm | 31 ++++++++++++++++++++++++-------
 2 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index ca30eda0..fcfce1fb 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -13,6 +13,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer;
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Memory qw(get_current_memory);
+use PVE::RESTEnvironment qw(log_warn);
 use PVE::Storage;
 use PVE::Tools;
 use PVE::Format qw(render_bytes render_duration);
@@ -572,4 +573,43 @@ sub has_cloudinit {
     return $found;
 }
 
+# Caller is expected to deal with volumes from an already exisitng entry first.
+sub record_fleecing_images {
+    my ($vmid, $volids) = @_;
+
+    return if scalar($volids->@*) == 0;
+
+    PVE::QemuConfig->lock_config($vmid, sub {
+	my $conf = PVE::QemuConfig->load_config($vmid);
+	$conf->{'fleecing-images'} = join(',', $volids->@*);
+	PVE::QemuConfig->write_config($vmid, $conf);
+    });
+}
+
+sub cleanup_fleecing_images {
+    my ($vmid, $storecfg) = @_;
+
+    my $volids = [];
+    my $failed = [];
+
+    PVE::QemuConfig->lock_config($vmid, sub {
+	my $conf = PVE::QemuConfig->load_config($vmid);
+	if ($conf->{'fleecing-images'}) {
+	    $volids = [PVE::Tools::split_list($conf->{'fleecing-images'})];
+	    delete $conf->{'fleecing-images'};
+	    PVE::QemuConfig->write_config($vmid, $conf);
+	}
+    });
+
+    for my $volid ($volids->@*) {
+	eval { PVE::Storage::vdisk_free($storecfg, $volid); };
+	if (my $err = $@) {
+	    log_warn("error removing fleecing image '$volid' - $err");
+	    push $failed->@*, $volid;
+	}
+    }
+
+    record_fleecing_images($vmid, $failed);
+}
+
 1;
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 8a44a1fd..ad0212bc 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -527,15 +527,25 @@ sub get_and_check_pbs_encryption_config {
     die "internal error - unhandled case for getting & checking PBS encryption ($keyfile, $master_keyfile)!";
 }
 
+# Helper is intended to be called from allocate_fleecing_images() only. Otherwise, fleecing volids
+# have already been recorded in the configuration and PVE::QemuConfig::cleanup_fleecing_images()
+# should be used instead.
 my sub cleanup_fleecing_images {
-    my ($self, $disks) = @_;
+    my ($self, $vmid, $disks) = @_;
+
+    my $failed = [];
 
     for my $di ($disks->@*) {
 	if (my $volid = $di->{'fleece-volid'}) {
 	    eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); };
-	    $self->log('warn', "error removing fleecing image '$volid' - $@") if $@;
+	    if (my $err = $@) {
+		$self->log('warn', "error removing fleecing image '$volid' - $err");
+		push $failed->@*, $volid;
+	    }
 	}
     }
+
+    PVE::QemuConfig::record_fleecing_images($vmid, $failed);
 }
 
 my sub allocate_fleecing_images {
@@ -543,8 +553,7 @@ my sub allocate_fleecing_images {
 
     die "internal error - no fleecing storage specified\n" if !$fleecing_storeid;
 
-    # TODO what about potential left-over images from a failed attempt? Just
-    # auto-remove? While unlikely, could conflict with manually created image from user...
+    my $fleece_volids = [];
 
     eval {
 	my $n = 0; # counter for fleecing image names
@@ -561,6 +570,8 @@ my sub allocate_fleecing_images {
 		$di->{'fleece-volid'} = PVE::Storage::vdisk_alloc(
 		    $self->{storecfg}, $fleecing_storeid, $vmid, $format, $name, $size);
 
+		push $fleece_volids->@*, $di->{'fleece-volid'};
+
 		$n++;
 	    } else {
 		die "implement me (type '$di->{type}')";
@@ -568,9 +579,11 @@ my sub allocate_fleecing_images {
 	}
     };
     if (my $err = $@) {
-	cleanup_fleecing_images($self, $disks);
+	cleanup_fleecing_images($self, $vmid, $disks);
 	die $err;
     }
+
+    PVE::QemuConfig::record_fleecing_images($vmid, $fleece_volids);
 }
 
 my sub detach_fleecing_images {
@@ -630,6 +643,10 @@ my sub check_and_prepare_fleecing {
 	$use_fleecing = 0;
     }
 
+    # clean up potential left-overs from a previous attempt
+    eval { PVE::QemuConfig::cleanup_fleecing_images($vmid, $self->{storecfg}); };
+    $self->log('warn', "attempt to clean up left-over fleecing images failed - $@") if $@;
+
     if ($use_fleecing) {
 	my ($default_format, $valid_formats) = PVE::Storage::storage_default_format(
 	    $self->{storecfg}, $fleecing_opts->{storage});
@@ -791,7 +808,7 @@ sub archive_pbs {
 
     if ($use_fleecing) {
 	detach_fleecing_images($task->{disks}, $vmid);
-	cleanup_fleecing_images($self, $task->{disks});
+	PVE::QemuConfig::cleanup_fleecing_images($vmid, $self->{storecfg});
     }
 
     die $err if $err;
@@ -991,7 +1008,7 @@ sub archive_vma {
 
     if ($use_fleecing) {
 	detach_fleecing_images($task->{disks}, $vmid);
-	cleanup_fleecing_images($self, $task->{disks});
+	PVE::QemuConfig::cleanup_fleecing_images($vmid, $self->{storecfg});
     }
 
     if ($err) {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [RFC qemu-server v2 19/21] migration: attempt to clean up potential left-over fleecing images
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (17 preceding siblings ...)
  2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 18/21] vzdump: better cleanup fleecing images after hard errors Fiona Ebner
@ 2024-03-15 10:25 ` Fiona Ebner
  2024-03-15 10:25 ` [pve-devel] [RFC qemu-server v2 20/21] destroy vm: " Fiona Ebner
  2024-03-15 10:25 ` [pve-devel] [PATCH docs v2 21/21] vzdump: add section about backup fleecing Fiona Ebner
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:25 UTC (permalink / raw)
  To: pve-devel

Before the guest is migrated to a different node and they'd really
become orphaned.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 PVE/QemuMigrate.pm | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 8d9b35ae..a190ce03 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -179,6 +179,9 @@ sub prepare {
     my $version = PVE::QemuServer::Helpers::get_node_pvecfg_version($self->{node});
     my $cloudinit_config = $conf->{cloudinit};
 
+    eval { PVE::QemuConfig::cleanup_fleecing_images($vmid, $storecfg) };
+    $self->log('warn', "attempt to clean up left-over fleecing images failed - $@") if $@;
+
     if (
 	PVE::QemuConfig->has_cloudinit($conf) && defined($cloudinit_config)
 	&& scalar(keys %$cloudinit_config) > 0
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [RFC qemu-server v2 20/21] destroy vm: clean up potential left-over fleecing images
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (18 preceding siblings ...)
  2024-03-15 10:25 ` [pve-devel] [RFC qemu-server v2 19/21] migration: attempt to clean up potential left-over fleecing images Fiona Ebner
@ 2024-03-15 10:25 ` Fiona Ebner
  2024-03-15 10:25 ` [pve-devel] [PATCH docs v2 21/21] vzdump: add section about backup fleecing Fiona Ebner
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:25 UTC (permalink / raw)
  To: pve-devel

To avoid them becoming orphaned.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 PVE/QemuServer.pm | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 91e34166..b80286f3 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2237,6 +2237,9 @@ sub check_type {
 sub destroy_vm {
     my ($storecfg, $vmid, $skiplock, $replacement_conf, $purge_unreferenced) = @_;
 
+    eval { PVE::QemuConfig::cleanup_fleecing_images($vmid, $storecfg) };
+    log_warn("attempt to clean up left-over fleecing images failed - $@") if $@;
+
     my $conf = PVE::QemuConfig->load_config($vmid);
 
     if (!$skiplock && !PVE::QemuConfig->has_lock($conf, 'suspended')) {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pve-devel] [PATCH docs v2 21/21] vzdump: add section about backup fleecing
  2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
                   ` (19 preceding siblings ...)
  2024-03-15 10:25 ` [pve-devel] [RFC qemu-server v2 20/21] destroy vm: " Fiona Ebner
@ 2024-03-15 10:25 ` Fiona Ebner
  20 siblings, 0 replies; 28+ messages in thread
From: Fiona Ebner @ 2024-03-15 10:25 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * flesh out recommendations much more
    * adapt to changes in backend (i.e. using qcow2 if available, no
      storage default)

 vzdump.adoc | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/vzdump.adoc b/vzdump.adoc
index b5bbac7..b01d3c5 100644
--- a/vzdump.adoc
+++ b/vzdump.adoc
@@ -136,6 +136,44 @@ not included in backups. For volume mount points you can set the *Backup* option
 to include the mount point in the backup. Device and bind mounts are never
 backed up as their content is managed outside the {pve} storage library.
 
+VM Backup Fleecing
+~~~~~~~~~~~~~~~~~~
+
+WARNING: Backup fleecing is still being worked on (also in upstream QEMU) and is
+currently only a technology preview.
+
+When a backup for a VM is started, QEMU will install a "copy-before-write"
+filter in its block layer. This filter ensures that upon new guest writes, old
+data still needed for the backup is sent to the backup target first. The guest
+write blocks until this operation is finished so guest IO to not-yet-backed-up
+sectors will be limited by the speed of the backup target.
+
+With backup fleecing, such old data is cached in a fleecing image rather than
+sent directly to the backup target. This can help guest IO performance and even
+prevent hangs in certain scenarios, at the cost of requiring more storage space.
+Use e.g. `vzdump 123 --fleecing enabled=1,storage=local-lvm` to enable backup
+fleecing, with fleecing images created on the storage `local-lvm`.
+
+The fleecing storage should be a fast local storage, with thin provisioning and
+discard support. Examples are LVM-thin, RBD, ZFS with `sparse 1` in the storage
+configuration, many file-based storages. Ideally, the fleecing storage is a
+dedicated storage, so it running full will not affect other guests and just fail
+the backup. Parts of the fleecing image that have been backed up will be
+discarded to try and keep the space usage low.
+
+For file-based storages that do not support discard (e.g. NFS before version
+4.2), you should set `preallocation off` in the storage configuration. In
+combination with `qcow2` (used automatically as the format for the fleecing
+image when the storage supports it), this has the advantage that already
+allocated parts of the image can be re-used later, which can still help save
+quite a bit of space.
+
+WARNING: On a storage that's not thinly provisioned, e.g. LVM or ZFS without the
+`sparse` option, the full size of the original disk needs to be reserved for the
+fleecing image up-front. On a thinly provisioned storage, the fleecing image can
+grow to the same size as the original image only if the guest re-writes a whole
+disk while the backup is busy with another disk.
+
 Backup File Names
 -----------------
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage
  2024-03-15 10:24 ` [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage Fiona Ebner
@ 2024-04-08  8:47   ` Wolfgang Bumiller
  2024-04-10  9:57     ` Fiona Ebner
  0 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Bumiller @ 2024-04-08  8:47 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

On Fri, Mar 15, 2024 at 11:24:54AM +0100, Fiona Ebner wrote:
> Similar to how Datastore.AllocateSpace is required for the backup
> storage, it should also be required for the fleecing storage.
> 
> Removing a fleecing storage from a job does not require more
> permissions than for modifying the job.
> 
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> New in v2.
> 
>  PVE/API2/Backup.pm | 10 ++++++++--
>  PVE/API2/VZDump.pm |  9 +++++----
>  PVE/VZDump.pm      |  2 +-
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/PVE/API2/Backup.pm b/PVE/API2/Backup.pm
> index 70753c2e..86f7dbdd 100644
> --- a/PVE/API2/Backup.pm
> +++ b/PVE/API2/Backup.pm
> @@ -42,7 +42,7 @@ my $vzdump_job_id_prop = {
>  
>  # NOTE: also used by the vzdump API call.
>  sub assert_param_permission_common {
> -    my ($rpcenv, $user, $param) = @_;
> +    my ($rpcenv, $user, $param, $is_delete) = @_;
>      return if $user eq 'root@pam'; # always OK
>  
>      for my $key (qw(tmpdir dumpdir script)) {
> @@ -52,6 +52,12 @@ sub assert_param_permission_common {
>      if (grep { defined($param->{$_}) } qw(bwlimit ionice performance)) {
>  	$rpcenv->check($user, "/", [ 'Sys.Modify' ]);
>      }
> +
> +    if ($param->{fleecing} && !$is_delete) {
> +	my $fleecing = PVE::VZDump::parse_fleecing($param);

^ The parse_fleecing sub does not actually return the hash, at least not
explicitly, and when it is not set it returns undef, so the `if` guard
in the statement below tries to access `undef->{storage}`.

If the parameter does exist then the first run through the function
which performs the actual string->hash conversion will *accidentally*
also return the hash implicitly, because there's no explicit return
statement for it.
Subsequent calls on the other hand will run into the
    return if ref($fleecing) eq 'HASH';
and thus return an empty list making `$fleecing` undef again.

> +	$rpcenv->check($user, "/storage/$fleecing->{storage}", [ 'Datastore.AllocateSpace' ])
> +	    if $fleecing->{storage};
> +    }
>  }
>  
>  my sub assert_param_permission_create {
> @@ -70,7 +76,7 @@ my sub assert_param_permission_update {
>      return if $user eq 'root@pam'; # always OK
>  
>      assert_param_permission_common($rpcenv, $user, $update);
> -    assert_param_permission_common($rpcenv, $user, $delete);
> +    assert_param_permission_common($rpcenv, $user, $delete, 1);
>  
>      if ($update->{storage}) {
>  	$rpcenv->check($user, "/storage/$update->{storage}", [ 'Datastore.Allocate' ])
> diff --git a/PVE/API2/VZDump.pm b/PVE/API2/VZDump.pm
> index f66fc740..7f92e7ec 100644
> --- a/PVE/API2/VZDump.pm
> +++ b/PVE/API2/VZDump.pm
> @@ -41,10 +41,11 @@ __PACKAGE__->register_method ({
>      description => "Create backup.",
>      permissions => {
>  	description => "The user needs 'VM.Backup' permissions on any VM, and "
> -	    ."'Datastore.AllocateSpace' on the backup storage. The 'tmpdir', 'dumpdir' and "
> -	    ."'script' parameters are restricted to the 'root\@pam' user. The 'maxfiles' and "
> -	    ."'prune-backups' settings require 'Datastore.Allocate' on the backup storage. The "
> -	    ."'bwlimit', 'performance' and 'ionice' parameters require 'Sys.Modify' on '/'. ",
> +	    ."'Datastore.AllocateSpace' on the backup storage (and fleecing storage when fleecing "
> +	    ."is used). The 'tmpdir', 'dumpdir' and 'script' parameters are restricted to the "
> +	    ."'root\@pam' user. The 'maxfiles' and 'prune-backups' settings require "
> +	    ."'Datastore.Allocate' on the backup storage. The 'bwlimit', 'performance' and "
> +	    ."'ionice' parameters require 'Sys.Modify' on '/'.",
>  	user => 'all',
>      },
>      protected => 1,
> diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
> index 74eb0c83..88149d68 100644
> --- a/PVE/VZDump.pm
> +++ b/PVE/VZDump.pm
> @@ -130,7 +130,7 @@ my $generate_notes = sub {
>      return $notes_template;
>  };
>  
> -my sub parse_fleecing {
> +sub parse_fleecing {
>      my ($param) = @_;
>  
>      if (defined(my $fleecing = $param->{fleecing})) {

^ So this should be updated to actually return the hash.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option
  2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option Fiona Ebner
@ 2024-04-08 12:45   ` Wolfgang Bumiller
  2024-04-10  9:30     ` Fiona Ebner
  0 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Bumiller @ 2024-04-08 12:45 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

On Fri, Mar 15, 2024 at 11:24:48AM +0100, Fiona Ebner wrote:
> When a fleecing option is given, it is expected that each device has
> a corresponding "-fleecing" block device already attached, except for
> EFI disk and TPM state, where fleecing is never used.
> 
> The following graph was adapted from [0] which also contains more
> details about fleecing.
> 
> [guest]
>    |
>    | root
>    v                 file
> [copy-before-write]<------[snapshot-access]
>    |           |
>    | file      | target
>    v           v
> [source] [fleecing]
> 
> For fleecing, a copy-before-write filter is inserted on top of the
> source node, as well as a snapshot-access node pointing to the filter
> node which allows to read the consistent state of the image at the
> time it was inserted. New guest writes are passed through the
> copy-before-write filter which will first copy over old data to the
> fleecing image in case that old data is still needed by the
> snapshot-access node.
> 
> The backup process will sequentially read from the snapshot access,
> which has a bitmap and knows whether to read from the original image
> or the fleecing image to get the "snapshot" state, i.e. data from the
> source image at the time when the copy-before-write filter was
> inserted. After reading, the copied sections are discarded from the
> fleecing image to reduce space usage.
> 
> All of this can be restricted by an initial dirty bitmap to parts of
> the source image that are required for an incremental backup.
> 
> For discard to work, it is necessary that the fleecing image does not
> have a larger cluster size than the backup job granularity. Since
> querying that size does not always work, e.g. for RBD with krbd, the
> cluster size will not be reported, a minimum of 4 MiB is used. A job
> with PBS target already has at least this granularity, so it's just
> relevant for other targets. I.e. edge cases where this minimum is not
> enough should be very rare in practice. If ever necessary in the
> future, can still add a passed-in value for the backup QMP command to
> override.
> 
> Additionally, the cbw-timeout and on-cbw-error=break-snapshot options
> are set when installing the copy-before-write filter and
> snapshot-access. When an error or timeout occurs, the problematic (and
> each further) snapshot operation will fail and thus cancel the backup
> instead of breaking the guest write.
> 
> Note that job_id cannot be inferred from the snapshot-access bs because
> it has no parent, so just pass the one from the original bs.
> 
> [0]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg876056.html
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> Changes in v2:
>     * specify minimum cluster size for backup job
> 
>  block/monitor/block-hmp-cmds.c |   1 +
>  pve-backup.c                   | 143 ++++++++++++++++++++++++++++++++-
>  qapi/block-core.json           |   8 +-
>  3 files changed, 148 insertions(+), 4 deletions(-)
> 
> diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
> index 6efe28cef5..ca29cc4281 100644
> --- a/block/monitor/block-hmp-cmds.c
> +++ b/block/monitor/block-hmp-cmds.c
> @@ -1064,6 +1064,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
>          NULL, NULL,
>          devlist, qdict_haskey(qdict, "speed"), speed,
>          false, 0, // BackupPerf max-workers
> +        false, false, // fleecing
>          &error);
>  
>      hmp_handle_error(mon, error);
> diff --git a/pve-backup.c b/pve-backup.c
> index e6b17b797e..00aaff6509 100644
> --- a/pve-backup.c
> +++ b/pve-backup.c
> @@ -7,8 +7,10 @@
>  #include "sysemu/blockdev.h"
>  #include "block/block_int-global-state.h"
>  #include "block/blockjob.h"
> +#include "block/copy-before-write.h"
>  #include "block/dirty-bitmap.h"
>  #include "qapi/qapi-commands-block.h"
> +#include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/cutils.h"
>  
> @@ -80,8 +82,15 @@ static void pvebackup_init(void)
>  // initialize PVEBackupState at startup
>  opts_init(pvebackup_init);
>  
> +typedef struct PVEBackupFleecingInfo {
> +    BlockDriverState *bs;
> +    BlockDriverState *cbw;
> +    BlockDriverState *snapshot_access;
> +} PVEBackupFleecingInfo;
> +
>  typedef struct PVEBackupDevInfo {
>      BlockDriverState *bs;
> +    PVEBackupFleecingInfo fleecing;
>      size_t size;
>      uint64_t block_size;
>      uint8_t dev_id;
> @@ -361,6 +370,25 @@ static void pvebackup_complete_cb(void *opaque, int ret)
>      PVEBackupDevInfo *di = opaque;
>      di->completed_ret = ret;
>  
> +    /*
> +     * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
> +     * won't be done as a coroutine anyways:
> +     * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
> +     *   just spawn a BH calling bdrv_unref().
> +     * - For cbw, draining would need to spawn a BH.
> +     *
> +     * Note that the AioContext lock is already acquired by our caller, i.e.
> +     * job_finalize_single_locked()
> +     */
> +    if (di->fleecing.snapshot_access) {
> +        bdrv_unref(di->fleecing.snapshot_access);
> +        di->fleecing.snapshot_access = NULL;
> +    }
> +    if (di->fleecing.cbw) {
> +        bdrv_cbw_drop(di->fleecing.cbw);
> +        di->fleecing.cbw = NULL;
> +    }
> +
>      /*
>       * Schedule stream cleanup in async coroutine. close_image and finish might
>       * take a while, so we can't block on them here. This way it also doesn't
> @@ -521,9 +549,82 @@ static void create_backup_jobs_bh(void *opaque) {
>  
>          bdrv_drained_begin(di->bs);
>  
> +        BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
> +
> +        BlockDriverState *source_bs = di->bs;
> +        bool discard_source = false;
> +        const char *job_id = bdrv_get_device_name(di->bs);
> +        if (di->fleecing.bs) {
> +            QDict *cbw_opts = qdict_new();
> +            qdict_put_str(cbw_opts, "driver", "copy-before-write");
> +            qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
> +            qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
> +
> +            if (di->bitmap) {
> +                /*
> +                 * Only guest writes to parts relevant for the backup need to be intercepted with
> +                 * old data being copied to the fleecing image.
> +                 */
> +                qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
> +                qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap));
> +            }
> +            /*
> +             * Fleecing storage is supposed to be fast and it's better to break backup than guest
> +             * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so
> +             * abort a bit before that.
> +             */
> +            qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
> +            qdict_put_int(cbw_opts, "cbw-timeout", 45);
> +
> +            di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err);
> +
> +            if (!di->fleecing.cbw) {
> +                error_setg(errp, "appending cbw node for fleecing failed: %s",
> +                           local_err ? error_get_pretty(local_err) : "unknown error");
> +                break;
> +            }
> +
> +            QDict *snapshot_access_opts = qdict_new();
> +            qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
> +            qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw));
> +
> +            /*
> +             * Holding the AioContext lock here would cause a deadlock, because bdrv_open_driver()
> +             * will aquire it a second time. But it's allowed to be held exactly once when polling
> +             * and that happens when the bdrv_refresh_total_sectors() call is made there.
> +             */
> +            aio_context_release(aio_context);
> +            di->fleecing.snapshot_access =
> +                bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
> +            aio_context_acquire(aio_context);
> +            if (!di->fleecing.snapshot_access) {
> +                error_setg(errp, "setting up snapshot access for fleecing failed: %s",
> +                           local_err ? error_get_pretty(local_err) : "unknown error");
> +                break;
> +            }
> +            source_bs = di->fleecing.snapshot_access;
> +            discard_source = true;
> +
> +            /*
> +             * bdrv_get_info() just retuns 0 (= doesn't matter) for RBD when using krbd. But discard
> +             * on the fleecing image won't work if the backup job's granularity is less than the RBD
> +             * object size (default 4 MiB), so it does matter. Always use at least 4 MiB. With a PBS
> +             * target, the backup job granularity would already be at least this much.
> +             */
> +            perf.min_cluster_size = 4 * 1024 * 1024;
> +            /*
> +             * For discard to work, cluster size for the backup job must be at least the same as for
> +             * the fleecing image.
> +             */
> +            BlockDriverInfo bdi;
> +            if (bdrv_get_info(di->fleecing.bs, &bdi) >= 0) {
> +                perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
> +            }
> +        }
> +
>          BlockJob *job = backup_job_create(
> -            NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
> -            bitmap_mode, false, NULL, &backup_state.perf, BLOCKDEV_ON_ERROR_REPORT,
> +            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
> +            bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
>              BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
>              &local_err);
>  
> @@ -581,6 +682,14 @@ static void create_backup_jobs_bh(void *opaque) {
>      aio_co_enter(data->ctx, data->co);
>  }
>  
> +/*
> + * EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
> + */
> +static bool device_uses_fleecing(const char *device_id)

Do we really want this?

IMO we already have enough code trying to distinguish "real" disks from
efidisks and tpmstate files.

AFAICT we do check whether the hmp command to *create* the fleecing
drives actually works, so... (see below)

> +{
> +    return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
> +}
> +
>  /*
>   * Returns a list of device infos, which needs to be freed by the caller. In
>   * case of an error, errp will be set, but the returned value might still be a
> @@ -588,6 +697,7 @@ static void create_backup_jobs_bh(void *opaque) {
>   */
>  static GList coroutine_fn *get_device_info(
>      const char *devlist,
> +    bool fleecing,
>      Error **errp)
>  {
>      gchar **devs = NULL;
> @@ -611,6 +721,31 @@ static GList coroutine_fn *get_device_info(
>              }
>              PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
>              di->bs = bs;
> +
> +            if (fleecing && device_uses_fleecing(*d)) {
> +                g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
> +                BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
> +                if (!fleecing_blk) {

...so instead of this, we could just treat the absence of a fleecing
BlockBackend *not* as an error, but as deliberate?

> +                    error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> +                              "Device '%s' not found", fleecing_devid);
> +                    goto err;
> +                }
> +                BlockDriverState *fleecing_bs = blk_bs(fleecing_blk);
> +                if (!bdrv_co_is_inserted(fleecing_bs)) {
> +                    error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, fleecing_devid);
> +                    goto err;
> +                }
> +                /*
> +                 * Fleecing image needs to be the same size to act as a cbw target.
> +                 */
> +                if (bs->total_sectors != fleecing_bs->total_sectors) {
> +                    error_setg(errp, "Size mismatch for '%s' - sector count %ld != %ld",
> +                               fleecing_devid, fleecing_bs->total_sectors, bs->total_sectors);
> +                    goto err;
> +                }
> +                di->fleecing.bs = fleecing_bs;
> +            }
> +
>              di_list = g_list_append(di_list, di);
>              d++;
>          }
> @@ -660,6 +795,7 @@ UuidInfo coroutine_fn *qmp_backup(
>      const char *devlist,
>      bool has_speed, int64_t speed,
>      bool has_max_workers, int64_t max_workers,
> +    bool has_fleecing, bool fleecing,
>      Error **errp)
>  {
>      assert(qemu_in_coroutine());
> @@ -687,7 +823,7 @@ UuidInfo coroutine_fn *qmp_backup(
>      /* Todo: try to auto-detect format based on file name */
>      format = has_format ? format : BACKUP_FORMAT_VMA;
>  
> -    di_list = get_device_info(devlist, &local_err);
> +    di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          goto err;
> @@ -1086,5 +1222,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
>      ret->query_bitmap_info = true;
>      ret->pbs_masterkey = true;
>      ret->backup_max_workers = true;
> +    ret->backup_fleecing = true;
>      return ret;
>  }
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 58fd637e86..0bc5f42677 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -933,6 +933,10 @@
>  #
>  # @max-workers: see @BackupPerf for details. Default 16.
>  #
> +# @fleecing: perform a backup with fleecing. For each device in @devlist, a
> +#            corresponing '-fleecing' device with the same size already needs to
> +#            be present.
> +#
>  # Returns: the uuid of the backup job
>  #
>  ##
> @@ -953,7 +957,8 @@
>                                      '*firewall-file': 'str',
>                                      '*devlist': 'str',
>                                      '*speed': 'int',
> -                                    '*max-workers': 'int' },
> +                                    '*max-workers': 'int',
> +                                    '*fleecing': 'bool' },
>    'returns': 'UuidInfo', 'coroutine': true }
>  
>  ##
> @@ -1009,6 +1014,7 @@
>              'pbs-dirty-bitmap-migration': 'bool',
>              'pbs-masterkey': 'bool',
>              'pbs-library-version': 'str',
> +            'backup-fleecing': 'bool',
>              'backup-max-workers': 'bool' } }
>  
>  ##
> -- 
> 2.39.2




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option
  2024-04-08 12:45   ` Wolfgang Bumiller
@ 2024-04-10  9:30     ` Fiona Ebner
  2024-04-10 11:38       ` Wolfgang Bumiller
  0 siblings, 1 reply; 28+ messages in thread
From: Fiona Ebner @ 2024-04-10  9:30 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pve-devel

Am 08.04.24 um 14:45 schrieb Wolfgang Bumiller:
> On Fri, Mar 15, 2024 at 11:24:48AM +0100, Fiona Ebner wrote:
>> @@ -581,6 +682,14 @@ static void create_backup_jobs_bh(void *opaque) {
>>      aio_co_enter(data->ctx, data->co);
>>  }
>>  
>> +/*
>> + * EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
>> + */
>> +static bool device_uses_fleecing(const char *device_id)
> 
> Do we really want this?
> 
> IMO we already have enough code trying to distinguish "real" disks from
> efidisks and tpmstate files.
> 
> AFAICT we do check whether the hmp command to *create* the fleecing
> drives actually works, so... (see below)
> 
>> +{
>> +    return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
>> +}
>> +
>>  /*
>>   * Returns a list of device infos, which needs to be freed by the caller. In
>>   * case of an error, errp will be set, but the returned value might still be a
>> @@ -588,6 +697,7 @@ static void create_backup_jobs_bh(void *opaque) {
>>   */
>>  static GList coroutine_fn *get_device_info(
>>      const char *devlist,
>> +    bool fleecing,
>>      Error **errp)
>>  {
>>      gchar **devs = NULL;
>> @@ -611,6 +721,31 @@ static GList coroutine_fn *get_device_info(
>>              }
>>              PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
>>              di->bs = bs;
>> +
>> +            if (fleecing && device_uses_fleecing(*d)) {
>> +                g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
>> +                BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
>> +                if (!fleecing_blk) {
> 
> ...so instead of this, we could just treat the absence of a fleecing
> BlockBackend *not* as an error, but as deliberate?
> 

Yes, we could. But the check gives protection against potential (future)
bugs where we can't find the fleecing device for some other reason.
Without the check, we'd just quietly continue and it would be hard to
notice that something is wrong. So I'm slightly in favor of keeping it.
If you still want me to remove it, I'll do it in v3.

>> +                    error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
>> +                              "Device '%s' not found", fleecing_devid);
>> +                    goto err;
>> +                }




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage
  2024-04-08  8:47   ` Wolfgang Bumiller
@ 2024-04-10  9:57     ` Fiona Ebner
  2024-04-10 11:37       ` Wolfgang Bumiller
  0 siblings, 1 reply; 28+ messages in thread
From: Fiona Ebner @ 2024-04-10  9:57 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pve-devel

Am 08.04.24 um 10:47 schrieb Wolfgang Bumiller:
> On Fri, Mar 15, 2024 at 11:24:54AM +0100, Fiona Ebner wrote:
>> @@ -52,6 +52,12 @@ sub assert_param_permission_common {
>>      if (grep { defined($param->{$_}) } qw(bwlimit ionice performance)) {
>>  	$rpcenv->check($user, "/", [ 'Sys.Modify' ]);
>>      }
>> +
>> +    if ($param->{fleecing} && !$is_delete) {
>> +	my $fleecing = PVE::VZDump::parse_fleecing($param);
> 
> ^ The parse_fleecing sub does not actually return the hash, at least not
> explicitly, and when it is not set it returns undef, so the `if` guard
> in the statement below tries to access `undef->{storage}`.
> 

It can't be unset, because $param->{fleecing} is checked before entering
the if branch here.

> If the parameter does exist then the first run through the function
> which performs the actual string->hash conversion will *accidentally*
> also return the hash implicitly, because there's no explicit return
> statement for it.
> Subsequent calls on the other hand will run into the
>     return if ref($fleecing) eq 'HASH';
> and thus return an empty list making `$fleecing` undef again.
> 

Oh, good catch! It did work by chance in my testing, because of what you
describe, the implicit return and because nobody else called
parse_fleecing() before here. Will fix in v3!

>> +	$rpcenv->check($user, "/storage/$fleecing->{storage}", [ 'Datastore.AllocateSpace' ])
>> +	    if $fleecing->{storage};
>> +    }
>>  }
>>  
>>  my sub assert_param_permission_create {

---snip---

>> diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
>> index 74eb0c83..88149d68 100644
>> --- a/PVE/VZDump.pm
>> +++ b/PVE/VZDump.pm
>> @@ -130,7 +130,7 @@ my $generate_notes = sub {
>>      return $notes_template;
>>  };
>>  
>> -my sub parse_fleecing {
>> +sub parse_fleecing {
>>      my ($param) = @_;
>>  
>>      if (defined(my $fleecing = $param->{fleecing})) {
> 
> ^ So this should be updated to actually return the hash.

We also have parse_performance() and parse_prune_backups_maxfiles() with
similar semantics. Their callers don't actually need any return value.
If we change parse_fleecing() to return the result, we should change the
others as well for consistency. Alternatively, I can fix the wrong
caller of parse_fleecing() above and maybe add an explicit "return
undef" to these parse_* functions to avoid something like this slipping
through in the future. Which option do you prefer?




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage
  2024-04-10  9:57     ` Fiona Ebner
@ 2024-04-10 11:37       ` Wolfgang Bumiller
  0 siblings, 0 replies; 28+ messages in thread
From: Wolfgang Bumiller @ 2024-04-10 11:37 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

On Wed, Apr 10, 2024 at 11:57:37AM +0200, Fiona Ebner wrote:
> Am 08.04.24 um 10:47 schrieb Wolfgang Bumiller:
> > On Fri, Mar 15, 2024 at 11:24:54AM +0100, Fiona Ebner wrote:
> >> @@ -52,6 +52,12 @@ sub assert_param_permission_common {
> >>      if (grep { defined($param->{$_}) } qw(bwlimit ionice performance)) {
> >>  	$rpcenv->check($user, "/", [ 'Sys.Modify' ]);
> >>      }
> >> +
> >> +    if ($param->{fleecing} && !$is_delete) {
> >> +	my $fleecing = PVE::VZDump::parse_fleecing($param);
> > 
> > ^ The parse_fleecing sub does not actually return the hash, at least not
> > explicitly, and when it is not set it returns undef, so the `if` guard
> > in the statement below tries to access `undef->{storage}`.
> > 
> 
> It can't be unset, because $param->{fleecing} is checked before entering
> the if branch here.
> 
> > If the parameter does exist then the first run through the function
> > which performs the actual string->hash conversion will *accidentally*
> > also return the hash implicitly, because there's no explicit return
> > statement for it.
> > Subsequent calls on the other hand will run into the
> >     return if ref($fleecing) eq 'HASH';
> > and thus return an empty list making `$fleecing` undef again.
> > 
> 
> Oh, good catch! It did work by chance in my testing, because of what you
> describe, the implicit return and because nobody else called
> parse_fleecing() before here. Will fix in v3!
> 
> >> +	$rpcenv->check($user, "/storage/$fleecing->{storage}", [ 'Datastore.AllocateSpace' ])
> >> +	    if $fleecing->{storage};
> >> +    }
> >>  }
> >>  
> >>  my sub assert_param_permission_create {
> 
> ---snip---
> 
> >> diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
> >> index 74eb0c83..88149d68 100644
> >> --- a/PVE/VZDump.pm
> >> +++ b/PVE/VZDump.pm
> >> @@ -130,7 +130,7 @@ my $generate_notes = sub {
> >>      return $notes_template;
> >>  };
> >>  
> >> -my sub parse_fleecing {
> >> +sub parse_fleecing {
> >>      my ($param) = @_;
> >>  
> >>      if (defined(my $fleecing = $param->{fleecing})) {
> > 
> > ^ So this should be updated to actually return the hash.
> 
> We also have parse_performance() and parse_prune_backups_maxfiles() with
> similar semantics. Their callers don't actually need any return value.
> If we change parse_fleecing() to return the result, we should change the
> others as well for consistency. Alternatively, I can fix the wrong
> caller of parse_fleecing() above and maybe add an explicit "return
> undef" to these parse_* functions to avoid something like this slipping
> through in the future. Which option do you prefer?

Having them all return the hash shouldn't hurt and makes sense to me.
Even if the others are "private" (`my sub`). A patch just dropping the
`my` usually does not include enough context lines in patch mails to
easily see that they don't return anything...

Given that we kind of need to always call them before using the hashes
anyway, always returning the hash makes sense anyway.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option
  2024-04-10  9:30     ` Fiona Ebner
@ 2024-04-10 11:38       ` Wolfgang Bumiller
  0 siblings, 0 replies; 28+ messages in thread
From: Wolfgang Bumiller @ 2024-04-10 11:38 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

On Wed, Apr 10, 2024 at 11:30:59AM +0200, Fiona Ebner wrote:
> Am 08.04.24 um 14:45 schrieb Wolfgang Bumiller:
> > On Fri, Mar 15, 2024 at 11:24:48AM +0100, Fiona Ebner wrote:
> >> @@ -581,6 +682,14 @@ static void create_backup_jobs_bh(void *opaque) {
> >>      aio_co_enter(data->ctx, data->co);
> >>  }
> >>  
> >> +/*
> >> + * EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
> >> + */
> >> +static bool device_uses_fleecing(const char *device_id)
> > 
> > Do we really want this?
> > 
> > IMO we already have enough code trying to distinguish "real" disks from
> > efidisks and tpmstate files.
> > 
> > AFAICT we do check whether the hmp command to *create* the fleecing
> > drives actually works, so... (see below)
> > 
> >> +{
> >> +    return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
> >> +}
> >> +
> >>  /*
> >>   * Returns a list of device infos, which needs to be freed by the caller. In
> >>   * case of an error, errp will be set, but the returned value might still be a
> >> @@ -588,6 +697,7 @@ static void create_backup_jobs_bh(void *opaque) {
> >>   */
> >>  static GList coroutine_fn *get_device_info(
> >>      const char *devlist,
> >> +    bool fleecing,
> >>      Error **errp)
> >>  {
> >>      gchar **devs = NULL;
> >> @@ -611,6 +721,31 @@ static GList coroutine_fn *get_device_info(
> >>              }
> >>              PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
> >>              di->bs = bs;
> >> +
> >> +            if (fleecing && device_uses_fleecing(*d)) {
> >> +                g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
> >> +                BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
> >> +                if (!fleecing_blk) {
> > 
> > ...so instead of this, we could just treat the absence of a fleecing
> > BlockBackend *not* as an error, but as deliberate?
> > 
> 
> Yes, we could. But the check gives protection against potential (future)
> bugs where we can't find the fleecing device for some other reason.
> Without the check, we'd just quietly continue and it would be hard to
> notice that something is wrong. So I'm slightly in favor of keeping it.
> If you still want me to remove it, I'll do it in v3.

Mh, makes sense. Let's keep it then.




^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-04-10 11:38 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-15 10:24 [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 01/21] block/copy-before-write: fix permission Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 02/21] block/copy-before-write: support unligned snapshot-discard Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 03/21] block/copy-before-write: create block_copy bitmap in filter node Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 04/21] qapi: blockdev-backup: add discard-source parameter Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 05/21] copy-before-write: allow specifying minimum cluster size Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 06/21] backup: add minimum cluster size to performance options Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu v2 07/21] PVE backup: add fleecing option Fiona Ebner
2024-04-08 12:45   ` Wolfgang Bumiller
2024-04-10  9:30     ` Fiona Ebner
2024-04-10 11:38       ` Wolfgang Bumiller
2024-03-15 10:24 ` [pve-devel] [PATCH common v2 08/21] json schema: add format description for pve-storage-id standard option Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH guest-common v2 09/21] vzdump: schema: add fleecing property string Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH guest-common v2 10/21] vzdump: schema: make storage for fleecing semi-optional Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [RFC guest-common v2 11/21] abstract config: do not copy fleecing images entry for snapshot Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH manager v2 12/21] vzdump: handle new 'fleecing' property string Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH manager v2 13/21] api: backup/vzdump: add permission check for fleecing storage Fiona Ebner
2024-04-08  8:47   ` Wolfgang Bumiller
2024-04-10  9:57     ` Fiona Ebner
2024-04-10 11:37       ` Wolfgang Bumiller
2024-03-15 10:24 ` [pve-devel] [PATCH qemu-server v2 14/21] backup: disk info: also keep track of size Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [PATCH qemu-server v2 15/21] backup: implement fleecing option Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 16/21] parse config: allow config keys with minus sign Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 17/21] schema: add fleecing-images config property Fiona Ebner
2024-03-15 10:24 ` [pve-devel] [RFC qemu-server v2 18/21] vzdump: better cleanup fleecing images after hard errors Fiona Ebner
2024-03-15 10:25 ` [pve-devel] [RFC qemu-server v2 19/21] migration: attempt to clean up potential left-over fleecing images Fiona Ebner
2024-03-15 10:25 ` [pve-devel] [RFC qemu-server v2 20/21] destroy vm: " Fiona Ebner
2024-03-15 10:25 ` [pve-devel] [PATCH docs v2 21/21] vzdump: add section about backup fleecing Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal