* [pve-devel] [PATCH qemu v3 01/22] block/copy-before-write: fix permission
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 02/22] block/copy-before-write: support unligned snapshot-discard Fiona Ebner
` (21 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
In case when source node does not have any parents, the condition still
works as required: backup job do create the parent by
block_job_create -> block_job_add_bdrv -> bdrv_root_attach_child
Still, in this case checking @perm variable doesn't work, as backup job
creates the root blk with empty permissions (as it rely on CBW filter
to require correct permissions and don't want to create extra
conflicts).
So, we should not check @perm.
The hack may be dropped entirely when transactional insertion of
filter (when we don't try to recalculate permissions in intermediate
state, when filter does conflict with original parent of the source
node) merged (old big series
"[PATCH v5 00/45] Transactional block-graph modifying API"[1] and it's
current in-flight part is "[PATCH v8 0/7] blockdev-replace"[2])
[1] https://patchew.org/QEMU/20220330212902.590099-1-vsementsov@openvz.org/
[2] https://patchew.org/QEMU/20231017184444.932733-1-vsementsov@yandex-team.ru/
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/copy-before-write.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index b866e42271..a2dddf6f57 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -364,9 +364,13 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
perm, shared, nperm, nshared);
if (!QLIST_EMPTY(&bs->parents)) {
- if (perm & BLK_PERM_WRITE) {
- *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
- }
+ /*
+ * Note, that source child may be shared with backup job. Backup job
+ * does create own blk parent on copy-before-write node, so this
+ * works even if source node does not have any parents before backup
+ * start
+ */
+ *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
}
}
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu v3 02/22] block/copy-before-write: support unligned snapshot-discard
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 01/22] block/copy-before-write: fix permission Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 03/22] block/copy-before-write: create block_copy bitmap in filter node Fiona Ebner
` (20 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
First thing that crashes on unligned access here is
bdrv_reset_dirty_bitmap(). Correct way is to align-down the
snapshot-discard request.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/copy-before-write.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index a2dddf6f57..0a219c2b75 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -325,14 +325,24 @@ static int coroutine_fn GRAPH_RDLOCK
cbw_co_pdiscard_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes)
{
BDRVCopyBeforeWriteState *s = bs->opaque;
+ uint32_t cluster_size = block_copy_cluster_size(s->bcs);
+ int64_t aligned_offset = QEMU_ALIGN_UP(offset, cluster_size);
+ int64_t aligned_end = QEMU_ALIGN_DOWN(offset + bytes, cluster_size);
+ int64_t aligned_bytes;
+
+ if (aligned_end <= aligned_offset) {
+ return 0;
+ }
+ aligned_bytes = aligned_end - aligned_offset;
WITH_QEMU_LOCK_GUARD(&s->lock) {
- bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+ bdrv_reset_dirty_bitmap(s->access_bitmap, aligned_offset,
+ aligned_bytes);
}
- block_copy_reset(s->bcs, offset, bytes);
+ block_copy_reset(s->bcs, aligned_offset, aligned_bytes);
- return bdrv_co_pdiscard(s->target, offset, bytes);
+ return bdrv_co_pdiscard(s->target, aligned_offset, aligned_bytes);
}
static void cbw_refresh_filename(BlockDriverState *bs)
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu v3 03/22] block/copy-before-write: create block_copy bitmap in filter node
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 01/22] block/copy-before-write: fix permission Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 02/22] block/copy-before-write: support unligned snapshot-discard Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 04/22] qapi: blockdev-backup: add discard-source parameter Fiona Ebner
` (19 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Currently block_copy creates copy_bitmap in source node. But that is in
bad relation with .independent_close=true of copy-before-write filter:
source node may be detached and removed before .bdrv_close() handler
called, which should call block_copy_state_free(), which in turn should
remove copy_bitmap.
That's all not ideal: it would be better if internal bitmap of
block-copy object is not attached to any node. But that is not possible
now.
The simplest solution is just create copy_bitmap in filter node, where
anyway two other bitmaps are created.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/block-copy.c | 3 +-
block/copy-before-write.c | 2 +-
include/block/block-copy.h | 1 +
tests/qemu-iotests/257.out | 112 ++++++++++++++++++-------------------
4 files changed, 60 insertions(+), 58 deletions(-)
diff --git a/block/block-copy.c b/block/block-copy.c
index e13d7bc6b6..b61685f1a2 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -346,6 +346,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
}
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
Error **errp)
{
@@ -360,7 +361,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
return NULL;
}
- copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
+ copy_bitmap = bdrv_create_dirty_bitmap(copy_bitmap_bs, cluster_size, NULL,
errp);
if (!copy_bitmap) {
return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 0a219c2b75..d3b95bd600 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -470,7 +470,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
bs->file->bs->supported_zero_flags);
- s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+ s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
if (!s->bcs) {
error_prepend(errp, "Cannot create block-copy-state: ");
ret = -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 0700953ab8..8b41643bfa 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
typedef struct BlockCopyCallState BlockCopyCallState;
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
Error **errp);
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index aa76131ca9..c33dd7f3a9 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -120,16 +120,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -596,16 +596,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -865,16 +865,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -1341,16 +1341,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -1610,16 +1610,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -2086,16 +2086,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -2355,16 +2355,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -2831,16 +2831,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -3100,16 +3100,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -3576,16 +3576,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -3845,16 +3845,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -4321,16 +4321,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -4590,16 +4590,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -5066,16 +5066,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu v3 04/22] qapi: blockdev-backup: add discard-source parameter
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (2 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 03/22] block/copy-before-write: create block_copy bitmap in filter node Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 05/22] copy-before-write: allow specifying minimum cluster size Fiona Ebner
` (18 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:
[guest] [snapshot-access] ~~ blockdev-backup ~~> [backup target]
| |
| root | file
v v
[copy-before-write]
| |
| file | target
v v
[active disk] [temp.img]
In this case discard-after-copy does two things:
- discard data in temp.img to save disk space
- avoid further copy-before-write operation in discarded area
Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Still we can't take it
unconditionally, as it will break normal backup from RO source. So, we
have to add a parameter and pass it thorough bdrv_open flags.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/backup.c | 5 +++--
block/block-copy.c | 9 +++++++++
block/copy-before-write.c | 15 +++++++++++++--
block/copy-before-write.h | 1 +
block/replication.c | 4 ++--
blockdev.c | 2 +-
include/block/block-common.h | 2 ++
include/block/block-copy.h | 1 +
include/block/block_int-global-state.h | 2 +-
qapi/block-core.json | 4 ++++
10 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index af87fa6aa9..3dc955f625 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -332,7 +332,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, int64_t speed,
MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
BitmapSyncMode bitmap_mode,
- bool compress,
+ bool compress, bool discard_source,
const char *filter_node_name,
BackupPerf *perf,
BlockdevOnError on_source_error,
@@ -429,7 +429,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
goto error;
}
- cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
+ cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
+ &bcs, errp);
if (!cbw) {
goto error;
}
diff --git a/block/block-copy.c b/block/block-copy.c
index b61685f1a2..3c61e52bae 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -137,6 +137,7 @@ typedef struct BlockCopyState {
CoMutex lock;
int64_t in_flight_bytes;
BlockCopyMethod method;
+ bool discard_source;
BlockReqList reqs;
QLIST_HEAD(, BlockCopyCallState) calls;
/*
@@ -348,6 +349,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
+ bool discard_source,
Error **errp)
{
ERRP_GUARD();
@@ -409,6 +411,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
cluster_size),
};
+ s->discard_source = discard_source;
block_copy_set_copy_opts(s, false, false);
ratelimit_init(&s->rate_limit);
@@ -580,6 +583,12 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
co_put_to_shres(s->mem, t->req.bytes);
block_copy_task_end(t, ret);
+ if (s->discard_source && ret == 0) {
+ int64_t nbytes =
+ MIN(t->req.offset + t->req.bytes, s->len) - t->req.offset;
+ bdrv_co_pdiscard(s->source, t->req.offset, nbytes);
+ }
+
return ret;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index d3b95bd600..3503702d71 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState {
BdrvChild *target;
OnCbwError on_cbw_error;
uint32_t cbw_timeout_ns;
+ bool discard_source;
/*
* @lock: protects access to @access_bitmap, @done_bitmap and
@@ -357,6 +358,8 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
+ BDRVCopyBeforeWriteState *s = bs->opaque;
+
if (!(role & BDRV_CHILD_FILTERED)) {
/*
* Target child
@@ -381,6 +384,10 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
* start
*/
*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+ if (s->discard_source) {
+ *nperm = *nperm | BLK_PERM_WRITE;
+ }
+
*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
}
}
@@ -470,7 +477,9 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
bs->file->bs->supported_zero_flags);
- s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
+ s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;
+ s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
+ flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
if (!s->bcs) {
error_prepend(errp, "Cannot create block-copy-state: ");
ret = -EINVAL;
@@ -544,12 +553,14 @@ BlockDriver bdrv_cbw_filter = {
BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
+ bool discard_source,
BlockCopyState **bcs,
Error **errp)
{
BDRVCopyBeforeWriteState *state;
BlockDriverState *top;
QDict *opts;
+ int flags = BDRV_O_RDWR | (discard_source ? BDRV_O_CBW_DISCARD_SOURCE : 0);
assert(source->total_sectors == target->total_sectors);
GLOBAL_STATE_CODE();
@@ -562,7 +573,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
qdict_put_str(opts, "file", bdrv_get_node_name(source));
qdict_put_str(opts, "target", bdrv_get_node_name(target));
- top = bdrv_insert_node(source, opts, BDRV_O_RDWR, errp);
+ top = bdrv_insert_node(source, opts, flags, errp);
if (!top) {
return NULL;
}
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 6e72bb25e9..01af0cd3c4 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -39,6 +39,7 @@
BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
+ bool discard_source,
BlockCopyState **bcs,
Error **errp);
void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/block/replication.c b/block/replication.c
index ea4bf1aa80..39ad78cf98 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -579,8 +579,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
s->backup_job = backup_job_create(
NULL, s->secondary_disk->bs, s->hidden_disk->bs,
- 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
- &perf,
+ 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, false,
+ NULL, &perf,
BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
backup_job_completed, bs, NULL, &local_err);
diff --git a/blockdev.c b/blockdev.c
index 7793143d76..ce3fef924c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2802,7 +2802,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
backup->sync, bmap, backup->bitmap_mode,
- backup->compress,
+ backup->compress, backup->discard_source,
backup->filter_node_name,
&perf,
backup->on_source_error,
diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..913a8b259c 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -234,6 +234,8 @@ typedef enum {
read-write fails */
#define BDRV_O_IO_URING 0x40000 /* use io_uring instead of the thread pool */
+#define BDRV_O_CBW_DISCARD_SOURCE 0x80000 /* for copy-before-write filter */
+
#define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 8b41643bfa..bdc703bacd 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -27,6 +27,7 @@ typedef struct BlockCopyCallState BlockCopyCallState;
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
+ bool discard_source,
Error **errp);
/* Function should be called prior any actual copy request */
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index 32f0f9858a..546f2b5532 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -189,7 +189,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
MirrorSyncMode sync_mode,
BdrvDirtyBitmap *sync_bitmap,
BitmapSyncMode bitmap_mode,
- bool compress,
+ bool compress, bool discard_source,
const char *filter_node_name,
BackupPerf *perf,
BlockdevOnError on_source_error,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 09de550c95..4297e5beda 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1816,6 +1816,9 @@
# node specified by @drive. If this option is not given, a node
# name is autogenerated. (Since: 4.2)
#
+# @discard-source: Discard blocks on source which are already copied
+# to the target. (Since 9.0)
+#
# @x-perf: Performance options. (Since 6.0)
#
# Features:
@@ -1837,6 +1840,7 @@
'*on-target-error': 'BlockdevOnError',
'*auto-finalize': 'bool', '*auto-dismiss': 'bool',
'*filter-node-name': 'str',
+ '*discard-source': 'bool',
'*x-perf': { 'type': 'BackupPerf',
'features': [ 'unstable' ] } } }
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu v3 05/22] copy-before-write: allow specifying minimum cluster size
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (3 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 04/22] qapi: blockdev-backup: add discard-source parameter Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 06/22] backup: add minimum cluster size to performance options Fiona Ebner
` (17 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.
Copy-before-write operations will use at least this granularity and in
particular, discard requests to the source node will too. If the
granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.
The QAPI uses uint32 so the value will be non-negative, but still fit
into a uint64_t.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/block-copy.c | 17 +++++++++++++----
block/copy-before-write.c | 3 ++-
include/block/block-copy.h | 1 +
qapi/block-core.json | 8 +++++++-
4 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/block-copy.c b/block/block-copy.c
index 3c61e52bae..c9a722a5a6 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
}
static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
+ int64_t min_cluster_size,
Error **errp)
{
int ret;
@@ -330,7 +331,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
"used. If the actual block size of the target exceeds "
"this default, the backup may be unusable",
BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
- return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+ return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
} else if (ret < 0 && !target_does_cow) {
error_setg_errno(errp, -ret,
"Couldn't determine the cluster size of the target image, "
@@ -340,16 +341,18 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
return ret;
} else if (ret < 0 && target_does_cow) {
/* Not fatal; just trudge on ahead. */
- return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+ return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
}
- return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
+ return MAX(min_cluster_size,
+ MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size));
}
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
bool discard_source,
+ int64_t min_cluster_size,
Error **errp)
{
ERRP_GUARD();
@@ -358,7 +361,13 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BdrvDirtyBitmap *copy_bitmap;
bool is_fleecing;
- cluster_size = block_copy_calculate_cluster_size(target->bs, errp);
+ if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
+ error_setg(errp, "min-cluster-size needs to be a power of 2");
+ return NULL;
+ }
+
+ cluster_size = block_copy_calculate_cluster_size(target->bs,
+ min_cluster_size, errp);
if (cluster_size < 0) {
return NULL;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 3503702d71..4a8c5bdb62 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -479,7 +479,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;
s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
- flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
+ flags & BDRV_O_CBW_DISCARD_SOURCE,
+ opts->min_cluster_size, errp);
if (!s->bcs) {
error_prepend(errp, "Cannot create block-copy-state: ");
ret = -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index bdc703bacd..77857c6c68 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
bool discard_source,
+ int64_t min_cluster_size,
Error **errp);
/* Function should be called prior any actual copy request */
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4297e5beda..33e7e3c090 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4825,12 +4825,18 @@
# @on-cbw-error parameter will decide how this failure is handled.
# Default 0. (Since 7.1)
#
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# operations. Has to be a power of 2. No effect if smaller than
+# the maximum of the target's cluster size and 64 KiB. Default 0.
+# (Since 8.1)
+#
# Since: 6.2
##
{ 'struct': 'BlockdevOptionsCbw',
'base': 'BlockdevOptionsGenericFormat',
'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap',
- '*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32' } }
+ '*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32',
+ '*min-cluster-size': 'uint32' } }
##
# @BlockdevOptions:
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu v3 06/22] backup: add minimum cluster size to performance options
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (4 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 05/22] copy-before-write: allow specifying minimum cluster size Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 18:41 ` [pve-devel] partially-applied: " Thomas Lamprecht
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 07/22] PVE backup: add fleecing option Fiona Ebner
` (16 subsequent siblings)
22 siblings, 1 reply; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.
Backup/block-copy will use at least this granularity for copy operations
and in particular, discard requests to the backup source will too. If
the granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/backup.c | 2 +-
block/copy-before-write.c | 2 ++
block/copy-before-write.h | 1 +
blockdev.c | 3 +++
qapi/block-core.json | 9 +++++++--
5 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index 3dc955f625..ac5bd81338 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -430,7 +430,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
}
cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
- &bcs, errp);
+ perf->min_cluster_size, &bcs, errp);
if (!cbw) {
goto error;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4a8c5bdb62..9ca5ec5e5c 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -555,6 +555,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+ int64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp)
{
@@ -573,6 +574,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
}
qdict_put_str(opts, "file", bdrv_get_node_name(source));
qdict_put_str(opts, "target", bdrv_get_node_name(target));
+ qdict_put_int(opts, "min-cluster-size", min_cluster_size);
top = bdrv_insert_node(source, opts, flags, errp);
if (!top) {
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 01af0cd3c4..dc6cafe7fa 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+ int64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp);
void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index ce3fef924c..5ae1dde73c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2729,6 +2729,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
if (backup->x_perf->has_max_chunk) {
perf.max_chunk = backup->x_perf->max_chunk;
}
+ if (backup->x_perf->has_min_cluster_size) {
+ perf.min_cluster_size = backup->x_perf->min_cluster_size;
+ }
}
if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 33e7e3c090..58fd637e86 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1757,11 +1757,16 @@
# it should not be less than job cluster size which is calculated
# as maximum of target image cluster size and 64k. Default 0.
#
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# and background copy operations. Has to be a power of 2. No
+# effect if smaller than the maximum of the target's cluster size
+# and 64 KiB. Default 0. (Since 8.1)
+#
# Since: 6.0
##
{ 'struct': 'BackupPerf',
- 'data': { '*use-copy-range': 'bool',
- '*max-workers': 'int', '*max-chunk': 'int64' } }
+ 'data': { '*use-copy-range': 'bool', '*max-workers': 'int',
+ '*max-chunk': 'int64', '*min-cluster-size': 'uint32' } }
##
# @BackupCommon:
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] partially-applied: [PATCH qemu v3 06/22] backup: add minimum cluster size to performance options
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 06/22] backup: add minimum cluster size to performance options Fiona Ebner
@ 2024-04-11 18:41 ` Thomas Lamprecht
0 siblings, 0 replies; 34+ messages in thread
From: Thomas Lamprecht @ 2024-04-11 18:41 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
On 11/04/2024 11:29, Fiona Ebner wrote:
> Useful to make discard-source work in the context of backup fleecing
> when the fleecing image has a larger granularity than the backup
> target.
>
> Backup/block-copy will use at least this granularity for copy operations
> and in particular, discard requests to the backup source will too. If
> the granularity is too small, they will just be aligned down in
> cbw_co_pdiscard_snapshot() and thus effectively ignored.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> block/backup.c | 2 +-
> block/copy-before-write.c | 2 ++
> block/copy-before-write.h | 1 +
> blockdev.c | 3 +++
> qapi/block-core.json | 9 +++++++--
> 5 files changed, 14 insertions(+), 3 deletions(-)
>
>
applied the qemu part, thanks!
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu v3 07/22] PVE backup: add fleecing option
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (5 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 06/22] backup: add minimum cluster size to performance options Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH common v3 08/22] json schema: add format description for pve-storage-id standard option Fiona Ebner
` (15 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
When a fleecing option is given, it is expected that each device has
a corresponding "-fleecing" block device already attached, except for
EFI disk and TPM state, where fleecing is never used.
The following graph was adapted from [0] which also contains more
details about fleecing.
[guest]
|
| root
v file
[copy-before-write]<------[snapshot-access]
| |
| file | target
v v
[source] [fleecing]
For fleecing, a copy-before-write filter is inserted on top of the
source node, as well as a snapshot-access node pointing to the filter
node which allows to read the consistent state of the image at the
time it was inserted. New guest writes are passed through the
copy-before-write filter which will first copy over old data to the
fleecing image in case that old data is still needed by the
snapshot-access node.
The backup process will sequentially read from the snapshot access,
which has a bitmap and knows whether to read from the original image
or the fleecing image to get the "snapshot" state, i.e. data from the
source image at the time when the copy-before-write filter was
inserted. After reading, the copied sections are discarded from the
fleecing image to reduce space usage.
All of this can be restricted by an initial dirty bitmap to parts of
the source image that are required for an incremental backup.
For discard to work, it is necessary that the fleecing image does not
have a larger cluster size than the backup job granularity. Since
querying that size does not always work, e.g. for RBD with krbd, the
cluster size will not be reported, a minimum of 4 MiB is used. A job
with PBS target already has at least this granularity, so it's just
relevant for other targets. I.e. edge cases where this minimum is not
enough should be very rare in practice. If ever necessary in the
future, can still add a passed-in value for the backup QMP command to
override.
Additionally, the cbw-timeout and on-cbw-error=break-snapshot options
are set when installing the copy-before-write filter and
snapshot-access. When an error or timeout occurs, the problematic (and
each further) snapshot operation will fail and thus cancel the backup
instead of breaking the guest write.
Note that job_id cannot be inferred from the snapshot-access bs because
it has no parent, so just pass the one from the original bs.
[0]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg876056.html
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
pve-backup.c | 143 ++++++++++++++++++++++++++++++++-
qapi/block-core.json | 8 +-
3 files changed, 148 insertions(+), 4 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 6efe28cef5..ca29cc4281 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1064,6 +1064,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
NULL, NULL,
devlist, qdict_haskey(qdict, "speed"), speed,
false, 0, // BackupPerf max-workers
+ false, false, // fleecing
&error);
hmp_handle_error(mon, error);
diff --git a/pve-backup.c b/pve-backup.c
index e6b17b797e..00aaff6509 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -7,8 +7,10 @@
#include "sysemu/blockdev.h"
#include "block/block_int-global-state.h"
#include "block/blockjob.h"
+#include "block/copy-before-write.h"
#include "block/dirty-bitmap.h"
#include "qapi/qapi-commands-block.h"
+#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qerror.h"
#include "qemu/cutils.h"
@@ -80,8 +82,15 @@ static void pvebackup_init(void)
// initialize PVEBackupState at startup
opts_init(pvebackup_init);
+typedef struct PVEBackupFleecingInfo {
+ BlockDriverState *bs;
+ BlockDriverState *cbw;
+ BlockDriverState *snapshot_access;
+} PVEBackupFleecingInfo;
+
typedef struct PVEBackupDevInfo {
BlockDriverState *bs;
+ PVEBackupFleecingInfo fleecing;
size_t size;
uint64_t block_size;
uint8_t dev_id;
@@ -361,6 +370,25 @@ static void pvebackup_complete_cb(void *opaque, int ret)
PVEBackupDevInfo *di = opaque;
di->completed_ret = ret;
+ /*
+ * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
+ * won't be done as a coroutine anyways:
+ * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
+ * just spawn a BH calling bdrv_unref().
+ * - For cbw, draining would need to spawn a BH.
+ *
+ * Note that the AioContext lock is already acquired by our caller, i.e.
+ * job_finalize_single_locked()
+ */
+ if (di->fleecing.snapshot_access) {
+ bdrv_unref(di->fleecing.snapshot_access);
+ di->fleecing.snapshot_access = NULL;
+ }
+ if (di->fleecing.cbw) {
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+ }
+
/*
* Schedule stream cleanup in async coroutine. close_image and finish might
* take a while, so we can't block on them here. This way it also doesn't
@@ -521,9 +549,82 @@ static void create_backup_jobs_bh(void *opaque) {
bdrv_drained_begin(di->bs);
+ BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+
+ BlockDriverState *source_bs = di->bs;
+ bool discard_source = false;
+ const char *job_id = bdrv_get_device_name(di->bs);
+ if (di->fleecing.bs) {
+ QDict *cbw_opts = qdict_new();
+ qdict_put_str(cbw_opts, "driver", "copy-before-write");
+ qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
+
+ if (di->bitmap) {
+ /*
+ * Only guest writes to parts relevant for the backup need to be intercepted with
+ * old data being copied to the fleecing image.
+ */
+ qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap));
+ }
+ /*
+ * Fleecing storage is supposed to be fast and it's better to break backup than guest
+ * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so
+ * abort a bit before that.
+ */
+ qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
+ qdict_put_int(cbw_opts, "cbw-timeout", 45);
+
+ di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err);
+
+ if (!di->fleecing.cbw) {
+ error_setg(errp, "appending cbw node for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ break;
+ }
+
+ QDict *snapshot_access_opts = qdict_new();
+ qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
+ qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw));
+
+ /*
+ * Holding the AioContext lock here would cause a deadlock, because bdrv_open_driver()
+ * will aquire it a second time. But it's allowed to be held exactly once when polling
+ * and that happens when the bdrv_refresh_total_sectors() call is made there.
+ */
+ aio_context_release(aio_context);
+ di->fleecing.snapshot_access =
+ bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
+ aio_context_acquire(aio_context);
+ if (!di->fleecing.snapshot_access) {
+ error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ break;
+ }
+ source_bs = di->fleecing.snapshot_access;
+ discard_source = true;
+
+ /*
+ * bdrv_get_info() just retuns 0 (= doesn't matter) for RBD when using krbd. But discard
+ * on the fleecing image won't work if the backup job's granularity is less than the RBD
+ * object size (default 4 MiB), so it does matter. Always use at least 4 MiB. With a PBS
+ * target, the backup job granularity would already be at least this much.
+ */
+ perf.min_cluster_size = 4 * 1024 * 1024;
+ /*
+ * For discard to work, cluster size for the backup job must be at least the same as for
+ * the fleecing image.
+ */
+ BlockDriverInfo bdi;
+ if (bdrv_get_info(di->fleecing.bs, &bdi) >= 0) {
+ perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
+ }
+ }
+
BlockJob *job = backup_job_create(
- NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
- bitmap_mode, false, NULL, &backup_state.perf, BLOCKDEV_ON_ERROR_REPORT,
+ job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
&local_err);
@@ -581,6 +682,14 @@ static void create_backup_jobs_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
+/*
+ * EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
+ */
+static bool device_uses_fleecing(const char *device_id)
+{
+ return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
+}
+
/*
* Returns a list of device infos, which needs to be freed by the caller. In
* case of an error, errp will be set, but the returned value might still be a
@@ -588,6 +697,7 @@ static void create_backup_jobs_bh(void *opaque) {
*/
static GList coroutine_fn *get_device_info(
const char *devlist,
+ bool fleecing,
Error **errp)
{
gchar **devs = NULL;
@@ -611,6 +721,31 @@ static GList coroutine_fn *get_device_info(
}
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
+
+ if (fleecing && device_uses_fleecing(*d)) {
+ g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
+ BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
+ if (!fleecing_blk) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", fleecing_devid);
+ goto err;
+ }
+ BlockDriverState *fleecing_bs = blk_bs(fleecing_blk);
+ if (!bdrv_co_is_inserted(fleecing_bs)) {
+ error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, fleecing_devid);
+ goto err;
+ }
+ /*
+ * Fleecing image needs to be the same size to act as a cbw target.
+ */
+ if (bs->total_sectors != fleecing_bs->total_sectors) {
+ error_setg(errp, "Size mismatch for '%s' - sector count %ld != %ld",
+ fleecing_devid, fleecing_bs->total_sectors, bs->total_sectors);
+ goto err;
+ }
+ di->fleecing.bs = fleecing_bs;
+ }
+
di_list = g_list_append(di_list, di);
d++;
}
@@ -660,6 +795,7 @@ UuidInfo coroutine_fn *qmp_backup(
const char *devlist,
bool has_speed, int64_t speed,
bool has_max_workers, int64_t max_workers,
+ bool has_fleecing, bool fleecing,
Error **errp)
{
assert(qemu_in_coroutine());
@@ -687,7 +823,7 @@ UuidInfo coroutine_fn *qmp_backup(
/* Todo: try to auto-detect format based on file name */
format = has_format ? format : BACKUP_FORMAT_VMA;
- di_list = get_device_info(devlist, &local_err);
+ di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto err;
@@ -1086,5 +1222,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->query_bitmap_info = true;
ret->pbs_masterkey = true;
ret->backup_max_workers = true;
+ ret->backup_fleecing = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 58fd637e86..0bc5f42677 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -933,6 +933,10 @@
#
# @max-workers: see @BackupPerf for details. Default 16.
#
+# @fleecing: perform a backup with fleecing. For each device in @devlist, a
+# corresponing '-fleecing' device with the same size already needs to
+# be present.
+#
# Returns: the uuid of the backup job
#
##
@@ -953,7 +957,8 @@
'*firewall-file': 'str',
'*devlist': 'str',
'*speed': 'int',
- '*max-workers': 'int' },
+ '*max-workers': 'int',
+ '*fleecing': 'bool' },
'returns': 'UuidInfo', 'coroutine': true }
##
@@ -1009,6 +1014,7 @@
'pbs-dirty-bitmap-migration': 'bool',
'pbs-masterkey': 'bool',
'pbs-library-version': 'str',
+ 'backup-fleecing': 'bool',
'backup-max-workers': 'bool' } }
##
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH common v3 08/22] json schema: add format description for pve-storage-id standard option
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (6 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH qemu v3 07/22] PVE backup: add fleecing option Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 17:58 ` [pve-devel] applied: " Thomas Lamprecht
2024-04-11 9:29 ` [pve-devel] [PATCH guest-common v3 09/22] vzdump: schema: add fleecing property string Fiona Ebner
` (14 subsequent siblings)
22 siblings, 1 reply; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
so that the option can be used as part of a property string.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
src/PVE/JSONSchema.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
index 4bf0e78..115f811 100644
--- a/src/PVE/JSONSchema.pm
+++ b/src/PVE/JSONSchema.pm
@@ -84,6 +84,7 @@ register_standard_option('pve-iface', {
register_standard_option('pve-storage-id', {
description => "The storage identifier.",
type => 'string', format => 'pve-storage-id',
+ format_description => 'storage ID',
});
register_standard_option('pve-bridge-id', {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH guest-common v3 09/22] vzdump: schema: add fleecing property string
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (7 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH common v3 08/22] json schema: add format description for pve-storage-id standard option Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 18:07 ` [pve-devel] applied: " Thomas Lamprecht
2024-04-11 9:29 ` [pve-devel] [PATCH guest-common v3 10/22] vzdump: schema: make storage for fleecing semi-optional Fiona Ebner
` (13 subsequent siblings)
22 siblings, 1 reply; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
It's a property string, because that avoids having an implicit
"enabled" as part of a 'fleecing-storage' property. And there likely
will be more options in the future, e.g. threshold/limit for the
fleecing image size.
Storage is non-optional, so the storage choice needs to be a conscious
decision. Can allow for a default later, when a good choice can be
made further down the stack. The original idea with "same storage as
VM disk" is not great, because e.g. for LVM, it would require the same
size as the disk up front.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
src/PVE/VZDump/Common.pm | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/src/PVE/VZDump/Common.pm b/src/PVE/VZDump/Common.pm
index 5be87a0..30bd044 100644
--- a/src/PVE/VZDump/Common.pm
+++ b/src/PVE/VZDump/Common.pm
@@ -33,6 +33,7 @@ my $dowhash_to_dow = sub {
};
our $PROPERTY_STRINGS = {
+ 'fleecing' => 'backup-fleecing',
'performance' => 'backup-performance',
'prune-backups' => 'prune-backups',
};
@@ -79,6 +80,24 @@ sub parse_dow {
return $res;
};
+PVE::JSONSchema::register_format('backup-fleecing', {
+ enabled => {
+ description => "Enable backup fleecing. Cache backup data from blocks where new guest "
+ ."writes happen on specified storage instead of copying them directly to the backup "
+ ."target. This can help guest IO performance and even prevent hangs, at the cost of "
+ ."requiring more storage space.",
+ type => 'boolean',
+ default => 0,
+ optional => 1,
+ default_key => 1,
+ },
+ storage => get_standard_option('pve-storage-id', {
+ description => "Use this storage to storage fleecing images. For efficient space usage, "
+ ."it's best to use a local storage that supports discard and either thin provisioning "
+ ."or sparse files.",
+ }),
+});
+
PVE::JSONSchema::register_format('backup-performance', {
'max-workers' => {
description => "Applies to VMs. Allow up to this many IO workers at the same time.",
@@ -262,6 +281,12 @@ my $confdesc = {
format => 'backup-performance',
optional => 1,
},
+ fleecing => {
+ type => 'string',
+ description => "Options for backup fleecing (VM only).",
+ format => 'backup-fleecing',
+ optional => 1,
+ },
lockwait => {
type => 'integer',
description => "Maximal time to wait for the global lock (minutes).",
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] applied: Re: [PATCH guest-common v3 09/22] vzdump: schema: add fleecing property string
2024-04-11 9:29 ` [pve-devel] [PATCH guest-common v3 09/22] vzdump: schema: add fleecing property string Fiona Ebner
@ 2024-04-11 18:07 ` Thomas Lamprecht
2024-04-12 8:38 ` Fiona Ebner
0 siblings, 1 reply; 34+ messages in thread
From: Thomas Lamprecht @ 2024-04-11 18:07 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
On 11/04/2024 11:29, Fiona Ebner wrote:
> It's a property string, because that avoids having an implicit
> "enabled" as part of a 'fleecing-storage' property. And there likely
> will be more options in the future, e.g. threshold/limit for the
> fleecing image size.
>
> Storage is non-optional, so the storage choice needs to be a conscious
> decision. Can allow for a default later, when a good choice can be
> made further down the stack. The original idea with "same storage as
> VM disk" is not great, because e.g. for LVM, it would require the same
> size as the disk up front.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> src/PVE/VZDump/Common.pm | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
>
applied, but massaged in having the whitespace on the continued line for
multi-line strings, not on the previous (we should actually add that to
the style guide), thanks!
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [pve-devel] applied: Re: [PATCH guest-common v3 09/22] vzdump: schema: add fleecing property string
2024-04-11 18:07 ` [pve-devel] applied: " Thomas Lamprecht
@ 2024-04-12 8:38 ` Fiona Ebner
0 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-12 8:38 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
Am 11.04.24 um 20:07 schrieb Thomas Lamprecht:
> On 11/04/2024 11:29, Fiona Ebner wrote:
>> It's a property string, because that avoids having an implicit
>> "enabled" as part of a 'fleecing-storage' property. And there likely
>> will be more options in the future, e.g. threshold/limit for the
>> fleecing image size.
>>
>> Storage is non-optional, so the storage choice needs to be a conscious
>> decision. Can allow for a default later, when a good choice can be
>> made further down the stack. The original idea with "same storage as
>> VM disk" is not great, because e.g. for LVM, it would require the same
>> size as the disk up front.
>>
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>> ---
>> src/PVE/VZDump/Common.pm | 25 +++++++++++++++++++++++++
>> 1 file changed, 25 insertions(+)
>>
>>
>
> applied, but massaged in having the whitespace on the continued line for
> multi-line strings, not on the previous (we should actually add that to
> the style guide), thanks!
Sorry, I always forget the preferred way. Added to the style guide now:
https://pve.proxmox.com/wiki/Perl_Style_Guide#Wrapping_Strings
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH guest-common v3 10/22] vzdump: schema: make storage for fleecing semi-optional
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (8 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH guest-common v3 09/22] vzdump: schema: add fleecing property string Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 18:07 ` Thomas Lamprecht
2024-04-11 18:07 ` [pve-devel] applied: " Thomas Lamprecht
2024-04-11 9:29 ` [pve-devel] [RFC guest-common v3 11/22] abstract config: do not copy fleecing images entry for snapshot Fiona Ebner
` (12 subsequent siblings)
22 siblings, 2 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
so it doesn't need to be set when explicitly disabling fleecing. Needs
a custom verifier to enforce it being set when enabled.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
src/PVE/VZDump/Common.pm | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/src/PVE/VZDump/Common.pm b/src/PVE/VZDump/Common.pm
index 30bd044..f7cdb66 100644
--- a/src/PVE/VZDump/Common.pm
+++ b/src/PVE/VZDump/Common.pm
@@ -95,8 +95,20 @@ PVE::JSONSchema::register_format('backup-fleecing', {
description => "Use this storage to storage fleecing images. For efficient space usage, "
."it's best to use a local storage that supports discard and either thin provisioning "
."or sparse files.",
+ optional => 1,
}),
-});
+}, \&verify_backup_fleecing);
+
+sub verify_backup_fleecing {
+ my ($param, $noerr) = @_;
+
+ if (!$param->{storage} && $param->{enabled}) {
+ return if $noerr;
+ die "'storage' parameter is required when 'enabled' is set\n";
+ }
+
+ return $param;
+}
PVE::JSONSchema::register_format('backup-performance', {
'max-workers' => {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [pve-devel] [PATCH guest-common v3 10/22] vzdump: schema: make storage for fleecing semi-optional
2024-04-11 9:29 ` [pve-devel] [PATCH guest-common v3 10/22] vzdump: schema: make storage for fleecing semi-optional Fiona Ebner
@ 2024-04-11 18:07 ` Thomas Lamprecht
2024-04-11 18:07 ` [pve-devel] applied: " Thomas Lamprecht
1 sibling, 0 replies; 34+ messages in thread
From: Thomas Lamprecht @ 2024-04-11 18:07 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
On 11/04/2024 11:29, Fiona Ebner wrote:
> so it doesn't need to be set when explicitly disabling fleecing. Needs
> a custom verifier to enforce it being set when enabled.
>
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> src/PVE/VZDump/Common.pm | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/src/PVE/VZDump/Common.pm b/src/PVE/VZDump/Common.pm
> index 30bd044..f7cdb66 100644
> --- a/src/PVE/VZDump/Common.pm
> +++ b/src/PVE/VZDump/Common.pm
> @@ -95,8 +95,20 @@ PVE::JSONSchema::register_format('backup-fleecing', {
> description => "Use this storage to storage fleecing images. For efficient space usage, "
> ."it's best to use a local storage that supports discard and either thin provisioning "
> ."or sparse files.",
> + optional => 1,
> }),
> -});
> +}, \&verify_backup_fleecing);
> +
> +sub verify_backup_fleecing {
> + my ($param, $noerr) = @_;
> +
> + if (!$param->{storage} && $param->{enabled}) {
> + return if $noerr;
> + die "'storage' parameter is required when 'enabled' is set\n";
> + }
> +
> + return $param;
> +}
>
> PVE::JSONSchema::register_format('backup-performance', {
> 'max-workers' => {
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] applied: [PATCH guest-common v3 10/22] vzdump: schema: make storage for fleecing semi-optional
2024-04-11 9:29 ` [pve-devel] [PATCH guest-common v3 10/22] vzdump: schema: make storage for fleecing semi-optional Fiona Ebner
2024-04-11 18:07 ` Thomas Lamprecht
@ 2024-04-11 18:07 ` Thomas Lamprecht
1 sibling, 0 replies; 34+ messages in thread
From: Thomas Lamprecht @ 2024-04-11 18:07 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
On 11/04/2024 11:29, Fiona Ebner wrote:
> so it doesn't need to be set when explicitly disabling fleecing. Needs
> a custom verifier to enforce it being set when enabled.
>
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> src/PVE/VZDump/Common.pm | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
>
applied, thanks!
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [RFC guest-common v3 11/22] abstract config: do not copy fleecing images entry for snapshot
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (9 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH guest-common v3 10/22] vzdump: schema: make storage for fleecing semi-optional Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH manager v3 12/22] vzdump: have property string helpers always return the result Fiona Ebner
` (11 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
RFC because belongs to the optional "improved cleanup"-part of the
series. See qemu-server patches for more info.
src/PVE/AbstractConfig.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
index a286b13..a2ed142 100644
--- a/src/PVE/AbstractConfig.pm
+++ b/src/PVE/AbstractConfig.pm
@@ -684,6 +684,7 @@ sub __snapshot_copy_config {
next if $k eq 'lock';
next if $k eq 'digest';
next if $k eq 'description';
+ next if $k eq 'fleecing-images';
next if $k =~ m/^unused\d+$/;
$dest->{$k} = $source->{$k};
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH manager v3 12/22] vzdump: have property string helpers always return the result
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (10 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [RFC guest-common v3 11/22] abstract config: do not copy fleecing images entry for snapshot Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH manager v3 13/22] vzdump: handle new 'fleecing' property string Fiona Ebner
` (10 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Previously, the result would only be returned implicitly and if not
already parsed. While callers do not strictly need the return value,
future callers might mistakenly rely on it and even work by chance in
some scenarios, because of the implicit return. Make the code more
future proof by explicitly returning the result in all cases.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v3.
PVE/VZDump.pm | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index 152eb3e5..72461f73 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -134,9 +134,11 @@ my sub parse_performance {
my ($param) = @_;
if (defined(my $perf = $param->{performance})) {
- return if ref($perf) eq 'HASH'; # already parsed
+ return $perf if ref($perf) eq 'HASH'; # already parsed
$param->{performance} = PVE::JSONSchema::parse_property_string('backup-performance', $perf);
}
+
+ return $param->{performance};
}
my $parse_prune_backups_maxfiles = sub {
@@ -149,7 +151,7 @@ my $parse_prune_backups_maxfiles = sub {
if defined($maxfiles) && defined($prune_backups);
if (defined($prune_backups)) {
- return if ref($prune_backups) eq 'HASH'; # already parsed
+ return $prune_backups if ref($prune_backups) eq 'HASH'; # already parsed
$param->{'prune-backups'} = PVE::JSONSchema::parse_property_string(
'prune-backups',
$prune_backups
@@ -161,6 +163,8 @@ my $parse_prune_backups_maxfiles = sub {
$param->{'prune-backups'} = { 'keep-all' => 1 };
}
}
+
+ return $param->{'prune-backups'};
};
sub storage_info {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH manager v3 13/22] vzdump: handle new 'fleecing' property string
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (11 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH manager v3 12/22] vzdump: have property string helpers always return the result Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-22 8:15 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH manager v3 14/22] api: backup/vzdump: add permission check for fleecing storage Fiona Ebner
` (9 subsequent siblings)
22 siblings, 1 reply; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* return result from parsing
PVE/VZDump.pm | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index 72461f73..812357bd 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -130,6 +130,17 @@ my $generate_notes = sub {
return $notes_template;
};
+my sub parse_fleecing {
+ my ($param) = @_;
+
+ if (defined(my $fleecing = $param->{fleecing})) {
+ return $fleecing if ref($fleecing) eq 'HASH'; # already parsed
+ $param->{fleecing} = PVE::JSONSchema::parse_property_string('backup-fleecing', $fleecing);
+ }
+
+ return $param->{fleecing};
+}
+
my sub parse_performance {
my ($param) = @_;
@@ -282,6 +293,7 @@ sub read_vzdump_defaults {
} keys %$confdesc_for_defaults
};
$parse_prune_backups_maxfiles->($defaults, "defaults in VZDump schema");
+ parse_fleecing($defaults);
parse_performance($defaults);
my $raw;
@@ -308,6 +320,7 @@ sub read_vzdump_defaults {
$res->{mailto} = [ @mailto ];
}
$parse_prune_backups_maxfiles->($res, "options in '$fn'");
+ parse_fleecing($res);
parse_performance($res);
foreach my $key (keys %$defaults) {
@@ -1457,6 +1470,7 @@ sub verify_vzdump_parameters {
if defined($param->{'prune-backups'}) && defined($param->{maxfiles});
$parse_prune_backups_maxfiles->($param, 'CLI parameters');
+ parse_fleecing($param);
parse_performance($param);
if (my $template = $param->{'notes-template'}) {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [pve-devel] [PATCH manager v3 13/22] vzdump: handle new 'fleecing' property string
2024-04-11 9:29 ` [pve-devel] [PATCH manager v3 13/22] vzdump: handle new 'fleecing' property string Fiona Ebner
@ 2024-04-22 8:15 ` Fiona Ebner
0 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-22 8:15 UTC (permalink / raw)
To: pve-devel
Am 11.04.24 um 11:29 schrieb Fiona Ebner:
> @@ -282,6 +293,7 @@ sub read_vzdump_defaults {
> } keys %$confdesc_for_defaults
> };
> $parse_prune_backups_maxfiles->($defaults, "defaults in VZDump schema");
> + parse_fleecing($defaults);
> parse_performance($defaults);
>
> my $raw;
Forgot to mention there was a merge conflict here, because of 693b10f2
("vzdump: actually honor schema defaults for performance"). To avoid the
same issue with the fleecing property string, the hunk above was changed
in the same fashion to:
> @@ -299,6 +310,13 @@ sub read_vzdump_defaults {
> defined($default) ? ($_ => $default) : ()
> } keys $performance_fmt->%*
> };
> + my $fleecing_fmt = PVE::JSONSchema::get_format('backup-fleecing');
> + $defaults->{fleecing} = {
> + map {
> + my $default = $fleecing_fmt->{$_}->{default};
> + defined($default) ? ($_ => $default) : ()
> + } keys $fleecing_fmt->%*
> + };
> $parse_prune_backups_maxfiles->($defaults, "defaults in VZDump schema");
>
> my $raw;
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH manager v3 14/22] api: backup/vzdump: add permission check for fleecing storage
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (12 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH manager v3 13/22] vzdump: handle new 'fleecing' property string Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu-server v3 15/22] backup: disk info: also keep track of size Fiona Ebner
` (8 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Similar to how Datastore.AllocateSpace is required for the backup
storage, it should also be required for the fleecing storage.
Removing a fleecing storage from a job does not require more
permissions than for modifying the job.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* avoid potential access of $fleecing=undef as a hash (for
future-proofing, should not happen in practice).
PVE/API2/Backup.pm | 10 ++++++++--
PVE/API2/VZDump.pm | 9 +++++----
PVE/VZDump.pm | 2 +-
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/PVE/API2/Backup.pm b/PVE/API2/Backup.pm
index 70753c2e..88140323 100644
--- a/PVE/API2/Backup.pm
+++ b/PVE/API2/Backup.pm
@@ -42,7 +42,7 @@ my $vzdump_job_id_prop = {
# NOTE: also used by the vzdump API call.
sub assert_param_permission_common {
- my ($rpcenv, $user, $param) = @_;
+ my ($rpcenv, $user, $param, $is_delete) = @_;
return if $user eq 'root@pam'; # always OK
for my $key (qw(tmpdir dumpdir script)) {
@@ -52,6 +52,12 @@ sub assert_param_permission_common {
if (grep { defined($param->{$_}) } qw(bwlimit ionice performance)) {
$rpcenv->check($user, "/", [ 'Sys.Modify' ]);
}
+
+ if ($param->{fleecing} && !$is_delete) {
+ my $fleecing = PVE::VZDump::parse_fleecing($param) // {};
+ $rpcenv->check($user, "/storage/$fleecing->{storage}", [ 'Datastore.AllocateSpace' ])
+ if $fleecing->{storage};
+ }
}
my sub assert_param_permission_create {
@@ -70,7 +76,7 @@ my sub assert_param_permission_update {
return if $user eq 'root@pam'; # always OK
assert_param_permission_common($rpcenv, $user, $update);
- assert_param_permission_common($rpcenv, $user, $delete);
+ assert_param_permission_common($rpcenv, $user, $delete, 1);
if ($update->{storage}) {
$rpcenv->check($user, "/storage/$update->{storage}", [ 'Datastore.Allocate' ])
diff --git a/PVE/API2/VZDump.pm b/PVE/API2/VZDump.pm
index f66fc740..7f92e7ec 100644
--- a/PVE/API2/VZDump.pm
+++ b/PVE/API2/VZDump.pm
@@ -41,10 +41,11 @@ __PACKAGE__->register_method ({
description => "Create backup.",
permissions => {
description => "The user needs 'VM.Backup' permissions on any VM, and "
- ."'Datastore.AllocateSpace' on the backup storage. The 'tmpdir', 'dumpdir' and "
- ."'script' parameters are restricted to the 'root\@pam' user. The 'maxfiles' and "
- ."'prune-backups' settings require 'Datastore.Allocate' on the backup storage. The "
- ."'bwlimit', 'performance' and 'ionice' parameters require 'Sys.Modify' on '/'. ",
+ ."'Datastore.AllocateSpace' on the backup storage (and fleecing storage when fleecing "
+ ."is used). The 'tmpdir', 'dumpdir' and 'script' parameters are restricted to the "
+ ."'root\@pam' user. The 'maxfiles' and 'prune-backups' settings require "
+ ."'Datastore.Allocate' on the backup storage. The 'bwlimit', 'performance' and "
+ ."'ionice' parameters require 'Sys.Modify' on '/'.",
user => 'all',
},
protected => 1,
diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm
index 812357bd..cde20624 100644
--- a/PVE/VZDump.pm
+++ b/PVE/VZDump.pm
@@ -130,7 +130,7 @@ my $generate_notes = sub {
return $notes_template;
};
-my sub parse_fleecing {
+sub parse_fleecing {
my ($param) = @_;
if (defined(my $fleecing = $param->{fleecing})) {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu-server v3 15/22] backup: disk info: also keep track of size
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (13 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH manager v3 14/22] api: backup/vzdump: add permission check for fleecing storage Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH qemu-server v3 16/22] backup: implement fleecing option Fiona Ebner
` (7 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
which will be needed to allocate fleecing images.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
PVE/VZDump/QemuServer.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index be7d8e1e..51498dbc 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -140,6 +140,7 @@ sub prepare {
path => $path,
volid => $volid,
storeid => $storeid,
+ size => $size,
format => $format,
virtdev => $ds,
qmdevice => "drive-$ds",
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH qemu-server v3 16/22] backup: implement fleecing option
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (14 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH qemu-server v3 15/22] backup: disk info: also keep track of size Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 17/22] parse config: allow config keys with minus sign Fiona Ebner
` (6 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Management for fleecing images is implemented here. If the fleecing
option is set, for each disk (except EFI disk and TPM state) a new
fleecing image is allocated on the configured fleecing storage (same
storage as original disk by default). The disk is attached to QEMU
with the 'size' parameter, because the block node in QEMU has to be
the exact same size and the newly allocated image might be bigger if
the storage has a coarser allocation or rounded up. After backup, the
disks are detached and removed from the storage.
If the storage supports qcow2, use that as the fleecing image format.
This allows saving some space even on storages that do not properly
support discard, like, for example, older versions of NFS.
Since there can be multiple volumes with the same volume name on
different storages, the fleecing image's name cannot be just based on
the original volume's name. The schema vm-ID-fleece-N(.FORMAT) with N
incrementing for each disk is used.
Partially inspired by the existing handling of the TPM state image
during backup.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
PVE/VZDump/QemuServer.pm | 144 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 142 insertions(+), 2 deletions(-)
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 51498dbc..8c97ee62 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -26,6 +26,7 @@ use PVE::Format qw(render_duration render_bytes);
use PVE::QemuConfig;
use PVE::QemuServer;
+use PVE::QemuServer::Helpers;
use PVE::QemuServer::Machine;
use PVE::QemuServer::Monitor qw(mon_cmd);
@@ -525,6 +526,121 @@ sub get_and_check_pbs_encryption_config {
die "internal error - unhandled case for getting & checking PBS encryption ($keyfile, $master_keyfile)!";
}
+my sub cleanup_fleecing_images {
+ my ($self, $disks) = @_;
+
+ for my $di ($disks->@*) {
+ if (my $volid = $di->{'fleece-volid'}) {
+ eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); };
+ $self->log('warn', "error removing fleecing image '$volid' - $@") if $@;
+ }
+ }
+}
+
+my sub allocate_fleecing_images {
+ my ($self, $disks, $vmid, $fleecing_storeid, $format) = @_;
+
+ die "internal error - no fleecing storage specified\n" if !$fleecing_storeid;
+
+ # TODO what about potential left-over images from a failed attempt? Just
+ # auto-remove? While unlikely, could conflict with manually created image from user...
+
+ eval {
+ my $n = 0; # counter for fleecing image names
+
+ for my $di ($disks->@*) {
+ next if $di->{virtdev} =~ m/^(?:tpmstate|efidisk)\d$/; # too small to be worth it
+ if ($di->{type} eq 'block' || $di->{type} eq 'file') {
+ my $scfg = PVE::Storage::storage_config($self->{storecfg}, $fleecing_storeid);
+ my $name = "vm-$vmid-fleece-$n";
+ $name .= ".$format" if $scfg->{path};
+
+ my $size = PVE::Tools::convert_size($di->{size}, 'b' => 'kb');
+
+ $di->{'fleece-volid'} = PVE::Storage::vdisk_alloc(
+ $self->{storecfg}, $fleecing_storeid, $vmid, $format, $name, $size);
+
+ $n++;
+ } else {
+ die "implement me (type '$di->{type}')";
+ }
+ }
+ };
+ if (my $err = $@) {
+ cleanup_fleecing_images($self, $disks);
+ die $err;
+ }
+}
+
+my sub detach_fleecing_images {
+ my ($disks, $vmid) = @_;
+
+ return if !PVE::QemuServer::Helpers::vm_running_locally($vmid);
+
+ for my $di ($disks->@*) {
+ if (my $volid = $di->{'fleece-volid'}) {
+ my $devid = "$di->{qmdevice}-fleecing";
+ $devid =~ s/^drive-//; # re-added by qemu_drivedel()
+ eval { PVE::QemuServer::qemu_drivedel($vmid, $devid) };
+ }
+ }
+}
+
+my sub attach_fleecing_images {
+ my ($self, $disks, $vmid, $format) = @_;
+
+ # unconditionally try to remove potential left-overs from a previous backup
+ detach_fleecing_images($disks, $vmid);
+
+ my $vollist = [ map { $_->{'fleece-volid'} } grep { $_->{'fleece-volid'} } $disks->@* ];
+ PVE::Storage::activate_volumes($self->{storecfg}, $vollist);
+
+ for my $di ($disks->@*) {
+ if (my $volid = $di->{'fleece-volid'}) {
+ $self->loginfo("$di->{qmdevice}: attaching fleecing image $volid to QEMU");
+
+ my $path = PVE::Storage::path($self->{storecfg}, $volid);
+ my $devid = "$di->{qmdevice}-fleecing";
+ my $drive = "file=$path,if=none,id=$devid,format=$format,discard=unmap";
+ # Specify size explicitly, to make it work if storage backend rounded up size for
+ # fleecing image when allocating.
+ $drive .= ",size=$di->{size}" if $format eq 'raw';
+ $drive =~ s/\\/\\\\/g;
+ my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"");
+ die "attaching fleecing image $volid failed - $ret\n" if $ret !~ m/OK/s;
+ }
+ }
+}
+
+my sub check_and_prepare_fleecing {
+ my ($self, $vmid, $fleecing_opts, $disks, $is_template, $qemu_support) = @_;
+
+ # Even if the VM was started specifically for fleecing, it's possible that the VM is resumed and
+ # then starts doing IO. For VMs that are not resumed the fleecing images will just stay empty,
+ # so there is no big cost.
+
+ my $use_fleecing = $fleecing_opts && $fleecing_opts->{enabled} && !$is_template;
+
+ if ($use_fleecing && !defined($qemu_support->{'backup-fleecing'})) {
+ $self->log(
+ 'warn',
+ "running QEMU version does not support backup fleecing - continuing without",
+ );
+ $use_fleecing = 0;
+ }
+
+ if ($use_fleecing) {
+ my ($default_format, $valid_formats) = PVE::Storage::storage_default_format(
+ $self->{storecfg}, $fleecing_opts->{storage});
+ my $format = scalar(grep { $_ eq 'qcow2' } $valid_formats->@*) ? 'qcow2' : 'raw';
+
+ allocate_fleecing_images($self, $disks, $vmid, $fleecing_opts->{storage}, $format);
+ attach_fleecing_images($self, $disks, $vmid, $format);
+ }
+
+ return $use_fleecing;
+}
+
sub archive_pbs {
my ($self, $task, $vmid) = @_;
@@ -578,6 +694,7 @@ sub archive_pbs {
# get list early so we die on unkown drive types before doing anything
my $devlist = _get_task_devlist($task);
+ my $use_fleecing;
$self->enforce_vm_running_for_backup($vmid);
$self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -606,6 +723,11 @@ sub archive_pbs {
$attach_tpmstate_drive->($self, $task, $vmid);
+ my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
+
+ $use_fleecing = check_and_prepare_fleecing(
+ $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support);
+
my $fs_frozen = $self->qga_fs_freeze($task, $vmid);
my $params = {
@@ -617,6 +739,8 @@ sub archive_pbs {
devlist => $devlist,
'config-file' => $conffile,
};
+ $params->{fleecing} = JSON::true if $use_fleecing;
+
if (defined(my $ns = $scfg->{namespace})) {
$params->{'backup-ns'} = $ns;
}
@@ -633,7 +757,6 @@ sub archive_pbs {
$params->{"master-keyfile"} = $master_keyfile if defined($master_keyfile);
}
- my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
$params->{'use-dirty-bitmap'} = JSON::true
if $qemu_support->{'pbs-dirty-bitmap'} && !$is_template;
@@ -665,6 +788,11 @@ sub archive_pbs {
}
$self->restore_vm_power_state($vmid);
+ if ($use_fleecing) {
+ detach_fleecing_images($task->{disks}, $vmid);
+ cleanup_fleecing_images($self, $task->{disks});
+ }
+
die $err if $err;
}
@@ -724,8 +852,10 @@ sub archive_vma {
$speed = $opts->{bwlimit}*1024;
}
+ my $is_template = PVE::QemuConfig->is_template($self->{vmlist}->{$vmid});
+
my $diskcount = scalar(@{$task->{disks}});
- if (PVE::QemuConfig->is_template($self->{vmlist}->{$vmid}) || !$diskcount) {
+ if ($is_template || !$diskcount) {
my @pathlist;
foreach my $di (@{$task->{disks}}) {
if ($di->{type} eq 'block' || $di->{type} eq 'file') {
@@ -765,6 +895,7 @@ sub archive_vma {
}
my $devlist = _get_task_devlist($task);
+ my $use_fleecing;
$self->enforce_vm_running_for_backup($vmid);
$self->{qmeventd_fh} = PVE::QemuServer::register_qmeventd_handle($vmid);
@@ -784,6 +915,9 @@ sub archive_vma {
$attach_tpmstate_drive->($self, $task, $vmid);
+ $use_fleecing = check_and_prepare_fleecing(
+ $self, $vmid, $opts->{fleecing}, $task->{disks}, $is_template, $qemu_support);
+
my $outfh;
if ($opts->{stdout}) {
$outfh = $opts->{stdout};
@@ -812,6 +946,7 @@ sub archive_vma {
devlist => $devlist
};
$params->{'firewall-file'} = $firewall if -e $firewall;
+ $params->{fleecing} = JSON::true if $use_fleecing;
add_backup_performance_options($params, $opts->{performance}, $qemu_support);
$qmpclient->queue_cmd($vmid, $backup_cb, 'backup', %$params);
@@ -853,6 +988,11 @@ sub archive_vma {
$self->restore_vm_power_state($vmid);
+ if ($use_fleecing) {
+ detach_fleecing_images($task->{disks}, $vmid);
+ cleanup_fleecing_images($self, $task->{disks});
+ }
+
if ($err) {
if ($cpid) {
kill(9, $cpid);
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [RFC qemu-server v3 17/22] parse config: allow config keys with minus sign
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (15 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH qemu-server v3 16/22] backup: implement fleecing option Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 17:50 ` Thomas Lamprecht
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 18/22] schema: add fleecing-images config property Fiona Ebner
` (5 subsequent siblings)
22 siblings, 1 reply; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
In preparation for the upcoming 'fleecing-images' key. To avoid mixing
of options with - and options with _, which is not very user-friendly,
it would be nice to add aliases for existing options with _. And
long-term, backup restore handlers could switch to the modern keys
with -.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
RFC, because this and the rest of the qemu-server patches are for
the optional improved cleanup.
PVE/QemuServer.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index abe175a4..4a57b8b8 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2379,7 +2379,7 @@ sub parse_vm_config {
} else {
$handle_error->("vm $vmid - property 'delete' is only allowed in [PENDING]\n");
}
- } elsif ($line =~ m/^([a-z][a-z_]*\d*):\s*(.+?)\s*$/) {
+ } elsif ($line =~ m/^([a-z][a-z_\-]*\d*):\s*(.+?)\s*$/) {
my $key = $1;
my $value = $2;
if ($section eq 'cloudinit') {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [pve-devel] [RFC qemu-server v3 17/22] parse config: allow config keys with minus sign
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 17/22] parse config: allow config keys with minus sign Fiona Ebner
@ 2024-04-11 17:50 ` Thomas Lamprecht
2024-04-16 9:02 ` Fiona Ebner
0 siblings, 1 reply; 34+ messages in thread
From: Thomas Lamprecht @ 2024-04-11 17:50 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
On 11/04/2024 11:29, Fiona Ebner wrote:
> In preparation for the upcoming 'fleecing-images' key. To avoid mixing
> of options with - and options with _, which is not very user-friendly,
> it would be nice to add aliases for existing options with _. And
> long-term, backup restore handlers could switch to the modern keys
> with -.
a long-term way out could be to switch all formats over to using
kebab-case and normalize through s/_/- here.
Writing that out again naturally need either checking if all nodes
support the new format already actively, or adding support to it
in a minor release now and then enable it on a major release.
All churn, but so is having an format mess.
I ponder about doing something like this for the API, but for endpoints
not being backed by a config (schema) it would be less problematic
w.r.t. accepting both variants for a while, but API docs showing only
the new one and then making warnings noisier over the time of a full
major release..
Anyhow, this on it's own would be fine by me, but in any case we
should at least document why the formats are mixed would be good, users
can be confused by it, and cannot know that most of the "bad" options
isn't the fault from the devs currently mainly working on PVE.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [pve-devel] [RFC qemu-server v3 17/22] parse config: allow config keys with minus sign
2024-04-11 17:50 ` Thomas Lamprecht
@ 2024-04-16 9:02 ` Fiona Ebner
2024-10-21 13:28 ` Thomas Lamprecht
0 siblings, 1 reply; 34+ messages in thread
From: Fiona Ebner @ 2024-04-16 9:02 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
Am 11.04.24 um 19:50 schrieb Thomas Lamprecht:
> On 11/04/2024 11:29, Fiona Ebner wrote:
>> In preparation for the upcoming 'fleecing-images' key. To avoid mixing
>> of options with - and options with _, which is not very user-friendly,
>> it would be nice to add aliases for existing options with _. And
>> long-term, backup restore handlers could switch to the modern keys
>> with -.
>
> a long-term way out could be to switch all formats over to using
> kebab-case and normalize through s/_/- here.
>
> Writing that out again naturally need either checking if all nodes
> support the new format already actively, or adding support to it
> in a minor release now and then enable it on a major release.
>
> All churn, but so is having an format mess.
>
> I ponder about doing something like this for the API, but for endpoints
> not being backed by a config (schema) it would be less problematic
> w.r.t. accepting both variants for a while, but API docs showing only
> the new one and then making warnings noisier over the time of a full
> major release..
>
> Anyhow, this on it's own would be fine by me, but in any case we
> should at least document why the formats are mixed would be good, users
> can be confused by it, and cannot know that most of the "bad" options
> isn't the fault from the devs currently mainly working on PVE.
Where would be a good place for this? In the appendix for the CLI in the
admin guide? What exactly is the reason, just that the underscore was
preferred when the project started out?
Or should I re-spin naming the option "fleecing_images"?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [pve-devel] [RFC qemu-server v3 17/22] parse config: allow config keys with minus sign
2024-04-16 9:02 ` Fiona Ebner
@ 2024-10-21 13:28 ` Thomas Lamprecht
0 siblings, 0 replies; 34+ messages in thread
From: Thomas Lamprecht @ 2024-10-21 13:28 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
thanks for the reminder on this one.
Am 16/04/2024 um 11:02 schrieb Fiona Ebner:
>> Anyhow, this on it's own would be fine by me, but in any case we
>> should at least document why the formats are mixed would be good, users
>> can be confused by it, and cannot know that most of the "bad" options
>> isn't the fault from the devs currently mainly working on PVE.
>
> Where would be a good place for this? In the appendix for the CLI in the
> admin guide? What exactly is the reason, just that the underscore was
> preferred when the project started out?
Hmm, maybe in the Configuration Files [0] appendix and then link to that
in the CLI one with a note that properties are generally derived from the
config schema?
And yeah, the original snake_case use can probably be attributed a bit to
Perl's and hash key accesses though, because, as you undoubtedly know,
snake_case properties can be used without quotes, unlike, e.g. 'kebab-case'.
But besides that I'd just say different devs and less enforcement of style
lead to the mix.
[0]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_configuration_files_2
> Or should I re-spin naming the option "fleecing_images"?
No hard feelings either way, for new properties it is not further
restricting w.r.t. where a backup can be restored, as the new property
needs to be supported there anyway, independent of spelling, so if no
other dev has an opinion here I'd probably keep this as is; but we
really should look into s/_/-/ normalization and acceptance of either
variant for the next major release with some long term goal to sunset
the snake_case variants in the API, for config restore we probably want
to keep it forever.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [RFC qemu-server v3 18/22] schema: add fleecing-images config property
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (16 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 17/22] parse config: allow config keys with minus sign Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 19/22] vzdump: better cleanup fleecing images after hard errors Fiona Ebner
` (4 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
to be used internally to record volume IDs of fleecing images
allocated during backup.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
PVE/API2/Qemu.pm | 9 +++++++++
PVE/QemuServer.pm | 7 +++++++
PVE/VZDump/QemuServer.pm | 1 +
3 files changed, 17 insertions(+)
diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index f3ce83d6..e1a3a19f 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -942,6 +942,9 @@ __PACKAGE__->register_method({
$param->{cpuunits} = PVE::CGroup::clamp_cpu_shares($param->{cpuunits})
if defined($param->{cpuunits}); # clamp value depending on cgroup version
+ raise_param_exc({ 'fleecing-images' => "Cannot set option - for internal use only." })
+ if $param->{'fleecing-images'};
+
PVE::Cluster::check_cfs_quorum();
my $filename = PVE::QemuConfig->config_file($vmid);
@@ -1660,6 +1663,9 @@ my $update_vm_api = sub {
push @paramarr, "-$key", $value;
}
+ raise_param_exc({ 'fleecing-images' => "Cannot set option - for internal use only." })
+ if $param->{'fleecing-images'};
+
my $skiplock = extract_param($param, 'skiplock');
raise_param_exc({ skiplock => "Only root may use this option." })
if $skiplock && $authuser ne 'root@pam';
@@ -3747,6 +3753,9 @@ __PACKAGE__->register_method({
next if $opt eq 'snapshots' || $opt eq 'parent' || $opt eq 'snaptime' ||
$opt eq 'vmstate' || $opt eq 'snapstate';
+ # left-overs, not cloned
+ next if $opt eq 'fleecing-images';
+
# no need to copy unused images, because VMID(owner) changes anyways
next if $opt =~ m/^unused\d+$/;
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 4a57b8b8..61804ae6 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -729,6 +729,13 @@ EODESCR
description => "List of host cores used to execute guest processes, for example: 0,5,8-11",
optional => 1,
},
+ 'fleecing-images' => {
+ type => 'string',
+ format => 'pve-volume-id-list',
+ description => "For internal use only. List of fleecing images allocated during backup."
+ ." If no backup is running, these are left-overs that failed to be removed.",
+ optional => 1,
+ },
};
my $cicustom_fmt = {
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 8c97ee62..8a44a1fd 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -249,6 +249,7 @@ sub assemble {
next;
}
next if $line =~ m/^lock:/ || $line =~ m/^parent:/;
+ next if $line =~ m/^fleecing-images:/;
print $outfd $line;
}
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [RFC qemu-server v3 19/22] vzdump: better cleanup fleecing images after hard errors
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (17 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 18/22] schema: add fleecing-images config property Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 20/22] migration: attempt to clean up potential left-over fleecing images Fiona Ebner
` (3 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
By recording the allocated fleecing images in the VM config, they
are not immediately orphaned, should a hard error occur during
backup that prevents cleanup.
They are attempted to be cleaned up during the next backup run.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
PVE/QemuConfig.pm | 40 ++++++++++++++++++++++++++++++++++++++++
PVE/VZDump/QemuServer.pm | 31 ++++++++++++++++++++++++-------
2 files changed, 64 insertions(+), 7 deletions(-)
diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index 8e8a7828..81bbbd2c 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -13,6 +13,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd);
use PVE::QemuServer;
use PVE::QemuServer::Machine;
use PVE::QemuServer::Memory qw(get_current_memory);
+use PVE::RESTEnvironment qw(log_warn);
use PVE::Storage;
use PVE::Tools;
use PVE::Format qw(render_bytes render_duration);
@@ -573,4 +574,43 @@ sub has_cloudinit {
return $found;
}
+# Caller is expected to deal with volumes from an already exisitng entry first.
+sub record_fleecing_images {
+ my ($vmid, $volids) = @_;
+
+ return if scalar($volids->@*) == 0;
+
+ PVE::QemuConfig->lock_config($vmid, sub {
+ my $conf = PVE::QemuConfig->load_config($vmid);
+ $conf->{'fleecing-images'} = join(',', $volids->@*);
+ PVE::QemuConfig->write_config($vmid, $conf);
+ });
+}
+
+sub cleanup_fleecing_images {
+ my ($vmid, $storecfg) = @_;
+
+ my $volids = [];
+ my $failed = [];
+
+ PVE::QemuConfig->lock_config($vmid, sub {
+ my $conf = PVE::QemuConfig->load_config($vmid);
+ if ($conf->{'fleecing-images'}) {
+ $volids = [PVE::Tools::split_list($conf->{'fleecing-images'})];
+ delete $conf->{'fleecing-images'};
+ PVE::QemuConfig->write_config($vmid, $conf);
+ }
+ });
+
+ for my $volid ($volids->@*) {
+ eval { PVE::Storage::vdisk_free($storecfg, $volid); };
+ if (my $err = $@) {
+ log_warn("error removing fleecing image '$volid' - $err");
+ push $failed->@*, $volid;
+ }
+ }
+
+ record_fleecing_images($vmid, $failed);
+}
+
1;
diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 8a44a1fd..ad0212bc 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -527,15 +527,25 @@ sub get_and_check_pbs_encryption_config {
die "internal error - unhandled case for getting & checking PBS encryption ($keyfile, $master_keyfile)!";
}
+# Helper is intended to be called from allocate_fleecing_images() only. Otherwise, fleecing volids
+# have already been recorded in the configuration and PVE::QemuConfig::cleanup_fleecing_images()
+# should be used instead.
my sub cleanup_fleecing_images {
- my ($self, $disks) = @_;
+ my ($self, $vmid, $disks) = @_;
+
+ my $failed = [];
for my $di ($disks->@*) {
if (my $volid = $di->{'fleece-volid'}) {
eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); };
- $self->log('warn', "error removing fleecing image '$volid' - $@") if $@;
+ if (my $err = $@) {
+ $self->log('warn', "error removing fleecing image '$volid' - $err");
+ push $failed->@*, $volid;
+ }
}
}
+
+ PVE::QemuConfig::record_fleecing_images($vmid, $failed);
}
my sub allocate_fleecing_images {
@@ -543,8 +553,7 @@ my sub allocate_fleecing_images {
die "internal error - no fleecing storage specified\n" if !$fleecing_storeid;
- # TODO what about potential left-over images from a failed attempt? Just
- # auto-remove? While unlikely, could conflict with manually created image from user...
+ my $fleece_volids = [];
eval {
my $n = 0; # counter for fleecing image names
@@ -561,6 +570,8 @@ my sub allocate_fleecing_images {
$di->{'fleece-volid'} = PVE::Storage::vdisk_alloc(
$self->{storecfg}, $fleecing_storeid, $vmid, $format, $name, $size);
+ push $fleece_volids->@*, $di->{'fleece-volid'};
+
$n++;
} else {
die "implement me (type '$di->{type}')";
@@ -568,9 +579,11 @@ my sub allocate_fleecing_images {
}
};
if (my $err = $@) {
- cleanup_fleecing_images($self, $disks);
+ cleanup_fleecing_images($self, $vmid, $disks);
die $err;
}
+
+ PVE::QemuConfig::record_fleecing_images($vmid, $fleece_volids);
}
my sub detach_fleecing_images {
@@ -630,6 +643,10 @@ my sub check_and_prepare_fleecing {
$use_fleecing = 0;
}
+ # clean up potential left-overs from a previous attempt
+ eval { PVE::QemuConfig::cleanup_fleecing_images($vmid, $self->{storecfg}); };
+ $self->log('warn', "attempt to clean up left-over fleecing images failed - $@") if $@;
+
if ($use_fleecing) {
my ($default_format, $valid_formats) = PVE::Storage::storage_default_format(
$self->{storecfg}, $fleecing_opts->{storage});
@@ -791,7 +808,7 @@ sub archive_pbs {
if ($use_fleecing) {
detach_fleecing_images($task->{disks}, $vmid);
- cleanup_fleecing_images($self, $task->{disks});
+ PVE::QemuConfig::cleanup_fleecing_images($vmid, $self->{storecfg});
}
die $err if $err;
@@ -991,7 +1008,7 @@ sub archive_vma {
if ($use_fleecing) {
detach_fleecing_images($task->{disks}, $vmid);
- cleanup_fleecing_images($self, $task->{disks});
+ PVE::QemuConfig::cleanup_fleecing_images($vmid, $self->{storecfg});
}
if ($err) {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [RFC qemu-server v3 20/22] migration: attempt to clean up potential left-over fleecing images
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (18 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 19/22] vzdump: better cleanup fleecing images after hard errors Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 21/22] destroy vm: " Fiona Ebner
` (2 subsequent siblings)
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Before the guest is migrated to a different node and they'd really
become orphaned.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
PVE/QemuMigrate.pm | 3 +++
1 file changed, 3 insertions(+)
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 8d9b35ae..a190ce03 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -179,6 +179,9 @@ sub prepare {
my $version = PVE::QemuServer::Helpers::get_node_pvecfg_version($self->{node});
my $cloudinit_config = $conf->{cloudinit};
+ eval { PVE::QemuConfig::cleanup_fleecing_images($vmid, $storecfg) };
+ $self->log('warn', "attempt to clean up left-over fleecing images failed - $@") if $@;
+
if (
PVE::QemuConfig->has_cloudinit($conf) && defined($cloudinit_config)
&& scalar(keys %$cloudinit_config) > 0
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [RFC qemu-server v3 21/22] destroy vm: clean up potential left-over fleecing images
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (19 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 20/22] migration: attempt to clean up potential left-over fleecing images Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-11 9:29 ` [pve-devel] [PATCH docs v3 22/22] vzdump: add section about backup fleecing Fiona Ebner
2024-04-19 15:23 ` [pve-devel] partially-applied: [PATCH-SERIES v3] fix #4136: implement " Fiona Ebner
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
To avoid them becoming orphaned.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
PVE/QemuServer.pm | 3 +++
1 file changed, 3 insertions(+)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 61804ae6..6edd1692 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2231,6 +2231,9 @@ sub check_type {
sub destroy_vm {
my ($storecfg, $vmid, $skiplock, $replacement_conf, $purge_unreferenced) = @_;
+ eval { PVE::QemuConfig::cleanup_fleecing_images($vmid, $storecfg) };
+ log_warn("attempt to clean up left-over fleecing images failed - $@") if $@;
+
my $conf = PVE::QemuConfig->load_config($vmid);
if (!$skiplock && !PVE::QemuConfig->has_lock($conf, 'suspended')) {
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] [PATCH docs v3 22/22] vzdump: add section about backup fleecing
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (20 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [RFC qemu-server v3 21/22] destroy vm: " Fiona Ebner
@ 2024-04-11 9:29 ` Fiona Ebner
2024-04-19 15:23 ` [pve-devel] partially-applied: [PATCH-SERIES v3] fix #4136: implement " Fiona Ebner
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-11 9:29 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
vzdump.adoc | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/vzdump.adoc b/vzdump.adoc
index b5bbac7..b01d3c5 100644
--- a/vzdump.adoc
+++ b/vzdump.adoc
@@ -136,6 +136,44 @@ not included in backups. For volume mount points you can set the *Backup* option
to include the mount point in the backup. Device and bind mounts are never
backed up as their content is managed outside the {pve} storage library.
+VM Backup Fleecing
+~~~~~~~~~~~~~~~~~~
+
+WARNING: Backup fleecing is still being worked on (also in upstream QEMU) and is
+currently only a technology preview.
+
+When a backup for a VM is started, QEMU will install a "copy-before-write"
+filter in its block layer. This filter ensures that upon new guest writes, old
+data still needed for the backup is sent to the backup target first. The guest
+write blocks until this operation is finished so guest IO to not-yet-backed-up
+sectors will be limited by the speed of the backup target.
+
+With backup fleecing, such old data is cached in a fleecing image rather than
+sent directly to the backup target. This can help guest IO performance and even
+prevent hangs in certain scenarios, at the cost of requiring more storage space.
+Use e.g. `vzdump 123 --fleecing enabled=1,storage=local-lvm` to enable backup
+fleecing, with fleecing images created on the storage `local-lvm`.
+
+The fleecing storage should be a fast local storage, with thin provisioning and
+discard support. Examples are LVM-thin, RBD, ZFS with `sparse 1` in the storage
+configuration, many file-based storages. Ideally, the fleecing storage is a
+dedicated storage, so it running full will not affect other guests and just fail
+the backup. Parts of the fleecing image that have been backed up will be
+discarded to try and keep the space usage low.
+
+For file-based storages that do not support discard (e.g. NFS before version
+4.2), you should set `preallocation off` in the storage configuration. In
+combination with `qcow2` (used automatically as the format for the fleecing
+image when the storage supports it), this has the advantage that already
+allocated parts of the image can be re-used later, which can still help save
+quite a bit of space.
+
+WARNING: On a storage that's not thinly provisioned, e.g. LVM or ZFS without the
+`sparse` option, the full size of the original disk needs to be reserved for the
+fleecing image up-front. On a thinly provisioned storage, the fleecing image can
+grow to the same size as the original image only if the guest re-writes a whole
+disk while the backup is busy with another disk.
+
Backup File Names
-----------------
--
2.39.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [pve-devel] partially-applied: [PATCH-SERIES v3] fix #4136: implement backup fleecing
2024-04-11 9:29 [pve-devel] [PATCH-SERIES v3] fix #4136: implement backup fleecing Fiona Ebner
` (21 preceding siblings ...)
2024-04-11 9:29 ` [pve-devel] [PATCH docs v3 22/22] vzdump: add section about backup fleecing Fiona Ebner
@ 2024-04-19 15:23 ` Fiona Ebner
22 siblings, 0 replies; 34+ messages in thread
From: Fiona Ebner @ 2024-04-19 15:23 UTC (permalink / raw)
To: pve-devel
As discussed off-list with Thomas, applied the remaining non-RFC patches
and bumped the packages with the necessary dependency bumps. For the RFC
ones we'll have more time to decide upon the improved cleanup mechanism
(should it even go to the config or rather some file in /var) and minus
sign in config options.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 34+ messages in thread