[pve-devel] [RFC qemu/guest-common/manager/qemu-server/docs 00/13] fix #4136: implement backup fleecing

* [pve-devel] [RFC qemu/guest-common/manager/qemu-server/docs 00/13] fix #4136: implement backup fleecing
@ 2024-01-25 14:41 Fiona Ebner
  2024-01-25 14:41 ` [pve-devel] [PATCH qemu 01/13] backup: factor out gathering device info into helper Fiona Ebner
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: Fiona Ebner @ 2024-01-25 14:41 UTC (permalink / raw)
  To: pve-devel

When a backup for a VM is started, QEMU will install a
"copy-before-write" filter in its block layer. This filter ensures
that upon new guest writes, old data still needed for the backup is
sent to the backup target first. The guest write blocks until this
operation is finished so guest IO to not-yet-backed-up sectors will be
limited by the speed of the backup target.

With backup fleecing, such old data is cached in a fleecing image
rather than sent directly to the backup target. This can help guest IO
performance and even prevent hangs in certain scenarios, at the cost
of requiring more storage space.

With this series it will be possible to enable backup-fleecing via
e.g. `vzdump 123 --fleecing enabled=1,storage=local-zfs` with fleecing
images created on the storage `local-zfs`. If no storage is specified,
the fleecing image will be created on the same storage as the original
image.

Fleecing images are created by qemu-server via pve-storage and
attached to QEMU before the backup starts, and cleaned up after the
backup finished or failed. Currently, just a "-fleecing(.raw)" suffix
is added and there is no special handling yet for e.g. qm rescan/etc..
And previous left-overs are not automatically cleaned up, because
while unlikely, images with this name might've been created by a user
too. Happy to discuss alternatives!

The fleecing image needs to be the exact same size as the source, but
luckily, an explicit size can be specified when attaching a raw image
to QEMU so there are no size issues when using storages that have
coarser allocation/round up.

While initial tests seem fine, bitmap handling needs to be carefully
checked for correctness. More eyeballs can't hurt there.

QEMU patches are for the submodule for better reviewability. There are
unfortunately a few prerequisites which are also still being worked on
upstream. These are:

Fix for qcow2 block status querying when used as a source image [0].
Already reviewed and being pulled.

For being able to discard the fleecing image, addition of a
discard-source parameter[1]. This series was adapted for downstream
and I tried to address the two remaining issues:

1. Permission issue when backup source node is read-only (e.g. TMP
state): Made permissions conditional for when discard-source is set
with a new option for the copy-before-write block driver. Currently,
it's part of QAPI, nicer would be to make it internal-only.

2. Cluster size issue when fleecing image has a larger cluster size
than backup target: Made a workaround by also considering source image
when calculating cluster size for block copy and had to hack
.bdrv_co_get_info implementations for snapshot-access and
copy-before-write. Not super confident and better to wait for an
answer from upstream.

Upstream reports/discussions for these can also be found at [1].

No hard dependencies AFAICS, but of course pve-manager should depend
on both new pve-guest-common and qemu-server to actually be able to
use the option.

[0]: https://lore.kernel.org/qemu-devel/20240116154839.401030-1-f.ebner@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/20240117160737.1057513-1-vsementsov@yandex-team.ru/

qemu:

Fiona Ebner (6):
  backup: factor out gathering device info into helper
  backup: get device info: code cleanup
  block/io: clear BDRV_BLOCK_RECURSE flag after recursing in
    bdrv_co_block_status
  block/{copy-before-write,snapshot-access}: implement bdrv_co_get_info
    driver callback
  block/block-copy: always consider source cluster size too
  PVE backup: add fleecing option

Vladimir Sementsov-Ogievskiy (2):
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter

 block/backup.c                         |  15 +-
 block/block-copy.c                     |  36 ++--
 block/copy-before-write.c              |  46 ++++-
 block/copy-before-write.h              |   1 +
 block/io.c                             |  10 ++
 block/monitor/block-hmp-cmds.c         |   1 +
 block/replication.c                    |   4 +-
 block/snapshot-access.c                |   7 +
 blockdev.c                             |   2 +-
 include/block/block-copy.h             |   3 +-
 include/block/block_int-global-state.h |   2 +-
 pve-backup.c                           | 234 +++++++++++++++++++------
 qapi/block-core.json                   |  18 +-
 13 files changed, 300 insertions(+), 79 deletions(-)

guest-common:

Fiona Ebner (1):
  vzdump: schema: add fleecing property string

 src/PVE/VZDump/Common.pm | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

manager:

Fiona Ebner (1):
  vzdump: handle new 'fleecing' property string

 PVE/VZDump.pm | 12 ++++++++++++
 1 file changed, 12 insertions(+)

qemu-server:

Fiona Ebner (2):
  backup: disk info: also keep track of size
  backup: implement fleecing option

 PVE/VZDump/QemuServer.pm | 141 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 139 insertions(+), 2 deletions(-)

docs:

Fiona Ebner (1):
  vzdump: add section about backup fleecing

 vzdump.adoc | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Summary over all repositories:
  17 files changed, 504 insertions(+), 0 deletions(-)

-- 
Generated by git-murpp 0.5.0

^ permalink raw reply	[flat|nested] 31+ messages in thread