From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 3355090DA5 for ; Thu, 25 Jan 2024 15:41:55 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 16C951B950 for ; Thu, 25 Jan 2024 15:41:55 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 25 Jan 2024 15:41:54 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C728E492BF for ; Thu, 25 Jan 2024 15:41:53 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Thu, 25 Jan 2024 15:41:36 +0100 Message-Id: <20240125144149.216064-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.075 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [common.pm, qemuserver.pm, vzdump.pm] Subject: [pve-devel] [RFC qemu/guest-common/manager/qemu-server/docs 00/13] fix #4136: implement backup fleecing X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jan 2024 14:41:55 -0000 When a backup for a VM is started, QEMU will install a "copy-before-write" filter in its block layer. This filter ensures that upon new guest writes, old data still needed for the backup is sent to the backup target first. The guest write blocks until this operation is finished so guest IO to not-yet-backed-up sectors will be limited by the speed of the backup target. With backup fleecing, such old data is cached in a fleecing image rather than sent directly to the backup target. This can help guest IO performance and even prevent hangs in certain scenarios, at the cost of requiring more storage space. With this series it will be possible to enable backup-fleecing via e.g. `vzdump 123 --fleecing enabled=1,storage=local-zfs` with fleecing images created on the storage `local-zfs`. If no storage is specified, the fleecing image will be created on the same storage as the original image. Fleecing images are created by qemu-server via pve-storage and attached to QEMU before the backup starts, and cleaned up after the backup finished or failed. Currently, just a "-fleecing(.raw)" suffix is added and there is no special handling yet for e.g. qm rescan/etc.. And previous left-overs are not automatically cleaned up, because while unlikely, images with this name might've been created by a user too. Happy to discuss alternatives! The fleecing image needs to be the exact same size as the source, but luckily, an explicit size can be specified when attaching a raw image to QEMU so there are no size issues when using storages that have coarser allocation/round up. While initial tests seem fine, bitmap handling needs to be carefully checked for correctness. More eyeballs can't hurt there. QEMU patches are for the submodule for better reviewability. There are unfortunately a few prerequisites which are also still being worked on upstream. These are: Fix for qcow2 block status querying when used as a source image [0]. Already reviewed and being pulled. For being able to discard the fleecing image, addition of a discard-source parameter[1]. This series was adapted for downstream and I tried to address the two remaining issues: 1. Permission issue when backup source node is read-only (e.g. TMP state): Made permissions conditional for when discard-source is set with a new option for the copy-before-write block driver. Currently, it's part of QAPI, nicer would be to make it internal-only. 2. Cluster size issue when fleecing image has a larger cluster size than backup target: Made a workaround by also considering source image when calculating cluster size for block copy and had to hack .bdrv_co_get_info implementations for snapshot-access and copy-before-write. Not super confident and better to wait for an answer from upstream. Upstream reports/discussions for these can also be found at [1]. No hard dependencies AFAICS, but of course pve-manager should depend on both new pve-guest-common and qemu-server to actually be able to use the option. [0]: https://lore.kernel.org/qemu-devel/20240116154839.401030-1-f.ebner@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/20240117160737.1057513-1-vsementsov@yandex-team.ru/ qemu: Fiona Ebner (6): backup: factor out gathering device info into helper backup: get device info: code cleanup block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status block/{copy-before-write,snapshot-access}: implement bdrv_co_get_info driver callback block/block-copy: always consider source cluster size too PVE backup: add fleecing option Vladimir Sementsov-Ogievskiy (2): block/copy-before-write: create block_copy bitmap in filter node qapi: blockdev-backup: add discard-source parameter block/backup.c | 15 +- block/block-copy.c | 36 ++-- block/copy-before-write.c | 46 ++++- block/copy-before-write.h | 1 + block/io.c | 10 ++ block/monitor/block-hmp-cmds.c | 1 + block/replication.c | 4 +- block/snapshot-access.c | 7 + blockdev.c | 2 +- include/block/block-copy.h | 3 +- include/block/block_int-global-state.h | 2 +- pve-backup.c | 234 +++++++++++++++++++------ qapi/block-core.json | 18 +- 13 files changed, 300 insertions(+), 79 deletions(-) guest-common: Fiona Ebner (1): vzdump: schema: add fleecing property string src/PVE/VZDump/Common.pm | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) manager: Fiona Ebner (1): vzdump: handle new 'fleecing' property string PVE/VZDump.pm | 12 ++++++++++++ 1 file changed, 12 insertions(+) qemu-server: Fiona Ebner (2): backup: disk info: also keep track of size backup: implement fleecing option PVE/VZDump/QemuServer.pm | 141 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 139 insertions(+), 2 deletions(-) docs: Fiona Ebner (1): vzdump: add section about backup fleecing vzdump.adoc | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) Summary over all repositories: 17 files changed, 504 insertions(+), 0 deletions(-) -- Generated by git-murpp 0.5.0