From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 4EC3FB9890 for ; Fri, 15 Mar 2024 11:25:09 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2C20919193 for ; Fri, 15 Mar 2024 11:25:09 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 15 Mar 2024 11:25:07 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id E224B48A43 for ; Fri, 15 Mar 2024 11:25:06 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Fri, 15 Mar 2024 11:24:41 +0100 Message-Id: <20240315102502.84163-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.070 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pve-devel] [PATCH-SERIES v2] fix #4136: implement backup fleecing X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Mar 2024 10:25:09 -0000 Changes in v2 (thanks - not limited to - to Fabian and Alexandre for feedback!): * Use v3 of "discard-source" upstream series (v4 was posted in the meantime but without any semantic change) * Add patches to specify minimum cluster size during backup, to allow discard to work even if fleecing image has larger cluster size than backup target. * Add permission check for fleecing storage. * Record fleecing image in config to be able to clean up after hard failure. * Do not use "same storage as image" as default fleecing storage. * Use qcow2 for fleecing image if storage supports it * Flesh out recommendations for fleecing storage in docs. When a backup for a VM is started, QEMU will install a "copy-before-write" filter in its block layer. This filter ensures that upon new guest writes, old data still needed for the backup is sent to the backup target first. The guest write blocks until this operation is finished so guest IO to not-yet-backed-up sectors will be limited by the speed of the backup target. With backup fleecing, such old data is cached in a fleecing image rather than sent directly to the backup target. This can help guest IO performance and even prevent hangs in certain scenarios, at the cost of requiring more storage space. With this series it will be possible to enable backup-fleecing via e.g. `vzdump 123 --fleecing enabled=1,storage=local-lvm` with fleecing images created on the storage `local-lvm`. The fleecing storage should be a fast local storage which supports thin-provisioning and discard. If the storage supports qcow2, that is used as the fleecing image format. If the underlying file system does not support discard, with qcow2 and preallocation=off, at least already allocated parts of the image can be re-used later. Fleecing images are created by qemu-server via pve-storage and attached to QEMU before the backup starts, and cleaned up after the backup finished or failed. The naming schema for fleecing images is 'vm-ID-fleece-N(.FORMAT)'. The allocated images are recorded in the guest configuration, so that even after a hard failure, clean-up can be re-attempted. While not too bad, it's a non-trivial amount of code and I'm not 100% sure about the cost-benefit, so sending those as RFC. The fleecing image needs to be the exact same size as the source, but luckily, an explicit size can be specified when attaching a raw image to QEMU so there are no size issues when using storages that have coarser allocation/round up. For qcow2, it seems that virtual size can be nearly arbitrary (i.e. modulo 512 byte granularity) during allocation. While tests seem fine so far, most important part to review is the setup of the backup job and bitmap handling inside QEMU. QEMU patches are for the submodule for better reviewability. There are two prerequisites (that are expected to be picked up by upstream at some point): 1. For being able to discard the fleecing image, addition of a discard-source parameter [0]. 2. In combination with discard, cluster size issue when fleecing image has a larger cluster size than backup target. Proposed workaround is to be able to specify the minimum granularity for the backup job [1]. Dependencies: pve-manager -> pve-guest-common -> pve-common \-> qemu-server Plus new pve-qemu-kvm to actually be able to use the feature. [0]: https://lore.kernel.org/qemu-devel/20240228141501.455989-1-vsementsov@yandex-team.ru/ [1]: https://lore.kernel.org/qemu-devel/20240308155158.830258-1-f.ebner@proxmox.com/ qemu: Fiona Ebner (3): copy-before-write: allow specifying minimum cluster size backup: add minimum cluster size to performance options PVE backup: add fleecing option Vladimir Sementsov-Ogievskiy (4): block/copy-before-write: fix permission block/copy-before-write: support unligned snapshot-discard block/copy-before-write: create block_copy bitmap in filter node qapi: blockdev-backup: add discard-source parameter block/backup.c | 5 +- block/block-copy.c | 29 ++++- block/copy-before-write.c | 42 ++++++-- block/copy-before-write.h | 2 + block/monitor/block-hmp-cmds.c | 1 + block/replication.c | 4 +- blockdev.c | 5 +- include/block/block-common.h | 2 + include/block/block-copy.h | 3 + include/block/block_int-global-state.h | 2 +- pve-backup.c | 143 ++++++++++++++++++++++++- qapi/block-core.json | 29 ++++- tests/qemu-iotests/257.out | 112 +++++++++---------- 13 files changed, 298 insertions(+), 81 deletions(-) common: Fiona Ebner (1): json schema: add format description for pve-storage-id standard option src/PVE/JSONSchema.pm | 1 + 1 file changed, 1 insertion(+) guest-common: Fiona Ebner (3): vzdump: schema: add fleecing property string vzdump: schema: make storage for fleecing semi-optional abstract config: do not copy fleecing images entry for snapshot src/PVE/AbstractConfig.pm | 1 + src/PVE/VZDump/Common.pm | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+) manager: Fiona Ebner (2): vzdump: handle new 'fleecing' property string api: backup/vzdump: add permission check for fleecing storage PVE/API2/Backup.pm | 10 ++++++++-- PVE/API2/VZDump.pm | 9 +++++---- PVE/VZDump.pm | 12 ++++++++++++ 3 files changed, 25 insertions(+), 6 deletions(-) qemu-server: Fiona Ebner (7): backup: disk info: also keep track of size backup: implement fleecing option parse config: allow config keys with minus sign schema: add fleecing-images config property vzdump: better cleanup fleecing images after hard errors migration: attempt to clean up potential left-over fleecing images destroy vm: clean up potential left-over fleecing images PVE/API2/Qemu.pm | 9 +++ PVE/QemuConfig.pm | 40 ++++++++++ PVE/QemuMigrate.pm | 3 + PVE/QemuServer.pm | 12 ++- PVE/VZDump/QemuServer.pm | 163 ++++++++++++++++++++++++++++++++++++++- 5 files changed, 224 insertions(+), 3 deletions(-) docs: Fiona Ebner (1): vzdump: add section about backup fleecing vzdump.adoc | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) Summary over all repositories: 25 files changed, 624 insertions(+), 90 deletions(-) -- Generated by git-murpp 0.5.0