From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 3D94270F52 for ; Tue, 17 May 2022 10:30:25 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0896D2624F for ; Tue, 17 May 2022 10:29:55 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id C140D26246 for ; Tue, 17 May 2022 10:29:53 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8ED504366E for ; Tue, 17 May 2022 10:29:53 +0200 (CEST) From: Fabian Ebner To: pve-devel@lists.proxmox.com Date: Tue, 17 May 2022 10:29:49 +0200 Message-Id: <20220517082949.60472-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.074 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [gitlab.com, proxmox.com] Subject: [pve-devel] [PATCH qemu] add revert to work around performance regression when backing up large RBD disk X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2022 08:30:25 -0000 resulting in QMP timeouts and very slow backups. The plan is to figure out (ideally together with upstream) a way to make the implementation of bdrv_co_block_status for RBD more efficient. But for now, revert the problematic change as a stop-gap measure. Upstream bug report: https://gitlab.com/qemu-project/qemu/-/issues/1026 Forum threads: https://forum.proxmox.com/threads/109272/ https://forum.proxmox.com/threads/109448/ https://forum.proxmox.com/threads/101334/ (partially) Signed-off-by: Fabian Ebner --- I simply removed the other two patches already fixing the offending commit from the "series" file, to make it less likely that we forget to re-add them once we revert the revert and to reduce the git diff. Hope that is ok. ...k-rbd-implement-bdrv_co_block_status.patch | 161 ++++++++++++++++++ debian/patches/series | 3 +- 2 files changed, 162 insertions(+), 2 deletions(-) create mode 100644 debian/patches/pve/0053-Revert-block-rbd-implement-bdrv_co_block_status.patch diff --git a/debian/patches/pve/0053-Revert-block-rbd-implement-bdrv_co_block_status.patch b/debian/patches/pve/0053-Revert-block-rbd-implement-bdrv_co_block_status.patch new file mode 100644 index 0000000..d3c6a71 --- /dev/null +++ b/debian/patches/pve/0053-Revert-block-rbd-implement-bdrv_co_block_status.patch @@ -0,0 +1,161 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Fabian Ebner +Date: Tue, 17 May 2022 09:46:02 +0200 +Subject: [PATCH] Revert "block/rbd: implement bdrv_co_block_status" + +During backup, bdrv_co_block_status is called for each block copy +chunk. When RBD is used, the current implementation with +rbd_diff_iterate2() using whole_object=true takes about linearly more +time, depending on the image size. Since there are linearly more +chunks, the slowdown is quadratic, becoming unacceptable for large +images (starting somewhere between 500-1000 GiB in my testing). + +This reverts commit 0347a8fd4c3faaedf119be04c197804be40a384b as a +stop-gap measure, until it's clear how to make the implemenation +more efficient. + +Upstream bug report: +https://gitlab.com/qemu-project/qemu/-/issues/1026 + +Signed-off-by: Fabian Ebner +--- + block/rbd.c | 112 ---------------------------------------------------- + 1 file changed, 112 deletions(-) + +diff --git a/block/rbd.c b/block/rbd.c +index a4b8fb482c..3393b06a4e 100644 +--- a/block/rbd.c ++++ b/block/rbd.c +@@ -97,12 +97,6 @@ typedef struct RBDTask { + int64_t ret; + } RBDTask; + +-typedef struct RBDDiffIterateReq { +- uint64_t offs; +- uint64_t bytes; +- bool exists; +-} RBDDiffIterateReq; +- + static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, + BlockdevOptionsRbd *opts, bool cache, + const char *keypairs, const char *secretid, +@@ -1267,111 +1261,6 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs, + return spec_info; + } + +-/* +- * rbd_diff_iterate2 allows to interrupt the exection by returning a negative +- * value in the callback routine. Choose a value that does not conflict with +- * an existing exitcode and return it if we want to prematurely stop the +- * execution because we detected a change in the allocation status. +- */ +-#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000 +- +-static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len, +- int exists, void *opaque) +-{ +- RBDDiffIterateReq *req = opaque; +- +- assert(req->offs + req->bytes <= offs); +- /* +- * we do not diff against a snapshot so we should never receive a callback +- * for a hole. +- */ +- assert(exists); +- +- if (!req->exists && offs > req->offs) { +- /* +- * we started in an unallocated area and hit the first allocated +- * block. req->bytes must be set to the length of the unallocated area +- * before the allocated area. stop further processing. +- */ +- req->bytes = offs - req->offs; +- return QEMU_RBD_EXIT_DIFF_ITERATE2; +- } +- +- if (req->exists && offs > req->offs + req->bytes) { +- /* +- * we started in an allocated area and jumped over an unallocated area, +- * req->bytes contains the length of the allocated area before the +- * unallocated area. stop further processing. +- */ +- return QEMU_RBD_EXIT_DIFF_ITERATE2; +- } +- +- req->bytes += len; +- req->exists = true; +- +- return 0; +-} +- +-static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, +- bool want_zero, int64_t offset, +- int64_t bytes, int64_t *pnum, +- int64_t *map, +- BlockDriverState **file) +-{ +- BDRVRBDState *s = bs->opaque; +- int status, r; +- RBDDiffIterateReq req = { .offs = offset }; +- uint64_t features, flags; +- +- assert(offset + bytes <= s->image_size); +- +- /* default to all sectors allocated */ +- status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; +- *map = offset; +- *file = bs; +- *pnum = bytes; +- +- /* check if RBD image supports fast-diff */ +- r = rbd_get_features(s->image, &features); +- if (r < 0) { +- return status; +- } +- if (!(features & RBD_FEATURE_FAST_DIFF)) { +- return status; +- } +- +- /* check if RBD fast-diff result is valid */ +- r = rbd_get_flags(s->image, &flags); +- if (r < 0) { +- return status; +- } +- if (flags & RBD_FLAG_FAST_DIFF_INVALID) { +- return status; +- } +- +- r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true, +- qemu_rbd_diff_iterate_cb, &req); +- if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) { +- return status; +- } +- assert(req.bytes <= bytes); +- if (!req.exists) { +- if (r == 0) { +- /* +- * rbd_diff_iterate2 does not invoke callbacks for unallocated +- * areas. This here catches the case where no callback was +- * invoked at all (req.bytes == 0). +- */ +- assert(req.bytes == 0); +- req.bytes = bytes; +- } +- status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID; +- } +- +- *pnum = req.bytes; +- return status; +-} +- + static int64_t qemu_rbd_getlength(BlockDriverState *bs) + { + BDRVRBDState *s = bs->opaque; +@@ -1607,7 +1496,6 @@ static BlockDriver bdrv_rbd = { + #ifdef LIBRBD_SUPPORTS_WRITE_ZEROES + .bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes, + #endif +- .bdrv_co_block_status = qemu_rbd_co_block_status, + + .bdrv_snapshot_create = qemu_rbd_snap_create, + .bdrv_snapshot_delete = qemu_rbd_snap_remove, diff --git a/debian/patches/series b/debian/patches/series index 96a525a..821d63d 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -1,8 +1,6 @@ extra/0001-monitor-qmp-fix-race-with-clients-disconnecting-earl.patch extra/0002-monitor-hmp-add-support-for-flag-argument-with-value.patch extra/0003-monitor-refactor-set-expire_password-and-allow-VNC-d.patch -extra/0004-block-rbd-fix-handling-of-holes-in-.bdrv_co_block_st.patch -extra/0005-block-rbd-workaround-for-ceph-issue-53784.patch extra/0006-block-io-Update-BSC-only-if-want_zero-is-true.patch extra/0007-block-nbd-Delete-reconnect-delay-timer-when-done.patch extra/0008-block-nbd-Assert-there-are-no-timers-when-closed.patch @@ -77,3 +75,4 @@ pve/0049-PVE-savevm-async-register-yank-before-migration_inco.patch pve/0050-qemu-img-dd-add-l-option-for-loading-a-snapshot.patch pve/0051-vma-allow-partial-restore.patch pve/0052-pbs-namespace-support.patch +pve/0053-Revert-block-rbd-implement-bdrv_co_block_status.patch -- 2.30.2