From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 14E23BAC1D for ; Fri, 15 Dec 2023 14:21:09 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E7AA7295B for ; Fri, 15 Dec 2023 14:21:08 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 15 Dec 2023 14:21:06 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id AD6D747A0E for ; Fri, 15 Dec 2023 14:21:06 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Fri, 15 Dec 2023 14:20:59 +0100 Message-Id: <20231215132059.95280-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.076 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: [pve-devel] [PATCH qemu] Revert "add patch to work around stuck guest IO with iothread and VirtIO block/SCSI" X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Dec 2023 13:21:09 -0000 This reverts commit 6b7c1815e1c89cb66ff48fbba6da69fe6d254630. The attempted fix has been reported to cause high CPU usage after backup [0]. Not difficult to reproduce and it's iothreads getting stuck in a loop. Downgrading to pve-qemu-kvm=8.1.2-4 helps which was also verified by Christian, thanks! The issue this was supposed to fix is much rarer, so revert for now, while upstream is still working on a proper fix. [0]: https://forum.proxmox.com/threads/138140/ Signed-off-by: Fiona Ebner --- ...work-around-iothread-polling-getting.patch | 66 ------------------- debian/patches/series | 1 - 2 files changed, 67 deletions(-) delete mode 100644 debian/patches/pve/0046-virtio-blk-scsi-work-around-iothread-polling-getting.patch diff --git a/debian/patches/pve/0046-virtio-blk-scsi-work-around-iothread-polling-getting.patch b/debian/patches/pve/0046-virtio-blk-scsi-work-around-iothread-polling-getting.patch deleted file mode 100644 index 3ac10a8..0000000 --- a/debian/patches/pve/0046-virtio-blk-scsi-work-around-iothread-polling-getting.patch +++ /dev/null @@ -1,66 +0,0 @@ -From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 -From: Fiona Ebner -Date: Tue, 5 Dec 2023 14:05:49 +0100 -Subject: [PATCH] virtio blk/scsi: work around iothread polling getting stuck - with drain - -When using iothread, after commits -1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()") -766aa2de0f ("virtio-scsi: implement BlockDevOps->drained_begin()") -it can happen that polling gets stuck when draining. This would cause -IO in the guest to get completely stuck. - -A workaround for users is stopping and resuming the vCPUs because that -would also stop and resume the dataplanes which would kick the host -notifiers. - -This can happen with block jobs like backup and drive mirror as well -as with hotplug [2]. - -Reports in the community forum that might be about this issue[0][1] -and there is also one in the enterprise support channel. - -As a workaround in the code, just re-enable notifications and kick the -virt queue after draining. Draining is already costly and rare, so no -need to worry about a performance penalty here. This was taken from -the following comment of a QEMU developer [3] (in my debugging, -I had already found re-enabling notification to work around the issue, -but also kicking the queue is more complete). - -[0]: https://forum.proxmox.com/threads/137286/ -[1]: https://forum.proxmox.com/threads/137536/ -[2]: https://issues.redhat.com/browse/RHEL-3934 -[3]: https://issues.redhat.com/browse/RHEL-3934?focusedId=23562096&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-23562096 - -Signed-off-by: Fiona Ebner ---- - hw/block/virtio-blk.c | 2 ++ - hw/scsi/virtio-scsi.c | 2 ++ - 2 files changed, 4 insertions(+) - -diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c -index 39e7f23fab..22502047d5 100644 ---- a/hw/block/virtio-blk.c -+++ b/hw/block/virtio-blk.c -@@ -1537,6 +1537,8 @@ static void virtio_blk_drained_end(void *opaque) - for (uint16_t i = 0; i < s->conf.num_queues; i++) { - VirtQueue *vq = virtio_get_queue(vdev, i); - virtio_queue_aio_attach_host_notifier(vq, ctx); -+ virtio_queue_set_notification(vq, 1); -+ virtio_queue_notify(vdev, i); - } - } - -diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c -index 45b95ea070..a7bddbf899 100644 ---- a/hw/scsi/virtio-scsi.c -+++ b/hw/scsi/virtio-scsi.c -@@ -1166,6 +1166,8 @@ static void virtio_scsi_drained_end(SCSIBus *bus) - for (uint32_t i = 0; i < total_queues; i++) { - VirtQueue *vq = virtio_get_queue(vdev, i); - virtio_queue_aio_attach_host_notifier(vq, s->ctx); -+ virtio_queue_set_notification(vq, 1); -+ virtio_queue_notify(vdev, i); - } - } - diff --git a/debian/patches/series b/debian/patches/series index 7dcedcb..b3da8bb 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -60,4 +60,3 @@ pve/0042-Revert-block-rbd-implement-bdrv_co_block_status.patch pve/0043-alloc-track-fix-deadlock-during-drop.patch pve/0044-migration-for-snapshots-hold-the-BQL-during-setup-ca.patch pve/0045-savevm-async-don-t-hold-BQL-during-setup.patch -pve/0046-virtio-blk-scsi-work-around-iothread-polling-getting.patch -- 2.39.2