From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 3BAB09B2BD
 for <pve-devel@lists.proxmox.com>; Tue, 17 Oct 2023 14:10:54 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 19C0132AE8
 for <pve-devel@lists.proxmox.com>; Tue, 17 Oct 2023 14:10:24 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Tue, 17 Oct 2023 14:10:20 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id B47FF41B3D
 for <pve-devel@lists.proxmox.com>; Tue, 17 Oct 2023 14:10:17 +0200 (CEST)
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Tue, 17 Oct 2023 14:10:10 +0200
Message-Id: <20231017121012.132636-6-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20231017121012.132636-1-f.ebner@proxmox.com>
References: <20231017121012.132636-1-f.ebner@proxmox.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.086 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [nongnu.org]
Subject: [pve-devel] [PATCH v3 qemu 5/7] add patch to disable graph locking
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 17 Oct 2023 12:10:54 -0000

There are still some issues with graph locking, e.g. deadlocks during
backup canceling [0] and initial attempts to fix it didn't work [1].
Because the AioContext locks still exist, it should still be safe to
disable graph locking.

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg00729.html
[1]: https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg06905.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes in v3.

 ...t-graph-lock-Disable-locking-for-now.patch | 140 ++++++++++++++++++
 debian/patches/series                         |   1 +
 2 files changed, 141 insertions(+)
 create mode 100644 debian/patches/extra/0006-Revert-Revert-graph-lock-Disable-locking-for-now.patch

diff --git a/debian/patches/extra/0006-Revert-Revert-graph-lock-Disable-locking-for-now.patch b/debian/patches/extra/0006-Revert-Revert-graph-lock-Disable-locking-for-now.patch
new file mode 100644
index 0000000..f0648d2
--- /dev/null
+++ b/debian/patches/extra/0006-Revert-Revert-graph-lock-Disable-locking-for-now.patch
@@ -0,0 +1,140 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Fiona Ebner <f.ebner@proxmox.com>
+Date: Thu, 28 Sep 2023 10:07:03 +0200
+Subject: [PATCH] Revert "Revert "graph-lock: Disable locking for now""
+
+This reverts commit 3cce22defb4b0e47cf135444e30cc673cff5ebad.
+
+There are still some issues with graph locking, e.g. deadlocks during
+backup canceling [0]. Because the AioContext locks still exist, it
+should be safe to disable locking again.
+
+From the original 80fc5d2600 ("graph-lock: Disable locking for now"):
+
+> We don't currently rely on graph locking yet. It is supposed to replace
+> the AioContext lock eventually to enable multiqueue support, but as long
+> as we still have the AioContext lock, it is sufficient without the graph
+> lock. Once the AioContext lock goes away, the deadlock doesn't exist any
+> more either and this commit can be reverted. (Of course, it can also be
+> reverted while the AioContext lock still exists if the callers have been
+> fixed.)
+
+[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg00729.html
+
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+---
+ block/graph-lock.c | 24 ++++++++++++++++++++++++
+ 1 file changed, 24 insertions(+)
+
+diff --git a/block/graph-lock.c b/block/graph-lock.c
+index 5e66f01ae8..5c2873262a 100644
+--- a/block/graph-lock.c
++++ b/block/graph-lock.c
+@@ -30,8 +30,10 @@ BdrvGraphLock graph_lock;
+ /* Protects the list of aiocontext and orphaned_reader_count */
+ static QemuMutex aio_context_list_lock;
+ 
++#if 0
+ /* Written and read with atomic operations. */
+ static int has_writer;
++#endif
+ 
+ /*
+  * A reader coroutine could move from an AioContext to another.
+@@ -88,6 +90,7 @@ void unregister_aiocontext(AioContext *ctx)
+     g_free(ctx->bdrv_graph);
+ }
+ 
++#if 0
+ static uint32_t reader_count(void)
+ {
+     BdrvGraphRWlock *brdv_graph;
+@@ -105,12 +108,19 @@ static uint32_t reader_count(void)
+     assert((int32_t)rd >= 0);
+     return rd;
+ }
++#endif
+ 
+ void bdrv_graph_wrlock(BlockDriverState *bs)
+ {
++#if 0
+     AioContext *ctx = NULL;
+ 
+     GLOBAL_STATE_CODE();
++    /*
++     * TODO Some callers hold an AioContext lock when this is called, which
++     * causes deadlocks. Reenable once the AioContext locking is cleaned up (or
++     * AioContext locks are gone).
++     */
+     assert(!qatomic_read(&has_writer));
+ 
+     /*
+@@ -158,11 +168,13 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
+     if (ctx) {
+         aio_context_acquire(bdrv_get_aio_context(bs));
+     }
++#endif
+ }
+ 
+ void bdrv_graph_wrunlock(void)
+ {
+     GLOBAL_STATE_CODE();
++#if 0
+     QEMU_LOCK_GUARD(&aio_context_list_lock);
+     assert(qatomic_read(&has_writer));
+ 
+@@ -174,10 +186,13 @@ void bdrv_graph_wrunlock(void)
+ 
+     /* Wake up all coroutine that are waiting to read the graph */
+     qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
++#endif
+ }
+ 
+ void coroutine_fn bdrv_graph_co_rdlock(void)
+ {
++    /* TODO Reenable when wrlock is reenabled */
++#if 0
+     BdrvGraphRWlock *bdrv_graph;
+     bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
+ 
+@@ -237,10 +252,12 @@ void coroutine_fn bdrv_graph_co_rdlock(void)
+             qemu_co_queue_wait(&reader_queue, &aio_context_list_lock);
+         }
+     }
++#endif
+ }
+ 
+ void coroutine_fn bdrv_graph_co_rdunlock(void)
+ {
++#if 0
+     BdrvGraphRWlock *bdrv_graph;
+     bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
+ 
+@@ -258,6 +275,7 @@ void coroutine_fn bdrv_graph_co_rdunlock(void)
+     if (qatomic_read(&has_writer)) {
+         aio_wait_kick();
+     }
++#endif
+ }
+ 
+ void bdrv_graph_rdlock_main_loop(void)
+@@ -275,13 +293,19 @@ void bdrv_graph_rdunlock_main_loop(void)
+ void assert_bdrv_graph_readable(void)
+ {
+     /* reader_count() is slow due to aio_context_list_lock lock contention */
++    /* TODO Reenable when wrlock is reenabled */
++#if 0
+ #ifdef CONFIG_DEBUG_GRAPH_LOCK
+     assert(qemu_in_main_thread() || reader_count());
+ #endif
++#endif
+ }
+ 
+ void assert_bdrv_graph_writable(void)
+ {
+     assert(qemu_in_main_thread());
++    /* TODO Reenable when wrlock is reenabled */
++#if 0
+     assert(qatomic_read(&has_writer));
++#endif
+ }
diff --git a/debian/patches/series b/debian/patches/series
index 01d4d3c..6d681da 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -3,6 +3,7 @@ extra/0002-scsi-megasas-Internal-cdbs-have-16-byte-length.patch
 extra/0003-ide-avoid-potential-deadlock-when-draining-during-tr.patch
 extra/0004-migration-block-dirty-bitmap-fix-loading-bitmap-when.patch
 extra/0005-hw-ide-reset-cancel-async-DMA-operation-before-reset.patch
+extra/0006-Revert-Revert-graph-lock-Disable-locking-for-now.patch
 bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
 bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
 bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
-- 
2.39.2