public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu] add fix for crash during live migration in combination with block flush
@ 2025-01-08 13:03 Fiona Ebner
  2025-01-15 16:28 ` Thomas Lamprecht
  0 siblings, 1 reply; 4+ messages in thread
From: Fiona Ebner @ 2025-01-08 13:03 UTC (permalink / raw)
  To: pve-devel

Setting blk->root is a graph change operation and thus needs to be
protected by the block graph write lock in blk_remove_bs(). The
assignment to blk->root in blk_insert_bs() is already protected by
the block graph write lock.

In particular, the graph read lock in blk_co_do_flush() could
previously not ensure that blk_bs(blk) would always return the same
value during the locked section, which could lead to a segfault [0] in
combination with migration [1].

From the user-provided backtraces in the forum thread [1], it seems
like blk_co_do_flush() managed to get past the
blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a
non-NULL value during the check, but then, when calling
bdrv_co_flush(), blk_bs(blk) returned NULL.

[0]:

> 0  bdrv_primary_child (bs=bs@entry=0x0) at ../block.c:8287
> 1  bdrv_co_flush (bs=0x0) at ../block/io.c:2948
> 2  bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901

[1]: https://forum.proxmox.com/threads/158072

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Upstream submission of the same patch:
https://lore.kernel.org/qemu-devel/20250108124649.333668-1-f.ebner@proxmox.com/T/

 ...otect-setting-block-root-to-NULL-wit.patch | 51 +++++++++++++++++++
 debian/patches/series                         |  1 +
 2 files changed, 52 insertions(+)
 create mode 100644 debian/patches/extra/0007-block-backend-protect-setting-block-root-to-NULL-wit.patch

diff --git a/debian/patches/extra/0007-block-backend-protect-setting-block-root-to-NULL-wit.patch b/debian/patches/extra/0007-block-backend-protect-setting-block-root-to-NULL-wit.patch
new file mode 100644
index 0000000..7ff996f
--- /dev/null
+++ b/debian/patches/extra/0007-block-backend-protect-setting-block-root-to-NULL-wit.patch
@@ -0,0 +1,51 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Fiona Ebner <f.ebner@proxmox.com>
+Date: Wed, 8 Jan 2025 12:41:20 +0100
+Subject: [PATCH] block-backend: protect setting block root to NULL with block
+ graph write lock
+
+Setting blk->root is a graph change operation and thus needs to be
+protected by the block graph write lock in blk_remove_bs(). The
+assignment to blk->root in blk_insert_bs() is already protected by
+the block graph write lock.
+
+In particular, the graph read lock in blk_co_do_flush() could
+previously not ensure that blk_bs(blk) would always return the same
+value during the locked section, which could lead to a segfault [0] in
+combination with migration [1].
+
+From the user-provided backtraces in the forum thread [1], it seems
+like blk_co_do_flush() managed to get past the
+blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a
+non-NULL value during the check, but then, when calling
+bdrv_co_flush(), blk_bs(blk) returned NULL.
+
+[0]:
+
+> 0  bdrv_primary_child (bs=bs@entry=0x0) at ../block.c:8287
+> 1  bdrv_co_flush (bs=0x0) at ../block/io.c:2948
+> 2  bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901
+
+[1]: https://forum.proxmox.com/threads/158072
+
+Cc: qemu-stable@nongnu.org
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+---
+ block/block-backend.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/block/block-backend.c b/block/block-backend.c
+index db6f9b92a3..68ae681139 100644
+--- a/block/block-backend.c
++++ b/block/block-backend.c
+@@ -896,9 +896,9 @@ void blk_remove_bs(BlockBackend *blk)
+      */
+     blk_drain(blk);
+     root = blk->root;
+-    blk->root = NULL;
+ 
+     bdrv_graph_wrlock();
++    blk->root = NULL;
+     bdrv_root_unref_child(root);
+     bdrv_graph_wrunlock();
+ }
diff --git a/debian/patches/series b/debian/patches/series
index 0b48878..18bf974 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -4,6 +4,7 @@ extra/0003-ide-avoid-potential-deadlock-when-draining-during-tr.patch
 extra/0004-Revert-x86-acpi-workaround-Windows-not-handling-name.patch
 extra/0005-virtio-net-Add-queues-before-loading-them.patch
 extra/0006-virtio-net-Fix-size-check-in-dhclient-workaround.patch
+extra/0007-block-backend-protect-setting-block-root-to-NULL-wit.patch
 bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
 bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
 bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH qemu] add fix for crash during live migration in combination with block flush
  2025-01-08 13:03 [pve-devel] [PATCH qemu] add fix for crash during live migration in combination with block flush Fiona Ebner
@ 2025-01-15 16:28 ` Thomas Lamprecht
  2025-01-16 10:30   ` Fiona Ebner
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Lamprecht @ 2025-01-15 16:28 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 08.01.25 um 14:03 schrieb Fiona Ebner:
> Setting blk->root is a graph change operation and thus needs to be
> protected by the block graph write lock in blk_remove_bs(). The
> assignment to blk->root in blk_insert_bs() is already protected by
> the block graph write lock.
> 
> In particular, the graph read lock in blk_co_do_flush() could
> previously not ensure that blk_bs(blk) would always return the same
> value during the locked section, which could lead to a segfault [0] in
> combination with migration [1].
> 
> From the user-provided backtraces in the forum thread [1], it seems
> like blk_co_do_flush() managed to get past the
> blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a
> non-NULL value during the check, but then, when calling
> bdrv_co_flush(), blk_bs(blk) returned NULL.
> 
> [0]:
> 
>> 0  bdrv_primary_child (bs=bs@entry=0x0) at ../block.c:8287
>> 1  bdrv_co_flush (bs=0x0) at ../block/io.c:2948
>> 2  bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901
> 
> [1]: https://forum.proxmox.com/threads/158072
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> Upstream submission of the same patch:
> https://lore.kernel.org/qemu-devel/20250108124649.333668-1-f.ebner@proxmox.com/T/

I only skimmed the upstream discussion, but seems that there is still some
issue left; so should I wait this version out?


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH qemu] add fix for crash during live migration in combination with block flush
  2025-01-15 16:28 ` Thomas Lamprecht
@ 2025-01-16 10:30   ` Fiona Ebner
  2025-01-16 13:10     ` Thomas Lamprecht
  0 siblings, 1 reply; 4+ messages in thread
From: Fiona Ebner @ 2025-01-16 10:30 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion

Am 15.01.25 um 17:28 schrieb Thomas Lamprecht:
> Am 08.01.25 um 14:03 schrieb Fiona Ebner:
>> Setting blk->root is a graph change operation and thus needs to be
>> protected by the block graph write lock in blk_remove_bs(). The
>> assignment to blk->root in blk_insert_bs() is already protected by
>> the block graph write lock.
>>
>> In particular, the graph read lock in blk_co_do_flush() could
>> previously not ensure that blk_bs(blk) would always return the same
>> value during the locked section, which could lead to a segfault [0] in
>> combination with migration [1].
>>
>> From the user-provided backtraces in the forum thread [1], it seems
>> like blk_co_do_flush() managed to get past the
>> blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a
>> non-NULL value during the check, but then, when calling
>> bdrv_co_flush(), blk_bs(blk) returned NULL.
>>
>> [0]:
>>
>>> 0  bdrv_primary_child (bs=bs@entry=0x0) at ../block.c:8287
>>> 1  bdrv_co_flush (bs=0x0) at ../block/io.c:2948
>>> 2  bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901
>>
>> [1]: https://forum.proxmox.com/threads/158072
>>
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>> ---
>>
>> Upstream submission of the same patch:
>> https://lore.kernel.org/qemu-devel/20250108124649.333668-1-f.ebner@proxmox.com/T/
> 
> I only skimmed the upstream discussion, but seems that there is still some
> issue left; so should I wait this version out?

Yes, we should at least also put the "root = blk->root;" assignment into
the write lock section like the upstream maintainer suggested.

That more complete change is in the package provided to the forum user.
The change should still be an improvement over the status quo, however,
the user reported that it didn't help with the specific crash. I don't
see other code paths that would fit the provided backtraces right now :/
I'll ask the user to try again with a more complete GDB script in the
hope of discovering something I missed.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH qemu] add fix for crash during live migration in combination with block flush
  2025-01-16 10:30   ` Fiona Ebner
@ 2025-01-16 13:10     ` Thomas Lamprecht
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2025-01-16 13:10 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 16.01.25 um 11:30 schrieb Fiona Ebner:
> That more complete change is in the package provided to the forum user.
> The change should still be an improvement over the status quo, however,
> the user reported that it didn't help with the specific crash. I don't
> see other code paths that would fit the provided backtraces right now :/
> I'll ask the user to try again with a more complete GDB script in the
> hope of discovering something I missed.

Ack, and maybe also ensure they actually restarted the VM after
installing the package ^^


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-16 13:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-08 13:03 [pve-devel] [PATCH qemu] add fix for crash during live migration in combination with block flush Fiona Ebner
2025-01-15 16:28 ` Thomas Lamprecht
2025-01-16 10:30   ` Fiona Ebner
2025-01-16 13:10     ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal