From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id A7C071FF16F
	for <inbox@lore.proxmox.com>; Thu, 16 Jan 2025 11:30:14 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 96A381CF23;
	Thu, 16 Jan 2025 11:30:11 +0100 (CET)
Message-ID: <760bc33f-0c7d-4df7-9b1d-e44f823c1df7@proxmox.com>
Date: Thu, 16 Jan 2025 11:30:03 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
 Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <20250108130304.343460-1-f.ebner@proxmox.com>
 <3e462976-81be-4025-b7b0-b546a51c2246@proxmox.com>
Content-Language: en-US
From: Fiona Ebner <f.ebner@proxmox.com>
In-Reply-To: <3e462976-81be-4025-b7b0-b546a51c2246@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.050 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH qemu] add fix for crash during live
 migration in combination with block flush
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

Am 15.01.25 um 17:28 schrieb Thomas Lamprecht:
> Am 08.01.25 um 14:03 schrieb Fiona Ebner:
>> Setting blk->root is a graph change operation and thus needs to be
>> protected by the block graph write lock in blk_remove_bs(). The
>> assignment to blk->root in blk_insert_bs() is already protected by
>> the block graph write lock.
>>
>> In particular, the graph read lock in blk_co_do_flush() could
>> previously not ensure that blk_bs(blk) would always return the same
>> value during the locked section, which could lead to a segfault [0] in
>> combination with migration [1].
>>
>> From the user-provided backtraces in the forum thread [1], it seems
>> like blk_co_do_flush() managed to get past the
>> blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a
>> non-NULL value during the check, but then, when calling
>> bdrv_co_flush(), blk_bs(blk) returned NULL.
>>
>> [0]:
>>
>>> 0  bdrv_primary_child (bs=bs@entry=0x0) at ../block.c:8287
>>> 1  bdrv_co_flush (bs=0x0) at ../block/io.c:2948
>>> 2  bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901
>>
>> [1]: https://forum.proxmox.com/threads/158072
>>
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>> ---
>>
>> Upstream submission of the same patch:
>> https://lore.kernel.org/qemu-devel/20250108124649.333668-1-f.ebner@proxmox.com/T/
> 
> I only skimmed the upstream discussion, but seems that there is still some
> issue left; so should I wait this version out?

Yes, we should at least also put the "root = blk->root;" assignment into
the write lock section like the upstream maintainer suggested.

That more complete change is in the package provided to the forum user.
The change should still be an improvement over the status quo, however,
the user reported that it didn't help with the specific crash. I don't
see other code paths that would fit the provided backtraces right now :/
I'll ask the user to try again with a more complete GDB script in the
hope of discovering something I missed.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel