From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 662FD90D57
 for <pve-devel@lists.proxmox.com>; Thu, 25 Jan 2024 15:41:55 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 47AD81B9AD
 for <pve-devel@lists.proxmox.com>; Thu, 25 Jan 2024 15:41:55 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Thu, 25 Jan 2024 15:41:54 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A31E6492BA
 for <pve-devel@lists.proxmox.com>; Thu, 25 Jan 2024 15:41:53 +0100 (CET)
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Thu, 25 Jan 2024 15:41:39 +0100
Message-Id: <20240125144149.216064-4-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20240125144149.216064-1-f.ebner@proxmox.com>
References: <20240125144149.216064-1-f.ebner@proxmox.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.075 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: [pve-devel] [PATCH qemu 03/13] block/io: clear BDRV_BLOCK_RECURSE
 flag after recursing in bdrv_co_block_status
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 25 Jan 2024 14:41:55 -0000

Using fleecing backup like in [0] on a qcow2 image (with metadata
preallocation) can lead to the following assertion failure:

> bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.

In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
will be set by the qcow2 driver, so the caller will recursively check
the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
chain, in bdrv_co_do_block_status() for the snapshot-access driver,
the assertion failure will happen, because both flags are set.

To fix it, clear the recurse flag after the recursive check was done.

In detail:

> #0  qcow2_co_block_status

Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID.

> #1  bdrv_co_do_block_status

Because of the data flag, bdrv_co_do_block_status() will now also set
BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
bdrv_co_do_block_status() for the bdrv_file child will be called,
which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.

Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.

> #2  bdrv_co_common_block_status_above
> #3  bdrv_co_block_status_above
> #4  bdrv_co_block_status
> #5  cbw_co_snapshot_block_status
> #6  bdrv_co_snapshot_block_status
> #7  snapshot_access_co_block_status
> #8  bdrv_co_do_block_status

Return value is propagated all the way up to here, where the assertion
failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
both set.

> #9  bdrv_co_common_block_status_above
> #10 bdrv_co_block_status_above
> #11 block_copy_block_status
> #12 block_copy_dirty_clusters
> #13 block_copy_common
> #14 block_copy_async_co_entry
> #15 coroutine_trampoline

[0]:

> #!/bin/bash
> rm /tmp/disk.qcow2
> ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node1", "sync": "full", "job-id": "backup0" } }
> EOF

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

Can be added as an extra/ patch.

 block/io.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/io.c b/block/io.c
index 63f7b3ad3e..24a3c84c93 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2589,6 +2589,16 @@ bdrv_co_block_status(BlockDriverState *bs, bool want_zero,
                 ret |= (ret2 & BDRV_BLOCK_ZERO);
             }
         }
+
+        /*
+         * Now that the recursive search was done, clear the flag. Otherwise,
+         * with more complicated block graphs like snapshot-access ->
+         * copy-before-write -> qcow2, where the return value will be propagated
+         * further up to a parent bdrv_co_do_block_status() call, both the
+         * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which is
+         * not allowed.
+         */
+        ret &= ~BDRV_BLOCK_RECURSE;
     }
 
 out:
-- 
2.39.2