public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server v2 6/8] fix #6543: use qcow2 'discard-no-unref' option when using snapshot-as-volume-chain
Date: Fri, 25 Jul 2025 12:50:53 +0200	[thread overview]
Message-ID: <20250725105109.54093-7-f.ebner@proxmox.com> (raw)
In-Reply-To: <20250725105109.54093-1-f.ebner@proxmox.com>

Without the 'discard-no-unref', a qcow2 file can grow beyond what
'qemu-img measure' reports, because of fragmentation. This can lead to
IO errors with qcow2 on top of LVM storages, where the containing LV
is allocated with that size. Guard enabling the option with
having 'snapshot-as-volume-chain' in the storage configuration for
now. Enabling it always should be evaluated a bit more and tested on
different storages. It is a runtime-only option just affecting how
referencing clusters is handled during discard in qcow2 and nothing
else, so it is also fine for existing images and migration streams.

While 'snapshot-as-volume-chain' is not the perfect proxy, as that's
not only for LVM, it's an experimental feature that covers the LVM
case and it seems like a nice fit to try out the new option on
file-based storages too.

Suggested-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
* add missing qcow2 format check for qemu_img
* fix mocking File::stat, expected test output was wrong

 src/PVE/QemuServer/Blockdev.pm                |  7 ++++++
 src/PVE/QemuServer/QemuImage.pm               | 20 +++++++++++++++++
 src/test/cfg2cmd/simple-backingchain.conf.cmd |  2 +-
 src/test/run_qemu_img_convert_tests.pl        | 22 ++++++++++++++-----
 4 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/PVE/QemuServer/Blockdev.pm b/src/PVE/QemuServer/Blockdev.pm
index 8528a587..1487bc99 100644
--- a/src/PVE/QemuServer/Blockdev.pm
+++ b/src/PVE/QemuServer/Blockdev.pm
@@ -372,6 +372,13 @@ my sub generate_format_blockdev {
         $blockdev->{size} = int($options->{size});
     }
 
+    # see bug #6543: without this option, fragmentation can lead to the qcow2 file growing larger
+    # than what qemu-img measure reports, which is problematic for qcow2-on-top-of-LVM
+    # TODO test and consider enabling this in general
+    if ($scfg && $scfg->{'snapshot-as-volume-chain'}) {
+        $blockdev->{'discard-no-unref'} = JSON::true if $format eq 'qcow2';
+    }
+
     return $blockdev;
 }
 
diff --git a/src/PVE/QemuServer/QemuImage.pm b/src/PVE/QemuServer/QemuImage.pm
index 026c24e9..804cbc8e 100644
--- a/src/PVE/QemuServer/QemuImage.pm
+++ b/src/PVE/QemuServer/QemuImage.pm
@@ -3,6 +3,9 @@ package PVE::QemuServer::QemuImage;
 use strict;
 use warnings;
 
+use Fcntl qw(S_ISBLK);
+use File::stat;
+
 use PVE::Format qw(render_bytes);
 use PVE::Storage;
 use PVE::Tools;
@@ -27,6 +30,18 @@ sub convert_iscsi_path {
     die "cannot convert iscsi path '$path', unknown format\n";
 }
 
+my sub qcow2_target_image_opts {
+    my ($path, @qcow2_opts) = @_;
+
+    my $st = File::stat::stat($path) or die "stat for '$path' failed - $!\n";
+
+    my $driver = S_ISBLK($st->mode) ? 'host_device' : 'file';
+
+    my $qcow2_opts_str = ',' . join(',', @qcow2_opts);
+
+    return "driver=qcow2$qcow2_opts_str,file.driver=$driver,file.filename=$path";
+}
+
 # The possible options are:
 # bwlimit - The bandwidth limit in KiB/s.
 # is-zero-initialized - If the destination image is zero-initialized.
@@ -71,6 +86,8 @@ sub convert {
     my $dst_format = checked_volume_format($storecfg, $dst_volid);
     my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
     my $dst_is_iscsi = ($dst_path =~ m|^iscsi://|);
+    my $dst_needs_discard_no_unref =
+        $dst_scfg->{'snapshot-as-volume-chain'} && $dst_format eq 'qcow2';
     my $support_qemu_snapshots = PVE::Storage::volume_qemu_snapshot_method($storecfg, $src_volid);
 
     my $cmd = [];
@@ -94,6 +111,9 @@ sub convert {
     if ($dst_is_iscsi) {
         push @$cmd, '--target-image-opts';
         $dst_path = convert_iscsi_path($dst_path);
+    } elsif ($dst_needs_discard_no_unref) {
+        push @$cmd, '--target-image-opts';
+        $dst_path = qcow2_target_image_opts($dst_path, 'discard-no-unref=true');
     } else {
         push @$cmd, '-O', $dst_format;
     }
diff --git a/src/test/cfg2cmd/simple-backingchain.conf.cmd b/src/test/cfg2cmd/simple-backingchain.conf.cmd
index cae2ad4b..4ac24b93 100644
--- a/src/test/cfg2cmd/simple-backingchain.conf.cmd
+++ b/src/test/cfg2cmd/simple-backingchain.conf.cmd
@@ -26,7 +26,7 @@
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
   -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
-  -blockdev '{"detect-zeroes":"on","discard":"ignore","driver":"throttle","file":{"backing":{"backing":{"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vzsnapext/images/8006/snap1-vm-8006-disk-0.qcow2","node-name":"ea91a385a49a008a4735c0aec5c6749","read-only":false},"node-name":"fa91a385a49a008a4735c0aec5c6749","read-only":false},"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vzsnapext/images/8006/snap2-vm-8006-disk-0.qcow2","node-name":"ec0289317073959d450248d8cd7a480","read-only":false},"node-name":"fc0289317073959d450248d8cd7a480","read-only":false},"cache":{"direct":true,"no-flush":false},"detect-ze
 roes":"on","discard":"ignore","driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vzsnapext/images/8006/vm-8006-disk-0.qcow2","node-name":"e74f4959037afb46eddc7313c43dfdd","read-only":false},"node-name":"f74f4959037afb46eddc7313c43dfdd","read-only":false},"node-name":"drive-scsi0","read-only":false,"throttle-group":"throttle-drive-scsi0"}' \
+  -blockdev '{"detect-zeroes":"on","discard":"ignore","driver":"throttle","file":{"backing":{"backing":{"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","discard-no-unref":true,"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vzsnapext/images/8006/snap1-vm-8006-disk-0.qcow2","node-name":"ea91a385a49a008a4735c0aec5c6749","read-only":false},"node-name":"fa91a385a49a008a4735c0aec5c6749","read-only":false},"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","discard-no-unref":true,"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vzsnapext/images/8006/snap2-vm-8006-disk-0.qcow2","node-name":"ec0289317073959d450248d8cd7a480","read-only":false},"node-name":"fc0289317073959d450248d8cd7a480","read-only":false},"ca
 che":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","discard-no-unref":true,"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vzsnapext/images/8006/vm-8006-disk-0.qcow2","node-name":"e74f4959037afb46eddc7313c43dfdd","read-only":false},"node-name":"f74f4959037afb46eddc7313c43dfdd","read-only":false},"node-name":"drive-scsi0","read-only":false,"throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,device_id=drive-scsi0,write-cache=on' \
   -blockdev '{"detect-zeroes":"on","discard":"ignore","driver":"throttle","file":{"backing":{"backing":{"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"qcow2","file":{"aio":"native","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"host_device","filename":"/dev/veegee/snap1-vm-8006-disk-0.qcow2","node-name":"e25f58d3e6e11f2065ad41253988915","read-only":false},"node-name":"f25f58d3e6e11f2065ad41253988915","read-only":false},"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"qcow2","file":{"aio":"native","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"host_device","filename":"/dev/veegee/snap2-vm-8006-disk-0.qcow2","node-name":"e9415bb5e484c1e25d25063b01686fe","read-only":false},"node-name":"f9415bb5e484c1e25d25063b01686fe","read-only":false},"cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore
 ","driver":"qcow2","file":{"aio":"native","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"host_device","filename":"/dev/veegee/vm-8006-disk-0.qcow2","node-name":"e87358a470ca311f94d5cc61d1eb428","read-only":false},"node-name":"f87358a470ca311f94d5cc61d1eb428","read-only":false},"node-name":"drive-scsi1","read-only":false,"throttle-group":"throttle-drive-scsi1"}' \
   -device 'scsi-hd,bus=scsihw0.0,scsi-id=1,drive=drive-scsi1,id=scsi1,device_id=drive-scsi1,write-cache=on' \
diff --git a/src/test/run_qemu_img_convert_tests.pl b/src/test/run_qemu_img_convert_tests.pl
index 64c98327..3c8f09f0 100755
--- a/src/test/run_qemu_img_convert_tests.pl
+++ b/src/test/run_qemu_img_convert_tests.pl
@@ -542,10 +542,10 @@ my $tests = [
             "-n",
             "-f",
             "raw",
-            "-O",
-            "qcow2",
+            "--target-image-opts",
             "/var/lib/vz/images/$vmid/vm-$vmid-disk-0.raw",
-            "/var/lib/vzsnapext/images/$vmid/vm-$vmid-disk-0.qcow2",
+            "driver=qcow2,discard-no-unref=true,file.driver=file,"
+                . "file.filename=/var/lib/vzsnapext/images/$vmid/vm-$vmid-disk-0.qcow2",
         ],
     },
     {
@@ -560,16 +560,26 @@ my $tests = [
             "-n",
             "-f",
             "raw",
-            "-O",
-            "qcow2",
+            "--target-image-opts",
             "/var/lib/vz/images/$vmid/vm-$vmid-disk-0.raw",
-            "/dev/pve/vm-$vmid-disk-0.qcow2",
+            "driver=qcow2,discard-no-unref=true,file.driver=host_device,"
+                . "file.filename=/dev/pve/vm-$vmid-disk-0.qcow2",
         ],
     },
 ];
 
 my $command;
 
+my $file_stat_module = Test::MockModule->new("File::stat");
+$file_stat_module->mock(
+    stat => sub {
+        my ($path) = @_;
+        my $st = $file_stat_module->original('stat')->('./run_qemu_img_convert_tests.pl');
+        $st->[2] = 25008 if $path =~ m!/dev/!; # block device
+        return $st;
+    },
+);
+
 my $storage_module = Test::MockModule->new("PVE::Storage");
 $storage_module->mock(
     config => sub {
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-07-25 10:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-25 10:50 [pve-devel] [PATCH-SERIES qemu-server v2 0/8] blockdev and snapshot-as-volume-chain on LVM fixes Fiona Ebner
2025-07-25 10:50 ` [pve-devel] [PATCH qemu-server v2 1/8] blockdev: helper to add common options Fiona Ebner
2025-07-25 10:50 ` [pve-devel] [PATCH qemu-server v2 2/8] blockdev: fix discard Fiona Ebner
2025-07-25 10:50 ` [pve-devel] [PATCH qemu-server v2 3/8] tests: image convert: avoid hard-coded VM ID in result Fiona Ebner
2025-07-25 10:50 ` [pve-devel] [PATCH qemu-server v2 4/8] tests: image convert: properly set snapshot-as-volume-chain option Fiona Ebner
2025-07-25 10:50 ` [pve-devel] [PATCH qemu-server v2 5/8] tests: image convert: add tests where storages with 'snapshot-as-volume-chain' are the target Fiona Ebner
2025-07-25 10:50 ` Fiona Ebner [this message]
2025-07-25 10:50 ` [pve-devel] [PATCH qemu-server v2 7/8] tests: cfg2cmd: improve mocking File::stat Fiona Ebner
2025-07-25 10:50 ` [pve-devel] [PATCH qemu-server v2 8/8] image convert: re-use generate_drive_blockdev() Fiona Ebner
2025-07-25 12:15 ` [pve-devel] applied-series: [PATCH-SERIES qemu-server v2 0/8] blockdev and snapshot-as-volume-chain on LVM fixes Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250725105109.54093-7-f.ebner@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal