public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v4 pve-qemu 1/1] add block-commit-replaces option patch
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 10097 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 pve-qemu 1/1] add block-commit-replaces option patch
Date: Tue, 11 Mar 2025 11:28:49 +0100
Message-ID: <20250311102905.2680524-2-alexandre.derumier@groupe-cyllene.com>

This is needed for external snapshot live commit,
when the top blocknode is not the fmt-node.
(in our case, the throttle-group node is the topnode)

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 ...051-block-commit-add-replaces-option.patch | 137 ++++++++++++++++++
 debian/patches/series                         |   1 +
 2 files changed, 138 insertions(+)
 create mode 100644 debian/patches/pve/0051-block-commit-add-replaces-option.patch

diff --git a/debian/patches/pve/0051-block-commit-add-replaces-option.patch b/debian/patches/pve/0051-block-commit-add-replaces-option.patch
new file mode 100644
index 0000000..2488b5b
--- /dev/null
+++ b/debian/patches/pve/0051-block-commit-add-replaces-option.patch
@@ -0,0 +1,137 @@
+From ae39fd3bb72db440cf380978af9bf5693c12ac6c Mon Sep 17 00:00:00 2001
+From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
+Date: Wed, 11 Dec 2024 16:20:25 +0100
+Subject: [PATCH] block-commit: add replaces option
+
+This use same code than drive-mirror for live commit, but the option
+is not send currently.
+
+Allow to replaces a different node than the root node after the block-commit
+(as we use throttle-group as root, and not the drive)
+---
+ block/mirror.c                         | 4 ++--
+ block/replication.c                    | 2 +-
+ blockdev.c                             | 4 ++--
+ include/block/block_int-global-state.h | 4 +++-
+ qapi/block-core.json                   | 5 ++++-
+ qemu-img.c                             | 2 +-
+ 6 files changed, 13 insertions(+), 8 deletions(-)
+
+diff --git a/block/mirror.c b/block/mirror.c
+index 2f12238..1a5e528 100644
+--- a/block/mirror.c
++++ b/block/mirror.c
+@@ -2086,7 +2086,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+                               int64_t speed, BlockdevOnError on_error,
+                               const char *filter_node_name,
+                               BlockCompletionFunc *cb, void *opaque,
+-                              bool auto_complete, Error **errp)
++                              bool auto_complete, const char *replaces, Error **errp)
+ {
+     bool base_read_only;
+     BlockJob *job;
+@@ -2102,7 +2102,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+     }
+ 
+     job = mirror_start_job(
+-                     job_id, bs, creation_flags, base, NULL, speed, 0, 0,
++                     job_id, bs, creation_flags, base, replaces, speed, 0, 0,
+                      MIRROR_LEAVE_BACKING_CHAIN, false,
+                      on_error, on_error, true, cb, opaque,
+                      &commit_active_job_driver, MIRROR_SYNC_MODE_FULL,
+diff --git a/block/replication.c b/block/replication.c
+index 0415a5e..debbe25 100644
+--- a/block/replication.c
++++ b/block/replication.c
+@@ -711,7 +711,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
+         s->commit_job = commit_active_start(
+                             NULL, bs->file->bs, s->secondary_disk->bs,
+                             JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
+-                            NULL, replication_done, bs, true, errp);
++                            NULL, replication_done, bs, true, NULL, errp);
+         bdrv_graph_rdunlock_main_loop();
+         break;
+     default:
+diff --git a/blockdev.c b/blockdev.c
+index cbe2243..349fb71 100644
+--- a/blockdev.c
++++ b/blockdev.c
+@@ -2435,7 +2435,7 @@ void qmp_block_commit(const char *job_id, const char *device,
+                       const char *filter_node_name,
+                       bool has_auto_finalize, bool auto_finalize,
+                       bool has_auto_dismiss, bool auto_dismiss,
+-                      Error **errp)
++                      const char *replaces, Error **errp)
+ {
+     BlockDriverState *bs;
+     BlockDriverState *iter;
+@@ -2596,7 +2596,7 @@ void qmp_block_commit(const char *job_id, const char *device,
+             job_id = bdrv_get_device_name(bs);
+         }
+         commit_active_start(job_id, top_bs, base_bs, job_flags, speed, on_error,
+-                            filter_node_name, NULL, NULL, false, &local_err);
++                            filter_node_name, NULL, NULL, false, replaces, &local_err);
+     } else {
+         BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);
+         if (bdrv_op_is_blocked(overlay_bs, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
+diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
+index f0c642b..194b580 100644
+--- a/include/block/block_int-global-state.h
++++ b/include/block/block_int-global-state.h
+@@ -115,6 +115,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
+  * @cb: Completion function for the job.
+  * @opaque: Opaque pointer value passed to @cb.
+  * @auto_complete: Auto complete the job.
++ * @replaces: Block graph node name to replace once the commit is done.
+  * @errp: Error object.
+  *
+  */
+@@ -123,7 +124,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+                               int64_t speed, BlockdevOnError on_error,
+                               const char *filter_node_name,
+                               BlockCompletionFunc *cb, void *opaque,
+-                              bool auto_complete, Error **errp);
++                              bool auto_complete, const char *replaces,
++                              Error **errp);
+ /*
+  * mirror_start:
+  * @job_id: The id of the newly-created job, or %NULL to use the
+diff --git a/qapi/block-core.json b/qapi/block-core.json
+index ff441d4..50564c7 100644
+--- a/qapi/block-core.json
++++ b/qapi/block-core.json
+@@ -2098,6 +2098,8 @@
+ #     disappear from the query list without user intervention.
+ #     Defaults to true.  (Since 3.1)
+ #
++# @replaces: graph node name to be replaced base image node.
++#
+ # Features:
+ #
+ # @deprecated: Members @base and @top are deprecated.  Use @base-node
+@@ -2125,7 +2127,8 @@
+             '*speed': 'int',
+             '*on-error': 'BlockdevOnError',
+             '*filter-node-name': 'str',
+-            '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
++            '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
++            '*replaces': 'str' },
+   'allow-preconfig': true }
+ 
+ ##
+diff --git a/qemu-img.c b/qemu-img.c
+index a6c88e0..f6c59bc 100644
+--- a/qemu-img.c
++++ b/qemu-img.c
+@@ -1079,7 +1079,7 @@ static int img_commit(int argc, char **argv)
+ 
+     commit_active_start("commit", bs, base_bs, JOB_DEFAULT, rate_limit,
+                         BLOCKDEV_ON_ERROR_REPORT, NULL, common_block_job_cb,
+-                        &cbi, false, &local_err);
++                        &cbi, false, NULL, &local_err);
+     if (local_err) {
+         goto done;
+     }
+-- 
+2.39.5
+
diff --git a/debian/patches/series b/debian/patches/series
index b780c1f..e60cd29 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -65,3 +65,4 @@ pve/0047-PVE-backup-factor-out-setting-up-snapshot-access-for.patch
 pve/0048-PVE-backup-save-device-name-in-device-info-structure.patch
 pve/0049-PVE-backup-include-device-name-in-error-when-setting.patch
 pve/0050-adapt-machine-version-deprecation-for-Proxmox-VE.patch
+pve/0051-block-commit-add-replaces-option.patch
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support Alexandre Derumier via pve-devel
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 69367 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
Date: Tue, 11 Mar 2025 11:28:50 +0100
Message-ID: <20250311102905.2680524-3-alexandre.derumier@groupe-cyllene.com>

The blockdev chain is:
-throttle-group-node (drive-(ide|scsi|virtio)x)
    - format-node (fmt-drive-x)
         - file-node (file-drive -x)

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm                             | 195 +--------
 PVE/QemuServer/Drive.pm                       | 375 ++++++++++++++++++
 test/cfg2cmd/bootorder-empty.conf.cmd         |  12 +-
 test/cfg2cmd/bootorder-legacy.conf.cmd        |  12 +-
 test/cfg2cmd/bootorder.conf.cmd               |  12 +-
 ...putype-icelake-client-deprecation.conf.cmd |   6 +-
 test/cfg2cmd/ide.conf.cmd                     |  23 +-
 test/cfg2cmd/pinned-version-pxe-pve.conf.cmd  |   6 +-
 test/cfg2cmd/pinned-version-pxe.conf.cmd      |   6 +-
 test/cfg2cmd/pinned-version.conf.cmd          |   6 +-
 test/cfg2cmd/q35-ide.conf.cmd                 |  23 +-
 .../q35-linux-hostpci-template.conf.cmd       |   3 +-
 test/cfg2cmd/seabios_serial.conf.cmd          |   6 +-
 ...imple-balloon-free-page-reporting.conf.cmd |   6 +-
 test/cfg2cmd/simple-btrfs.conf.cmd            |   6 +-
 test/cfg2cmd/simple-virtio-blk.conf.cmd       |   6 +-
 test/cfg2cmd/simple1-template.conf.cmd        |  11 +-
 test/cfg2cmd/simple1-throttle.conf            |  14 +
 test/cfg2cmd/simple1-throttle.conf.cmd        |  33 ++
 test/cfg2cmd/simple1.conf.cmd                 |   6 +-
 20 files changed, 523 insertions(+), 244 deletions(-)
 create mode 100644 test/cfg2cmd/simple1-throttle.conf
 create mode 100644 test/cfg2cmd/simple1-throttle.conf.cmd

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 9d06ac8b..5fd155e5 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -54,7 +54,7 @@ use PVE::QemuServer::Helpers qw(config_aware_timeout min_version kvm_user_versio
 use PVE::QemuServer::Cloudinit;
 use PVE::QemuServer::CGroup;
 use PVE::QemuServer::CPUConfig qw(print_cpu_device get_cpu_options get_cpu_bitness is_native_arch get_amd_sev_object);
-use PVE::QemuServer::Drive qw(is_valid_drivename checked_volume_format drive_is_cloudinit drive_is_cdrom drive_is_read_only parse_drive print_drive);
+use PVE::QemuServer::Drive qw(is_valid_drivename checked_volume_format drive_is_cloudinit drive_is_cdrom drive_is_read_only parse_drive print_drive print_drive_throttle_group generate_drive_blockdev);
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::MetaInfo;
@@ -1386,7 +1386,10 @@ sub print_drivedevice_full {
 	} else {
 	    $device .= ",bus=ahci$controller.$unit";
 	}
-	$device .= ",drive=drive-$drive_id,id=$drive_id";
+	$device .= ",id=$drive_id";
+	#with blockdev, empty cdrom device don't have any blockdev attached, so drive param can't be declared
+	#with drive=none (and throttle-filter can't be defined without media too)
+	$device .= ",drive=drive-$drive_id" if $device_type ne 'cd' || $drive->{file} ne 'none';
 
 	if ($device_type eq 'hd') {
 	    if (my $model = $drive->{model}) {
@@ -1412,6 +1415,13 @@ sub print_drivedevice_full {
 	$device .= ",serial=$serial";
     }
 
+    my $writecache = $drive->{cache} && $drive->{cache} =~ /^(?:none|writeback|unsafe)$/  ? "on" : "off";
+    $device .= ",write-cache=$writecache" if $drive->{media} && $drive->{media} ne 'cdrom';
+
+    my @qemu_drive_options = qw(heads secs cyls trans rerror werror);
+    foreach my $o (@qemu_drive_options) {
+       $device .= ",$o=$drive->{$o}" if defined($drive->{$o});
+    }
 
     return $device;
 }
@@ -1430,150 +1440,6 @@ sub get_initiator_name {
     return $initiator;
 }
 
-my sub storage_allows_io_uring_default {
-    my ($scfg, $cache_direct) = @_;
-
-    # io_uring with cache mode writeback or writethrough on krbd will hang...
-    return if $scfg && $scfg->{type} eq 'rbd' && $scfg->{krbd} && !$cache_direct;
-
-    # io_uring with cache mode writeback or writethrough on LVM will hang, without cache only
-    # sometimes, just plain disable...
-    return if $scfg && $scfg->{type} eq 'lvm';
-
-    # io_uring causes problems when used with CIFS since kernel 5.15
-    # Some discussion: https://www.spinics.net/lists/linux-cifs/msg26734.html
-    return if $scfg && $scfg->{type} eq 'cifs';
-
-    return 1;
-}
-
-my sub drive_uses_cache_direct {
-    my ($drive, $scfg) = @_;
-
-    my $cache_direct = 0;
-
-    if (my $cache = $drive->{cache}) {
-	$cache_direct = $cache =~ /^(?:off|none|directsync)$/;
-    } elsif (!drive_is_cdrom($drive) && !($scfg && $scfg->{type} eq 'btrfs' && !$scfg->{nocow})) {
-	$cache_direct = 1;
-    }
-
-    return $cache_direct;
-}
-
-sub print_drive_commandline_full {
-    my ($storecfg, $vmid, $drive, $live_restore_name, $io_uring) = @_;
-
-    my $drive_id = PVE::QemuServer::Drive::get_drive_id($drive);
-
-    my ($storeid) = PVE::Storage::parse_volume_id($drive->{file}, 1);
-    my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
-
-    my ($path, $format) = PVE::QemuServer::Drive::get_path_and_format(
-	$storecfg, $vmid, $drive, $live_restore_name);
-
-    my $is_rbd = $path =~ m/^rbd:/;
-
-    my $opts = '';
-    my @qemu_drive_options = qw(heads secs cyls trans media cache rerror werror aio discard);
-    foreach my $o (@qemu_drive_options) {
-	$opts .= ",$o=$drive->{$o}" if defined($drive->{$o});
-    }
-
-    # snapshot only accepts on|off
-    if (defined($drive->{snapshot})) {
-	my $v = $drive->{snapshot} ? 'on' : 'off';
-	$opts .= ",snapshot=$v";
-    }
-
-    if (defined($drive->{ro})) { # ro maps to QEMUs `readonly`, which accepts `on` or `off` only
-	$opts .= ",readonly=" . ($drive->{ro} ? 'on' : 'off');
-    }
-
-    foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
-	my ($dir, $qmpname) = @$type;
-	if (my $v = $drive->{"mbps$dir"}) {
-	    $opts .= ",throttling.bps$qmpname=".int($v*1024*1024);
-	}
-	if (my $v = $drive->{"mbps${dir}_max"}) {
-	    $opts .= ",throttling.bps$qmpname-max=".int($v*1024*1024);
-	}
-	if (my $v = $drive->{"bps${dir}_max_length"}) {
-	    $opts .= ",throttling.bps$qmpname-max-length=$v";
-	}
-	if (my $v = $drive->{"iops${dir}"}) {
-	    $opts .= ",throttling.iops$qmpname=$v";
-	}
-	if (my $v = $drive->{"iops${dir}_max"}) {
-	    $opts .= ",throttling.iops$qmpname-max=$v";
-	}
-	if (my $v = $drive->{"iops${dir}_max_length"}) {
-	    $opts .= ",throttling.iops$qmpname-max-length=$v";
-	}
-    }
-
-    if ($live_restore_name) {
-	$format = "rbd" if $is_rbd;
-	die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
-	    if !$format;
-	$opts .= ",format=alloc-track,file.driver=$format";
-    } elsif ($format) {
-	$opts .= ",format=$format";
-    }
-
-    my $cache_direct = drive_uses_cache_direct($drive, $scfg);
-
-    $opts .= ",cache=none" if !$drive->{cache} && $cache_direct;
-
-    if (!$drive->{aio}) {
-	if ($io_uring && storage_allows_io_uring_default($scfg, $cache_direct)) {
-	    # io_uring supports all cache modes
-	    $opts .= ",aio=io_uring";
-	} else {
-	    # aio native works only with O_DIRECT
-	    if($cache_direct) {
-		$opts .= ",aio=native";
-	    } else {
-		$opts .= ",aio=threads";
-	    }
-	}
-    }
-
-    if (!drive_is_cdrom($drive)) {
-	my $detectzeroes;
-	if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
-	    $detectzeroes = 'off';
-	} elsif ($drive->{discard}) {
-	    $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
-	} else {
-	    # This used to be our default with discard not being specified:
-	    $detectzeroes = 'on';
-	}
-
-	# note: 'detect-zeroes' works per blockdev and we want it to persist
-	# after the alloc-track is removed, so put it on 'file' directly
-	my $dz_param = $live_restore_name ? "file.detect-zeroes" : "detect-zeroes";
-	$opts .= ",$dz_param=$detectzeroes" if $detectzeroes;
-    }
-
-    if ($live_restore_name) {
-	$opts .= ",backing=$live_restore_name";
-	$opts .= ",auto-remove=on";
-    }
-
-    # my $file_param = $live_restore_name ? "file.file.filename" : "file";
-    my $file_param = "file";
-    if ($live_restore_name) {
-	# non-rbd drivers require the underlying file to be a separate block
-	# node, so add a second .file indirection
-	$file_param .= ".file" if !$is_rbd;
-	$file_param .= ".filename";
-    }
-    my $pathinfo = $path ? "$file_param=$path," : '';
-
-    return "${pathinfo}if=none,id=drive-$drive->{interface}$drive->{index}$opts";
-}
-
 sub print_pbs_blockdev {
     my ($pbs_conf, $pbs_name) = @_;
     my $blockdev = "driver=pbs,node-name=$pbs_name,read-only=on";
@@ -3893,13 +3759,13 @@ sub config_to_command {
 	    push @$devices, '-blockdev', $live_restore->{blockdev};
 	}
 
-	my $drive_cmd = print_drive_commandline_full(
-	    $storecfg, $vmid, $drive, $live_blockdev_name, min_version($kvmver, 6, 0));
+	my $throttle_group = print_drive_throttle_group($drive);
+	push @$devices, '-object', $throttle_group if $throttle_group;
 
 	# extra protection for templates, but SATA and IDE don't support it..
-	$drive_cmd .= ',readonly=on' if drive_is_read_only($conf, $drive);
-
-	push @$devices, '-drive',$drive_cmd;
+	$drive->{ro} = 1 if drive_is_read_only($conf, $drive);
+	my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive, $live_blockdev_name);
+	push @$devices, '-blockdev', JSON->new->canonical->allow_nonref->encode($blockdev) if $blockdev;
 	push @$devices, '-device', print_drivedevice_full(
 	    $storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
     });
@@ -8171,33 +8037,6 @@ sub qemu_drive_mirror_switch_to_active_mode {
     }
 }
 
-# Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
-# source, but some storages have problems with io_uring, sometimes even leading to crashes.
-my sub clone_disk_check_io_uring {
-    my ($src_drive, $storecfg, $src_storeid, $dst_storeid, $use_drive_mirror) = @_;
-
-    return if !$use_drive_mirror;
-
-    # Don't complain when not changing storage.
-    # Assume if it works for the source, it'll work for the target too.
-    return if $src_storeid eq $dst_storeid;
-
-    my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
-    my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
-
-    my $cache_direct = drive_uses_cache_direct($src_drive);
-
-    my $src_uses_io_uring;
-    if ($src_drive->{aio}) {
-	$src_uses_io_uring = $src_drive->{aio} eq 'io_uring';
-    } else {
-	$src_uses_io_uring = storage_allows_io_uring_default($src_scfg, $cache_direct);
-    }
-
-    die "target storage is known to cause issues with aio=io_uring (used by current drive)\n"
-	if $src_uses_io_uring && !storage_allows_io_uring_default($dst_scfg, $cache_direct);
-}
-
 sub clone_disk {
     my ($storecfg, $source, $dest, $full, $newvollist, $jobs, $completion, $qga, $bwlimit) = @_;
 
diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
index 1041c1dd..9fe679dd 100644
--- a/PVE/QemuServer/Drive.pm
+++ b/PVE/QemuServer/Drive.pm
@@ -24,6 +24,8 @@ drive_is_read_only
 get_scsi_devicetype
 parse_drive
 print_drive
+print_drive_throttle_group
+generate_drive_blockdev
 );
 
 our $QEMU_FORMAT_RE = qr/raw|cow|qcow|qcow2|qed|vmdk|cloop/;
@@ -998,4 +1000,377 @@ sub get_scsi_device_type {
 
     return $devicetype;
 }
+
+my sub storage_allows_io_uring_default {
+    my ($scfg, $cache_direct) = @_;
+
+    # io_uring with cache mode writeback or writethrough on krbd will hang...
+    return if $scfg && $scfg->{type} eq 'rbd' && $scfg->{krbd} && !$cache_direct;
+
+    # io_uring with cache mode writeback or writethrough on LVM will hang, without cache only
+    # sometimes, just plain disable...
+    return if $scfg && $scfg->{type} eq 'lvm';
+
+    # io_uring causes problems when used with CIFS since kernel 5.15
+    # Some discussion: https://www.spinics.net/lists/linux-cifs/msg26734.html
+    return if $scfg && $scfg->{type} eq 'cifs';
+
+    return 1;
+}
+
+my sub drive_uses_cache_direct {
+    my ($drive, $scfg) = @_;
+
+    my $cache_direct = 0;
+
+    if (my $cache = $drive->{cache}) {
+	$cache_direct = $cache =~ /^(?:off|none|directsync)$/;
+    } elsif (!drive_is_cdrom($drive) && !($scfg && $scfg->{type} eq 'btrfs' && !$scfg->{nocow})) {
+	$cache_direct = 1;
+    }
+
+    return $cache_direct;
+}
+
+# Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
+# source, but some storages have problems with io_uring, sometimes even leading to crashes.
+sub clone_disk_check_io_uring {
+    my ($src_drive, $storecfg, $src_storeid, $dst_storeid, $use_drive_mirror) = @_;
+
+    return if !$use_drive_mirror;
+
+    # Don't complain when not changing storage.
+    # Assume if it works for the source, it'll work for the target too.
+    return if $src_storeid eq $dst_storeid;
+
+    my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
+    my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
+
+    my $cache_direct = drive_uses_cache_direct($src_drive);
+
+    my $src_uses_io_uring;
+    if ($src_drive->{aio}) {
+	$src_uses_io_uring = $src_drive->{aio} eq 'io_uring';
+    } else {
+	$src_uses_io_uring = storage_allows_io_uring_default($src_scfg, $cache_direct);
+    }
+
+    die "target storage is known to cause issues with aio=io_uring (used by current drive)\n"
+	if $src_uses_io_uring && !storage_allows_io_uring_default($dst_scfg, $cache_direct);
+}
+
+sub generate_blockdev_drive_aio {
+    my ($drive, $scfg) = @_;
+
+    my $cache_direct = drive_uses_cache_direct($drive, $scfg);
+    $drive->{aio} = 'threads' if drive_is_cdrom($drive);
+    my $aio = $drive->{aio};
+    if (!$aio) {
+	if (storage_allows_io_uring_default($scfg, $cache_direct)) {
+	    # io_uring supports all cache modes
+	    $aio = "io_uring";
+	} else {
+	    # aio native works only with O_DIRECT
+	    if($cache_direct) {
+		$aio = "native";
+	    } else {
+		$aio = "threads";
+	    }
+	}
+    }
+    return $aio;
+}
+
+sub generate_blockdev_drive_cache {
+    my ($drive, $scfg) = @_;
+
+    my $cache_direct = drive_uses_cache_direct($drive, $scfg);
+    my $cache = {};
+    $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
+    $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
+    return $cache;
+}
+
+sub generate_throttle_group {
+    my ($drive) = @_;
+
+    my $drive_id = get_drive_id($drive);
+
+    my $throttle_group = { id => "throttle-drive-$drive_id" };
+    my $limits = {};
+
+    foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
+       my ($dir, $qmpname) = @$type;
+
+       if (my $v = $drive->{"mbps$dir"}) {
+           $limits->{"bps$qmpname"} = int($v*1024*1024);
+       }
+       if (my $v = $drive->{"mbps${dir}_max"}) {
+           $limits->{"bps$qmpname-max"} = int($v*1024*1024);
+       }
+       if (my $v = $drive->{"bps${dir}_max_length"}) {
+           $limits->{"bps$qmpname-max-length"} = int($v)
+       }
+       if (my $v = $drive->{"iops${dir}"}) {
+           $limits->{"iops$qmpname"} = int($v);
+       }
+       if (my $v = $drive->{"iops${dir}_max"}) {
+           $limits->{"iops$qmpname-max"} = int($v);
+       }
+       if (my $v = $drive->{"iops${dir}_max_length"}) {
+           $limits->{"iops$qmpname-max-length"} = int($v);
+       }
+   }
+
+   $throttle_group->{limits} = $limits;
+
+   return $throttle_group;
+}
+
+sub print_drive_throttle_group {
+    my ($drive) = @_;
+
+    return if drive_is_cdrom($drive) && $drive->{file} eq 'none';
+
+    my $group = generate_throttle_group($drive);
+    $group->{'qom-type'} = "throttle-group";
+    return JSON->new->canonical->allow_nonref->encode($group)
+}
+
+sub generate_file_blockdev {
+    my ($storecfg, $drive, $snap, $nodename) = @_;
+
+    my $volid = $drive->{file};
+    my $blockdev = {};
+
+    my $scfg = undef;
+    my $path = $volid;
+    my $storeid = undef;
+    my $volname = undef;
+
+    if(!$drive->{format} || $drive->{format} ne 'nbd') {
+	($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
+	$scfg = PVE::Storage::storage_config($storecfg, $storeid);
+	$path = PVE::Storage::path($storecfg, $volid, $snap);
+    }
+
+    if ($path =~ m/^rbd:(\S+)$/) {
+
+        $blockdev->{driver} = 'rbd';
+
+	my @rbd_options = split(/:/, $1);
+	my $keyring = undef;
+	for my $option (@rbd_options) {
+	    if ($option =~ m/^(\S+)=(\S+)$/) {
+		my $key = $1;
+		my $value = $2;
+		$blockdev->{'auth-client-required'} = [$value] if $key eq 'auth_supported';
+		$blockdev->{'conf'} = $value if $key eq 'conf';
+		$blockdev->{'user'} = $value if $key eq 'id';
+		$keyring = $value if $key eq 'keyring';
+	        if ($key eq 'mon_host') {
+		    my $server = [];
+		    my @mons = split(';', $value);
+		    for my $mon (@mons) {
+			my ($host, $port) = PVE::Tools::parse_host_and_port($mon);
+			$port = '3300' if !$port;
+			push @$server, { host => $host, port => $port };
+		    }
+		    $blockdev->{server} = $server;
+		}
+	    } elsif ($option =~ m|^(\S+)/(\S+)$|){
+                $blockdev->{pool} = $1;
+		my $image = $2;
+
+                if($image =~ m|^(\S+)/(\S+)$|) {
+                    $blockdev->{namespace} = $1;
+                    $blockdev->{image} = $2;
+                } else {
+                    $blockdev->{image} = $image;
+                }
+            }
+	}
+
+	if($keyring && $blockdev->{server}) {
+	    #qemu devs are removed passing arbitrary values to blockdev object, and don't have added
+	    #keyring to the list of allowed keys. It need to be defined in the store ceph.conf.
+	    #https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02676.html
+	    #another way could be to simply patch qemu to allow the key
+	    my $ceph_conf = "/etc/pve/priv/ceph/${storeid}.conf";
+	    $blockdev->{conf} = $ceph_conf;
+	    if (!-e $ceph_conf) {
+		my $content = "[global]\nkeyring = $keyring\n";
+		PVE::Tools::file_set_contents($ceph_conf, $content, 0400);
+	    }
+	}
+    } elsif ($path =~ m/^nbd:(\S+):(\d+):exportname=(\S+)$/) {
+	my $server = { type => 'inet', host => $1, port => $2 };
+	$blockdev = { driver => 'nbd', server => $server, export => $3 };
+    } elsif ($path =~ m/^nbd:unix:(\S+):exportname=(\S+)$/) {
+	my $server = { type => 'unix', path => $1 };
+	$blockdev = { driver => 'nbd', server => $server, export => $2 };
+    } elsif ($path =~ m|^gluster(\+(tcp\|unix\|rdma))?://(.*)/(.*)/(images/(\S+)/(\S+))$|) {
+	my $protocol = $2 ? $2 : 'inet';
+	$protocol = 'inet' if $protocol eq 'tcp';
+	my $server = [{ type => $protocol, host => $3, port => '24007' }];
+	$blockdev = { driver => 'gluster', server => $server, volume => $4, path => $5 };
+    } elsif ($path =~ m/^\/dev/) {
+	my $driver = drive_is_cdrom($drive) ? 'host_cdrom' : 'host_device';
+	$blockdev = { driver => $driver, filename => $path };
+    } elsif ($path =~ m/^\//) {
+	$blockdev = { driver => 'file', filename => $path};
+    } else {
+	die "unsupported path: $path\n";
+	#fixme
+	#'{"driver":"iscsi","portal":"iscsi.example.com:3260","target":"demo-target","lun":3,"transport":"tcp"}'
+    }
+
+    $blockdev->{cache} = generate_blockdev_drive_cache($drive, $scfg);
+    #non-host qemu block driver (rbd, gluster,iscsi,..) don't have aio attribute
+    $blockdev->{aio} = generate_blockdev_drive_aio($drive, $scfg) if $blockdev->{filename};
+
+    ##discard && detect-zeroes
+    my $discard = 'ignore';
+    if($drive->{discard}) {
+	$discard = $drive->{discard};
+	$discard = 'unmap' if $discard eq 'on';
+    }
+    $blockdev->{discard} = $discard if !drive_is_cdrom($drive);
+
+    my $detect_zeroes;
+    if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
+	$detect_zeroes = 'off';
+    } elsif ($drive->{discard}) {
+	$detect_zeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
+    } else {
+	# This used to be our default with discard not being specified:
+	$detect_zeroes = 'on';
+    }
+    $blockdev->{'detect-zeroes'} = $detect_zeroes if !drive_is_cdrom($drive);
+
+    $nodename = encode_nodename('file', $volid, $snap) if !$nodename;
+    $blockdev->{'node-name'} = $nodename;
+
+    return $blockdev;
+}
+
+sub generate_format_blockdev {
+    my ($storecfg, $drive, $file, $snap, $nodename) = @_;
+
+    my $volid = $drive->{file};
+    die "format_blockdev can't be used for nbd" if $volid =~ /^nbd:/;
+
+    my $scfg = undef;
+    $nodename = encode_nodename('fmt', $volid, $snap) if !$nodename;
+
+    my $drive_id = get_drive_id($drive);
+
+    if ($drive->{zeroinit}) {
+	#fixme how to handle zeroinit ? insert special blockdev filter ?
+    }
+
+    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
+
+    # For PVE-managed volumes, use the format from the storage layer and prevent overrides via the
+    # drive's 'format' option. For unmanaged volumes, fallback to 'raw' to avoid auto-detection by
+    # QEMU.
+    my $format = undef;
+    if($storeid) {
+	$scfg = PVE::Storage::storage_config($storecfg, $storeid);
+	$format = checked_volume_format($storecfg, $volid);
+	if ($drive->{format} && $drive->{format} ne $format) {
+	    die "drive '$drive->{interface}$drive->{index}' - volume '$volid'"
+		." - 'format=$drive->{format}' option different from storage format '$format'\n";
+	}
+    } else {
+	$format = $drive->{format} // 'raw';
+    }
+
+    my $readonly = defined($drive->{ro}) ? JSON::true : JSON::false;
+
+    #libvirt define cache option on both format && file
+    my $cache = generate_blockdev_drive_cache($drive, $scfg);
+
+    my $blockdev = { 'node-name' => $nodename, driver => $format, file => $file, cache => $cache, 'read-only' => $readonly };
+
+    return $blockdev;
+}
+
+sub generate_drive_blockdev {
+    my ($storecfg, $vmid, $drive, $live_restore_name) = @_;
+
+    my $path;
+    my $volid = $drive->{file};
+    my $drive_id = get_drive_id($drive);
+
+    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
+    my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
+
+    if (drive_is_cdrom($drive)) {
+        die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
+
+	$path = get_iso_path($storecfg, $vmid, $volid);
+	#throttle-filter can't be defined without attached disk
+	return if !$path;
+	$drive->{ro} = 1;
+    }
+
+    my $blockdev_file = generate_file_blockdev($storecfg, $drive);
+    my $blockdev_format = generate_format_blockdev($storecfg, $drive, $blockdev_file);
+
+    my $blockdev_live_restore = undef;
+    if ($live_restore_name) {
+        die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
+            if !$drive->{format};
+
+        $blockdev_live_restore = { 'node-name' => "liverestore-drive-$drive_id",
+				    backing => $live_restore_name,
+				    'auto-remove' => 'on', format => "alloc-track",
+				    file => $blockdev_format };
+    }
+
+    #this is the topfilter entry point, use $drive-drive_id as nodename
+    my $blockdev_throttle = { driver => "throttle", 'node-name' => "drive-$drive_id", 'throttle-group' => "throttle-drive-$drive_id" };
+    #put liverestore filter between throttle && format filter
+    $blockdev_throttle->{file} = $live_restore_name ? $blockdev_live_restore : $blockdev_format;
+    return $blockdev_throttle,
+}
+
+sub encode_base62 {
+    my ($input) = @_;
+    my @chars = ('0'..'9', 'A'..'Z', 'a'..'z');
+    my $base = 62;
+    my $value = 0;
+
+    foreach my $byte (unpack('C*', $input)) {
+        $value = $value * 256 + $byte;
+    }
+
+    my $result = '';
+    while ($value > 0) {
+        $result = $chars[$value % $base] . $result;
+        $value = int($value / $base);
+    }
+
+    return $result || '0';
+}
+
+sub encode_nodename {
+    my ($type, $volid, $snap) = @_;
+
+    my $nodename = "$volid";
+    $nodename .= "-$snap" if $snap;
+    $nodename = encode_base62(Digest::SHA::sha1($nodename));
+    my $prefix = "";
+    if ($type eq 'fmt') {
+	$prefix = 'f';
+    } elsif ($type eq 'file') {
+	$prefix = 'e';
+    } else {
+	die "wrong node type";
+    }
+    #node-name start with an alpha character
+    return "$prefix-$nodename";
+}
+
 1;
diff --git a/test/cfg2cmd/bootorder-empty.conf.cmd b/test/cfg2cmd/bootorder-empty.conf.cmd
index 87fa6c28..4773d92a 100644
--- a/test/cfg2cmd/bootorder-empty.conf.cmd
+++ b/test/cfg2cmd/bootorder-empty.conf.cmd
@@ -25,14 +25,16 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2' \
   -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi4","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}'
   -device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-virtio0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
   -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-virtio1","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
   -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256' \
diff --git a/test/cfg2cmd/bootorder-legacy.conf.cmd b/test/cfg2cmd/bootorder-legacy.conf.cmd
index a4c3f050..4d31a46f 100644
--- a/test/cfg2cmd/bootorder-legacy.conf.cmd
+++ b/test/cfg2cmd/bootorder-legacy.conf.cmd
@@ -25,14 +25,16 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi4","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}' \
   -device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-virtio0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
   -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-virtio1","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
   -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1,bootindex=302' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=100' \
diff --git a/test/cfg2cmd/bootorder.conf.cmd b/test/cfg2cmd/bootorder.conf.cmd
index 76bd55d7..25e62e08 100644
--- a/test/cfg2cmd/bootorder.conf.cmd
+++ b/test/cfg2cmd/bootorder.conf.cmd
@@ -25,14 +25,16 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=103' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=103' \
   -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi4","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}' \
   -device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4,bootindex=102' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-virtio0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
   -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-virtio1","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
   -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=101' \
diff --git a/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd b/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
index bf084432..7545b2f2 100644
--- a/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
+++ b/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
@@ -23,9 +23,9 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/base-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-0.qcow2","node-name":"e-Nc8rhHZ7kcE2uuU2M8keyicwm0w"},"node-name":"f-Nc8rhHZ7kcE2uuU2M8keyicwm0w","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}'
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -machine 'type=pc+pve0'
diff --git a/test/cfg2cmd/ide.conf.cmd b/test/cfg2cmd/ide.conf.cmd
index 33c6aadc..8def69ba 100644
--- a/test/cfg2cmd/ide.conf.cmd
+++ b/test/cfg2cmd/ide.conf.cmd
@@ -23,16 +23,21 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/zero.iso,if=none,id=drive-ide0,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/one.iso,if=none,id=drive-ide1,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.0,unit=1,drive=drive-ide1,id=ide1,bootindex=201' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/two.iso,if=none,id=drive-ide2,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=202' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/three.iso,if=none,id=drive-ide3,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.1,unit=1,drive=drive-ide3,id=ide3,bootindex=203' \
+  -object '{"id":"throttle-drive-ide0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/zero.iso","node-name":"e-OO9IkxxtCYSqog6okQom0we4S48"},"node-name":"f-OO9IkxxtCYSqog6okQom0we4S48","read-only":true},"node-name":"drive-ide0","throttle-group":"throttle-drive-ide0"}' \
+  -device 'ide-cd,bus=ide.0,unit=0,id=ide0,drive=drive-ide0,bootindex=200' \
+  -object '{"id":"throttle-drive-ide1","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/one.iso","node-name":"e-OiteZ9aAusKmw6oIO8qucwmmmUU"},"node-name":"f-OiteZ9aAusKmw6oIO8qucwmmmUU","read-only":true},"node-name":"drive-ide1","throttle-group":"throttle-drive-ide1"}' \
+  -device 'ide-cd,bus=ide.0,unit=1,id=ide1,drive=drive-ide1,bootindex=201' \
+  -object '{"id":"throttle-drive-ide2","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/two.iso","node-name":"e-1Aib1Kemp2sgocAWokMGOyIQyQY"},"node-name":"f-1Aib1Kemp2sgocAWokMGOyIQyQY","read-only":true},"node-name":"drive-ide2","throttle-group":"throttle-drive-ide2"}' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,drive=drive-ide2,bootindex=202' \
+  -object '{"id":"throttle-drive-ide3","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/three.iso","node-name":"e-UKCOEDGubQ8AywsAyqqGIywCIWQ"},"node-name":"f-UKCOEDGubQ8AywsAyqqGIywCIWQ","read-only":true},"node-name":"drive-ide3","throttle-group":"throttle-drive-ide3"}' \
+  -device 'ide-cd,bus=ide.1,unit=1,id=ide3,drive=drive-ide3,bootindex=203' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/100/vm-100-disk-2.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/vm-100-disk-2.qcow2","node-name":"e-6zrMeiDDrkeISyGMGwACygKAISG"},"node-name":"f-6zrMeiDDrkeISyGMGwACygKAISG","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=2E:01:68:F9:9C:87,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd b/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
index d17d4deb..67090f5b 100644
--- a/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
+++ b/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
@@ -23,10 +23,10 @@
   -device 'virtio-rng-pci,rng=rng0,max-bytes=1024,period=1000,bus=pci.1,addr=0x1d' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"e-QrVmtMFNQG4wiK6key0AGkSGiE2"},"node-name":"f-QrVmtMFNQG4wiK6key0AGkSGiE2","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300,romfile=pxe-virtio.rom' \
diff --git a/test/cfg2cmd/pinned-version-pxe.conf.cmd b/test/cfg2cmd/pinned-version-pxe.conf.cmd
index 892fc148..9e14cd07 100644
--- a/test/cfg2cmd/pinned-version-pxe.conf.cmd
+++ b/test/cfg2cmd/pinned-version-pxe.conf.cmd
@@ -21,10 +21,10 @@
   -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"e-QrVmtMFNQG4wiK6key0AGkSGiE2"},"node-name":"f-QrVmtMFNQG4wiK6key0AGkSGiE2","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300,romfile=pxe-virtio.rom' \
diff --git a/test/cfg2cmd/pinned-version.conf.cmd b/test/cfg2cmd/pinned-version.conf.cmd
index 13361edf..895f21a6 100644
--- a/test/cfg2cmd/pinned-version.conf.cmd
+++ b/test/cfg2cmd/pinned-version.conf.cmd
@@ -21,10 +21,10 @@
   -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"e-QrVmtMFNQG4wiK6key0AGkSGiE2"},"node-name":"f-QrVmtMFNQG4wiK6key0AGkSGiE2","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
diff --git a/test/cfg2cmd/q35-ide.conf.cmd b/test/cfg2cmd/q35-ide.conf.cmd
index dd4f1bbe..406de59f 100644
--- a/test/cfg2cmd/q35-ide.conf.cmd
+++ b/test/cfg2cmd/q35-ide.conf.cmd
@@ -22,16 +22,21 @@
   -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/zero.iso,if=none,id=drive-ide0,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/one.iso,if=none,id=drive-ide1,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.2,unit=0,drive=drive-ide1,id=ide1,bootindex=201' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/two.iso,if=none,id=drive-ide2,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=202' \
-  -drive 'file=/mnt/pve/cifs-store/template/iso/three.iso,if=none,id=drive-ide3,media=cdrom,format=raw,aio=threads' \
-  -device 'ide-cd,bus=ide.3,unit=0,drive=drive-ide3,id=ide3,bootindex=203' \
+  -object '{"id":"throttle-drive-ide0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/zero.iso","node-name":"e-OO9IkxxtCYSqog6okQom0we4S48"},"node-name":"f-OO9IkxxtCYSqog6okQom0we4S48","read-only":true},"node-name":"drive-ide0","throttle-group":"throttle-drive-ide0"}' \
+  -device 'ide-cd,bus=ide.0,unit=0,id=ide0,drive=drive-ide0,bootindex=200' \
+  -object '{"id":"throttle-drive-ide1","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/one.iso","node-name":"e-OiteZ9aAusKmw6oIO8qucwmmmUU"},"node-name":"f-OiteZ9aAusKmw6oIO8qucwmmmUU","read-only":true},"node-name":"drive-ide1","throttle-group":"throttle-drive-ide1"}' \
+  -device 'ide-cd,bus=ide.2,unit=0,id=ide1,drive=drive-ide1,bootindex=201' \
+  -object '{"id":"throttle-drive-ide2","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/two.iso","node-name":"e-1Aib1Kemp2sgocAWokMGOyIQyQY"},"node-name":"f-1Aib1Kemp2sgocAWokMGOyIQyQY","read-only":true},"node-name":"drive-ide2","throttle-group":"throttle-drive-ide2"}' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,drive=drive-ide2,bootindex=202' \
+  -object '{"id":"throttle-drive-ide3","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/three.iso","node-name":"e-UKCOEDGubQ8AywsAyqqGIywCIWQ"},"node-name":"f-UKCOEDGubQ8AywsAyqqGIywCIWQ","read-only":true},"node-name":"drive-ide3","throttle-group":"throttle-drive-ide3"}' \
+  -device 'ide-cd,bus=ide.3,unit=0,id=ide3,drive=drive-ide3,bootindex=203' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/100/vm-100-disk-2.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/vm-100-disk-2.qcow2","node-name":"e-6zrMeiDDrkeISyGMGwACygKAISG"},"node-name":"f-6zrMeiDDrkeISyGMGwACygKAISG","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=2E:01:68:F9:9C:87,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd b/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
index cda10630..5f43da18 100644
--- a/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
+++ b/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
@@ -24,7 +24,8 @@
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/100/base-100-disk-2.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on,readonly=on' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/base-100-disk-2.raw","node-name":"e-3nPTM162JEOAymkwqg2Ww2QUioK"},"node-name":"f-3nPTM162JEOAymkwqg2Ww2QUioK","read-only":true},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
   -machine 'accel=tcg,type=pc+pve0' \
   -snapshot
diff --git a/test/cfg2cmd/seabios_serial.conf.cmd b/test/cfg2cmd/seabios_serial.conf.cmd
index 1c4e102c..066fb91d 100644
--- a/test/cfg2cmd/seabios_serial.conf.cmd
+++ b/test/cfg2cmd/seabios_serial.conf.cmd
@@ -23,10 +23,10 @@
   -device 'isa-serial,chardev=serial0' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd b/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
index 097a14e1..30a8c334 100644
--- a/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
+++ b/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
@@ -23,10 +23,10 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
diff --git a/test/cfg2cmd/simple-btrfs.conf.cmd b/test/cfg2cmd/simple-btrfs.conf.cmd
index c2354887..6279eaf1 100644
--- a/test/cfg2cmd/simple-btrfs.conf.cmd
+++ b/test/cfg2cmd/simple-btrfs.conf.cmd
@@ -23,10 +23,10 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/butter/bread/images/8006/vm-8006-disk-0/disk.raw,if=none,id=drive-scsi0,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":false,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/butter/bread/images/8006/vm-8006-disk-0/disk.raw","node-name":"e-Dc613MAbXUuSMOUYkqCWymoyGAM"},"node-name":"f-Dc613MAbXUuSMOUYkqCWymoyGAM","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple-virtio-blk.conf.cmd b/test/cfg2cmd/simple-virtio-blk.conf.cmd
index d19aca6b..372ff915 100644
--- a/test/cfg2cmd/simple-virtio-blk.conf.cmd
+++ b/test/cfg2cmd/simple-virtio-blk.conf.cmd
@@ -24,9 +24,9 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
+  -object '{"id":"throttle-drive-virtio0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
   -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple1-template.conf.cmd b/test/cfg2cmd/simple1-template.conf.cmd
index 35484600..6d8b405b 100644
--- a/test/cfg2cmd/simple1-template.conf.cmd
+++ b/test/cfg2cmd/simple1-template.conf.cmd
@@ -21,13 +21,14 @@
   -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/base-8006-disk-1.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap,readonly=on' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-1.qcow2","node-name":"e-ZRitpbHqRyeSoKUmIwwMc4Uq0oQ"},"node-name":"f-ZRitpbHqRyeSoKUmIwwMc4Uq0oQ","read-only":true},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
   -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
-  -drive 'file=/var/lib/vz/images/8006/base-8006-disk-0.qcow2,if=none,id=drive-sata0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
-  -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
+  -object '{"id":"throttle-drive-sata0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-0.qcow2","node-name":"e-Nc8rhHZ7kcE2uuU2M8keyicwm0w"},"node-name":"f-Nc8rhHZ7kcE2uuU2M8keyicwm0w","read-only":false},"node-name":"drive-sata0","throttle-group":"throttle-drive-sata0"}' \
+  -device 'ide-hd,bus=ahci0.0,id=sata0,drive=drive-sata0' \
   -machine 'accel=tcg,smm=off,type=pc+pve0' \
   -snapshot
diff --git a/test/cfg2cmd/simple1-throttle.conf b/test/cfg2cmd/simple1-throttle.conf
new file mode 100644
index 00000000..8e0f0f2f
--- /dev/null
+++ b/test/cfg2cmd/simple1-throttle.conf
@@ -0,0 +1,14 @@
+# TEST: Simple test for a basic configuration with no special things
+bootdisk: scsi0
+cores: 3
+ide2: none,media=cdrom
+memory: 768
+name: simple
+net0: virtio=A2:C0:43:77:08:A0,bridge=vmbr0
+numa: 0
+ostype: l26
+scsi0: local:8006/vm-8006-disk-0.qcow2,discard=on,size=104858K,iops_rd=10,iops_rd_max=10,iops_wr=20,iops_wr_max=20,iothread=1,mbps_rd=1,mbps_rd_max=1,mbps_wr=2,mbps_wr_max=2
+scsihw: virtio-scsi-pci
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 1
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
diff --git a/test/cfg2cmd/simple1-throttle.conf.cmd b/test/cfg2cmd/simple1-throttle.conf.cmd
new file mode 100644
index 00000000..f996f2c7
--- /dev/null
+++ b/test/cfg2cmd/simple1-throttle.conf.cmd
@@ -0,0 +1,33 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '3,sockets=1,cores=3,maxcpus=3' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 768 \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
+  -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
+  -object '{"id":"throttle-drive-scsi0","limits":{"bps-read":1048576,"bps-read-max":1048576,"bps-write":2097152,"bps-write-max":2097152,"iops-read":10,"iops-read-max":10,"iops-write":20,"iops-write-max":20},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
+  -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
+  -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
+  -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
+  -machine 'type=pc+pve0'
diff --git a/test/cfg2cmd/simple1.conf.cmd b/test/cfg2cmd/simple1.conf.cmd
index ecd14bcc..afde6fe7 100644
--- a/test/cfg2cmd/simple1.conf.cmd
+++ b/test/cfg2cmd/simple1.conf.cmd
@@ -23,10 +23,10 @@
   -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
   -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
   -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
-  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+  -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
   -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
-  -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"e-IQHs2Stp3mYmKYSGmUACmUu8i6u"},"node-name":"f-IQHs2Stp3mYmKYSGmUACmUu8i6u","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
   -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
   -netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
   -device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-04-01 13:50   ` Fabian Grünbichler
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 02/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 15030 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
Date: Tue, 11 Mar 2025 11:28:51 +0100
Message-ID: <20250311102905.2680524-4-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 src/PVE/Storage.pm           |   4 +-
 src/PVE/Storage/DirPlugin.pm |   1 +
 src/PVE/Storage/Plugin.pm    | 232 +++++++++++++++++++++++++++++------
 3 files changed, 196 insertions(+), 41 deletions(-)

diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index 3b4f041..79e5c3a 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -1002,7 +1002,7 @@ sub unmap_volume {
 }
 
 sub vdisk_alloc {
-    my ($cfg, $storeid, $vmid, $fmt, $name, $size) = @_;
+    my ($cfg, $storeid, $vmid, $fmt, $name, $size, $backing) = @_;
 
     die "no storage ID specified\n" if !$storeid;
 
@@ -1025,7 +1025,7 @@ sub vdisk_alloc {
     # lock shared storage
     return $plugin->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
 	my $old_umask = umask(umask|0037);
-	my $volname = eval { $plugin->alloc_image($storeid, $scfg, $vmid, $fmt, $name, $size) };
+	my $volname = eval { $plugin->alloc_image($storeid, $scfg, $vmid, $fmt, $name, $size, $backing) };
 	my $err = $@;
 	umask $old_umask;
 	die $err if $err;
diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
index fb23e0a..1cd7ac3 100644
--- a/src/PVE/Storage/DirPlugin.pm
+++ b/src/PVE/Storage/DirPlugin.pm
@@ -81,6 +81,7 @@ sub options {
 	is_mountpoint => { optional => 1 },
 	bwlimit => { optional => 1 },
 	preallocation => { optional => 1 },
+	snapext => { optional => 1 },
    };
 }
 
diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
index 65cf43f..d7f485f 100644
--- a/src/PVE/Storage/Plugin.pm
+++ b/src/PVE/Storage/Plugin.pm
@@ -216,6 +216,11 @@ my $defaultData = {
 	    maximum => 65535,
 	    optional => 1,
 	},
+        'snapext' => {
+	    type => 'boolean',
+	    description => 'enable external snapshot.',
+	    optional => 1,
+        },
     },
 };
 
@@ -716,7 +721,11 @@ sub filesystem_path {
 
     my $dir = $class->get_subdir($scfg, $vtype);
 
-    $dir .= "/$vmid" if $vtype eq 'images';
+    if ($scfg->{snapext} && $snapname) {
+	$name = $class->get_snap_volname($volname, $snapname);
+    } else {
+	$dir .= "/$vmid" if $vtype eq 'images';
+    }
 
     my $path = "$dir/$name";
 
@@ -873,7 +882,7 @@ sub clone_image {
 }
 
 sub alloc_image {
-    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
+    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;
 
     my $imagedir = $class->get_subdir($scfg, 'images');
     $imagedir .= "/$vmid";
@@ -901,17 +910,11 @@ sub alloc_image {
 	umask $old_umask;
 	die $err if $err;
     } else {
-	my $cmd = ['/usr/bin/qemu-img', 'create'];
-
-	my $prealloc_opt = preallocation_cmd_option($scfg, $fmt);
-	push @$cmd, '-o', $prealloc_opt if defined($prealloc_opt);
 
-	push @$cmd, '-f', $fmt, $path, "${size}K";
-
-	eval { run_command($cmd, errmsg => "unable to create image"); };
+	eval { qemu_img_create($scfg, $fmt, $size, $path, $backing) };
 	if ($@) {
 	    unlink $path;
-	    rmdir $imagedir;
+	    rmdir $imagedir if !$backing;
 	    die "$@";
 	}
     }
@@ -955,6 +958,50 @@ sub free_image {
 # TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
 my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
 
+sub qemu_img_create {
+    my ($scfg, $fmt, $size, $path, $backing) = @_;
+
+    my $cmd = ['/usr/bin/qemu-img', 'create'];
+
+    my $options = [];
+
+    if($backing) {
+	push @$cmd, '-b', $backing, '-F', 'qcow2';
+	push @$options, 'extended_l2=on','cluster_size=128k';
+    };
+    push @$options, preallocation_cmd_option($scfg, $fmt);
+    push @$cmd, '-o', join(',', @$options) if @$options > 0;
+    push @$cmd, '-f', $fmt, $path;
+    push @$cmd, "${size}K" if !$backing;
+
+    run_command($cmd, errmsg => "unable to create image");
+}
+
+sub qemu_img_info {
+    my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
+
+    my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
+    push $cmd->@*, '-f', $file_format if $file_format;
+    push $cmd->@*, '--backing-chain' if $follow_backing_files;
+
+    my $json = '';
+    my $err_output = '';
+    eval {
+        run_command($cmd,
+            timeout => $timeout,
+            outfunc => sub { $json .= shift },
+            errfunc => sub { $err_output .= shift . "\n"},
+        );
+    };
+    warn $@ if $@;
+    if ($err_output) {
+        # if qemu did not output anything to stdout we die with stderr as an error
+        die $err_output if !$json;
+        # otherwise we warn about it and try to parse the json
+        warn $err_output;
+    }
+    return $json;
+}
 # set $untrusted if the file in question might be malicious since it isn't
 # created by our stack
 # this makes certain checks fatal, and adds extra checks for known problems like
@@ -1018,25 +1065,9 @@ sub file_size_info {
 	warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
 	$file_format = 'raw';
     }
-    my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
-    push $cmd->@*, '-f', $file_format if $file_format;
 
-    my $json = '';
-    my $err_output = '';
-    eval {
-	run_command($cmd,
-	    timeout => $timeout,
-	    outfunc => sub { $json .= shift },
-	    errfunc => sub { $err_output .= shift . "\n"},
-	);
-    };
-    warn $@ if $@;
-    if ($err_output) {
-	# if qemu did not output anything to stdout we die with stderr as an error
-	die $err_output if !$json;
-	# otherwise we warn about it and try to parse the json
-	warn $err_output;
-    }
+    my $json = qemu_img_info($filename, $file_format, $timeout);
+
     if (!$json) {
 	die "failed to query file information with qemu-img\n" if $untrusted;
 	# skip decoding if there was no output, e.g. if there was a timeout.
@@ -1162,11 +1193,29 @@ sub volume_snapshot {
 
     die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
 
-    my $path = $class->filesystem_path($scfg, $volname);
+    if($scfg->{snapext}) {
+
+	my $path = $class->path($scfg, $volname, $storeid);
+	my $snappath = $class->path($scfg, $volname, $storeid, $snap);
+	#rename current volume to snap volume
+	die "snapshot volume $snappath already exist\n" if -e $snappath;
+	rename($path, $snappath) if -e $path;
+
+	my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
+	    $class->parse_volname($volname);
+
+	$class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $name, undef, $snappath);
+	if ($@) {
+	    eval { $class->free_image($storeid, $scfg, $volname, 0) };
+	    warn $@ if $@;
+	}
 
-    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
+    } else {
 
-    run_command($cmd);
+	my $path = $class->filesystem_path($scfg, $volname);
+	my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
+	run_command($cmd);
+    }
 
     return undef;
 }
@@ -1177,6 +1226,21 @@ sub volume_snapshot {
 sub volume_rollback_is_possible {
     my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
 
+    if ($scfg->{snapext}) {
+	#technically, we could manage multibranch, we it need lot more work for snapshot delete
+	#we need to implemente block-stream from deleted snapshot to all others child branchs
+	#when online, we need to do a transaction for multiple disk when delete the last snapshot
+	#and need to merge in current running file
+
+	my $snappath = $class->path($scfg, $volname, $storeid, $snap);
+	my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+	my $parentsnap = $snapshots->{current}->{parent};
+
+	return 1 if $snapshots->{$parentsnap}->{file} eq $snappath;
+
+	die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
+    }
+
     return 1;
 }
 
@@ -1187,9 +1251,15 @@ sub volume_snapshot_rollback {
 
     my $path = $class->filesystem_path($scfg, $volname);
 
-    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
-
-    run_command($cmd);
+    if ($scfg->{snapext}) {
+	#simply delete the current snapshot and recreate it
+	my $path = $class->filesystem_path($scfg, $volname);
+	unlink($path);
+	$class->volume_snapshot($scfg, $storeid, $volname, $snap);
+    } else {
+	my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
+	run_command($cmd);
+    }
 
     return undef;
 }
@@ -1201,13 +1271,49 @@ sub volume_snapshot_delete {
 
     return 1 if $running;
 
+    my $cmd = "";
     my $path = $class->filesystem_path($scfg, $volname);
 
-    $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
+    if ($scfg->{snapext}) {
+
+	my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+	my $snappath = $snapshots->{$snap}->{file};
+	die "volume $snappath is missing" if !-e $snappath;
 
-    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
+	my $parentsnap = $snapshots->{$snap}->{parent};
+	my $childsnap = $snapshots->{$snap}->{child};
 
-    run_command($cmd);
+	my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
+	my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
+
+	#if first snapshot,as it should be bigger,  we merge child, and rename the snapshot to child
+	if(!$parentsnap) {
+	    print"commit $childpath\n";
+	    $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
+	    eval { run_command($cmd) };
+	    if ($@) {
+		die "error commiting $childpath to $parentpath; $@\n";
+	    }
+	    print"rename $snappath to $childpath\n";
+	    rename($snappath, $childpath);
+	} else {
+	    #we rebase the child image on the parent as new backing image
+	    die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;
+	    $cmd = ['/usr/bin/qemu-img', 'rebase', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];
+	    eval { run_command($cmd) };
+	    if ($@) {
+		die "error rebase $childpath from $parentpath; $@\n";
+	    }
+	    #delete the snapshot
+	    unlink($snappath);
+	}
+
+    } else {
+	$class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
+
+	$cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
+	run_command($cmd);
+    }
 
     return undef;
 }
@@ -1246,7 +1352,7 @@ sub volume_has_feature {
 	    current => { qcow2 => 1, raw => 1, vmdk => 1 },
 	},
 	rename => {
-	    current => {qcow2 => 1, raw => 1, vmdk => 1},
+	    current => { qcow2 => 1, raw => 1, vmdk => 1},
 	},
     };
 
@@ -1481,7 +1587,37 @@ sub status {
 sub volume_snapshot_info {
     my ($class, $scfg, $storeid, $volname) = @_;
 
-    die "volume_snapshot_info is not implemented for $class";
+    my $path = $class->filesystem_path($scfg, $volname);
+
+    my $backing_chain = 1;
+    my $json = qemu_img_info($path, undef, 10, $backing_chain);
+    die "failed to query file information with qemu-img\n" if !$json;
+    my $snapshots = eval { decode_json($json) };
+
+    my $info = {};
+    my $order = 0;
+    for my $snap (@$snapshots) {
+
+	my $snapfile = $snap->{filename};
+	my $snapname = parse_snapname($snapfile);
+	$snapname = 'current' if !$snapname;
+	my $snapvolname = $class->get_snap_volname($volname, $snapname);
+
+	$info->{$snapname}->{order} = $order;
+	$info->{$snapname}->{file}= $snapfile;
+	$info->{$snapname}->{volname} = $snapvolname;
+	$info->{$snapname}->{volid} = "$storeid:$snapvolname";
+	$info->{$snapname}->{ext} = 1;
+
+	my $parentfile = $snap->{'backing-filename'};
+	if ($parentfile) {
+	    my $parentname = parse_snapname($parentfile);
+	    $info->{$snapname}->{parent} = $parentname;
+	    $info->{$parentname}->{child} = $snapname;
+	}
+	$order++;
+    }
+    return $info;
 }
 
 sub activate_storage {
@@ -1867,4 +2003,22 @@ sub config_aware_base_mkdir {
     }
 }
 
+sub get_snap_volname {
+    my ($class, $volname, $snapname) = @_;
+
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
+    $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";
+    return $name;
+}
+
+sub parse_snapname {
+    my ($name) = @_;
+
+    my $basename = basename($name);
+    if ($basename =~ m/^snap-(.*)-vm(.*)$/) {
+	return $1;
+    }
+    return undef;
+}
+
 1;
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 02/11] blockdev : convert qemu_driveadd && qemu_drivedel
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (2 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 5441 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 02/11] blockdev : convert qemu_driveadd && qemu_drivedel
Date: Tue, 11 Mar 2025 11:28:52 +0100
Message-ID: <20250311102905.2680524-5-alexandre.derumier@groupe-cyllene.com>

fixme/testme :
PVE/VZDump/QemuServer.pm:    eval { PVE::QemuServer::qemu_drivedel($vmid, "tpmstate0-backup"); };

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 5fd155e5..9ad12186 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4144,32 +4144,25 @@ sub qemu_iothread_del {
 }
 
 sub qemu_driveadd {
-    my ($storecfg, $vmid, $device) = @_;
+    my ($storecfg, $vmid, $drive) = @_;
 
-    my $kvmver = get_running_qemu_version($vmid);
-    my $io_uring = min_version($kvmver, 6, 0);
-    my $drive = print_drive_commandline_full($storecfg, $vmid, $device, undef, $io_uring);
-    $drive =~ s/\\/\\\\/g;
-    my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"", 60);
+    my $drive_id = get_drive_id($drive);
+    # always add a throttle-group, as it's mandatory for the throttle-filter root node.
+    my $throttle_group = generate_throttle_group($drive);
+    mon_cmd($vmid, 'object-add', "qom-type" => "throttle-group", %$throttle_group);
 
-    # If the command succeeds qemu prints: "OK"
-    return 1 if $ret =~ m/OK/s;
-
-    die "adding drive failed: $ret\n";
+    # The throttle filter is the root node with a stable name attached to the device,
+    # and currently it's not possible to insert it later
+    my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
+    mon_cmd($vmid, 'blockdev-add', %$blockdev, timeout => 10 * 60);
+    return 1;
 }
 
 sub qemu_drivedel {
     my ($vmid, $deviceid) = @_;
 
-    my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_del drive-$deviceid", 10 * 60);
-    $ret =~ s/^\s+//;
-
-    return 1 if $ret eq "";
-
-    # NB: device not found errors mean the drive was auto-deleted and we ignore the error
-    return 1 if $ret =~ m/Device \'.*?\' not found/s;
-
-    die "deleting drive $deviceid failed : $ret\n";
+    mon_cmd($vmid, 'blockdev-del', 'node-name' => "drive-$deviceid", timeout => 10 * 60);
+    mon_cmd($vmid, 'object-del', id => "throttle-drive-$deviceid");
 }
 
 sub qemu_deviceaddverify {
@@ -4404,7 +4397,7 @@ sub qemu_block_set_io_throttle {
 
     return if !check_running($vmid) ;
 
-    mon_cmd($vmid, "block_set_io_throttle", device => $deviceid,
+    mon_cmd($vmid, "block_set_io_throttle", id => $deviceid,
 	bps => int($bps),
 	bps_rd => int($bps_rd),
 	bps_wr => int($bps_wr),
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (3 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 02/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-04-01 13:50   ` Fabian Grünbichler
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 03/11] replace qemu_block_set_io_throttle with qom-set throttlegroup limits Alexandre Derumier via pve-devel
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 15304 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot
Date: Tue, 11 Mar 2025 11:28:53 +0100
Message-ID: <20250311102905.2680524-6-alexandre.derumier@groupe-cyllene.com>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1: Type: text/plain; charset=y, Size: 12558 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot
Date: Tue, 11 Mar 2025 11:28:53 +0100
Message-ID: <20250311102905.2680524-6-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 src/PVE/Storage/LVMPlugin.pm | 228 ++++++++++++++++++++++++++++++++---
 1 file changed, 210 insertions(+), 18 deletions(-)

diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
index 38f7fa1..19dbd7e 100644
--- a/src/PVE/Storage/LVMPlugin.pm
+++ b/src/PVE/Storage/LVMPlugin.pm
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 
 use IO::File;
+use POSIX qw/ceil/;
 
 use PVE::Tools qw(run_command trim);
 use PVE::Storage::Plugin;
@@ -218,6 +219,7 @@ sub type {
 sub plugindata {
     return {
 	content => [ {images => 1, rootdir => 1}, { images => 1 }],
+	format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
     };
 }
 
@@ -293,7 +295,10 @@ sub parse_volname {
     PVE::Storage::Plugin::parse_lvm_name($volname);
 
     if ($volname =~ m/^(vm-(\d+)-\S+)$/) {
-	return ('images', $1, $2, undef, undef, undef, 'raw');
+	my $name = $1;
+	my $vmid = $2;
+	my $format = $volname =~ m/\.qcow2$/ ? 'qcow2' : 'raw';
+	return ('images', $name, $vmid, undef, undef, undef, $format);
     }
 
     die "unable to parse lvm volume name '$volname'\n";
@@ -302,11 +307,13 @@ sub parse_volname {
 sub filesystem_path {
     my ($class, $scfg, $volname, $snapname) = @_;
 
-    die "lvm snapshot is not implemented"if defined($snapname);
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+	$class->parse_volname($volname);
 
-    my ($vtype, $name, $vmid) = $class->parse_volname($volname);
+    die "snapshot is working with qcow2 format only" if defined($snapname) && $format ne 'qcow2';
 
     my $vg = $scfg->{vgname};
+    $name = $class->get_snap_volname($volname, $snapname) if $snapname;
 
     my $path = "/dev/$vg/$name";
 
@@ -334,7 +341,9 @@ sub find_free_diskname {
 
     my $disk_list = [ keys %{$lvs->{$vg}} ];
 
-    return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, undef, $scfg);
+    $add_fmt_suffix = $fmt eq 'qcow2' ? 1 : undef;
+
+    return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix);
 }
 
 sub lvcreate {
@@ -363,9 +372,9 @@ sub lvrename {
 }
 
 sub alloc_image {
-    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
+    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;
 
-    die "unsupported format '$fmt'" if $fmt ne 'raw';
+    die "unsupported format '$fmt'" if $fmt !~ m/(raw|qcow2)/;
 
     die "illegal name '$name' - should be 'vm-$vmid-*'\n"
 	if  $name && $name !~ m/^vm-$vmid-/;
@@ -378,12 +387,36 @@ sub alloc_image {
 
     my $free = int($vgs->{$vg}->{free});
 
+
+    #add extra space for qcow2 metadatas
+    #without sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size
+    #with sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size / 16
+                                   #4MB overhead for 1TB with extented l2 clustersize=128k
+
+    #can't use qemu-img measure, because it's not possible to define options like clustersize && extended_l2
+    #verification has been done with : qemu-img create -f qcow2 -o extended_l2=on,cluster_size=128k test.img 1G
+
+    my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
+
+    my $lvmsize = $size;
+    $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
+
     die "not enough free space ($free < $size)\n" if $free < $size;
 
-    $name = $class->find_free_diskname($storeid, $scfg, $vmid)
+    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
 	if !$name;
 
-    lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
+    my $tags = ["pve-vm-$vmid"];
+    #tags all snapshots volumes with the main volume tag for easier activation of the whole group
+    push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
+    lvcreate($vg, $name, $lvmsize, $tags);
+
+    if ($fmt eq 'qcow2') {
+	#format the lvm volume with qcow2 format
+	$class->activate_volume($storeid, $scfg, $name, undef, {});
+	my $path = $class->path($scfg, $name, $storeid);
+	PVE::Storage::Plugin::qemu_img_create($scfg, $fmt, $size, $path, $backing);
+    }
 
     return $name;
 }
@@ -538,6 +571,12 @@ sub activate_volume {
 
     my $lvm_activate_mode = 'ey';
 
+    #activate volume && all snapshots volumes by tag
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+	$class->parse_volname($volname);
+
+    $path = "\@pve-$name" if $format eq 'qcow2';
+
     my $cmd = ['/sbin/lvchange', "-a$lvm_activate_mode", $path];
     run_command($cmd, errmsg => "can't activate LV '$path'");
     $cmd = ['/sbin/lvchange', '--refresh', $path];
@@ -550,6 +589,10 @@ sub deactivate_volume {
     my $path = $class->path($scfg, $volname, $storeid, $snapname);
     return if ! -b $path;
 
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+	$class->parse_volname($volname);
+    $path = "\@pve-$name" if $format eq 'qcow2';
+
     my $cmd = ['/sbin/lvchange', '-aln', $path];
     run_command($cmd, errmsg => "can't deactivate LV '$path'");
 }
@@ -557,15 +600,27 @@ sub deactivate_volume {
 sub volume_resize {
     my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
 
-    $size = ($size/1024/1024) . "M";
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+	$class->parse_volname($volname);
+
+    my $lvmsize = $size / 1024;
+    my $qcow2_overhead = ceil($size/1024/1024/1024/1024) * 4096;
+    $lvmsize += $qcow2_overhead if $format eq 'qcow2';
+    $lvmsize = "${lvmsize}k";
 
     my $path = $class->path($scfg, $volname);
-    my $cmd = ['/sbin/lvextend', '-L', $size, $path];
+    my $cmd = ['/sbin/lvextend', '-L', $lvmsize, $path];
 
     $class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
 	run_command($cmd, errmsg => "error resizing volume '$path'");
     });
 
+    if(!$running && $format eq 'qcow2') {
+	my $prealloc_opt = PVE::Storage::Plugin::preallocation_cmd_option($scfg, $format);
+	my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', $format, $path , $size];
+	run_command($cmd, timeout => 10);
+    }
+
     return 1;
 }
 
@@ -587,30 +642,159 @@ sub volume_size_info {
 sub volume_snapshot {
     my ($class, $scfg, $storeid, $volname, $snap) = @_;
 
-    die "lvm snapshot is not implemented";
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+        $class->parse_volname($volname);
+
+    die "can't snapshot this image format\n" if $format ne 'qcow2';
+
+    $class->activate_volume($storeid, $scfg, $volname, undef, {});
+
+    my $snap_volname = $class->get_snap_volname($volname, $snap);
+    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+    my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
+
+    #rename current lvm volume to snap volume
+    my $vg = $scfg->{vgname};
+    print"rename $volname to $snap_volname\n";
+    eval { lvrename($vg, $volname, $snap_volname); };
+    if ($@) {
+	die "can't rename lvm volume from $volname to $snap_volname: $@ \n";
+    }
+
+    eval { $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024, $snap_path); };
+    if ($@) {
+        eval { $class->free_image($storeid, $scfg, $volname, 0) };
+        warn $@ if $@;
+    }
 }
 
+sub volume_rollback_is_possible {
+    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
+
+    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+    $class->activate_volume($storeid, $scfg, $volname, undef, {});
+    my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+    my $parent_snap = $snapshots->{current}->{parent};
+
+    return 1 if $snapshots->{$parent_snap}->{file} eq $snap_path;
+    die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
+
+    return 1;
+}
+
+
 sub volume_snapshot_rollback {
     my ($class, $scfg, $storeid, $volname, $snap) = @_;
 
-    die "lvm snapshot rollback is not implemented";
+    die "can't rollback snapshot for this image format\n" if $volname !~ m/\.(qcow2)$/;
+
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+        $class->parse_volname($volname);
+
+    $class->activate_volume($storeid, $scfg, $volname, undef, {});
+    my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
+    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+    #simply delete the current snapshot and recreate it
+    eval { $class->free_image($storeid, $scfg, $volname, 0) };
+    if ($@) {
+	die "can't delete old volume $volname: $@\n";
+    }
+
+    eval { $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024, $snap_path) };
+    if ($@) {
+	die "can't allocate new volume $volname: $@\n";
+    }
+
+    return undef;
 }
 
 sub volume_snapshot_delete {
-    my ($class, $scfg, $storeid, $volname, $snap) = @_;
+    my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
+
+   die "can't delete snapshot for this image format\n" if $volname !~ m/\.(qcow2)$/;
+
+   return 1 if $running;
+
+   my $cmd = "";
+   my $path = $class->filesystem_path($scfg, $volname);
+
+   my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+   my $snap_path = $snapshots->{$snap}->{file};
+   my $snap_volname = $snapshots->{$snap}->{volname};
+   die "volume $snap_path is missing" if !-e $snap_path;
 
-    die "lvm snapshot delete is not implemented";
+   my $parent_snap = $snapshots->{$snap}->{parent};
+   my $child_snap = $snapshots->{$snap}->{child};
+
+   my $parent_path = $snapshots->{$parent_snap}->{file} if $parent_snap;
+   my $child_path = $snapshots->{$child_snap}->{file} if $child_snap;
+   my $child_volname = $snapshots->{$child_snap}->{volname} if $child_snap;
+
+   #if first snapshot,as it should be bigger,  we merge child, and rename the snapshot to child
+   if(!$parent_snap) {
+	print"commit $child_path\n";
+	$cmd = ['/usr/bin/qemu-img', 'commit', $child_path];
+	eval {	run_command($cmd) };
+	if ($@) {
+	    die "error commiting $child_path to $parent_path: $@\n";
+	}
+	print"delete $child_volname\n";
+	eval { $class->free_image($storeid, $scfg, $child_volname, 0) };
+	if ($@) {
+	    die "error delete old snapshot volume $child_volname: $@\n";
+	}
+	print"rename $snap_volname to $child_volname\n";
+	my $vg = $scfg->{vgname};
+	eval { lvrename($vg, $snap_volname, $child_volname) };
+	if ($@) {
+	    die "error renaming snapshot: $@\n";
+	}
+    } else {
+	#we rebase the child image on the parent as new backing image
+	die "missing parentsnap snapshot to rebase child $child_path\n" if !$parent_path;
+	print "link $child_snap to $parent_snap\n";
+	$cmd = ['/usr/bin/qemu-img', 'rebase', '-b', $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
+	eval { run_command($cmd) };
+	if ($@) {
+	    die "error rebase $child_path with $parent_path; $@\n";
+	}
+	#delete the snapshot
+	eval { $class->free_image($storeid, $scfg, $snap_volname, 0); };
+	if ($@) {
+	    die "error delete old snapshot volume $snap_volname: $@\n";
+	}
+    }
 }
 
 sub volume_has_feature {
     my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
 
     my $features = {
-	copy => { base => 1, current => 1},
-	rename => {current => 1},
+        copy => {
+            base => { qcow2 => 1, raw => 1},
+            current => { qcow2 => 1, raw => 1},
+            snap => { qcow2 => 1 },
+        },
+        'rename' => {
+            current => { qcow2 => 1, raw => 1},
+        },
+        snapshot => {
+            current => { qcow2 => 1 },
+            snap => { qcow2 => 1 },
+        },
+        template => {
+            current => { qcow2 => 1, raw => 1},
+        },
+	clone => {
+	    base => { qcow2 => 1, raw => 1 },
+	},
     };
 
-    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase) =
+
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
 	$class->parse_volname($volname);
 
     my $key = undef;
@@ -619,7 +803,7 @@ sub volume_has_feature {
     }else{
 	$key =  $isBase ? 'base' : 'current';
     }
-    return 1 if $features->{$feature}->{$key};
+    return 1 if defined($features->{$feature}->{$key}->{$format});
 
     return undef;
 }
@@ -740,4 +924,12 @@ sub rename_volume {
     return "${storeid}:${target_volname}";
 }
 
+sub get_snap_volname {
+    my ($class, $volname, $snapname) = @_;
+
+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
+    $name = !$snapname || $snapname eq 'current' ? $volname : "snap-$snapname-$name";
+    return $name;
+}
+
 1;
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 03/11] replace qemu_block_set_io_throttle with qom-set throttlegroup limits
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (4 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 5532 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 03/11] replace qemu_block_set_io_throttle with qom-set throttlegroup limits
Date: Tue, 11 Mar 2025 11:28:54 +0100
Message-ID: <20250311102905.2680524-7-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm | 49 ++++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 9ad12186..faa17edb 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4388,7 +4388,7 @@ sub qemu_cpu_hotplug {
     }
 }
 
-sub qemu_block_set_io_throttle {
+sub qemu_blockdev_set_throttle_limits {
     my ($vmid, $deviceid,
 	$bps, $bps_rd, $bps_wr, $iops, $iops_rd, $iops_wr,
 	$bps_max, $bps_rd_max, $bps_wr_max, $iops_max, $iops_rd_max, $iops_wr_max,
@@ -4397,27 +4397,32 @@ sub qemu_block_set_io_throttle {
 
     return if !check_running($vmid) ;
 
-    mon_cmd($vmid, "block_set_io_throttle", id => $deviceid,
-	bps => int($bps),
-	bps_rd => int($bps_rd),
-	bps_wr => int($bps_wr),
-	iops => int($iops),
-	iops_rd => int($iops_rd),
-	iops_wr => int($iops_wr),
-	bps_max => int($bps_max),
-	bps_rd_max => int($bps_rd_max),
-	bps_wr_max => int($bps_wr_max),
-	iops_max => int($iops_max),
-	iops_rd_max => int($iops_rd_max),
-	iops_wr_max => int($iops_wr_max),
-	bps_max_length => int($bps_max_length),
-	bps_rd_max_length => int($bps_rd_max_length),
-	bps_wr_max_length => int($bps_wr_max_length),
-	iops_max_length => int($iops_max_length),
-	iops_rd_max_length => int($iops_rd_max_length),
-	iops_wr_max_length => int($iops_wr_max_length),
+    mon_cmd(
+	$vmid,
+	'qom-set',
+	path => "throttle-$deviceid",
+	property => "limits",
+	value => {
+	    'bps-total' => int($bps),
+	    'bps-read' => int($bps_rd),
+	    'bps-write' => int($bps_wr),
+	    'iops-total' => int($iops),
+	    'iops-read' => int($iops_rd),
+	    'iops-write' => int($iops_wr),
+	    'bps-total-max' => int($bps_max),
+	    'bps-read-max' => int($bps_rd_max),
+	    'bps-write-max' => int($bps_wr_max),
+	    'iops-total-max' => int($iops_max),
+	    'iops-read-max' => int($iops_rd_max),
+	    'iops-write-max' => int($iops_wr_max),
+	    'bps-total-max-length' => int($bps_max_length),
+	    'bps-read-max-length' => int($bps_rd_max_length),
+	    'bps-write-max-length' => int($bps_wr_max_length),
+	    'iops-total-max-length' => int($iops_max_length),
+	    'iops-read-max-length' => int($iops_rd_max_length),
+	    'iops-write-max-length' => int($iops_wr_max_length),
+	}
     );
-
 }
 
 sub qemu_block_resize {
@@ -5181,7 +5186,7 @@ sub vmconfig_update_disk {
 		    safe_num_ne($drive->{iops_rd_max_length}, $old_drive->{iops_rd_max_length}) ||
 		    safe_num_ne($drive->{iops_wr_max_length}, $old_drive->{iops_wr_max_length})) {
 
-		    qemu_block_set_io_throttle(
+		    qemu_blockdev_set_throttle_limits(
 			$vmid,"drive-$opt",
 			($drive->{mbps} || 0)*1024*1024,
 			($drive->{mbps_rd} || 0)*1024*1024,
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (5 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 03/11] replace qemu_block_set_io_throttle with qom-set throttlegroup limits Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-04-01 13:50   ` Fabian Grünbichler
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 5021 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots
Date: Tue, 11 Mar 2025 11:28:55 +0100
Message-ID: <20250311102905.2680524-8-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 src/PVE/Storage.pm                 | 18 +++++++++++++++++-
 src/test/run_test_zfspoolplugin.pl | 18 ++++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index 79e5c3a..4012905 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -1052,7 +1052,23 @@ sub vdisk_free {
 
 	my (undef, undef, undef, undef, undef, $isBase, $format) =
 	    $plugin->parse_volname($volname);
-	$cleanup_worker = $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);
+
+        $cleanup_worker = sub {
+	    #remove external snapshots
+	    activate_volumes($cfg, [ $volid ]);
+	    my $snapshots = PVE::Storage::volume_snapshot_info($cfg, $volid);
+	    for my $snapid (sort { $snapshots->{$b}->{order} <=> $snapshots->{$a}->{order} } keys %$snapshots) {
+		my $snap = $snapshots->{$snapid};
+		next if $snapid eq 'current';
+		next if !$snap->{volid};
+		next if !$snap->{ext};
+		my ($snap_storeid, $snap_volname) = parse_volume_id($snap->{volid});
+		my (undef, undef, undef, undef, undef, $snap_isBase, $snap_format) =
+		    $plugin->parse_volname($volname);
+		$plugin->free_image($snap_storeid, $scfg, $snap_volname, $snap_isBase, $snap_format);
+	    }
+	    $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);
+	};
     });
 
     return if !$cleanup_worker;
diff --git a/src/test/run_test_zfspoolplugin.pl b/src/test/run_test_zfspoolplugin.pl
index 095ccb3..4ff9f22 100755
--- a/src/test/run_test_zfspoolplugin.pl
+++ b/src/test/run_test_zfspoolplugin.pl
@@ -6,12 +6,30 @@ use strict;
 use warnings;
 
 use Data::Dumper qw(Dumper);
+use Test::MockModule;
+
 use PVE::Storage;
 use PVE::Cluster;
 use PVE::Tools qw(run_command);
+use PVE::RPCEnvironment;
 use Cwd;
 $Data::Dumper::Sortkeys = 1;
 
+my $rpcenv_module;
+$rpcenv_module = Test::MockModule->new('PVE::RPCEnvironment');
+$rpcenv_module->mock(
+    get_user => sub {
+        return 'root@pam';
+    },
+    fork_worker => sub {
+	my ($self, $dtype, $id, $user, $function, $background) = @_;
+	$function->(123456);
+	return '123456';
+    }
+);
+
+my $rpcenv = PVE::RPCEnvironment->init('pub');
+
 my $verbose = undef;
 
 my $storagename = "zfstank99";
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (6 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-04-02  8:10   ` Fabian Grünbichler
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path Alexandre Derumier via pve-devel
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 3689 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
Date: Tue, 11 Mar 2025 11:28:56 +0100
Message-ID: <20250311102905.2680524-9-alexandre.derumier@groupe-cyllene.com>

Look at qdev value, as cdrom drives can be empty
without any inserted media

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index faa17edb..5ccc026a 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -3937,11 +3937,12 @@ sub vm_devices_list {
 	$devices_to_check = $to_check;
     }
 
+    #block devices need to be queried at qdev level, as a device
+    #don't always have a blockdev drive media attached (cdrom for example)
     my $resblock = mon_cmd($vmid, 'query-block');
-    foreach my $block (@$resblock) {
-	if($block->{device} =~ m/^drive-(\S+)/){
-		$devices->{$1} = 1;
-	}
+    $resblock = { map { $_->{qdev} => $_ } $resblock->@* };
+    foreach my $blockid (keys %$resblock) {
+	$devices->{$blockid} = 1;
     }
 
     my $resmice = mon_cmd($vmid, 'query-mice');
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (7 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-04-01 13:50   ` Fabian Grünbichler
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 3491 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path
Date: Tue, 11 Mar 2025 11:28:57 +0100
Message-ID: <20250311102905.2680524-10-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 src/PVE/Storage/LVMPlugin.pm | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
index 19dbd7e..2431fcd 100644
--- a/src/PVE/Storage/LVMPlugin.pm
+++ b/src/PVE/Storage/LVMPlugin.pm
@@ -365,9 +365,11 @@ sub lvcreate {
 sub lvrename {
     my ($vg, $oldname, $newname) = @_;
 
-    run_command(
-	['/sbin/lvrename', $vg, $oldname, $newname],
-	errmsg => "lvrename '${vg}/${oldname}' to '${newname}' error",
+    my $cmd = ['/sbin/lvrename'];
+    push @$cmd, $vg if $vg;
+    push @$cmd, $oldname, $newname;
+
+    run_command($cmd, errmsg => "lvrename '${oldname}' to '${newname}' error",
     );
 }
 
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 05/11] blockdev: convert cdrom media eject/insert
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (8 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper Alexandre Derumier via pve-devel
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 4364 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 05/11] blockdev: convert cdrom media eject/insert
Date: Tue, 11 Mar 2025 11:28:58 +0100
Message-ID: <20250311102905.2680524-11-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 5ccc026a..db95af0a 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5214,28 +5214,28 @@ sub vmconfig_update_disk {
 		return 1;
 	    }
 
-	} else { # cdrom
+       } else { # cdrom
 
 	    if ($drive->{file} eq 'none') {
-		mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
+		mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
+		mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
+		qemu_drivedel($vmid, $opt);
+
 		if (drive_is_cloudinit($old_drive)) {
 		    vmconfig_register_unused_drive($storecfg, $vmid, $conf, $old_drive);
 		}
 	    } else {
-		my ($path, $format) = PVE::QemuServer::Drive::get_path_and_format(
-		    $storecfg, $vmid, $drive);
+		my $path = get_iso_path($storecfg, $vmid, $drive->{file});
 
 		# force eject if locked
-		mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
+		mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
+		mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
+		eval { qemu_drivedel($vmid, $opt) };
 
 		if ($path) {
-		    mon_cmd(
-			$vmid,
-			"blockdev-change-medium",
-			id => "$opt",
-			filename => "$path",
-			format => "$format",
-		    );
+		    qemu_driveadd($storecfg, $vmid, $drive);
+		    mon_cmd($vmid, "blockdev-insert-medium", id => $opt, 'node-name' => "drive-$opt");
+		    mon_cmd($vmid, "blockdev-close-tray", id => $opt);
 		}
 	    }
 
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (9 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
@ 2025-03-11 10:28 ` Alexandre Derumier via pve-devel
  2025-04-01 13:50   ` Fabian Grünbichler
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:28 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 4158 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 pve-storage 5/5] lvm: add lvremove helper
Date: Tue, 11 Mar 2025 11:28:59 +0100
Message-ID: <20250311102905.2680524-12-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 src/PVE/Storage/LVMPlugin.pm | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
index 2431fcd..ab3563b 100644
--- a/src/PVE/Storage/LVMPlugin.pm
+++ b/src/PVE/Storage/LVMPlugin.pm
@@ -373,6 +373,14 @@ sub lvrename {
     );
 }
 
+sub lvremove {
+   my ($name, $vg) = @_;
+
+   my $path = $vg ? "$vg/$name" : $name;
+   my $cmd = ['/sbin/lvremove', '-f', $path];
+   run_command($cmd, errmsg => "lvremove '$path' error");
+}
+
 sub alloc_image {
     my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;
 
@@ -453,8 +461,7 @@ sub free_image {
 	warn $@ if $@;
 
 	$class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
-	    my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$volname"];
-	    run_command($cmd, errmsg => "lvremove '$vg/del-$volname' error");
+	    lvremove("del-$volname", $vg);
 	});
 	print "successfully removed volume $volname ($vg/del-$volname)\n";
     };
@@ -470,9 +477,7 @@ sub free_image {
 	run_command($cmd, errmsg => "lvrename '$vg/$volname' error");
 	return $zero_out_worker;
     } else {
-	my $tmpvg = $scfg->{vgname};
-	$cmd = ['/sbin/lvremove', '-f', "$tmpvg/$volname"];
-	run_command($cmd, errmsg => "lvremove '$tmpvg/$volname' error");
+	lvremove($volname, $scfg->{vgname});
     }
 
     return undef;
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 06/11] blockdev: block_resize: convert to blockdev
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (10 preceding siblings ...)
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper Alexandre Derumier via pve-devel
@ 2025-03-11 10:29 ` Alexandre Derumier via pve-devel
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:29 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 3266 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 06/11] blockdev: block_resize: convert to blockdev
Date: Tue, 11 Mar 2025 11:29:00 +0100
Message-ID: <20250311102905.2680524-13-alexandre.derumier@groupe-cyllene.com>

We need to use the top blocknode (throttle) as name-node

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index db95af0a..fcfd59b3 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4441,7 +4441,7 @@ sub qemu_block_resize {
     mon_cmd(
 	$vmid,
 	"block_resize",
-	device => $deviceid,
+	'node-name' => $deviceid,
 	size => int($size),
 	timeout => 60,
     );
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (11 preceding siblings ...)
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
@ 2025-03-11 10:29 ` Alexandre Derumier via pve-devel
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:29 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 3751 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename
Date: Tue, 11 Mar 2025 11:29:01 +0100
Message-ID: <20250311102905.2680524-14-alexandre.derumier@groupe-cyllene.com>

we have fixed nodename now

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index fcfd59b3..a9c8b758 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5775,20 +5775,15 @@ sub vm_start_nolock {
 	    $migrate_storage_uri = "nbd:${localip}:${storage_migrate_port}";
 	}
 
-	my $block_info = mon_cmd($vmid, "query-block");
-	$block_info = { map { $_->{device} => $_ } $block_info->@* };
-
 	foreach my $opt (sort keys %$nbd) {
 	    my $drivestr = $nbd->{$opt}->{drivestr};
 	    my $volid = $nbd->{$opt}->{volid};
 
-	    my $block_node = $block_info->{"drive-$opt"}->{inserted}->{'node-name'};
-
 	    mon_cmd(
 		$vmid,
 		"block-export-add",
 		id => "drive-$opt",
-		'node-name' => $block_node,
+		'node-name' => "drive-$opt",
 		writable => JSON::true,
 		type => "nbd",
 		name => "drive-$opt", # NBD export name
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (12 preceding siblings ...)
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
@ 2025-03-11 10:29 ` Alexandre Derumier via pve-devel
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 09/11] blockdev: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:29 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 9221 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Tue, 11 Mar 2025 11:29:02 +0100
Message-ID: <20250311102905.2680524-15-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuMigrate.pm                    |  2 +-
 PVE/QemuServer.pm                     | 65 ++++++++++++++++++---------
 test/MigrationTest/QemuMigrateMock.pm | 10 +++--
 3 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index ed5ede30..09402311 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1134,7 +1134,7 @@ sub phase2 {
 	    my $bitmap = $target->{bitmap};
 
 	    $self->log('info', "$drive: start migration to $nbd_uri");
-	    PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
+	    PVE::QemuServer::qemu_drive_mirror($vmid, $source_drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
 	}
 
 	if (PVE::QemuServer::QMPHelpers::runs_at_least_qemu_version($vmid, 8, 2)) {
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index a9c8b758..63b8f332 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7729,57 +7729,78 @@ sub qemu_img_convert {
 sub qemu_drive_mirror {
     my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
 
-    $jobs = {} if !$jobs;
+    my $storecfg = PVE::Storage::config();
+
+    # copy original drive config (aio,cache,discard,...)
+    my $dst_drive = dclone($drive);
+    $dst_drive->{file} = $dst_volid;
+    $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
+    #improve: if target storage don't support aio uring,change it to default native
+    #and remove clone_disk_check_io_uring()
 
-    my $qemu_target;
-    my $format;
-    $jobs->{"drive-$drive"} = {};
+    #add new block device
+    my $target_file_blockdev = generate_file_blockdev($storecfg, $dst_drive);
+    my $target_fmt_blockdev = undef;
+    my $target_nodename = undef;
 
     if ($dst_volid =~ /^nbd:/) {
-	$qemu_target = $dst_volid;
-	$format = "nbd";
+	#nbd file don't have fmt
+	$target_nodename = encode_nodename('file', $dst_volid);
+	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_file_blockdev);
     } else {
-	my $storecfg = PVE::Storage::config();
+	$target_nodename = encode_nodename('fmt', $dst_volid);
+	$target_fmt_blockdev = generate_format_blockdev($storecfg, $dst_drive, $target_file_blockdev);
+	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
+    }
 
-	$format = checked_volume_format($storecfg, $dst_volid);
+    # we replace the original src_fmt node in the blockdev graph.
+    # need to be fined, or  it'll replace the root throttle-filter
+    my $src_fmt_nodename = encode_nodename('fmt', $drive->{file});
 
-	my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
 
-	$qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
-    }
+    my $drive_id = get_drive_id($drive);
+    my $deviceid = "drive-$drive_id";
+
+    $jobs = {} if !$jobs;
+    my $jobid = "mirror-$deviceid";
+    $jobs->{$jobid} = {};
 
     my $opts = {
+	'job-id' => $jobid,
 	timeout => 10,
-	device => "drive-$drive",
-	mode => "existing",
+	device => $deviceid,
+	replaces => $src_fmt_nodename,
 	sync => "full",
-	target => $qemu_target,
+	target => $target_nodename,
 	'auto-dismiss' => JSON::false,
     };
-    $opts->{format} = $format if $format;
 
     if (defined($src_bitmap)) {
 	$opts->{sync} = 'incremental';
-	$opts->{bitmap} = $src_bitmap;
+	$opts->{bitmap} = $src_bitmap;   ##FIXME: how to handle bitmap ? special proxmox patch ?
 	print "drive mirror re-using dirty bitmap '$src_bitmap'\n";
     }
 
     if (defined($bwlimit)) {
 	$opts->{speed} = $bwlimit * 1024;
-	print "drive mirror is starting for drive-$drive with bandwidth limit: ${bwlimit} KB/s\n";
+	print "drive mirror is starting for $deviceid with bandwidth limit: ${bwlimit} KB/s\n";
     } else {
-	print "drive mirror is starting for drive-$drive\n";
+	print "drive mirror is starting for $deviceid\n";
     }
 
     # if a job already runs for this device we get an error, catch it for cleanup
-    eval { mon_cmd($vmid, "drive-mirror", %$opts); };
+    eval { mon_cmd($vmid, "blockdev-mirror", %$opts); };
+
     if (my $err = $@) {
 	eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
 	warn "$@\n" if $@;
+	eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $target_file_blockdev->{'node-name'}) };
+	warn "$@\n" if $@;
+	eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $target_fmt_blockdev->{'node-name'}) };
+	warn "$@\n" if $@;
 	die "mirroring error: $err\n";
     }
-
-    qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga);
+    qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga, 'mirror');
 }
 
 # $completion can be either
@@ -8111,7 +8132,7 @@ sub clone_disk {
 
 	my $sparseinit = PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $newvolid);
 	if ($use_drive_mirror) {
-	    qemu_drive_mirror($vmid, $src_drivename, $newvolid, $newvmid, $sparseinit, $jobs,
+	    qemu_drive_mirror($vmid, $drive, $newvolid, $newvmid, $sparseinit, $jobs,
 	        $completion, $qga, $bwlimit);
 	} else {
 	    if ($dst_drivename eq 'efidisk0') {
diff --git a/test/MigrationTest/QemuMigrateMock.pm b/test/MigrationTest/QemuMigrateMock.pm
index 11c58c08..d156ff1b 100644
--- a/test/MigrationTest/QemuMigrateMock.pm
+++ b/test/MigrationTest/QemuMigrateMock.pm
@@ -132,14 +132,16 @@ $MigrationTest::Shared::qemu_server_module->mock(
     qemu_drive_mirror => sub {
 	my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
 
+	my $drive_id = "$drive->{interface}$drive->{index}";
+
 	die "drive_mirror with wrong vmid: '$vmid'\n" if $vmid ne $test_vmid;
-	die "qemu_drive_mirror '$drive' error\n"
-	    if $fail_config->{qemu_drive_mirror} && $fail_config->{qemu_drive_mirror} eq $drive;
+	die "qemu_drive_mirror '$drive_id' error\n"
+	    if $fail_config->{qemu_drive_mirror} && $fail_config->{qemu_drive_mirror} eq $drive_id;
 
 	my $nbd_info = decode_json(file_get_contents("${RUN_DIR_PATH}/nbd_info"));
 	die "target does not expect drive mirror for '$drive'\n"
-	    if !defined($nbd_info->{$drive});
-	delete $nbd_info->{$drive};
+	    if !defined($nbd_info->{$drive_id});
+	delete $nbd_info->{$drive_id};
 	file_set_contents("${RUN_DIR_PATH}/nbd_info", to_json($nbd_info));
     },
     qemu_drive_mirror_monitor => sub {
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 09/11] blockdev: change aio on target if io_uring is not default.
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (13 preceding siblings ...)
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
@ 2025-03-11 10:29 ` Alexandre Derumier via pve-devel
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
  16 siblings, 0 replies; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:29 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 5803 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 09/11] blockdev: change aio on target if io_uring is not default.
Date: Tue, 11 Mar 2025 11:29:03 +0100
Message-ID: <20250311102905.2680524-16-alexandre.derumier@groupe-cyllene.com>

This was a limitation of drive-mirror, blockdev mirror is able
to reopen image with a different aio.

Do the change when generating the blockdev_format

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm       |  4 ----
 PVE/QemuServer/Drive.pm | 30 +++---------------------------
 2 files changed, 3 insertions(+), 31 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 63b8f332..d6aa5730 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7735,8 +7735,6 @@ sub qemu_drive_mirror {
     my $dst_drive = dclone($drive);
     $dst_drive->{file} = $dst_volid;
     $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
-    #improve: if target storage don't support aio uring,change it to default native
-    #and remove clone_disk_check_io_uring()
 
     #add new block device
     my $target_file_blockdev = generate_file_blockdev($storecfg, $dst_drive);
@@ -8109,8 +8107,6 @@ sub clone_disk {
 	    $dst_format = 'raw';
 	    $size = PVE::QemuServer::Drive::TPMSTATE_DISK_SIZE;
 	} else {
-	    clone_disk_check_io_uring($drive, $storecfg, $src_storeid, $storeid, $use_drive_mirror);
-
 	    $size = PVE::Storage::volume_size_info($storecfg, $drive->{file}, 10);
 	}
 	$newvolid = PVE::Storage::vdisk_alloc(
diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
index 9fe679dd..5b281616 100644
--- a/PVE/QemuServer/Drive.pm
+++ b/PVE/QemuServer/Drive.pm
@@ -1032,33 +1032,6 @@ my sub drive_uses_cache_direct {
     return $cache_direct;
 }
 
-# Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
-# source, but some storages have problems with io_uring, sometimes even leading to crashes.
-sub clone_disk_check_io_uring {
-    my ($src_drive, $storecfg, $src_storeid, $dst_storeid, $use_drive_mirror) = @_;
-
-    return if !$use_drive_mirror;
-
-    # Don't complain when not changing storage.
-    # Assume if it works for the source, it'll work for the target too.
-    return if $src_storeid eq $dst_storeid;
-
-    my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
-    my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
-
-    my $cache_direct = drive_uses_cache_direct($src_drive);
-
-    my $src_uses_io_uring;
-    if ($src_drive->{aio}) {
-	$src_uses_io_uring = $src_drive->{aio} eq 'io_uring';
-    } else {
-	$src_uses_io_uring = storage_allows_io_uring_default($src_scfg, $cache_direct);
-    }
-
-    die "target storage is known to cause issues with aio=io_uring (used by current drive)\n"
-	if $src_uses_io_uring && !storage_allows_io_uring_default($dst_scfg, $cache_direct);
-}
-
 sub generate_blockdev_drive_aio {
     my ($drive, $scfg) = @_;
 
@@ -1077,6 +1050,9 @@ sub generate_blockdev_drive_aio {
 		$aio = "threads";
 	    }
 	}
+    } elsif ($drive->{aio} eq 'io_uring' && !storage_allows_io_uring_default($scfg, $cache_direct)) {
+	#change aio if io_uring is not supported by storage
+	$aio = $cache_direct ? 'native' : 'threads';
     }
     return $aio;
 }
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (14 preceding siblings ...)
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 09/11] blockdev: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
@ 2025-03-11 10:29 ` Alexandre Derumier via pve-devel
  2025-04-02  8:10   ` Fabian Grünbichler
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:29 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 8437 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support
Date: Tue, 11 Mar 2025 11:29:04 +0100
Message-ID: <20250311102905.2680524-17-alexandre.derumier@groupe-cyllene.com>

We need to define name-nodes for all backing chain images,
to be able to live rename them with blockdev-reopen

For linked clone, we don't need to definebase image(s) chain.
They are auto added with #block nodename.

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuServer.pm       | 26 ++----------------
 PVE/QemuServer/Drive.pm | 60 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 24 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index d6aa5730..60481acc 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -54,7 +54,7 @@ use PVE::QemuServer::Helpers qw(config_aware_timeout min_version kvm_user_versio
 use PVE::QemuServer::Cloudinit;
 use PVE::QemuServer::CGroup;
 use PVE::QemuServer::CPUConfig qw(print_cpu_device get_cpu_options get_cpu_bitness is_native_arch get_amd_sev_object);
-use PVE::QemuServer::Drive qw(is_valid_drivename checked_volume_format drive_is_cloudinit drive_is_cdrom drive_is_read_only parse_drive print_drive print_drive_throttle_group generate_drive_blockdev);
+use PVE::QemuServer::Drive qw(is_valid_drivename checked_volume_format drive_is_cloudinit drive_is_cdrom drive_is_read_only parse_drive print_drive print_drive_throttle_group generate_drive_blockdev do_snapshots_with_qemu);
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::MetaInfo;
@@ -3765,6 +3765,7 @@ sub config_to_command {
 	# extra protection for templates, but SATA and IDE don't support it..
 	$drive->{ro} = 1 if drive_is_read_only($conf, $drive);
 	my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive, $live_blockdev_name);
+	#FIXME: verify if external snapshot backing chain is matching config
 	push @$devices, '-blockdev', JSON->new->canonical->allow_nonref->encode($blockdev) if $blockdev;
 	push @$devices, '-device', print_drivedevice_full(
 	    $storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
@@ -7559,29 +7560,6 @@ sub foreach_storage_used_by_vm {
     }
 }
 
-my $qemu_snap_storage = {
-    rbd => 1,
-};
-sub do_snapshots_with_qemu {
-    my ($storecfg, $volid, $deviceid) = @_;
-
-    return if $deviceid =~ m/tpmstate0/;
-
-    my $storage_name = PVE::Storage::parse_volume_id($volid);
-    my $scfg = $storecfg->{ids}->{$storage_name};
-    die "could not find storage '$storage_name'\n" if !defined($scfg);
-
-    if ($qemu_snap_storage->{$scfg->{type}} && !$scfg->{krbd}){
-	return 1;
-    }
-
-    if ($volid =~ m/\.(qcow2|qed)$/){
-	return 1;
-    }
-
-    return;
-}
-
 sub qga_check_running {
     my ($vmid, $nowarn) = @_;
 
diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
index 5b281616..51513546 100644
--- a/PVE/QemuServer/Drive.pm
+++ b/PVE/QemuServer/Drive.pm
@@ -18,6 +18,7 @@ our @EXPORT_OK = qw(
 is_valid_drivename
 checked_parse_volname
 checked_volume_format
+do_snapshots_with_qemu
 drive_is_cloudinit
 drive_is_cdrom
 drive_is_read_only
@@ -1230,6 +1231,32 @@ sub generate_file_blockdev {
     return $blockdev;
 }
 
+my $qemu_snap_storage = {
+    rbd => 1,
+};
+
+sub do_snapshots_with_qemu {
+    my ($storecfg, $volid, $deviceid) = @_;
+
+    return if $deviceid =~ m/tpmstate0/;
+
+    my $storage_name = PVE::Storage::parse_volume_id($volid);
+    my $scfg = $storecfg->{ids}->{$storage_name};
+    die "could not find storage '$storage_name'\n" if !defined($scfg);
+
+    if ($qemu_snap_storage->{$scfg->{type}} && !$scfg->{krbd}){
+        return 1;
+    }
+
+    return 2 if $scfg->{snapext} || $scfg->{type} eq 'lvm' && $volid =~ m/\.(qcow2)/;
+
+    if ($volid =~ m/\.(qcow2|qed)$/){
+        return 1;
+    }
+
+    return;
+}
+
 sub generate_format_blockdev {
     my ($storecfg, $drive, $file, $snap, $nodename) = @_;
 
@@ -1272,6 +1299,37 @@ sub generate_format_blockdev {
     return $blockdev;
 }
 
+sub generate_backing_blockdev {
+    my ($storecfg, $snapshots, $deviceid, $drive, $snap_id) = @_;
+
+    my $snapshot = $snapshots->{$snap_id};
+    my $parentid = $snapshot->{parent};
+
+    my $volid = $drive->{file};
+
+    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap_id);
+    $snap_file_blockdev->{filename} = $snapshot->{file};
+    $drive->{ro} = 1;
+    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap_id);
+    $snap_fmt_blockdev->{backing} = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
+    return $snap_fmt_blockdev;
+}
+
+sub generate_backing_chain_blockdev {
+    my ($storecfg, $deviceid, $drive) = @_;
+
+    my $volid = $drive->{file};
+    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid);
+    return if !$do_snapshots_with_qemu || $do_snapshots_with_qemu != 2;
+
+    my $chain_blockdev = undef;
+    PVE::Storage::activate_volumes($storecfg, [$volid]);
+    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+    my $parentid = $snapshots->{'current'}->{parent};
+    $chain_blockdev = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
+    return $chain_blockdev;
+}
+
 sub generate_drive_blockdev {
     my ($storecfg, $vmid, $drive, $live_restore_name) = @_;
 
@@ -1293,6 +1351,8 @@ sub generate_drive_blockdev {
 
     my $blockdev_file = generate_file_blockdev($storecfg, $drive);
     my $blockdev_format = generate_format_blockdev($storecfg, $drive, $blockdev_file);
+    my $backing_chain  = generate_backing_chain_blockdev($storecfg, "drive-$drive_id", $drive);
+    $blockdev_format->{backing} = $backing_chain if $backing_chain;
 
     my $blockdev_live_restore = undef;
     if ($live_restore_name) {
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
       [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
                   ` (15 preceding siblings ...)
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
@ 2025-03-11 10:29 ` Alexandre Derumier via pve-devel
  2025-04-02  8:10   ` Fabian Grünbichler
  16 siblings, 1 reply; 34+ messages in thread
From: Alexandre Derumier via pve-devel @ 2025-03-11 10:29 UTC (permalink / raw)
  To: pve-devel; +Cc: Alexandre Derumier

[-- Attachment #1: Type: message/rfc822, Size: 15743 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
Date: Tue, 11 Mar 2025 11:29:05 +0100
Message-ID: <20250311102905.2680524-18-alexandre.derumier@groupe-cyllene.com>

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 PVE/QemuConfig.pm       |   4 +-
 PVE/QemuServer.pm       | 226 +++++++++++++++++++++++++++++++++++++---
 PVE/QemuServer/Drive.pm |   4 +
 3 files changed, 220 insertions(+), 14 deletions(-)

diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index b60cc398..2b3acb15 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -377,7 +377,7 @@ sub __snapshot_create_vol_snapshot {
 
     print "snapshotting '$device' ($drive->{file})\n";
 
-    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
+    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
 }
 
 sub __snapshot_delete_remove_drive {
@@ -414,7 +414,7 @@ sub __snapshot_delete_vol_snapshot {
     my $storecfg = PVE::Storage::config();
     my $volid = $drive->{file};
 
-    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
+    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
 
     push @$unused, $volid;
 }
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 60481acc..6ce3e9c6 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4449,20 +4449,200 @@ sub qemu_block_resize {
 }
 
 sub qemu_volume_snapshot {
-    my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
+    my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
 
+    my $volid = $drive->{file};
     my $running = check_running($vmid);
-
-    if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
-	mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
+    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;
+    if ($do_snapshots_with_qemu) {
+	if($do_snapshots_with_qemu == 2) {
+	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+	    my $parent_snap = $snapshots->{'current'}->{parent};
+	    my $size = PVE::Storage::volume_size_info($storecfg, $volid, 5);
+	    blockdev_rename($storecfg, $vmid, $deviceid, $drive, 'current', $snap, $parent_snap);
+	    blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap, $size);
+	} else {
+	    mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
+	}
     } else {
 	PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
     }
 }
 
+sub blockdev_external_snapshot {
+    my ($storecfg, $vmid, $deviceid, $drive, $snap, $size) = @_;
+
+    my $volid = $drive->{file};
+
+    #be sure to add drive in write mode
+    delete($drive->{ro});
+
+    my $new_file_blockdev = generate_file_blockdev($storecfg, $drive);
+    my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_file_blockdev);
+
+    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap);
+    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap);
+
+    #preallocate add a new current file with reference to backing-file
+    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
+    my $name = (PVE::Storage::parse_volname($storecfg, $volid))[1];
+    PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'qcow2', $name, $size/1024, $snap_file_blockdev->{filename});
+
+    #backing need to be forced to undef in blockdev, to avoid reopen of backing-file on blockdev-add
+    $new_fmt_blockdev->{backing} = undef;
+
+    PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
+
+    mon_cmd($vmid, 'blockdev-snapshot', node => $snap_fmt_blockdev->{'node-name'}, overlay => $new_fmt_blockdev->{'node-name'});
+}
+
+sub blockdev_delete {
+    my ($storecfg, $vmid, $drive, $file_blockdev, $fmt_blockdev) = @_;
+
+    #add eval as reopen is auto removing the old nodename automatically only if it was created at vm start in command line argument
+    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_blockdev->{'node-name'}) };
+    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_blockdev->{'node-name'}) };
+
+    #delete the file (don't use vdisk_free as we don't want to delete all snapshot chain)
+    print"delete old $file_blockdev->{filename}\n";
+
+    my $storage_name = PVE::Storage::parse_volume_id($drive->{file});
+    my $scfg = $storecfg->{ids}->{$storage_name};
+    if ($scfg->{type} eq 'lvm') {
+	PVE::Storage::LVMPlugin::lvremove($file_blockdev->{filename});
+    } else {
+	unlink($file_blockdev->{filename});
+    }
+}
+
+sub blockdev_rename {
+    my ($storecfg, $vmid, $deviceid, $drive, $src_snap, $target_snap, $parent_snap) = @_;
+
+    print "rename $src_snap to $target_snap\n";
+
+    my $volid = $drive->{file};
+
+    my $src_file_blockdev = generate_file_blockdev($storecfg, $drive, $src_snap);
+    my $src_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $src_file_blockdev, $src_snap);
+    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
+    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
+
+    #create a hardlink
+    link($src_file_blockdev->{filename}, $target_file_blockdev->{filename});
+
+    if($target_snap eq 'current' || $src_snap eq 'current') {
+	#rename from|to current
+
+	#add backing to target
+	if ($parent_snap) {
+	    my $parent_fmt_nodename = encode_nodename('fmt', $volid, $parent_snap);
+	    $target_fmt_blockdev->{backing} = $parent_fmt_nodename;
+	}
+	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
+
+	#reopen the current throttlefilter nodename with the target fmt nodename
+	my $drive_blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
+	delete $drive_blockdev->{file};
+	$drive_blockdev->{file} = $target_fmt_blockdev->{'node-name'};
+	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$drive_blockdev]);
+    } else {
+	#intermediate snapshot
+	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
+
+	#reopen the parent node with the new target fmt backing node
+	my $parent_file_blockdev = generate_file_blockdev($storecfg, $drive, $parent_snap);
+	my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_file_blockdev, $parent_snap);
+	$parent_fmt_blockdev->{backing} = $target_fmt_blockdev->{'node-name'};
+	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
+
+	#change backing-file in qcow2 metadatas
+	PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_blockdev->{'node-name'}, 'backing-file' => $target_file_blockdev->{filename});
+    }
+
+    # delete old file|fmt nodes
+    # add eval as reopen is auto removing the old nodename automatically only if it was created at vm start in command line argument
+    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_blockdev->{'node-name'})};
+    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_blockdev->{'node-name'})};
+
+    unlink($src_file_blockdev->{filename});
+
+    #rename underlay
+    my $storage_name = PVE::Storage::parse_volume_id($volid);
+    my $scfg = $storecfg->{ids}->{$storage_name};
+    return if $scfg->{type} ne 'lvm';
+
+    print "rename underlay lvm volume $src_file_blockdev->{filename} to $target_file_blockdev->{filename}\n";
+    PVE::Storage::LVMPlugin::lvrename(undef, $src_file_blockdev->{filename}, $target_file_blockdev->{filename});
+}
+
+sub blockdev_commit {
+    my ($storecfg, $vmid, $deviceid, $drive, $src_snap, $target_snap) = @_;
+
+    my $volid = $drive->{file};
+
+    print "block-commit $src_snap to base:$target_snap\n";
+    $src_snap = undef if $src_snap && $src_snap eq 'current';
+
+    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
+    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
+
+    my $src_file_blockdev = generate_file_blockdev($storecfg, $drive, $src_snap);
+    my $src_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $src_file_blockdev, $src_snap);
+
+    my $job_id = "commit-$deviceid";
+    my $jobs = {};
+    my $opts = { 'job-id' => $job_id, device => $deviceid };
+
+    my $complete = undef;
+    if ($src_snap) {
+	$complete = 'auto';
+	$opts->{'top-node'} = $src_fmt_blockdev->{'node-name'};
+	$opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
+    } else {
+	$complete = 'complete';
+	$opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
+	$opts->{replaces} = $src_fmt_blockdev->{'node-name'};
+    }
+
+    mon_cmd($vmid, "block-commit", %$opts);
+    $jobs->{$job_id} = {};
+    qemu_drive_mirror_monitor ($vmid, undef, $jobs, $complete, 0, 'commit');
+
+    blockdev_delete($storecfg, $vmid, $drive, $src_file_blockdev, $src_fmt_blockdev);
+}
+
+sub blockdev_stream {
+    my ($storecfg, $vmid, $deviceid, $drive, $snap, $parent_snap, $target_snap) = @_;
+
+    my $volid = $drive->{file};
+    $target_snap = undef if $target_snap eq 'current';
+
+    my $parent_file_blockdev = generate_file_blockdev($storecfg, $drive, $parent_snap);
+    my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_file_blockdev, $parent_snap);
+
+    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
+    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
+
+    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap);
+    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap);
+
+    my $job_id = "stream-$deviceid";
+    my $jobs = {};
+    my $options = { 'job-id' => $job_id, device => $target_fmt_blockdev->{'node-name'} };
+    $options->{'base-node'} = $parent_fmt_blockdev->{'node-name'};
+    $options->{'backing-file'} = $parent_file_blockdev->{filename};
+
+    mon_cmd($vmid, 'block-stream', %$options);
+    $jobs->{$job_id} = {};
+    qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'stream');
+
+    blockdev_delete($storecfg, $vmid, $drive, $snap_file_blockdev, $snap_fmt_blockdev);
+}
+
 sub qemu_volume_snapshot_delete {
-    my ($vmid, $storecfg, $volid, $snap) = @_;
+    my ($vmid, $storecfg, $drive, $snap) = @_;
 
+    my $volid = $drive->{file};
     my $running = check_running($vmid);
     my $attached_deviceid;
 
@@ -4474,13 +4654,35 @@ sub qemu_volume_snapshot_delete {
 	});
     }
 
-    if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
-	mon_cmd(
-	    $vmid,
-	    'blockdev-snapshot-delete-internal-sync',
-	    device => $attached_deviceid,
-	    name => $snap,
-	);
+    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
+    if ($attached_deviceid && $do_snapshots_with_qemu) {
+
+	if ($do_snapshots_with_qemu == 2) {
+
+	    my $path = PVE::Storage::path($storecfg, $volid);
+	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+	    my $parentsnap = $snapshots->{$snap}->{parent};
+	    my $childsnap = $snapshots->{$snap}->{child};
+
+	    # if we delete the first snasphot, we commit because the first snapshot original base image, it should be big.
+            # improve-me: if firstsnap > child : commit, if firstsnap < child do a stream.
+	    if(!$parentsnap) {
+		print"delete first snapshot $snap\n";
+		blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childsnap, $snap);
+		blockdev_rename($storecfg, $vmid, $attached_deviceid, $drive, $snap, $childsnap, $snapshots->{$childsnap}->{child});
+	    } else {
+		#intermediate snapshot, we always stream the snapshot to child snapshot
+		print"stream intermediate snapshot $snap to $childsnap\n";
+		blockdev_stream($storecfg, $vmid, $attached_deviceid, $drive, $snap, $parentsnap, $childsnap);
+	    }
+	} else {
+	    mon_cmd(
+	        $vmid,
+		'blockdev-snapshot-delete-internal-sync',
+		device => $attached_deviceid,
+		name => $snap,
+	    );
+	}
     } else {
 	PVE::Storage::volume_snapshot_delete(
 	    $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
index 51513546..7ba401bd 100644
--- a/PVE/QemuServer/Drive.pm
+++ b/PVE/QemuServer/Drive.pm
@@ -1117,6 +1117,8 @@ sub print_drive_throttle_group {
 sub generate_file_blockdev {
     my ($storecfg, $drive, $snap, $nodename) = @_;
 
+    $snap = undef if $snap && $snap eq 'current';
+
     my $volid = $drive->{file};
     my $blockdev = {};
 
@@ -1260,6 +1262,8 @@ sub do_snapshots_with_qemu {
 sub generate_format_blockdev {
     my ($storecfg, $drive, $file, $snap, $nodename) = @_;
 
+    $snap = undef if $snap && $snap eq 'current';
+
     my $volid = $drive->{file};
     die "format_blockdev can't be used for nbd" if $volid =~ /^nbd:/;
 
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
@ 2025-04-01 13:50   ` Fabian Grünbichler
  0 siblings, 0 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-01 13:50 UTC (permalink / raw)
  To: Proxmox VE development discussion


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:

same here - please add some description about why things are implemented how they are.. else we have to dive through multiple threads of review discussions in 2 years when we want to find out ;)

> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  src/PVE/Storage/LVMPlugin.pm | 228 ++++++++++++++++++++++++++++++++---
>  1 file changed, 210 insertions(+), 18 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 38f7fa1..19dbd7e 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -4,6 +4,7 @@ use strict;
>  use warnings;
>  
>  use IO::File;
> +use POSIX qw/ceil/;
>  
>  use PVE::Tools qw(run_command trim);
>  use PVE::Storage::Plugin;
> @@ -218,6 +219,7 @@ sub type {
>  sub plugindata {
>      return {
>  	content => [ {images => 1, rootdir => 1}, { images => 1 }],
> +	format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
>      };
>  }
>  
> @@ -293,7 +295,10 @@ sub parse_volname {
>      PVE::Storage::Plugin::parse_lvm_name($volname);
>  
>      if ($volname =~ m/^(vm-(\d+)-\S+)$/) {
> -	return ('images', $1, $2, undef, undef, undef, 'raw');
> +	my $name = $1;
> +	my $vmid = $2;
> +	my $format = $volname =~ m/\.qcow2$/ ? 'qcow2' : 'raw';

this is really tricky and I am afraid there are still pitfalls/bugs here unless we add additional checks in places that a requested $format and the one in the name match..

> +	return ('images', $name, $vmid, undef, undef, undef, $format);
>      }
>  
>      die "unable to parse lvm volume name '$volname'\n";
> @@ -302,11 +307,13 @@ sub parse_volname {
>  sub filesystem_path {
>      my ($class, $scfg, $volname, $snapname) = @_;
>  
> -    die "lvm snapshot is not implemented"if defined($snapname);
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
>  
> -    my ($vtype, $name, $vmid) = $class->parse_volname($volname);
> +    die "snapshot is working with qcow2 format only" if defined($snapname) && $format ne 'qcow2';
>  
>      my $vg = $scfg->{vgname};
> +    $name = $class->get_snap_volname($volname, $snapname) if $snapname;
>  
>      my $path = "/dev/$vg/$name";
>  
> @@ -334,7 +341,9 @@ sub find_free_diskname {
>  
>      my $disk_list = [ keys %{$lvs->{$vg}} ];
>  
> -    return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, undef, $scfg);
> +    $add_fmt_suffix = $fmt eq 'qcow2' ? 1 : undef;
> +
> +    return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix);
>  }
>  
>  sub lvcreate {
> @@ -363,9 +372,9 @@ sub lvrename {
>  }
>  
>  sub alloc_image {
> -    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> +    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;

same as for the dir-based one - $backing as arbitrary path is not a good idea.. this should either be a snapshot $volname, or just the $snapname itself?

>  
> -    die "unsupported format '$fmt'" if $fmt ne 'raw';
> +    die "unsupported format '$fmt'" if $fmt !~ m/(raw|qcow2)/;
>  
>      die "illegal name '$name' - should be 'vm-$vmid-*'\n"
>  	if  $name && $name !~ m/^vm-$vmid-/;
> @@ -378,12 +387,36 @@ sub alloc_image {
>  
>      my $free = int($vgs->{$vg}->{free});
>  
> +
> +    #add extra space for qcow2 metadatas
> +    #without sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size
> +    #with sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size / 16

are these formulas valid for all disk sizes, or just for 1TB?

> +                                   #4MB overhead for 1TB with extented l2 clustersize=128k

so this means 1TB x 8 / 128K / 16 = 1GB / 256 = 4MB

if the formula is generic, that means 1 GB of storage == 4KB of overhead, or 1MB of storage == 4 bytes of overhead?

> +
> +    #can't use qemu-img measure, because it's not possible to define options like clustersize && extended_l2
> +    #verification has been done with : qemu-img create -f qcow2 -o extended_l2=on,cluster_size=128k test.img 1G
> +
> +    my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;

above you say 4MB for 1TB, but here you go down to KB and then multiply by 4K? why not go down to MB and multiply by 4?

> +
> +    my $lvmsize = $size;
> +    $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
> +
>      die "not enough free space ($free < $size)\n" if $free < $size;
>  
> -    $name = $class->find_free_diskname($storeid, $scfg, $vmid)
> +    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
>  	if !$name;
>  
> -    lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
> +    my $tags = ["pve-vm-$vmid"];
> +    #tags all snapshots volumes with the main volume tag for easier activation of the whole group
> +    push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
> +    lvcreate($vg, $name, $lvmsize, $tags);
> +
> +    if ($fmt eq 'qcow2') {
> +	#format the lvm volume with qcow2 format
> +	$class->activate_volume($storeid, $scfg, $name, undef, {});

the last two parameters are not needed..

> +	my $path = $class->path($scfg, $name, $storeid);
> +	PVE::Storage::Plugin::qemu_img_create($scfg, $fmt, $size, $path, $backing);
> +    }
>  
>      return $name;
>  }
> @@ -538,6 +571,12 @@ sub activate_volume {
>  
>      my $lvm_activate_mode = 'ey';
>  
> +    #activate volume && all snapshots volumes by tag
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    $path = "\@pve-$name" if $format eq 'qcow2';
> +
>      my $cmd = ['/sbin/lvchange', "-a$lvm_activate_mode", $path];
>      run_command($cmd, errmsg => "can't activate LV '$path'");
>      $cmd = ['/sbin/lvchange', '--refresh', $path];
> @@ -550,6 +589,10 @@ sub deactivate_volume {
>      my $path = $class->path($scfg, $volname, $storeid, $snapname);
>      return if ! -b $path;
>  
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +    $path = "\@pve-$name" if $format eq 'qcow2';
> +
>      my $cmd = ['/sbin/lvchange', '-aln', $path];
>      run_command($cmd, errmsg => "can't deactivate LV '$path'");
>  }
> @@ -557,15 +600,27 @@ sub deactivate_volume {
>  sub volume_resize {
>      my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
>  
> -    $size = ($size/1024/1024) . "M";
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $lvmsize = $size / 1024;

I don't really get this, see comments above for alloc_image

> +    my $qcow2_overhead = ceil($size/1024/1024/1024/1024) * 4096;
> +    $lvmsize += $qcow2_overhead if $format eq 'qcow2';

we definitely don't want to have this twice..

> +    $lvmsize = "${lvmsize}k";
>  
>      my $path = $class->path($scfg, $volname);
> -    my $cmd = ['/sbin/lvextend', '-L', $size, $path];
> +    my $cmd = ['/sbin/lvextend', '-L', $lvmsize, $path];
>  
>      $class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
>  	run_command($cmd, errmsg => "error resizing volume '$path'");
>      });
>  
> +    if(!$running && $format eq 'qcow2') {
> +	my $prealloc_opt = PVE::Storage::Plugin::preallocation_cmd_option($scfg, $format);
> +	my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', $format, $path , $size];
> +	run_command($cmd, timeout => 10);
> +    }
> +
>      return 1;
>  }
>  
> @@ -587,30 +642,159 @@ sub volume_size_info {
>  sub volume_snapshot {
>      my ($class, $scfg, $storeid, $volname, $snap) = @_;
>  
> -    die "lvm snapshot is not implemented";
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +        $class->parse_volname($volname);
> +
> +    die "can't snapshot this image format\n" if $format ne 'qcow2';
> +
> +    $class->activate_volume($storeid, $scfg, $volname, undef, {});

last two not needed

> +
> +    my $snap_volname = $class->get_snap_volname($volname, $snap);
> +    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);

see above..

> +
> +    my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> +
> +    #rename current lvm volume to snap volume
> +    my $vg = $scfg->{vgname};
> +    print"rename $volname to $snap_volname\n";
> +    eval { lvrename($vg, $volname, $snap_volname); };
> +    if ($@) {
> +	die "can't rename lvm volume from $volname to $snap_volname: $@ \n";
> +    }
> +
> +    eval { $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024, $snap_path); };
> +    if ($@) {
> +        eval { $class->free_image($storeid, $scfg, $volname, 0) };

missing error handling, this needs to rename back? also, this might return a code-reference that needs to be executed..

> +        warn $@ if $@;
> +    }
>  }
>  
> +sub volume_rollback_is_possible {
> +    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
> +
> +    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> +    $class->activate_volume($storeid, $scfg, $volname, undef, {});
> +    my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +    my $parent_snap = $snapshots->{current}->{parent};

wouldn't it be enough to check that this equals $snap?

> +
> +    return 1 if $snapshots->{$parent_snap}->{file} eq $snap_path;
> +    die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> +
> +    return 1;
> +}
> +
> +
>  sub volume_snapshot_rollback {
>      my ($class, $scfg, $storeid, $volname, $snap) = @_;
>  
> -    die "lvm snapshot rollback is not implemented";
> +    die "can't rollback snapshot for this image format\n" if $volname !~ m/\.(qcow2)$/;

this should go below parse_volname and use the format it returns..

> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +        $class->parse_volname($volname);
> +
> +    $class->activate_volume($storeid, $scfg, $volname, undef, {});

two unneeded parameters

> +    my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> +    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> +    #simply delete the current snapshot and recreate it
> +    eval { $class->free_image($storeid, $scfg, $volname, 0) };

might return a code reference that needs to be executed..

> +    if ($@) {
> +	die "can't delete old volume $volname: $@\n";
> +    }
> +
> +    eval { $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024, $snap_path) };
> +    if ($@) {
> +	die "can't allocate new volume $volname: $@\n";
> +    }
> +
> +    return undef;
>  }
>  
>  sub volume_snapshot_delete {
> -    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +    my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
> +
> +   die "can't delete snapshot for this image format\n" if $volname !~ m/\.(qcow2)$/;

this should parse the volname and use the returned format!

> +
> +   return 1 if $running;
> +
> +   my $cmd = "";
> +   my $path = $class->filesystem_path($scfg, $volname);
> +
> +   my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +   my $snap_path = $snapshots->{$snap}->{file};
> +   my $snap_volname = $snapshots->{$snap}->{volname};
> +   die "volume $snap_path is missing" if !-e $snap_path;
>  
> -    die "lvm snapshot delete is not implemented";
> +   my $parent_snap = $snapshots->{$snap}->{parent};
> +   my $child_snap = $snapshots->{$snap}->{child};
> +
> +   my $parent_path = $snapshots->{$parent_snap}->{file} if $parent_snap;
> +   my $child_path = $snapshots->{$child_snap}->{file} if $child_snap;
> +   my $child_volname = $snapshots->{$child_snap}->{volname} if $child_snap;

same as in the Plugin.pm patch, this is not allowed code style!

> +
> +   #if first snapshot,as it should be bigger,  we merge child, and rename the snapshot to child
> +   if(!$parent_snap) {
> +	print"commit $child_path\n";
> +	$cmd = ['/usr/bin/qemu-img', 'commit', $child_path];

could use `-d`, since we don't use $child_path afterwards anyway

> +	eval {	run_command($cmd) };
> +	if ($@) {
> +	    die "error commiting $child_path to $parent_path: $@\n";
> +	}
> +	print"delete $child_volname\n";
> +	eval { $class->free_image($storeid, $scfg, $child_volname, 0) };

might return a code reference that needs to be executed..

> +	if ($@) {
> +	    die "error delete old snapshot volume $child_volname: $@\n";
> +	}
> +	print"rename $snap_volname to $child_volname\n";
> +	my $vg = $scfg->{vgname};
> +	eval { lvrename($vg, $snap_volname, $child_volname) };
> +	if ($@) {
> +	    die "error renaming snapshot: $@\n";
> +	}
> +    } else {
> +	#we rebase the child image on the parent as new backing image
> +	die "missing parentsnap snapshot to rebase child $child_path\n" if !$parent_path;

how would this happen?

> +	print "link $child_snap to $parent_snap\n";
> +	$cmd = ['/usr/bin/qemu-img', 'rebase', '-b', $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
> +	eval { run_command($cmd) };
> +	if ($@) {
> +	    die "error rebase $child_path with $parent_path; $@\n";
> +	}
> +	#delete the snapshot
> +	eval { $class->free_image($storeid, $scfg, $snap_volname, 0); };

might return a code reference that needs to be executed..

> +	if ($@) {
> +	    die "error delete old snapshot volume $snap_volname: $@\n";
> +	}
> +    }
>  }
>  
>  sub volume_has_feature {
>      my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
>  
>      my $features = {
> -	copy => { base => 1, current => 1},
> -	rename => {current => 1},
> +        copy => {
> +            base => { qcow2 => 1, raw => 1},
> +            current => { qcow2 => 1, raw => 1},
> +            snap => { qcow2 => 1 },

did you actually test this? AFAICT this would still fall back to internal qcow2 snapshots?

> +        },
> +        'rename' => {
> +            current => { qcow2 => 1, raw => 1},

how does this interact with snapshots?

> +        },
> +        snapshot => {
> +            current => { qcow2 => 1 },
> +            snap => { qcow2 => 1 },
> +        },
> +        template => {
> +            current => { qcow2 => 1, raw => 1},

see below..

> +        },
> +	clone => {
> +	    base => { qcow2 => 1, raw => 1 },

how can we do linked clones of raw volumes? how can we do linked clones of qcow2 volumes if we don't allow creating base volumes?

> +	},
>      };
>  
> -    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase) =
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
>  	$class->parse_volname($volname);
>  
>      my $key = undef;
> @@ -619,7 +803,7 @@ sub volume_has_feature {
>      }else{
>  	$key =  $isBase ? 'base' : 'current';
>      }
> -    return 1 if $features->{$feature}->{$key};
> +    return 1 if defined($features->{$feature}->{$key}->{$format});

why the defined?

>  
>      return undef;
>  }
> @@ -740,4 +924,12 @@ sub rename_volume {
>      return "${storeid}:${target_volname}";
>  }
>  
> +sub get_snap_volname {
> +    my ($class, $volname, $snapname) = @_;
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> +    $name = !$snapname || $snapname eq 'current' ? $volname : "snap-$snapname-$name";

see above..

> +    return $name;
> +}
> +
>  1;
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2025-04-01 13:50   ` Fabian Grünbichler
  2025-04-02  8:01     ` DERUMIER, Alexandre via pve-devel
       [not found]     ` <0e2cd118f35aa8d4c410d362fea1a1b366df1570.camel@groupe-cyllene.com>
  0 siblings, 2 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-01 13:50 UTC (permalink / raw)
  To: Proxmox VE development discussion

> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:

some sort of description here would be great ;)

> ---
>  src/PVE/Storage.pm           |   4 +-
>  src/PVE/Storage/DirPlugin.pm |   1 +
>  src/PVE/Storage/Plugin.pm    | 232 +++++++++++++++++++++++++++++------
>  3 files changed, 196 insertions(+), 41 deletions(-)
> 
> diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
> index 3b4f041..79e5c3a 100755
> --- a/src/PVE/Storage.pm
> +++ b/src/PVE/Storage.pm
> @@ -1002,7 +1002,7 @@ sub unmap_volume {
>  }
>  
>  sub vdisk_alloc {
> -    my ($cfg, $storeid, $vmid, $fmt, $name, $size) = @_;
> +    my ($cfg, $storeid, $vmid, $fmt, $name, $size, $backing) = @_;
>  
>      die "no storage ID specified\n" if !$storeid;
>  
> @@ -1025,7 +1025,7 @@ sub vdisk_alloc {
>      # lock shared storage
>      return $plugin->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
>  	my $old_umask = umask(umask|0037);
> -	my $volname = eval { $plugin->alloc_image($storeid, $scfg, $vmid, $fmt, $name, $size) };
> +	my $volname = eval { $plugin->alloc_image($storeid, $scfg, $vmid, $fmt, $name, $size, $backing) };
>  	my $err = $@;
>  	umask $old_umask;
>  	die $err if $err;
> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
> index fb23e0a..1cd7ac3 100644
> --- a/src/PVE/Storage/DirPlugin.pm
> +++ b/src/PVE/Storage/DirPlugin.pm
> @@ -81,6 +81,7 @@ sub options {
>  	is_mountpoint => { optional => 1 },
>  	bwlimit => { optional => 1 },
>  	preallocation => { optional => 1 },
> +	snapext => { optional => 1 },
>     };
>  }
>  
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index 65cf43f..d7f485f 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -216,6 +216,11 @@ my $defaultData = {
>  	    maximum => 65535,
>  	    optional => 1,
>  	},
> +        'snapext' => {
> +	    type => 'boolean',
> +	    description => 'enable external snapshot.',
> +	    optional => 1,
> +        },
>      },
>  };
>  
> @@ -716,7 +721,11 @@ sub filesystem_path {
>  
>      my $dir = $class->get_subdir($scfg, $vtype);
>  
> -    $dir .= "/$vmid" if $vtype eq 'images';
> +    if ($scfg->{snapext} && $snapname) {
> +	$name = $class->get_snap_volname($volname, $snapname);
> +    } else {
> +	$dir .= "/$vmid" if $vtype eq 'images';
> +    }

this is a bit weird, as it mixes volnames (with the `$vmid/` prefix) and names (without), it's only called twice in this patch, and this here already has $volname parsed, so could we maybe let get_snap_volname take and return the $name part without the dir?

>  
>      my $path = "$dir/$name";
>  
> @@ -873,7 +882,7 @@ sub clone_image {
>  }
>  
>  sub alloc_image {
> -    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> +    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;

this extends the storage API, so it should actually do that.. and probably $backing should not be an arbitrary path, but something that is resolved locally?

>  
>      my $imagedir = $class->get_subdir($scfg, 'images');
>      $imagedir .= "/$vmid";
> @@ -901,17 +910,11 @@ sub alloc_image {
>  	umask $old_umask;
>  	die $err if $err;
>      } else {
> -	my $cmd = ['/usr/bin/qemu-img', 'create'];
> -
> -	my $prealloc_opt = preallocation_cmd_option($scfg, $fmt);
> -	push @$cmd, '-o', $prealloc_opt if defined($prealloc_opt);
>  
> -	push @$cmd, '-f', $fmt, $path, "${size}K";
> -
> -	eval { run_command($cmd, errmsg => "unable to create image"); };
> +	eval { qemu_img_create($scfg, $fmt, $size, $path, $backing) };
>  	if ($@) {
>  	    unlink $path;
> -	    rmdir $imagedir;
> +	    rmdir $imagedir if !$backing;

don't think this is needed, rmdir will fail if the dir isn't empty anyway..

>  	    die "$@";
>  	}
>      }
> @@ -955,6 +958,50 @@ sub free_image {
>  # TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
>  my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
>  
> +sub qemu_img_create {
> +    my ($scfg, $fmt, $size, $path, $backing) = @_;
> +
> +    my $cmd = ['/usr/bin/qemu-img', 'create'];
> +
> +    my $options = [];
> +
> +    if($backing) {
> +	push @$cmd, '-b', $backing, '-F', 'qcow2';
> +	push @$options, 'extended_l2=on','cluster_size=128k';
> +    };
> +    push @$options, preallocation_cmd_option($scfg, $fmt);
> +    push @$cmd, '-o', join(',', @$options) if @$options > 0;
> +    push @$cmd, '-f', $fmt, $path;
> +    push @$cmd, "${size}K" if !$backing;

is this because it will automatically take the size from the backing image?

> +
> +    run_command($cmd, errmsg => "unable to create image");
> +}
> +
> +sub qemu_img_info {
> +    my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
> +
> +    my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> +    push $cmd->@*, '-f', $file_format if $file_format;
> +    push $cmd->@*, '--backing-chain' if $follow_backing_files;
> +
> +    my $json = '';
> +    my $err_output = '';
> +    eval {
> +        run_command($cmd,
> +            timeout => $timeout,
> +            outfunc => sub { $json .= shift },
> +            errfunc => sub { $err_output .= shift . "\n"},
> +        );
> +    };
> +    warn $@ if $@;
> +    if ($err_output) {
> +        # if qemu did not output anything to stdout we die with stderr as an error
> +        die $err_output if !$json;
> +        # otherwise we warn about it and try to parse the json
> +        warn $err_output;
> +    }
> +    return $json;
> +}
>  # set $untrusted if the file in question might be malicious since it isn't
>  # created by our stack
>  # this makes certain checks fatal, and adds extra checks for known problems like
> @@ -1018,25 +1065,9 @@ sub file_size_info {
>  	warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
>  	$file_format = 'raw';
>      }
> -    my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> -    push $cmd->@*, '-f', $file_format if $file_format;
>  
> -    my $json = '';
> -    my $err_output = '';
> -    eval {
> -	run_command($cmd,
> -	    timeout => $timeout,
> -	    outfunc => sub { $json .= shift },
> -	    errfunc => sub { $err_output .= shift . "\n"},
> -	);
> -    };
> -    warn $@ if $@;
> -    if ($err_output) {
> -	# if qemu did not output anything to stdout we die with stderr as an error
> -	die $err_output if !$json;
> -	# otherwise we warn about it and try to parse the json
> -	warn $err_output;
> -    }
> +    my $json = qemu_img_info($filename, $file_format, $timeout);
> +
>      if (!$json) {
>  	die "failed to query file information with qemu-img\n" if $untrusted;
>  	# skip decoding if there was no output, e.g. if there was a timeout.
> @@ -1162,11 +1193,29 @@ sub volume_snapshot {
>  
>      die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
>  
> -    my $path = $class->filesystem_path($scfg, $volname);
> +    if($scfg->{snapext}) {
> +
> +	my $path = $class->path($scfg, $volname, $storeid);
> +	my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> +	#rename current volume to snap volume
> +	die "snapshot volume $snappath already exist\n" if -e $snappath;
> +	rename($path, $snappath) if -e $path;

this is still looking weird.. I don't think it makes sense interface wise to allow snapshotting a volume that doesn't even exist..

> +
> +	my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
> +	    $class->parse_volname($volname);
> +
> +	$class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $name, undef, $snappath);
> +	if ($@) {
> +	    eval { $class->free_image($storeid, $scfg, $volname, 0) };
> +	    warn $@ if $@;

missing cleanup - this should undo the rename from above

> +	}
>  
> -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> +    } else {
>  
> -    run_command($cmd);
> +	my $path = $class->filesystem_path($scfg, $volname);
> +	my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> +	run_command($cmd);
> +    }
>  
>      return undef;
>  }
> @@ -1177,6 +1226,21 @@ sub volume_snapshot {
>  sub volume_rollback_is_possible {
>      my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>  
> +    if ($scfg->{snapext}) {
> +	#technically, we could manage multibranch, we it need lot more work for snapshot delete
> +	#we need to implemente block-stream from deleted snapshot to all others child branchs
> +	#when online, we need to do a transaction for multiple disk when delete the last snapshot
> +	#and need to merge in current running file
> +
> +	my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> +	my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +	my $parentsnap = $snapshots->{current}->{parent};

wouldn't it be enough to check that this equals $snap?

> +
> +	return 1 if $snapshots->{$parentsnap}->{file} eq $snappath;
> +
> +	die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> +    }
> +
>      return 1;
>  }
>  
> @@ -1187,9 +1251,15 @@ sub volume_snapshot_rollback {
>  
>      my $path = $class->filesystem_path($scfg, $volname);
>  
> -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> -
> -    run_command($cmd);
> +    if ($scfg->{snapext}) {
> +	#simply delete the current snapshot and recreate it
> +	my $path = $class->filesystem_path($scfg, $volname);
> +	unlink($path);
> +	$class->volume_snapshot($scfg, $storeid, $volname, $snap);

instead of volume_snapshot, this could simply call alloc_image with the backing file? then volume_snapshot could always rename and always cleanup properly..

> +    } else {
> +	my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> +	run_command($cmd);
> +    }
>  
>      return undef;
>  }
> @@ -1201,13 +1271,49 @@ sub volume_snapshot_delete {
>  
>      return 1 if $running;
>  
> +    my $cmd = "";
>      my $path = $class->filesystem_path($scfg, $volname);
>  
> -    $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> +    if ($scfg->{snapext}) {
> +
> +	my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +	my $snappath = $snapshots->{$snap}->{file};
> +	die "volume $snappath is missing" if !-e $snappath;
>  
> -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> +	my $parentsnap = $snapshots->{$snap}->{parent};
> +	my $childsnap = $snapshots->{$snap}->{child};
>  
> -    run_command($cmd);
> +	my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> +	my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;

my $foo = .. if ...; 

is forbidden in our code ;) but I think we always need to have a childsnap anyway, right?

so we could simply check for that, and then switch around the two branches below so that one of them can do

if (my $parentsnap = ...) {
...
} else {
...
}

> +
> +	#if first snapshot,as it should be bigger,  we merge child, and rename the snapshot to child
> +	if(!$parentsnap) {
> +	    print"commit $childpath\n";
> +	    $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];

we could provide `-d` here to skip emptying $childpath since we renamed over it anyway below..

> +	    eval { run_command($cmd) };
> +	    if ($@) {
> +		die "error commiting $childpath to $parentpath; $@\n";

this is wrong, there is no $parentpath.. we are committing into $snappath

> +	    }
> +	    print"rename $snappath to $childpath\n";
> +	    rename($snappath, $childpath);

what if this fails?

> +	} else {
> +	    #we rebase the child image on the parent as new backing image

should we extend this to make it clear what this means? it means copying any parts of $snap that are not in $parent and not yet overwritten by $child into $child, right?

so how expensive this is depends on:
- how many changes are between $parent and $snap (increases cost)
- how many of those are overwritten by changes between $snap and $child (decreases cost)

> +	    die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;

how can this happen? if there is a parentsnap there must be a parentpath as well?

> +	    $cmd = ['/usr/bin/qemu-img', 'rebase', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];
> +	    eval { run_command($cmd) };
> +	    if ($@) {
> +		die "error rebase $childpath from $parentpath; $@\n";
> +	    }
> +	    #delete the snapshot
> +	    unlink($snappath);
> +	}
> +
> +    } else {
> +	$class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> +
> +	$cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> +	run_command($cmd);
> +    }
>  
>      return undef;
>  }
> @@ -1246,7 +1352,7 @@ sub volume_has_feature {
>  	    current => { qcow2 => 1, raw => 1, vmdk => 1 },
>  	},
>  	rename => {
> -	    current => {qcow2 => 1, raw => 1, vmdk => 1},
> +	    current => { qcow2 => 1, raw => 1, vmdk => 1},
>  	},
>      };
>  
> @@ -1481,7 +1587,37 @@ sub status {
>  sub volume_snapshot_info {
>      my ($class, $scfg, $storeid, $volname) = @_;
>  
> -    die "volume_snapshot_info is not implemented for $class";
> +    my $path = $class->filesystem_path($scfg, $volname);
> +
> +    my $backing_chain = 1;
> +    my $json = qemu_img_info($path, undef, 10, $backing_chain);
> +    die "failed to query file information with qemu-img\n" if !$json;
> +    my $snapshots = eval { decode_json($json) };

missing error handlign for json decoding..

> +
> +    my $info = {};
> +    my $order = 0;
> +    for my $snap (@$snapshots) {
> +
> +	my $snapfile = $snap->{filename};
> +	my $snapname = parse_snapname($snapfile);
> +	$snapname = 'current' if !$snapname;
> +	my $snapvolname = $class->get_snap_volname($volname, $snapname);
> +
> +	$info->{$snapname}->{order} = $order;
> +	$info->{$snapname}->{file}= $snapfile;
> +	$info->{$snapname}->{volname} = $snapvolname;
> +	$info->{$snapname}->{volid} = "$storeid:$snapvolname";
> +	$info->{$snapname}->{ext} = 1;
> +
> +	my $parentfile = $snap->{'backing-filename'};
> +	if ($parentfile) {
> +	    my $parentname = parse_snapname($parentfile);
> +	    $info->{$snapname}->{parent} = $parentname;
> +	    $info->{$parentname}->{child} = $snapname;
> +	}
> +	$order++;
> +    }
> +    return $info;
>  }
>  
>  sub activate_storage {
> @@ -1867,4 +2003,22 @@ sub config_aware_base_mkdir {
>      }
>  }
>  
> +sub get_snap_volname {
> +    my ($class, $volname, $snapname) = @_;
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> +    $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";

other way round would be better to group by volume first IMHO ($vmid/snap-$name-$snapname), as this is similar to how we encode snapshots often on the storage level (volume@snap). we also need to have some delimiter between snapshot and volume name that is not allowed in either (hard for volname since basically everything but '/' goes, but snapshots have a restricted character set (configid, which means alphanumeric, hyphen and underscore), so we could use something like '.' as delimiter? or we switch to directories and do $vmid/snap/$snap/$name?)

> +    return $name;
> +}
> +
> +sub parse_snapname {
> +    my ($name) = @_;
> +
> +    my $basename = basename($name);
> +    if ($basename =~ m/^snap-(.*)-vm(.*)$/) {

this is not strict enough, see above

> +	return $1;
> +    }
> +    return undef;
> +}
> +
>  1;
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
@ 2025-04-01 13:50   ` Fabian Grünbichler
  2025-04-07 11:02     ` DERUMIER, Alexandre via pve-devel
  2025-04-07 11:29     ` DERUMIER, Alexandre via pve-devel
  0 siblings, 2 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-01 13:50 UTC (permalink / raw)
  To: Proxmox VE development discussion


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:
> 
>  
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  src/PVE/Storage.pm                 | 18 +++++++++++++++++-
>  src/test/run_test_zfspoolplugin.pl | 18 ++++++++++++++++++
>  2 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
> index 79e5c3a..4012905 100755
> --- a/src/PVE/Storage.pm
> +++ b/src/PVE/Storage.pm
> @@ -1052,7 +1052,23 @@ sub vdisk_free {
>  
>  	my (undef, undef, undef, undef, undef, $isBase, $format) =
>  	    $plugin->parse_volname($volname);
> -	$cleanup_worker = $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);
> +
> +        $cleanup_worker = sub {
> +	    #remove external snapshots
> +	    activate_volumes($cfg, [ $volid ]);
> +	    my $snapshots = PVE::Storage::volume_snapshot_info($cfg, $volid);
> +	    for my $snapid (sort { $snapshots->{$b}->{order} <=> $snapshots->{$a}->{order} } keys %$snapshots) {
> +		my $snap = $snapshots->{$snapid};
> +		next if $snapid eq 'current';
> +		next if !$snap->{volid};
> +		next if !$snap->{ext};
> +		my ($snap_storeid, $snap_volname) = parse_volume_id($snap->{volid});
> +		my (undef, undef, undef, undef, undef, $snap_isBase, $snap_format) =
> +		    $plugin->parse_volname($volname);
> +		$plugin->free_image($snap_storeid, $scfg, $snap_volname, $snap_isBase, $snap_format);
> +	    }
> +	    $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);

this is the wrong place to do this, you need to handle this in the cleanup worker returned by the plugin and still execute it here.. also you need to honor saferemove when cleaning up the snapshots

> +	};
>      });
>  
>      return if !$cleanup_worker;
> diff --git a/src/test/run_test_zfspoolplugin.pl b/src/test/run_test_zfspoolplugin.pl
> index 095ccb3..4ff9f22 100755
> --- a/src/test/run_test_zfspoolplugin.pl
> +++ b/src/test/run_test_zfspoolplugin.pl
> @@ -6,12 +6,30 @@ use strict;
>  use warnings;
>  
>  use Data::Dumper qw(Dumper);
> +use Test::MockModule;
> +
>  use PVE::Storage;
>  use PVE::Cluster;
>  use PVE::Tools qw(run_command);
> +use PVE::RPCEnvironment;
>  use Cwd;
>  $Data::Dumper::Sortkeys = 1;
>  
> +my $rpcenv_module;
> +$rpcenv_module = Test::MockModule->new('PVE::RPCEnvironment');
> +$rpcenv_module->mock(
> +    get_user => sub {
> +        return 'root@pam';
> +    },
> +    fork_worker => sub {
> +	my ($self, $dtype, $id, $user, $function, $background) = @_;
> +	$function->(123456);
> +	return '123456';
> +    }
> +);
> +
> +my $rpcenv = PVE::RPCEnvironment->init('pub');
> +

what? why? no explanation?

>  my $verbose = undef;
>  
>  my $storagename = "zfstank99";
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path Alexandre Derumier via pve-devel
@ 2025-04-01 13:50   ` Fabian Grünbichler
  0 siblings, 0 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-01 13:50 UTC (permalink / raw)
  To: Proxmox VE development discussion


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:


> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  src/PVE/Storage/LVMPlugin.pm | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 19dbd7e..2431fcd 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -365,9 +365,11 @@ sub lvcreate {
>  sub lvrename {
>      my ($vg, $oldname, $newname) = @_;
>  
> -    run_command(
> -	['/sbin/lvrename', $vg, $oldname, $newname],
> -	errmsg => "lvrename '${vg}/${oldname}' to '${newname}' error",
> +    my $cmd = ['/sbin/lvrename'];
> +    push @$cmd, $vg if $vg;
> +    push @$cmd, $oldname, $newname;
> +
> +    run_command($cmd, errmsg => "lvrename '${oldname}' to '${newname}' error",

why? see comments on other patches, I don't think we want or need this..

the call in qemu-server is forbidden as well, plugins are off-limits for non storage code..

>      );
>  }
>  
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper Alexandre Derumier via pve-devel
@ 2025-04-01 13:50   ` Fabian Grünbichler
  0 siblings, 0 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-01 13:50 UTC (permalink / raw)
  To: Proxmox VE development discussion


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:
> 
>  
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  src/PVE/Storage/LVMPlugin.pm | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 2431fcd..ab3563b 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -373,6 +373,14 @@ sub lvrename {
>      );
>  }
>  
> +sub lvremove {
> +   my ($name, $vg) = @_;
> +
> +   my $path = $vg ? "$vg/$name" : $name;

why? it's only called twice below and both have a volume group set?

the call in qemu-server is forbidden - you must never call directly into plugin code like that..

> +   my $cmd = ['/sbin/lvremove', '-f', $path];
> +   run_command($cmd, errmsg => "lvremove '$path' error");
> +}
> +
>  sub alloc_image {
>      my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;
>  
> @@ -453,8 +461,7 @@ sub free_image {
>  	warn $@ if $@;
>  
>  	$class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
> -	    my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$volname"];
> -	    run_command($cmd, errmsg => "lvremove '$vg/del-$volname' error");
> +	    lvremove("del-$volname", $vg);
>  	});
>  	print "successfully removed volume $volname ($vg/del-$volname)\n";
>      };
> @@ -470,9 +477,7 @@ sub free_image {
>  	run_command($cmd, errmsg => "lvrename '$vg/$volname' error");
>  	return $zero_out_worker;
>      } else {
> -	my $tmpvg = $scfg->{vgname};
> -	$cmd = ['/sbin/lvremove', '-f', "$tmpvg/$volname"];
> -	run_command($cmd, errmsg => "lvremove '$tmpvg/$volname' error");
> +	lvremove($volname, $scfg->{vgname});
>      }
>  
>      return undef;
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
  2025-04-01 13:50   ` Fabian Grünbichler
@ 2025-04-02  8:01     ` DERUMIER, Alexandre via pve-devel
       [not found]     ` <0e2cd118f35aa8d4c410d362fea1a1b366df1570.camel@groupe-cyllene.com>
  1 sibling, 0 replies; 34+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-02  8:01 UTC (permalink / raw)
  To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 20335 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
Date: Wed, 2 Apr 2025 08:01:44 +0000
Message-ID: <0e2cd118f35aa8d4c410d362fea1a1b366df1570.camel@groupe-cyllene.com>

>  
> @@ -716,7 +721,11 @@ sub filesystem_path {
>  
>      my $dir = $class->get_subdir($scfg, $vtype);
>  
> -    $dir .= "/$vmid" if $vtype eq 'images';
> +    if ($scfg->{snapext} && $snapname) {
> + $name = $class->get_snap_volname($volname, $snapname);
> +    } else {
> + $dir .= "/$vmid" if $vtype eq 'images';
> +    }

>>this is a bit weird, as it mixes volnames (with the `$vmid/` prefix)
>>and names (without), it's only called twice in this patch, and this
>>here already has $volname parsed, so could we maybe let
>>get_snap_volname take and return the $name part without the dir?

ok!

>  
>      my $path = "$dir/$name";
>  
> @@ -873,7 +882,7 @@ sub clone_image {
>  }
>  
>  sub alloc_image {
> -    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> +    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size,
> $backing) = @_;

>>this extends the storage API, so it should actually do that.. and
>>probably $backing should not be an arbitrary path, but something that
>>is resolved locally?

I'll send the $snapname as param instead


> +
> +    if($backing) {
> + push @$cmd, '-b', $backing, '-F', 'qcow2';
> + push @$options, 'extended_l2=on','cluster_size=128k';
> +    };
> +    push @$options, preallocation_cmd_option($scfg, $fmt);
> +    push @$cmd, '-o', join(',', @$options) if @$options > 0;
> +    push @$cmd, '-f', $fmt, $path;
> +    push @$cmd, "${size}K" if !$backing;

>>is this because it will automatically take the size from the backing
>>image?

Yes, it take size from the backing.  (and refuse if you specify size
param at the same time than backing file)


> + my $path = $class->path($scfg, $volname, $storeid);
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + #rename current volume to snap volume
> + die "snapshot volume $snappath already exist\n" if -e $snappath;
> + rename($path, $snappath) if -e $path;

>>this is still looking weird.. I don't think it makes sense interface
>>wise to allow snapshotting a volume that doesn't even exist..

This is more by security, I'm still unsure of the behaviour if you have
multiple disks, and that snapshot is dying in the middle. (1 disk
rename, the other not renamed). 

> +
> + my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
> +     $class->parse_volname($volname);
> +
> + $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $name, undef,
> $snappath);
> + if ($@) {
> +     eval { $class->free_image($storeid, $scfg, $volname, 0) };
> +     warn $@ if $@;

>>missing cleanup - this should undo the rename from above

Do you have an idea how to do it with mutiple disk ?  
(revert renaming of other disks elsewhere in the code? just keep them
like this)? 



>  
> @@ -1187,9 +1251,15 @@ sub volume_snapshot_rollback {
>  
>      my $path = $class->filesystem_path($scfg, $volname);
>  
> -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> -
> -    run_command($cmd);
> +    if ($scfg->{snapext}) {
> + #simply delete the current snapshot and recreate it
> + my $path = $class->filesystem_path($scfg, $volname);
> + unlink($path);
> + $class->volume_snapshot($scfg, $storeid, $volname, $snap);

>>instead of volume_snapshot, this could simply call alloc_image with
>>the backing file? then volume_snapshot could always rename and always
>>cleanup properly..

Yes , better like this indeed

> 
> + } else {
> +     #we rebase the child image on the parent as new backing image

>>should we extend this to make it clear what this means? it means
>>copying any parts of $snap that are not in $parent and not yet
>>overwritten by $child into $child, right?
>>
yes,
intermediate snapshot: (rebase)
-------------------------------
snap1 (10G)-->snap2 (1G)----current(1G)
---> delete snap2

rebase current on snap1

snap1(10G)----->current(2G)


or

snap1 (10G)-->snap2 (1G)----> snap3 (1G)--->current(1G)
---> delete snap2

rebase snap3 on snap1

snap1 (10G)---> snap3 (2G)--->current(1G)



>>so how expensive this is depends on:
>>- how many changes are between $parent and $snap (increases cost)
>>- how many of those are overwritten by changes between $snap and
>>$child (decreases cost)


but yes, the final size of the child is not 100% the additional content
of the deleted snapshot, if some blocks has already been overwriten in
the child


so, we could call it: "merge diff content of the delete snap to the
childsnap"





>>+sub get_snap_volname {
>>+    my ($class, $volname, $snapname) = @_;
>>+
>>+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase,
>>$format) = $class->parse_volname($volname);
+    $name = !$snapname || $snapname eq 'current' ? $volname :
"$vmid/snap-$snapname-$name";

>>other way round would be better to group by volume first IMHO
($vmid/snap-$name-$snapname), as this is similar to how we encode
>>snapshots often on the storage level (volume@snap). we also need to
>>have some delimiter between snapshot and volume name that is not
>>allowed in either (hard for volname since basically everything but
>>'/' goes, but snapshots have a restricted character set (configid,
>>which means alphanumeric, hyphen and underscore), so we could use
>>something like '.' as delimiter? or we switch to directories and do
>>$vmid/snap/$snap/$name?)

Personnaly, I prefer a '.' delimiter than sub directory. (better to
have the info in the filename itself)




[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2025-04-02  8:10   ` Fabian Grünbichler
  2025-04-03  4:51     ` DERUMIER, Alexandre via pve-devel
                       ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-02  8:10 UTC (permalink / raw)
  To: Proxmox VE development discussion

commit description missing here as well..

I haven't tested this (or the first patches doing the blockdev conversion) yet, but I see a few bigger design/architecture issues left (besides FIXMEs for missing pieces that previously worked ;)):

- we should probably move the decision whether a snapshot is done on the storage layer or by qemu into the control of the storage plugin, especially since we are currently cleaning that API up to allow easier implementation of external plugins
- if we do that, we can also make "uses external qcow2 snapshots" a property of the storage plugin+config to replace hard-coded checks for the snapext property or lvm+qcow2
- there are a few operations here that should not call directly into the storage plugin code or do equivalent actions, but should rather get a proper interface in that storage plugin API

the first one is the renaming of a blockdev while it is used, which is currently done like this:
-- "link" snapshot path to make it available under old and new name
-- handle blockdev additions/reopening/backing-file updates/deletions on the qemu layer
-- remove old snapshot path link
-- if LVM, rename actual volume (for non-LVM, linking followed by unlinking the source is effectively a rename already)

I wonder whether that couldn't be made more straight-forward by doing
-- rename snapshot volume/image (qemu must already have the old name open anyway and should be able to continue using it)
-- do blockdev additions/reopening/backing-file updates/deletions on the qemu layer

or is there an issue/check in qemu somewhere that prevents this approach? if not, we could just introduce a "volume_snapshot_rename" or extend rename_volume with a snapshot parameter..

the second thing that happens is deleting a snapshot volume/path, without deleting the whole snapshot.. that one we could easily support by extending volume_snapshot_delete by extending the $running parameter (e.g., passing "2") or adding a new one to signify that all the housekeeping was already done, and just the actual snapshot volume should be deleted. this shouldn't be an issue provided all such calls are guarded by first checking that we are using external snapshots..

> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:29 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  PVE/QemuConfig.pm       |   4 +-
>  PVE/QemuServer.pm       | 226 +++++++++++++++++++++++++++++++++++++---
>  PVE/QemuServer/Drive.pm |   4 +
>  3 files changed, 220 insertions(+), 14 deletions(-)
> 
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index b60cc398..2b3acb15 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -377,7 +377,7 @@ sub __snapshot_create_vol_snapshot {
>  
>      print "snapshotting '$device' ($drive->{file})\n";
>  
> -    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
>  }
>  
>  sub __snapshot_delete_remove_drive {
> @@ -414,7 +414,7 @@ sub __snapshot_delete_vol_snapshot {
>      my $storecfg = PVE::Storage::config();
>      my $volid = $drive->{file};
>  
> -    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
>  
>      push @$unused, $volid;
>  }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 60481acc..6ce3e9c6 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4449,20 +4449,200 @@ sub qemu_block_resize {
>  }
>  
>  sub qemu_volume_snapshot {
> -    my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
> -
> -    if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
> -	mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;

forbidden syntax

> +    if ($do_snapshots_with_qemu) {
> +	if($do_snapshots_with_qemu == 2) {
> +	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +	    my $parent_snap = $snapshots->{'current'}->{parent};
> +	    my $size = PVE::Storage::volume_size_info($storecfg, $volid, 5);
> +	    blockdev_rename($storecfg, $vmid, $deviceid, $drive, 'current', $snap, $parent_snap);
> +	    blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap, $size);
> +	} else {
> +	    mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> +	}
>      } else {
>  	PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>      }
>  }
>  
> +sub blockdev_external_snapshot {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $size) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    #be sure to add drive in write mode
> +    delete($drive->{ro});

why?

> +
> +    my $new_file_blockdev = generate_file_blockdev($storecfg, $drive);
> +    my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_file_blockdev);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap);
> +
> +    #preallocate add a new current file with reference to backing-file
> +    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
> +    my $name = (PVE::Storage::parse_volname($storecfg, $volid))[1];
> +    PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'qcow2', $name, $size/1024, $snap_file_blockdev->{filename});

if we instead extend volume_snapshot similarly to what I describe up top (adding a parameter that renaming was already done), we don't need to extend vdisk_alloc's interface like this.. or maybe we could even combine blockdev_rename and blockdev_external_snapshot, to just call PVE::Storage::volume_snapshot to do rename+alloc, and then do the blockdev dance? in any case, this here would be the *only* external caller of vdisk_alloc with a backing file, so I don't think this is the right interface..

> +
> +    #backing need to be forced to undef in blockdev, to avoid reopen of backing-file on blockdev-add
> +    $new_fmt_blockdev->{backing} = undef;
> +
> +    PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
> +
> +    mon_cmd($vmid, 'blockdev-snapshot', node => $snap_fmt_blockdev->{'node-name'}, overlay => $new_fmt_blockdev->{'node-name'});
> +}
> +
> +sub blockdev_delete {
> +    my ($storecfg, $vmid, $drive, $file_blockdev, $fmt_blockdev) = @_;
> +
> +    #add eval as reopen is auto removing the old nodename automatically only if it was created at vm start in command line argument
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_blockdev->{'node-name'}) };
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_blockdev->{'node-name'}) };
> +
> +    #delete the file (don't use vdisk_free as we don't want to delete all snapshot chain)
> +    print"delete old $file_blockdev->{filename}\n";
> +
> +    my $storage_name = PVE::Storage::parse_volume_id($drive->{file});
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    if ($scfg->{type} eq 'lvm') {
> +	PVE::Storage::LVMPlugin::lvremove($file_blockdev->{filename});
> +    } else {
> +	unlink($file_blockdev->{filename});
> +    }

this really needs to be handled in the storage layer

> +}
> +
> +sub blockdev_rename {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap, $target_snap, $parent_snap) = @_;
> +
> +    print "rename $src_snap to $target_snap\n";
> +
> +    my $volid = $drive->{file};
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg, $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $src_file_blockdev, $src_snap);
> +    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
> +
> +    #create a hardlink
> +    link($src_file_blockdev->{filename}, $target_file_blockdev->{filename});

this really needs to be handled in the storage layer

> +
> +    if($target_snap eq 'current' || $src_snap eq 'current') {
> +	#rename from|to current
> +
> +	#add backing to target
> +	if ($parent_snap) {
> +	    my $parent_fmt_nodename = encode_nodename('fmt', $volid, $parent_snap);
> +	    $target_fmt_blockdev->{backing} = $parent_fmt_nodename;
> +	}
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> +
> +	#reopen the current throttlefilter nodename with the target fmt nodename
> +	my $drive_blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
> +	delete $drive_blockdev->{file};
> +	$drive_blockdev->{file} = $target_fmt_blockdev->{'node-name'};

these two lines can be a single line

> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$drive_blockdev]);
> +    } else {
> +	#intermediate snapshot
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> +
> +	#reopen the parent node with the new target fmt backing node
> +	my $parent_file_blockdev = generate_file_blockdev($storecfg, $drive, $parent_snap);
> +	my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_file_blockdev, $parent_snap);
> +	$parent_fmt_blockdev->{backing} = $target_fmt_blockdev->{'node-name'};
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
> +
> +	#change backing-file in qcow2 metadatas
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_blockdev->{'node-name'}, 'backing-file' => $target_file_blockdev->{filename});
> +    }
> +
> +    # delete old file|fmt nodes
> +    # add eval as reopen is auto removing the old nodename automatically only if it was created at vm start in command line argument

ugh..

> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_blockdev->{'node-name'})};
> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_blockdev->{'node-name'})};
> +
> +    unlink($src_file_blockdev->{filename});

same as above

> +
> +    #rename underlay
> +    my $storage_name = PVE::Storage::parse_volume_id($volid);
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    return if $scfg->{type} ne 'lvm';
> +
> +    print "rename underlay lvm volume $src_file_blockdev->{filename} to $target_file_blockdev->{filename}\n";
> +    PVE::Storage::LVMPlugin::lvrename(undef, $src_file_blockdev->{filename}, $target_file_blockdev->{filename});

absolute no-go, this needs to be handled in the storage layer

> +}
> +
> +sub blockdev_commit {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap, $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    print "block-commit $src_snap to base:$target_snap\n";
> +    $src_snap = undef if $src_snap && $src_snap eq 'current';
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg, $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $src_file_blockdev, $src_snap);
> +
> +    my $job_id = "commit-$deviceid";
> +    my $jobs = {};
> +    my $opts = { 'job-id' => $job_id, device => $deviceid };
> +
> +    my $complete = undef;
> +    if ($src_snap) {
> +	$complete = 'auto';
> +	$opts->{'top-node'} = $src_fmt_blockdev->{'node-name'};
> +	$opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> +    } else {
> +	$complete = 'complete';
> +	$opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> +	$opts->{replaces} = $src_fmt_blockdev->{'node-name'};
> +    }
> +
> +    mon_cmd($vmid, "block-commit", %$opts);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor ($vmid, undef, $jobs, $complete, 0, 'commit');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $src_file_blockdev, $src_fmt_blockdev);
> +}
> +
> +sub blockdev_stream {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $parent_snap, $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +    $target_snap = undef if $target_snap eq 'current';
> +
> +    my $parent_file_blockdev = generate_file_blockdev($storecfg, $drive, $parent_snap);
> +    my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_file_blockdev, $parent_snap);
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap);
> +
> +    my $job_id = "stream-$deviceid";
> +    my $jobs = {};
> +    my $options = { 'job-id' => $job_id, device => $target_fmt_blockdev->{'node-name'} };
> +    $options->{'base-node'} = $parent_fmt_blockdev->{'node-name'};
> +    $options->{'backing-file'} = $parent_file_blockdev->{filename};
> +
> +    mon_cmd($vmid, 'block-stream', %$options);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'stream');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $snap_file_blockdev, $snap_fmt_blockdev);
> +}
> +
>  sub qemu_volume_snapshot_delete {
> -    my ($vmid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
>      my $attached_deviceid;
>  
> @@ -4474,13 +4654,35 @@ sub qemu_volume_snapshot_delete {
>  	});
>      }
>  
> -    if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
> -	mon_cmd(
> -	    $vmid,
> -	    'blockdev-snapshot-delete-internal-sync',
> -	    device => $attached_deviceid,
> -	    name => $snap,
> -	);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
> +    if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> +	if ($do_snapshots_with_qemu == 2) {
> +
> +	    my $path = PVE::Storage::path($storecfg, $volid);
> +	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +	    my $parentsnap = $snapshots->{$snap}->{parent};
> +	    my $childsnap = $snapshots->{$snap}->{child};
> +
> +	    # if we delete the first snasphot, we commit because the first snapshot original base image, it should be big.
> +            # improve-me: if firstsnap > child : commit, if firstsnap < child do a stream.
> +	    if(!$parentsnap) {
> +		print"delete first snapshot $snap\n";
> +		blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childsnap, $snap);
> +		blockdev_rename($storecfg, $vmid, $attached_deviceid, $drive, $snap, $childsnap, $snapshots->{$childsnap}->{child});
> +	    } else {
> +		#intermediate snapshot, we always stream the snapshot to child snapshot
> +		print"stream intermediate snapshot $snap to $childsnap\n";
> +		blockdev_stream($storecfg, $vmid, $attached_deviceid, $drive, $snap, $parentsnap, $childsnap);
> +	    }
> +	} else {
> +	    mon_cmd(
> +	        $vmid,
> +		'blockdev-snapshot-delete-internal-sync',
> +		device => $attached_deviceid,
> +		name => $snap,
> +	    );
> +	}
>      } else {
>  	PVE::Storage::volume_snapshot_delete(
>  	    $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
> index 51513546..7ba401bd 100644
> --- a/PVE/QemuServer/Drive.pm
> +++ b/PVE/QemuServer/Drive.pm
> @@ -1117,6 +1117,8 @@ sub print_drive_throttle_group {
>  sub generate_file_blockdev {
>      my ($storecfg, $drive, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      my $blockdev = {};
>  
> @@ -1260,6 +1262,8 @@ sub do_snapshots_with_qemu {
>  sub generate_format_blockdev {
>      my ($storecfg, $drive, $file, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      die "format_blockdev can't be used for nbd" if $volid =~ /^nbd:/;
>  
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support
  2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
@ 2025-04-02  8:10   ` Fabian Grünbichler
  0 siblings, 0 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-02  8:10 UTC (permalink / raw)
  To: Proxmox VE development discussion


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:29 CET geschrieben:

> We need to define name-nodes for all backing chain images,
> to be able to live rename them with blockdev-reopen
> 
> For linked clone, we don't need to definebase image(s) chain.
> They are auto added with #block nodename.
> 
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  PVE/QemuServer.pm       | 26 ++----------------
>  PVE/QemuServer/Drive.pm | 60 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 62 insertions(+), 24 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index d6aa5730..60481acc 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -54,7 +54,7 @@ use PVE::QemuServer::Helpers qw(config_aware_timeout min_version kvm_user_versio
>  use PVE::QemuServer::Cloudinit;
>  use PVE::QemuServer::CGroup;
>  use PVE::QemuServer::CPUConfig qw(print_cpu_device get_cpu_options get_cpu_bitness is_native_arch get_amd_sev_object);
> -use PVE::QemuServer::Drive qw(is_valid_drivename checked_volume_format drive_is_cloudinit drive_is_cdrom drive_is_read_only parse_drive print_drive print_drive_throttle_group generate_drive_blockdev);
> +use PVE::QemuServer::Drive qw(is_valid_drivename checked_volume_format drive_is_cloudinit drive_is_cdrom drive_is_read_only parse_drive print_drive print_drive_throttle_group generate_drive_blockdev do_snapshots_with_qemu);
>  use PVE::QemuServer::Machine;
>  use PVE::QemuServer::Memory qw(get_current_memory);
>  use PVE::QemuServer::MetaInfo;
> @@ -3765,6 +3765,7 @@ sub config_to_command {
>  	# extra protection for templates, but SATA and IDE don't support it..
>  	$drive->{ro} = 1 if drive_is_read_only($conf, $drive);
>  	my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive, $live_blockdev_name);
> +	#FIXME: verify if external snapshot backing chain is matching config

this is rather important (also checking for loops?)

>  	push @$devices, '-blockdev', JSON->new->canonical->allow_nonref->encode($blockdev) if $blockdev;
>  	push @$devices, '-device', print_drivedevice_full(
>  	    $storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
> @@ -7559,29 +7560,6 @@ sub foreach_storage_used_by_vm {
>      }
>  }
>  
> -my $qemu_snap_storage = {
> -    rbd => 1,
> -};
> -sub do_snapshots_with_qemu {
> -    my ($storecfg, $volid, $deviceid) = @_;
> -
> -    return if $deviceid =~ m/tpmstate0/;
> -
> -    my $storage_name = PVE::Storage::parse_volume_id($volid);
> -    my $scfg = $storecfg->{ids}->{$storage_name};
> -    die "could not find storage '$storage_name'\n" if !defined($scfg);
> -
> -    if ($qemu_snap_storage->{$scfg->{type}} && !$scfg->{krbd}){
> -	return 1;
> -    }
> -
> -    if ($volid =~ m/\.(qcow2|qed)$/){
> -	return 1;
> -    }
> -
> -    return;
> -}
> -
>  sub qga_check_running {
>      my ($vmid, $nowarn) = @_;
>  
> diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
> index 5b281616..51513546 100644
> --- a/PVE/QemuServer/Drive.pm
> +++ b/PVE/QemuServer/Drive.pm
> @@ -18,6 +18,7 @@ our @EXPORT_OK = qw(
>  is_valid_drivename
>  checked_parse_volname
>  checked_volume_format
> +do_snapshots_with_qemu
>  drive_is_cloudinit
>  drive_is_cdrom
>  drive_is_read_only
> @@ -1230,6 +1231,32 @@ sub generate_file_blockdev {
>      return $blockdev;
>  }
>  
> +my $qemu_snap_storage = {
> +    rbd => 1,
> +};
> +
> +sub do_snapshots_with_qemu {
> +    my ($storecfg, $volid, $deviceid) = @_;
> +
> +    return if $deviceid =~ m/tpmstate0/;
> +
> +    my $storage_name = PVE::Storage::parse_volume_id($volid);
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    die "could not find storage '$storage_name'\n" if !defined($scfg);
> +
> +    if ($qemu_snap_storage->{$scfg->{type}} && !$scfg->{krbd}){
> +        return 1;
> +    }
> +
> +    return 2 if $scfg->{snapext} || $scfg->{type} eq 'lvm' && $volid =~ m/\.(qcow2)/;
> +
> +    if ($volid =~ m/\.(qcow2|qed)$/){
> +        return 1;
> +    }

I wonder whether we want to delegate this to the storage plugin via volume_has_feature? we already tell the plugin there which volname and whether the guest is running or not.. and for example, the base Plugin will return 1 when deleting a snapshot if the guest is running, or when resizing a volume, the RBDPlugin will return 1 when resizing if librbd is used and the guest is running..

might make more sense to have a good interface and give control to the plugin rather than special casing this here and excluding external plugins?

> +
> +    return;
> +}
> +
>  sub generate_format_blockdev {
>      my ($storecfg, $drive, $file, $snap, $nodename) = @_;
>  
> @@ -1272,6 +1299,37 @@ sub generate_format_blockdev {
>      return $blockdev;
>  }
>  
> +sub generate_backing_blockdev {
> +    my ($storecfg, $snapshots, $deviceid, $drive, $snap_id) = @_;

$deviceid is only used to pass it to the recursive invocation below..

> +
> +    my $snapshot = $snapshots->{$snap_id};
> +    my $parentid = $snapshot->{parent};
> +
> +    my $volid = $drive->{file};
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap_id);
> +    $snap_file_blockdev->{filename} = $snapshot->{file};
> +    $drive->{ro} = 1;
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap_id);
> +    $snap_fmt_blockdev->{backing} = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
> +    return $snap_fmt_blockdev;
> +}
> +
> +sub generate_backing_chain_blockdev {
> +    my ($storecfg, $deviceid, $drive) = @_;
> +
> +    my $volid = $drive->{file};
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid);
> +    return if !$do_snapshots_with_qemu || $do_snapshots_with_qemu != 2;
> +
> +    my $chain_blockdev = undef;
> +    PVE::Storage::activate_volumes($storecfg, [$volid]);
> +    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +    my $parentid = $snapshots->{'current'}->{parent};
> +    $chain_blockdev = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
> +    return $chain_blockdev;
> +}
> +
>  sub generate_drive_blockdev {
>      my ($storecfg, $vmid, $drive, $live_restore_name) = @_;
>  
> @@ -1293,6 +1351,8 @@ sub generate_drive_blockdev {
>  
>      my $blockdev_file = generate_file_blockdev($storecfg, $drive);
>      my $blockdev_format = generate_format_blockdev($storecfg, $drive, $blockdev_file);
> +    my $backing_chain  = generate_backing_chain_blockdev($storecfg, "drive-$drive_id", $drive);
> +    $blockdev_format->{backing} = $backing_chain if $backing_chain;
>  
>      my $blockdev_live_restore = undef;
>      if ($live_restore_name) {
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
  2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
@ 2025-04-02  8:10   ` Fabian Grünbichler
  0 siblings, 0 replies; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-02  8:10 UTC (permalink / raw)
  To: Proxmox VE development discussion


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:
> Look at qdev value, as cdrom drives can be empty
> without any inserted media
> 
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  PVE/QemuServer.pm | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index faa17edb..5ccc026a 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -3937,11 +3937,12 @@ sub vm_devices_list {
>  	$devices_to_check = $to_check;
>      }
>  
> +    #block devices need to be queried at qdev level, as a device
> +    #don't always have a blockdev drive media attached (cdrom for example)
>      my $resblock = mon_cmd($vmid, 'query-block');
> -    foreach my $block (@$resblock) {
> -	if($block->{device} =~ m/^drive-(\S+)/){
> -		$devices->{$1} = 1;
> -	}
> +    $resblock = { map { $_->{qdev} => $_ } $resblock->@* };

here you map the full thing

> +    foreach my $blockid (keys %$resblock) {
> +	$devices->{$blockid} = 1;

but you are only interested in the IDs anyway?

so you could just do it like above for PCI devices:


foreach my $block (@$resblock) {
    my $qdev_id = $block->{qdev};
    $devices->{$qdev_id} = 1 if $qdev_id;
}

that way you don't need to loop twice ;)

this now returns more devices than before/different values for some cases? e.g., efidisk0 has a really different qdev ID compared to its device name, the UEFI firmware (pflash0) is also contained even though it is not a drive, and that's just for the first test VM I checked ;)

would it maybe make more sense to special-case the CD drive(s) here?

>      }
>  
>      my $resmice = mon_cmd($vmid, 'query-mice');
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
       [not found]     ` <0e2cd118f35aa8d4c410d362fea1a1b366df1570.camel@groupe-cyllene.com>
@ 2025-04-02  8:28       ` Fabian Grünbichler
  2025-04-03  4:27         ` DERUMIER, Alexandre via pve-devel
  0 siblings, 1 reply; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-02  8:28 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel


> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 02.04.2025 10:01 CEST geschrieben:
> 
>  
> >  
> > @@ -716,7 +721,11 @@ sub filesystem_path {
> >  
> >      my $dir = $class->get_subdir($scfg, $vtype);
> >  
> > -    $dir .= "/$vmid" if $vtype eq 'images';
> > +    if ($scfg->{snapext} && $snapname) {
> > + $name = $class->get_snap_volname($volname, $snapname);
> > +    } else {
> > + $dir .= "/$vmid" if $vtype eq 'images';
> > +    }
> 
> >>this is a bit weird, as it mixes volnames (with the `$vmid/` prefix)
> >>and names (without), it's only called twice in this patch, and this
> >>here already has $volname parsed, so could we maybe let
> >>get_snap_volname take and return the $name part without the dir?
> 
> ok!
> 
> >  
> >      my $path = "$dir/$name";
> >  
> > @@ -873,7 +882,7 @@ sub clone_image {
> >  }
> >  
> >  sub alloc_image {
> > -    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> > +    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size,
> > $backing) = @_;
> 
> >>this extends the storage API, so it should actually do that.. and
> >>probably $backing should not be an arbitrary path, but something that
> >>is resolved locally?
> 
> I'll send the $snapname as param instead

see my comments on the qemu-server side, I think it would be even better if we could just get rid of extending alloc_image like this, and instead always go via volume_snapshot..

> 
> 
> > +
> > +    if($backing) {
> > + push @$cmd, '-b', $backing, '-F', 'qcow2';
> > + push @$options, 'extended_l2=on','cluster_size=128k';
> > +    };
> > +    push @$options, preallocation_cmd_option($scfg, $fmt);
> > +    push @$cmd, '-o', join(',', @$options) if @$options > 0;
> > +    push @$cmd, '-f', $fmt, $path;
> > +    push @$cmd, "${size}K" if !$backing;
> 
> >>is this because it will automatically take the size from the backing
> >>image?
> 
> Yes, it take size from the backing.  (and refuse if you specify size
> param at the same time than backing file)

we pass a size and a backing file in qemu-server, so I guess that is wrong there? ;)

> 
> 
> > + my $path = $class->path($scfg, $volname, $storeid);
> > + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> > + #rename current volume to snap volume
> > + die "snapshot volume $snappath already exist\n" if -e $snappath;
> > + rename($path, $snappath) if -e $path;
> 
> >>this is still looking weird.. I don't think it makes sense interface
> >>wise to allow snapshotting a volume that doesn't even exist..
> 
> This is more by security, I'm still unsure of the behaviour if you have
> multiple disks, and that snapshot is dying in the middle. (1 disk
> rename, the other not renamed). 

I am not sure what we gain by this other than papering over issues.

for multi-disks what we'd need to do is:
- loop over volumes
-- take a snapshot of volume
-- as part of error handling directly in taking a snapshot, undo all steps done until the point of error
- if an error occurs, loop over all already correctly snapshotted volumes
-- undo snapshot of each such volume

> 
> > +
> > + my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
> > +     $class->parse_volname($volname);
> > +
> > + $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $name, undef,
> > $snappath);
> > + if ($@) {
> > +     eval { $class->free_image($storeid, $scfg, $volname, 0) };
> > +     warn $@ if $@;
> 
> >>missing cleanup - this should undo the rename from above
> 
> Do you have an idea how to do it with mutiple disk ?  
> (revert renaming of other disks elsewhere in the code? just keep them
> like this)? 

see above, the volume_snapshot task should clean up what it did up to the point of error, and its caller is responsible for undoing already completed volume_snapshot calls. obviously this won't always work (for example, if the snapshot fails because the storage has a problem, it's very likely that undoing things is not possible either and manual cleanup will be required)

> 
> 
> >  
> > @@ -1187,9 +1251,15 @@ sub volume_snapshot_rollback {
> >  
> >      my $path = $class->filesystem_path($scfg, $volname);
> >  
> > -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> > -
> > -    run_command($cmd);
> > +    if ($scfg->{snapext}) {
> > + #simply delete the current snapshot and recreate it
> > + my $path = $class->filesystem_path($scfg, $volname);
> > + unlink($path);
> > + $class->volume_snapshot($scfg, $storeid, $volname, $snap);
> 
> >>instead of volume_snapshot, this could simply call alloc_image with
> >>the backing file? then volume_snapshot could always rename and always
> >>cleanup properly..
> 
> Yes , better like this indeed

I think alloc_image should be split into an internal helper that takes the backing file parameter, and the existing alloc_image that shouldn't get that parameter though

> 
> > 
> > + } else {
> > +     #we rebase the child image on the parent as new backing image
> 
> >>should we extend this to make it clear what this means? it means
> >>copying any parts of $snap that are not in $parent and not yet
> >>overwritten by $child into $child, right?
> >>
> yes,
> intermediate snapshot: (rebase)
> -------------------------------
> snap1 (10G)-->snap2 (1G)----current(1G)
> ---> delete snap2
> 
> rebase current on snap1
> 
> snap1(10G)----->current(2G)
> 
> 
> or
> 
> snap1 (10G)-->snap2 (1G)----> snap3 (1G)--->current(1G)
> ---> delete snap2
> 
> rebase snap3 on snap1
> 
> snap1 (10G)---> snap3 (2G)--->current(1G)
> 
> 
> 
> >>so how expensive this is depends on:
> >>- how many changes are between $parent and $snap (increases cost)
> >>- how many of those are overwritten by changes between $snap and
> >>$child (decreases cost)
> 
> 
> but yes, the final size of the child is not 100% the additional content
> of the deleted snapshot, if some blocks has already been overwriten in
> the child
> 
> 
> so, we could call it: "merge diff content of the delete snap to the
> childsnap"
> 
> 
> 
> 
> 
> >>+sub get_snap_volname {
> >>+    my ($class, $volname, $snapname) = @_;
> >>+
> >>+    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase,
> >>$format) = $class->parse_volname($volname);
> +    $name = !$snapname || $snapname eq 'current' ? $volname :
> "$vmid/snap-$snapname-$name";
> 
> >>other way round would be better to group by volume first IMHO
> ($vmid/snap-$name-$snapname), as this is similar to how we encode
> >>snapshots often on the storage level (volume@snap). we also need to
> >>have some delimiter between snapshot and volume name that is not
> >>allowed in either (hard for volname since basically everything but
> >>'/' goes, but snapshots have a restricted character set (configid,
> >>which means alphanumeric, hyphen and underscore), so we could use
> >>something like '.' as delimiter? or we switch to directories and do
> >>$vmid/snap/$snap/$name?)
> 
> Personnaly, I prefer a '.' delimiter than sub directory. (better to
> have the info in the filename itself)

we can still bikeshed this later once other issue are ironed out, I just wanted to note it - it's very unfortunate that volume names currently are so unrestricted, but we have to live with it for now. I just realized that even snap-XXX.qcow2 would be recognized/returned as image, so we need something else or filter them out anyway.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
  2025-04-02  8:28       ` Fabian Grünbichler
@ 2025-04-03  4:27         ` DERUMIER, Alexandre via pve-devel
  0 siblings, 0 replies; 34+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-03  4:27 UTC (permalink / raw)
  To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 15495 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
Date: Thu, 3 Apr 2025 04:27:59 +0000
Message-ID: <77db5694d7964d03a0673e3b8764a54a2e5c4147.camel@groupe-cyllene.com>

> >
> > > <this extends the storage API, so it should actually do that..
> > > and
> > > probably $backing should not be an arbitrary path, but something
> > > that
> > > is resolved locally?
> 
> I'll send the $snapname as param instead

>>see my comments on the qemu-server side, I think it would be even
>>better if we could just get rid of extending alloc_image like this,
>>and instead always go via volume_snapshot..

> 
> > > is this because it will automatically take the size from the
> > > backing
> > > image?
> 
> Yes, it take size from the backing.  (and refuse if you specify size
> param at the same time than backing file)

>>we pass a size and a backing file in qemu-server, so I guess that is
>>wrong there? ;)



About this part,  for the lvm plugin,  both size && backing is used.
size is used for allocate lvm device (but not in qcow2 format part,
where backing is ued).


Alloc_image is used by qemu-server to allocate the lvm volume, without
doing the volume rename part from volume_snapshot. (Because it must be
done in qemu-server to be able to rename online the volume).


so if you want to use volume_snapshot  from qemu-server, we need an
option to tell him to not rename because it's already done.
(previously, I was using it, and that why I was using  (if -e
volume_...), to continue if the volume was already renamed by qemu-
server.





> 


> This is more by security, I'm still unsure of the behaviour if you
> have
> multiple disks, and that snapshot is dying in the middle. (1 disk
> rename, the other not renamed). 

>>I am not sure what we gain by this other than papering over issues.
>>
>>for multi-disks what we'd need to do is:
>>- loop over volumes
>>-- take a snapshot of volume
>>-- as part of error handling directly in taking a snapshot, undo all
>>steps done until the point of error
>>- if an error occurs, loop over all already correctly snapshotted
>>volumes
>>-- undo snapshot of each such volume

ok, got it! I'll look to it for next patch version




[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
  2025-04-02  8:10   ` Fabian Grünbichler
@ 2025-04-03  4:51     ` DERUMIER, Alexandre via pve-devel
  2025-04-04 11:31     ` DERUMIER, Alexandre via pve-devel
       [not found]     ` <3e516016a970e52e5a1014dbcd6cf9507581da74.camel@groupe-cyllene.com>
  2 siblings, 0 replies; 34+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-03  4:51 UTC (permalink / raw)
  To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 36584 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
Date: Thu, 3 Apr 2025 04:51:33 +0000
Message-ID: <c18b5150ae115610b545cf41ad642731783ba130.camel@groupe-cyllene.com>

>>- we should probably move the decision whether a snapshot is done on
>>the storage layer or by qemu into the control of the storage plugin,
>>especially since we are currently cleaning that API up to allow
>>easier implementation of external plugins

Agree with that indeed, if don't like to have hardcorded supported
storage too.

- if we do that, we can also make "uses external qcow2 snapshots" a
property of the storage plugin+config to replace hard-coded checks for
the snapext property or lvm+qcow2
- there are a few operations here that should not call directly into
the storage plugin code or do equivalent actions, but should rather get
a proper interface in that storage plugin API


>>the first one is the renaming of a blockdev while it is used, which
>>is currently done like this:
>>-- "link" snapshot path to make it available under old and new name
>>-- handle blockdev additions/reopening/backing-file updates/deletions
>>on the qemu layer
>>-- remove old snapshot path link
>>-- if LVM, rename actual volume (for non-LVM, linking followed by
>>unlinking the source is effectively a rename already)

>>I wonder whether that couldn't be made more straight-forward by doing
>>-- rename snapshot volume/image (qemu must already have the old name
>>open anyway and should be able to continue using it)
>>-- do blockdev additions/reopening/backing-file updates/deletions on
>>the qemu layer

>>or is there an issue/check in qemu somewhere that prevents this
>>approach? 

I'll do test to verify.

>>if not, we could just introduce a "volume_snapshot_rename" or extend
>>rename_volume with a snapshot parameter..

ok!


>>the second thing that happens is deleting a snapshot volume/path,
>>without deleting the whole snapshot.. that one we could easily
>>support by extending volume_snapshot_delete by extending the $running
>>parameter (e.g., passing "2") or adding a new one to signify that all
>>the housekeeping was already done, and just the actual snapshot
>>volume should be deleted. this shouldn't be an issue provided all
>>such calls are guarded by first checking that we are using external
>>snapshots..
ok!

> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 11.03.2025 11:29 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-
> cyllene.com>
> ---
>  PVE/QemuConfig.pm       |   4 +-
>  PVE/QemuServer.pm       | 226 +++++++++++++++++++++++++++++++++++++-
> --
>  PVE/QemuServer/Drive.pm |   4 +
>  3 files changed, 220 insertions(+), 14 deletions(-)
> 
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index b60cc398..2b3acb15 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -377,7 +377,7 @@ sub __snapshot_create_vol_snapshot {
>  
>      print "snapshotting '$device' ($drive->{file})\n";
>  
> -    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg,
> $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg,
> $drive, $snapname);
>  }
>  
>  sub __snapshot_delete_remove_drive {
> @@ -414,7 +414,7 @@ sub __snapshot_delete_vol_snapshot {
>      my $storecfg = PVE::Storage::config();
>      my $volid = $drive->{file};
>  
> -    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg,
> $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg,
> $drive, $snapname);
>  
>      push @$unused, $volid;
>  }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 60481acc..6ce3e9c6 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4449,20 +4449,200 @@ sub qemu_block_resize {
>  }
>  
>  sub qemu_volume_snapshot {
> -    my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
> -
> -    if ($running && do_snapshots_with_qemu($storecfg, $volid,
> $deviceid)) {
> - mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device =>
> $deviceid, name => $snap);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg,
> $volid, $deviceid) if $running;

forbidden syntax

> +    if ($do_snapshots_with_qemu) {
> + if($do_snapshots_with_qemu == 2) {
> +     my $snapshots = PVE::Storage::volume_snapshot_info($storecfg,
> $volid);
> +     my $parent_snap = $snapshots->{'current'}->{parent};
> +     my $size = PVE::Storage::volume_size_info($storecfg, $volid,
> 5);
> +     blockdev_rename($storecfg, $vmid, $deviceid, $drive, 'current',
> $snap, $parent_snap);
> +     blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive,
> $snap, $size);
> + } else {
> +     mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device =>
> $deviceid, name => $snap);
> + }
>      } else {
>   PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>      }
>  }
>  
> +sub blockdev_external_snapshot {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $size) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    #be sure to add drive in write mode
> +    delete($drive->{ro});

why?

> +
> +    my $new_file_blockdev = generate_file_blockdev($storecfg,
> $drive);
> +    my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $new_file_blockdev);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $snap_file_blockdev, $snap);
> +
> +    #preallocate add a new current file with reference to backing-
> file
> +    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
> +    my $name = (PVE::Storage::parse_volname($storecfg, $volid))[1];
> +    PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'qcow2',
> $name, $size/1024, $snap_file_blockdev->{filename});

if we instead extend volume_snapshot similarly to what I describe up
top (adding a parameter that renaming was already done), we don't need
to extend vdisk_alloc's interface like this.. or maybe we could even
combine blockdev_rename and blockdev_external_snapshot, to just call
PVE::Storage::volume_snapshot to do rename+alloc, and then do the
blockdev dance? in any case, this here would be the *only* external
caller of vdisk_alloc with a backing file, so I don't think this is the
right interface..

> +
> +    #backing need to be forced to undef in blockdev, to avoid reopen
> of backing-file on blockdev-add
> +    $new_fmt_blockdev->{backing} = undef;
> +
> +    PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$new_fmt_blockdev);
> +
> +    mon_cmd($vmid, 'blockdev-snapshot', node => $snap_fmt_blockdev-
> >{'node-name'}, overlay => $new_fmt_blockdev->{'node-name'});
> +}
> +
> +sub blockdev_delete {
> +    my ($storecfg, $vmid, $drive, $file_blockdev, $fmt_blockdev) =
> @_;
> +
> +    #add eval as reopen is auto removing the old nodename
> automatically only if it was created at vm start in command line
> argument
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' =>
> $file_blockdev->{'node-name'}) };
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' =>
> $fmt_blockdev->{'node-name'}) };
> +
> +    #delete the file (don't use vdisk_free as we don't want to
> delete all snapshot chain)
> +    print"delete old $file_blockdev->{filename}\n";
> +
> +    my $storage_name = PVE::Storage::parse_volume_id($drive-
> >{file});
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    if ($scfg->{type} eq 'lvm') {
> + PVE::Storage::LVMPlugin::lvremove($file_blockdev->{filename});
> +    } else {
> + unlink($file_blockdev->{filename});
> +    }

this really needs to be handled in the storage layer

> +}
> +
> +sub blockdev_rename {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap,
> $target_snap, $parent_snap) = @_;
> +
> +    print "rename $src_snap to $target_snap\n";
> +
> +    my $volid = $drive->{file};
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $src_file_blockdev, $src_snap);
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    #create a hardlink
> +    link($src_file_blockdev->{filename}, $target_file_blockdev-
> >{filename});

this really needs to be handled in the storage layer

> +
> +    if($target_snap eq 'current' || $src_snap eq 'current') {
> + #rename from|to current
> +
> + #add backing to target
> + if ($parent_snap) {
> +     my $parent_fmt_nodename = encode_nodename('fmt', $volid,
> $parent_snap);
> +     $target_fmt_blockdev->{backing} = $parent_fmt_nodename;
> + }
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$target_fmt_blockdev);
> +
> + #reopen the current throttlefilter nodename with the target fmt
> nodename
> + my $drive_blockdev = generate_drive_blockdev($storecfg, $vmid,
> $drive);
> + delete $drive_blockdev->{file};
> + $drive_blockdev->{file} = $target_fmt_blockdev->{'node-name'};

these two lines can be a single line

> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options
> => [$drive_blockdev]);
> +    } else {
> + #intermediate snapshot
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$target_fmt_blockdev);
> +
> + #reopen the parent node with the new target fmt backing node
> + my $parent_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $parent_snap);
> + my $parent_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $parent_file_blockdev, $parent_snap);
> + $parent_fmt_blockdev->{backing} = $target_fmt_blockdev->{'node-
> name'};
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options
> => [$parent_fmt_blockdev]);
> +
> + #change backing-file in qcow2 metadatas
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file',
> device => $deviceid, 'image-node-name' => $parent_fmt_blockdev-
> >{'node-name'}, 'backing-file' => $target_file_blockdev->{filename});
> +    }
> +
> +    # delete old file|fmt nodes
> +    # add eval as reopen is auto removing the old nodename
> automatically only if it was created at vm start in command line
> argument

ugh..

> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del',
> 'node-name' => $src_file_blockdev->{'node-name'})};
> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del',
> 'node-name' => $src_fmt_blockdev->{'node-name'})};
> +
> +    unlink($src_file_blockdev->{filename});

same as above

> +
> +    #rename underlay
> +    my $storage_name = PVE::Storage::parse_volume_id($volid);
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    return if $scfg->{type} ne 'lvm';
> +
> +    print "rename underlay lvm volume $src_file_blockdev->{filename}
> to $target_file_blockdev->{filename}\n";
> +    PVE::Storage::LVMPlugin::lvrename(undef, $src_file_blockdev-
> >{filename}, $target_file_blockdev->{filename});

absolute no-go, this needs to be handled in the storage layer

> +}
> +
> +sub blockdev_commit {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap,
> $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    print "block-commit $src_snap to base:$target_snap\n";
> +    $src_snap = undef if $src_snap && $src_snap eq 'current';
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $src_file_blockdev, $src_snap);
> +
> +    my $job_id = "commit-$deviceid";
> +    my $jobs = {};
> +    my $opts = { 'job-id' => $job_id, device => $deviceid };
> +
> +    my $complete = undef;
> +    if ($src_snap) {
> + $complete = 'auto';
> + $opts->{'top-node'} = $src_fmt_blockdev->{'node-name'};
> + $opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> +    } else {
> + $complete = 'complete';
> + $opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> + $opts->{replaces} = $src_fmt_blockdev->{'node-name'};
> +    }
> +
> +    mon_cmd($vmid, "block-commit", %$opts);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor ($vmid, undef, $jobs, $complete, 0,
> 'commit');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $src_file_blockdev,
> $src_fmt_blockdev);
> +}
> +
> +sub blockdev_stream {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $parent_snap,
> $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +    $target_snap = undef if $target_snap eq 'current';
> +
> +    my $parent_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $parent_snap);
> +    my $parent_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $parent_file_blockdev, $parent_snap);
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $snap_file_blockdev, $snap);
> +
> +    my $job_id = "stream-$deviceid";
> +    my $jobs = {};
> +    my $options = { 'job-id' => $job_id, device =>
> $target_fmt_blockdev->{'node-name'} };
> +    $options->{'base-node'} = $parent_fmt_blockdev->{'node-name'};
> +    $options->{'backing-file'} = $parent_file_blockdev->{filename};
> +
> +    mon_cmd($vmid, 'block-stream', %$options);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0,
> 'stream');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $snap_file_blockdev,
> $snap_fmt_blockdev);
> +}
> +
>  sub qemu_volume_snapshot_delete {
> -    my ($vmid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
>      my $attached_deviceid;
>  
> @@ -4474,13 +4654,35 @@ sub qemu_volume_snapshot_delete {
>   });
>      }
>  
> -    if ($attached_deviceid && do_snapshots_with_qemu($storecfg,
> $volid, $attached_deviceid)) {
> - mon_cmd(
> -     $vmid,
> -     'blockdev-snapshot-delete-internal-sync',
> -     device => $attached_deviceid,
> -     name => $snap,
> - );
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg,
> $volid, $attached_deviceid) if $running;
> +    if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> + if ($do_snapshots_with_qemu == 2) {
> +
> +     my $path = PVE::Storage::path($storecfg, $volid);
> +     my $snapshots = PVE::Storage::volume_snapshot_info($storecfg,
> $volid);
> +     my $parentsnap = $snapshots->{$snap}->{parent};
> +     my $childsnap = $snapshots->{$snap}->{child};
> +
> +     # if we delete the first snasphot, we commit because the first
> snapshot original base image, it should be big.
> +            # improve-me: if firstsnap > child : commit, if
> firstsnap < child do a stream.
> +     if(!$parentsnap) {
> + print"delete first snapshot $snap\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive,
> $childsnap, $snap);
> + blockdev_rename($storecfg, $vmid, $attached_deviceid, $drive,
> $snap, $childsnap, $snapshots->{$childsnap}->{child});
> +     } else {
> + #intermediate snapshot, we always stream the snapshot to child
> snapshot
> + print"stream intermediate snapshot $snap to $childsnap\n";
> + blockdev_stream($storecfg, $vmid, $attached_deviceid, $drive,
> $snap, $parentsnap, $childsnap);
> +     }
> + } else {
> +     mon_cmd(
> +         $vmid,
> + 'blockdev-snapshot-delete-internal-sync',
> + device => $attached_deviceid,
> + name => $snap,
> +     );
> + }
>      } else {
>   PVE::Storage::volume_snapshot_delete(
>       $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
> index 51513546..7ba401bd 100644
> --- a/PVE/QemuServer/Drive.pm
> +++ b/PVE/QemuServer/Drive.pm
> @@ -1117,6 +1117,8 @@ sub print_drive_throttle_group {
>  sub generate_file_blockdev {
>      my ($storecfg, $drive, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      my $blockdev = {};
>  
> @@ -1260,6 +1262,8 @@ sub do_snapshots_with_qemu {
>  sub generate_format_blockdev {
>      my ($storecfg, $drive, $file, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      die "format_blockdev can't be used for nbd" if $volid =~
> /^nbd:/;
>  
> -- 
> 2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
  2025-04-02  8:10   ` Fabian Grünbichler
  2025-04-03  4:51     ` DERUMIER, Alexandre via pve-devel
@ 2025-04-04 11:31     ` DERUMIER, Alexandre via pve-devel
       [not found]     ` <3e516016a970e52e5a1014dbcd6cf9507581da74.camel@groupe-cyllene.com>
  2 siblings, 0 replies; 34+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-04 11:31 UTC (permalink / raw)
  To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 36754 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
Date: Fri, 4 Apr 2025 11:31:57 +0000
Message-ID: <3e516016a970e52e5a1014dbcd6cf9507581da74.camel@groupe-cyllene.com>

Hi Fabian,

>>the first one is the renaming of a blockdev while it is used, which
>>is currently done like this:
>>-- "link" snapshot path to make it available under old and new name
>>-- handle blockdev additions/reopening/backing-file updates/deletions
>>on the qemu layer
>>-- remove old snapshot path link
>>-- if LVM, rename actual volume (for non-LVM, linking followed by
>>unlinking the source is effectively a rename already)

>>I wonder whether that couldn't be made more straight-forward by doing
>>-- rename snapshot volume/image (qemu must already have the old name
>>open anyway and should be able to continue using it)
>>-- do blockdev additions/reopening/backing-file updates/deletions on
>>the qemu layer

>>or is there an issue/check in qemu somewhere that prevents this
>>approach? if not, we could just introduce a "volume_snapshot_rename"
>>or extend rename_volume with a snapshot parameter..

I have done tests this last 2 days, and it's working fine indeed. (I
have done test with fio running during the snapshot rename/reopen, no
problem).

so I'm using Storage::rename_volume now with snapshot param


>>the second thing that happens is deleting a snapshot volume/path,
>>without deleting the whole snapshot.. that one we could easily
>>support by extending volume_snapshot_delete by extending the $running
>>parameter (e.g., passing "2") or adding a new one to signify that all
>>the housekeeping was already done, and just the actual snapshot
>>volume should be deleted. this shouldn't be an issue provided all
>>such calls are guarded by first checking that we are using external
>>snapshots..

I have reused vdisk_free for this one, as I'm seeing a comment about
$running deprecation in Storage.pm

# FIXME PVE 8.x remove $running parameter (needs APIAGE reset)
sub volume_snapshot_delete {
    my ($cfg, $volid, $snap, $running) = @_;


vdisk_free have also a cluster_lock_storage, so for lvm , I think it's
better.

(I have introduce a $snap param to vdisk_free, to only delete the
specific snapshot, and not the whole chain)




> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 11.03.2025 11:29 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-
> cyllene.com>
> ---
>  PVE/QemuConfig.pm       |   4 +-
>  PVE/QemuServer.pm       | 226 +++++++++++++++++++++++++++++++++++++-
> --
>  PVE/QemuServer/Drive.pm |   4 +
>  3 files changed, 220 insertions(+), 14 deletions(-)
> 
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index b60cc398..2b3acb15 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -377,7 +377,7 @@ sub __snapshot_create_vol_snapshot {
>  
>      print "snapshotting '$device' ($drive->{file})\n";
>  
> -    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg,
> $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg,
> $drive, $snapname);
>  }
>  
>  sub __snapshot_delete_remove_drive {
> @@ -414,7 +414,7 @@ sub __snapshot_delete_vol_snapshot {
>      my $storecfg = PVE::Storage::config();
>      my $volid = $drive->{file};
>  
> -    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg,
> $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg,
> $drive, $snapname);
>  
>      push @$unused, $volid;
>  }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 60481acc..6ce3e9c6 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4449,20 +4449,200 @@ sub qemu_block_resize {
>  }
>  
>  sub qemu_volume_snapshot {
> -    my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
> -
> -    if ($running && do_snapshots_with_qemu($storecfg, $volid,
> $deviceid)) {
> - mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device =>
> $deviceid, name => $snap);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg,
> $volid, $deviceid) if $running;

forbidden syntax

> +    if ($do_snapshots_with_qemu) {
> + if($do_snapshots_with_qemu == 2) {
> +     my $snapshots = PVE::Storage::volume_snapshot_info($storecfg,
> $volid);
> +     my $parent_snap = $snapshots->{'current'}->{parent};
> +     my $size = PVE::Storage::volume_size_info($storecfg, $volid,
> 5);
> +     blockdev_rename($storecfg, $vmid, $deviceid, $drive, 'current',
> $snap, $parent_snap);
> +     blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive,
> $snap, $size);
> + } else {
> +     mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device =>
> $deviceid, name => $snap);
> + }
>      } else {
>   PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>      }
>  }
>  
> +sub blockdev_external_snapshot {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $size) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    #be sure to add drive in write mode
> +    delete($drive->{ro});

why?

> +
> +    my $new_file_blockdev = generate_file_blockdev($storecfg,
> $drive);
> +    my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $new_file_blockdev);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $snap_file_blockdev, $snap);
> +
> +    #preallocate add a new current file with reference to backing-
> file
> +    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
> +    my $name = (PVE::Storage::parse_volname($storecfg, $volid))[1];
> +    PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'qcow2',
> $name, $size/1024, $snap_file_blockdev->{filename});

if we instead extend volume_snapshot similarly to what I describe up
top (adding a parameter that renaming was already done), we don't need
to extend vdisk_alloc's interface like this.. or maybe we could even
combine blockdev_rename and blockdev_external_snapshot, to just call
PVE::Storage::volume_snapshot to do rename+alloc, and then do the
blockdev dance? in any case, this here would be the *only* external
caller of vdisk_alloc with a backing file, so I don't think this is the
right interface..

> +
> +    #backing need to be forced to undef in blockdev, to avoid reopen
> of backing-file on blockdev-add
> +    $new_fmt_blockdev->{backing} = undef;
> +
> +    PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$new_fmt_blockdev);
> +
> +    mon_cmd($vmid, 'blockdev-snapshot', node => $snap_fmt_blockdev-
> >{'node-name'}, overlay => $new_fmt_blockdev->{'node-name'});
> +}
> +
> +sub blockdev_delete {
> +    my ($storecfg, $vmid, $drive, $file_blockdev, $fmt_blockdev) =
> @_;
> +
> +    #add eval as reopen is auto removing the old nodename
> automatically only if it was created at vm start in command line
> argument
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' =>
> $file_blockdev->{'node-name'}) };
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' =>
> $fmt_blockdev->{'node-name'}) };
> +
> +    #delete the file (don't use vdisk_free as we don't want to
> delete all snapshot chain)
> +    print"delete old $file_blockdev->{filename}\n";
> +
> +    my $storage_name = PVE::Storage::parse_volume_id($drive-
> >{file});
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    if ($scfg->{type} eq 'lvm') {
> + PVE::Storage::LVMPlugin::lvremove($file_blockdev->{filename});
> +    } else {
> + unlink($file_blockdev->{filename});
> +    }

this really needs to be handled in the storage layer

> +}
> +
> +sub blockdev_rename {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap,
> $target_snap, $parent_snap) = @_;
> +
> +    print "rename $src_snap to $target_snap\n";
> +
> +    my $volid = $drive->{file};
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $src_file_blockdev, $src_snap);
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    #create a hardlink
> +    link($src_file_blockdev->{filename}, $target_file_blockdev-
> >{filename});

this really needs to be handled in the storage layer

> +
> +    if($target_snap eq 'current' || $src_snap eq 'current') {
> + #rename from|to current
> +
> + #add backing to target
> + if ($parent_snap) {
> +     my $parent_fmt_nodename = encode_nodename('fmt', $volid,
> $parent_snap);
> +     $target_fmt_blockdev->{backing} = $parent_fmt_nodename;
> + }
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$target_fmt_blockdev);
> +
> + #reopen the current throttlefilter nodename with the target fmt
> nodename
> + my $drive_blockdev = generate_drive_blockdev($storecfg, $vmid,
> $drive);
> + delete $drive_blockdev->{file};
> + $drive_blockdev->{file} = $target_fmt_blockdev->{'node-name'};

these two lines can be a single line

> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options
> => [$drive_blockdev]);
> +    } else {
> + #intermediate snapshot
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$target_fmt_blockdev);
> +
> + #reopen the parent node with the new target fmt backing node
> + my $parent_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $parent_snap);
> + my $parent_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $parent_file_blockdev, $parent_snap);
> + $parent_fmt_blockdev->{backing} = $target_fmt_blockdev->{'node-
> name'};
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options
> => [$parent_fmt_blockdev]);
> +
> + #change backing-file in qcow2 metadatas
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file',
> device => $deviceid, 'image-node-name' => $parent_fmt_blockdev-
> >{'node-name'}, 'backing-file' => $target_file_blockdev->{filename});
> +    }
> +
> +    # delete old file|fmt nodes
> +    # add eval as reopen is auto removing the old nodename
> automatically only if it was created at vm start in command line
> argument

ugh..

> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del',
> 'node-name' => $src_file_blockdev->{'node-name'})};
> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del',
> 'node-name' => $src_fmt_blockdev->{'node-name'})};
> +
> +    unlink($src_file_blockdev->{filename});

same as above

> +
> +    #rename underlay
> +    my $storage_name = PVE::Storage::parse_volume_id($volid);
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    return if $scfg->{type} ne 'lvm';
> +
> +    print "rename underlay lvm volume $src_file_blockdev->{filename}
> to $target_file_blockdev->{filename}\n";
> +    PVE::Storage::LVMPlugin::lvrename(undef, $src_file_blockdev-
> >{filename}, $target_file_blockdev->{filename});

absolute no-go, this needs to be handled in the storage layer

> +}
> +
> +sub blockdev_commit {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap,
> $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    print "block-commit $src_snap to base:$target_snap\n";
> +    $src_snap = undef if $src_snap && $src_snap eq 'current';
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $src_file_blockdev, $src_snap);
> +
> +    my $job_id = "commit-$deviceid";
> +    my $jobs = {};
> +    my $opts = { 'job-id' => $job_id, device => $deviceid };
> +
> +    my $complete = undef;
> +    if ($src_snap) {
> + $complete = 'auto';
> + $opts->{'top-node'} = $src_fmt_blockdev->{'node-name'};
> + $opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> +    } else {
> + $complete = 'complete';
> + $opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> + $opts->{replaces} = $src_fmt_blockdev->{'node-name'};
> +    }
> +
> +    mon_cmd($vmid, "block-commit", %$opts);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor ($vmid, undef, $jobs, $complete, 0,
> 'commit');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $src_file_blockdev,
> $src_fmt_blockdev);
> +}
> +
> +sub blockdev_stream {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $parent_snap,
> $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +    $target_snap = undef if $target_snap eq 'current';
> +
> +    my $parent_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $parent_snap);
> +    my $parent_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $parent_file_blockdev, $parent_snap);
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $snap_file_blockdev, $snap);
> +
> +    my $job_id = "stream-$deviceid";
> +    my $jobs = {};
> +    my $options = { 'job-id' => $job_id, device =>
> $target_fmt_blockdev->{'node-name'} };
> +    $options->{'base-node'} = $parent_fmt_blockdev->{'node-name'};
> +    $options->{'backing-file'} = $parent_file_blockdev->{filename};
> +
> +    mon_cmd($vmid, 'block-stream', %$options);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0,
> 'stream');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $snap_file_blockdev,
> $snap_fmt_blockdev);
> +}
> +
>  sub qemu_volume_snapshot_delete {
> -    my ($vmid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
>      my $attached_deviceid;
>  
> @@ -4474,13 +4654,35 @@ sub qemu_volume_snapshot_delete {
>   });
>      }
>  
> -    if ($attached_deviceid && do_snapshots_with_qemu($storecfg,
> $volid, $attached_deviceid)) {
> - mon_cmd(
> -     $vmid,
> -     'blockdev-snapshot-delete-internal-sync',
> -     device => $attached_deviceid,
> -     name => $snap,
> - );
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg,
> $volid, $attached_deviceid) if $running;
> +    if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> + if ($do_snapshots_with_qemu == 2) {
> +
> +     my $path = PVE::Storage::path($storecfg, $volid);
> +     my $snapshots = PVE::Storage::volume_snapshot_info($storecfg,
> $volid);
> +     my $parentsnap = $snapshots->{$snap}->{parent};
> +     my $childsnap = $snapshots->{$snap}->{child};
> +
> +     # if we delete the first snasphot, we commit because the first
> snapshot original base image, it should be big.
> +            # improve-me: if firstsnap > child : commit, if
> firstsnap < child do a stream.
> +     if(!$parentsnap) {
> + print"delete first snapshot $snap\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive,
> $childsnap, $snap);
> + blockdev_rename($storecfg, $vmid, $attached_deviceid, $drive,
> $snap, $childsnap, $snapshots->{$childsnap}->{child});
> +     } else {
> + #intermediate snapshot, we always stream the snapshot to child
> snapshot
> + print"stream intermediate snapshot $snap to $childsnap\n";
> + blockdev_stream($storecfg, $vmid, $attached_deviceid, $drive,
> $snap, $parentsnap, $childsnap);
> +     }
> + } else {
> +     mon_cmd(
> +         $vmid,
> + 'blockdev-snapshot-delete-internal-sync',
> + device => $attached_deviceid,
> + name => $snap,
> +     );
> + }
>      } else {
>   PVE::Storage::volume_snapshot_delete(
>       $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
> index 51513546..7ba401bd 100644
> --- a/PVE/QemuServer/Drive.pm
> +++ b/PVE/QemuServer/Drive.pm
> @@ -1117,6 +1117,8 @@ sub print_drive_throttle_group {
>  sub generate_file_blockdev {
>      my ($storecfg, $drive, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      my $blockdev = {};
>  
> @@ -1260,6 +1262,8 @@ sub do_snapshots_with_qemu {
>  sub generate_format_blockdev {
>      my ($storecfg, $drive, $file, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      die "format_blockdev can't be used for nbd" if $volid =~
> /^nbd:/;
>  
> -- 
> 2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
       [not found]     ` <3e516016a970e52e5a1014dbcd6cf9507581da74.camel@groupe-cyllene.com>
@ 2025-04-04 11:37       ` Fabian Grünbichler
  2025-04-04 13:02         ` DERUMIER, Alexandre via pve-devel
  0 siblings, 1 reply; 34+ messages in thread
From: Fabian Grünbichler @ 2025-04-04 11:37 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel


> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 04.04.2025 13:31 CEST geschrieben:
> Hi Fabian,
> 
> >>the first one is the renaming of a blockdev while it is used, which
> >>is currently done like this:
> >>-- "link" snapshot path to make it available under old and new name
> >>-- handle blockdev additions/reopening/backing-file updates/deletions
> >>on the qemu layer
> >>-- remove old snapshot path link
> >>-- if LVM, rename actual volume (for non-LVM, linking followed by
> >>unlinking the source is effectively a rename already)
> 
> >>I wonder whether that couldn't be made more straight-forward by doing
> >>-- rename snapshot volume/image (qemu must already have the old name
> >>open anyway and should be able to continue using it)
> >>-- do blockdev additions/reopening/backing-file updates/deletions on
> >>the qemu layer
> 
> >>or is there an issue/check in qemu somewhere that prevents this
> >>approach? if not, we could just introduce a "volume_snapshot_rename"
> >>or extend rename_volume with a snapshot parameter..
> 
> I have done tests this last 2 days, and it's working fine indeed. (I
> have done test with fio running during the snapshot rename/reopen, no
> problem).
> 
> so I'm using Storage::rename_volume now with snapshot param
> 
> 
> >>the second thing that happens is deleting a snapshot volume/path,
> >>without deleting the whole snapshot.. that one we could easily
> >>support by extending volume_snapshot_delete by extending the $running
> >>parameter (e.g., passing "2") or adding a new one to signify that all
> >>the housekeeping was already done, and just the actual snapshot
> >>volume should be deleted. this shouldn't be an issue provided all
> >>such calls are guarded by first checking that we are using external
> >>snapshots..
> 
> I have reused vdisk_free for this one, as I'm seeing a comment about
> $running deprecation in Storage.pm
> 
> # FIXME PVE 8.x remove $running parameter (needs APIAGE reset)
> sub volume_snapshot_delete {
>     my ($cfg, $volid, $snap, $running) = @_;
> 
> 
> vdisk_free have also a cluster_lock_storage, so for lvm , I think it's
> better.
> 
> (I have introduce a $snap param to vdisk_free, to only delete the
> specific snapshot, and not the whole chain)

vdisk_free is definitely wrong - you are not deleting a vdisk, just a
snapshot.. I think this might be an argument for keeping $running ;)

you can call the lock inside volume_snapshot_delete, right?


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
  2025-04-04 11:37       ` Fabian Grünbichler
@ 2025-04-04 13:02         ` DERUMIER, Alexandre via pve-devel
  0 siblings, 0 replies; 34+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-04 13:02 UTC (permalink / raw)
  To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 13177 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support
Date: Fri, 4 Apr 2025 13:02:08 +0000
Message-ID: <2f1d349d64c7acb81eafd2c465e5d873d8d28b6d.camel@groupe-cyllene.com>

> I have reused vdisk_free for this one, as I'm seeing a comment about
> $running deprecation in Storage.pm
> 
> # FIXME PVE 8.x remove $running parameter (needs APIAGE reset)
> sub volume_snapshot_delete {
>     my ($cfg, $volid, $snap, $running) = @_;
> 
> 
> vdisk_free have also a cluster_lock_storage, so for lvm , I think
> it's
> better.
> 
> (I have introduce a $snap param to vdisk_free, to only delete the
> specific snapshot, and not the whole chain)

>>vdisk_free is definitely wrong - you are not deleting a vdisk, just a
>>snapshot.. I think this might be an argument for keeping $running ;)
>>
ok, I wasn't sure about it

>>you can call the lock inside volume_snapshot_delete, right?

yes sure.


I'll work on it next week, with all others changes you have requested.





[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots
  2025-04-01 13:50   ` Fabian Grünbichler
@ 2025-04-07 11:02     ` DERUMIER, Alexandre via pve-devel
  2025-04-07 11:29     ` DERUMIER, Alexandre via pve-devel
  1 sibling, 0 replies; 34+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-07 11:02 UTC (permalink / raw)
  To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 14491 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots
Date: Mon, 7 Apr 2025 11:02:45 +0000
Message-ID: <b020cad7b963228e8cad678e3f0a783742e46437.camel@groupe-cyllene.com>

Hi Fabian,

> =
> +     $plugin->parse_volname($volname);
> + $plugin->free_image($snap_storeid, $scfg, $snap_volname,
> $snap_isBase, $snap_format);
> +     }
> +     $plugin->free_image($storeid, $scfg, $volname, $isBase,
> $format);

>>this is the wrong place to do this, you need to handle this in the
>>cleanup worker returned by the plugin and still execute it here..
>>also you need to honor saferemove when cleaning up the snapshots

ok. 
currently, I'm deleting the snapshots in reverse before the main image,
as I'm reading snasphot chain from the image. But maybe it could be
better to scan the files/image directly based on the snap names ? (if
the chain is broken for example). as anyway we want to delete all
snapshot here.



> + };
>  
> +my $rpcenv_module;
> +$rpcenv_module = Test::MockModule->new('PVE::RPCEnvironment');
> +$rpcenv_module->mock(
> +    get_user => sub {
> +        return 'root@pam';
> +    },
> +    fork_worker => sub {
> + my ($self, $dtype, $id, $user, $function, $background) = @_;
> + $function->(123456);
> + return '123456';
> +    }
> +);
> +
> +my $rpcenv = PVE::RPCEnvironment->init('pub');
> +

>>what? why? no explanation?

About this, the current vdisk_free code is

"
$plugin->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
   ....
        $cleanup_worker = $plugin->free_image($storeid, $scfg,
$volname, $isBase, $format);
}

return if !$cleanup_worker;

my $rpcenv = PVE::RPCEnvironment::get();
my $authuser = $rpcenv->get_user();

$rpcenv->fork_worker('imgdel', undef, $authuser, $cleanup_worker);
"


but, if I look in different plugins, free_image always return undef.
(so with my change, the code was going to the rpcenv..., and the zfs
test was failing)

So, maybe I miss something, but does it work ?




[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots
  2025-04-01 13:50   ` Fabian Grünbichler
  2025-04-07 11:02     ` DERUMIER, Alexandre via pve-devel
@ 2025-04-07 11:29     ` DERUMIER, Alexandre via pve-devel
  1 sibling, 0 replies; 34+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-04-07 11:29 UTC (permalink / raw)
  To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 13628 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots
Date: Mon, 7 Apr 2025 11:29:34 +0000
Message-ID: <470cebc0fab87d8968a2d4462051bdebbf4b6110.camel@groupe-cyllene.com>

> + next if $snapid eq 'current';
> + next if !$snap->{volid};
> + next if !$snap->{ext};
> + my ($snap_storeid, $snap_volname) = parse_volume_id($snap-
> >{volid});
> + my (undef, undef, undef, undef, undef, $snap_isBase, $snap_format)
> =
> +     $plugin->parse_volname($volname);
> + $plugin->free_image($snap_storeid, $scfg, $snap_volname,
> $snap_isBase, $snap_format);
> +     }
> +     $plugin->free_image($storeid, $scfg, $volname, $isBase,
> $format);

>>this is the wrong place to do this, you need to handle this in the
>>cleanup worker returned by the plugin and still execute it here..
>>also you need to honor saferemove when cleaning up the snapshots

thinking about this, safemove is actually done as I'm using free_image
to delete snapshots



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-04-07 11:30 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-04-02  8:01     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <0e2cd118f35aa8d4c410d362fea1a1b366df1570.camel@groupe-cyllene.com>
2025-04-02  8:28       ` Fabian Grünbichler
2025-04-03  4:27         ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 02/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 03/11] replace qemu_block_set_io_throttle with qom-set throttlegroup limits Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-04-07 11:02     ` DERUMIER, Alexandre via pve-devel
2025-04-07 11:29     ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-04-02  8:10   ` Fabian Grünbichler
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 09/11] blockdev: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-04-02  8:10   ` Fabian Grünbichler
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-04-02  8:10   ` Fabian Grünbichler
2025-04-03  4:51     ` DERUMIER, Alexandre via pve-devel
2025-04-04 11:31     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <3e516016a970e52e5a1014dbcd6cf9507581da74.camel@groupe-cyllene.com>
2025-04-04 11:37       ` Fabian Grünbichler
2025-04-04 13:02         ` DERUMIER, Alexandre via pve-devel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal