* [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 13:27 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
` (13 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 10093 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
Date: Mon, 16 Dec 2024 10:12:15 +0100
Message-ID: <20241216091229.3142660-2-alexandre.derumier@groupe-cyllene.com>
This is needed for external snapshot live commit,
when the top blocknode is not the fmt-node.
(in our case, the throttle-group node is the topnode)
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
...052-block-commit-add-replaces-option.patch | 137 ++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 138 insertions(+)
create mode 100644 debian/patches/pve/0052-block-commit-add-replaces-option.patch
diff --git a/debian/patches/pve/0052-block-commit-add-replaces-option.patch b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
new file mode 100644
index 0000000..2488b5b
--- /dev/null
+++ b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
@@ -0,0 +1,137 @@
+From ae39fd3bb72db440cf380978af9bf5693c12ac6c Mon Sep 17 00:00:00 2001
+From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
+Date: Wed, 11 Dec 2024 16:20:25 +0100
+Subject: [PATCH] block-commit: add replaces option
+
+This use same code than drive-mirror for live commit, but the option
+is not send currently.
+
+Allow to replaces a different node than the root node after the block-commit
+(as we use throttle-group as root, and not the drive)
+---
+ block/mirror.c | 4 ++--
+ block/replication.c | 2 +-
+ blockdev.c | 4 ++--
+ include/block/block_int-global-state.h | 4 +++-
+ qapi/block-core.json | 5 ++++-
+ qemu-img.c | 2 +-
+ 6 files changed, 13 insertions(+), 8 deletions(-)
+
+diff --git a/block/mirror.c b/block/mirror.c
+index 2f12238..1a5e528 100644
+--- a/block/mirror.c
++++ b/block/mirror.c
+@@ -2086,7 +2086,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+ int64_t speed, BlockdevOnError on_error,
+ const char *filter_node_name,
+ BlockCompletionFunc *cb, void *opaque,
+- bool auto_complete, Error **errp)
++ bool auto_complete, const char *replaces, Error **errp)
+ {
+ bool base_read_only;
+ BlockJob *job;
+@@ -2102,7 +2102,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+ }
+
+ job = mirror_start_job(
+- job_id, bs, creation_flags, base, NULL, speed, 0, 0,
++ job_id, bs, creation_flags, base, replaces, speed, 0, 0,
+ MIRROR_LEAVE_BACKING_CHAIN, false,
+ on_error, on_error, true, cb, opaque,
+ &commit_active_job_driver, MIRROR_SYNC_MODE_FULL,
+diff --git a/block/replication.c b/block/replication.c
+index 0415a5e..debbe25 100644
+--- a/block/replication.c
++++ b/block/replication.c
+@@ -711,7 +711,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
+ s->commit_job = commit_active_start(
+ NULL, bs->file->bs, s->secondary_disk->bs,
+ JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
+- NULL, replication_done, bs, true, errp);
++ NULL, replication_done, bs, true, NULL, errp);
+ bdrv_graph_rdunlock_main_loop();
+ break;
+ default:
+diff --git a/blockdev.c b/blockdev.c
+index cbe2243..349fb71 100644
+--- a/blockdev.c
++++ b/blockdev.c
+@@ -2435,7 +2435,7 @@ void qmp_block_commit(const char *job_id, const char *device,
+ const char *filter_node_name,
+ bool has_auto_finalize, bool auto_finalize,
+ bool has_auto_dismiss, bool auto_dismiss,
+- Error **errp)
++ const char *replaces, Error **errp)
+ {
+ BlockDriverState *bs;
+ BlockDriverState *iter;
+@@ -2596,7 +2596,7 @@ void qmp_block_commit(const char *job_id, const char *device,
+ job_id = bdrv_get_device_name(bs);
+ }
+ commit_active_start(job_id, top_bs, base_bs, job_flags, speed, on_error,
+- filter_node_name, NULL, NULL, false, &local_err);
++ filter_node_name, NULL, NULL, false, replaces, &local_err);
+ } else {
+ BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);
+ if (bdrv_op_is_blocked(overlay_bs, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
+diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
+index f0c642b..194b580 100644
+--- a/include/block/block_int-global-state.h
++++ b/include/block/block_int-global-state.h
+@@ -115,6 +115,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @auto_complete: Auto complete the job.
++ * @replaces: Block graph node name to replace once the commit is done.
+ * @errp: Error object.
+ *
+ */
+@@ -123,7 +124,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+ int64_t speed, BlockdevOnError on_error,
+ const char *filter_node_name,
+ BlockCompletionFunc *cb, void *opaque,
+- bool auto_complete, Error **errp);
++ bool auto_complete, const char *replaces,
++ Error **errp);
+ /*
+ * mirror_start:
+ * @job_id: The id of the newly-created job, or %NULL to use the
+diff --git a/qapi/block-core.json b/qapi/block-core.json
+index ff441d4..50564c7 100644
+--- a/qapi/block-core.json
++++ b/qapi/block-core.json
+@@ -2098,6 +2098,8 @@
+ # disappear from the query list without user intervention.
+ # Defaults to true. (Since 3.1)
+ #
++# @replaces: graph node name to be replaced base image node.
++#
+ # Features:
+ #
+ # @deprecated: Members @base and @top are deprecated. Use @base-node
+@@ -2125,7 +2127,8 @@
+ '*speed': 'int',
+ '*on-error': 'BlockdevOnError',
+ '*filter-node-name': 'str',
+- '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
++ '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
++ '*replaces': 'str' },
+ 'allow-preconfig': true }
+
+ ##
+diff --git a/qemu-img.c b/qemu-img.c
+index a6c88e0..f6c59bc 100644
+--- a/qemu-img.c
++++ b/qemu-img.c
+@@ -1079,7 +1079,7 @@ static int img_commit(int argc, char **argv)
+
+ commit_active_start("commit", bs, base_bs, JOB_DEFAULT, rate_limit,
+ BLOCKDEV_ON_ERROR_REPORT, NULL, common_block_job_cb,
+- &cbi, false, &local_err);
++ &cbi, false, NULL, &local_err);
+ if (local_err) {
+ goto done;
+ }
+--
+2.39.5
+
diff --git a/debian/patches/series b/debian/patches/series
index 93c97bf..e604a23 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -92,3 +92,4 @@ pve/0048-PVE-backup-fixup-error-handling-for-fleecing.patch
pve/0049-PVE-backup-factor-out-setting-up-snapshot-access-for.patch
pve/0050-PVE-backup-save-device-name-in-device-info-structure.patch
pve/0051-PVE-backup-include-device-name-in-error-when-setting.patch
+pve/0052-block-commit-add-replaces-option.patch
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:17 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
` (12 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 19086 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
Date: Mon, 16 Dec 2024 10:12:16 +0100
Message-ID: <20241216091229.3142660-3-alexandre.derumier@groupe-cyllene.com>
The blockdev chain is:
-throttle-group-node (drive-(ide|scsi|virtio)x)
- format-node (fmt-drive-x)
- file-node (file-drive -x)
fixme: implement iscsi:// path
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 351 +++++++++++++++++++++++++++++++---------------
1 file changed, 237 insertions(+), 114 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 8192599a..2832ed09 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -1464,7 +1464,8 @@ sub print_drivedevice_full {
} else {
$device .= ",bus=ahci$controller.$unit";
}
- $device .= ",drive=drive-$drive_id,id=$drive_id";
+ $device .= ",id=$drive_id";
+ $device .= ",drive=drive-$drive_id" if $device_type ne 'cd' || $drive->{file} ne 'none';
if ($device_type eq 'hd') {
if (my $model = $drive->{model}) {
@@ -1490,6 +1491,13 @@ sub print_drivedevice_full {
$device .= ",serial=$serial";
}
+ my $writecache = $drive->{cache} && $drive->{cache} =~ /^(?:none|writeback|unsafe)$/ ? "on" : "off";
+ $device .= ",write-cache=$writecache" if $drive->{media} && $drive->{media} ne 'cdrom';
+
+ my @qemu_drive_options = qw(heads secs cyls trans rerror werror);
+ foreach my $o (@qemu_drive_options) {
+ $device .= ",$o=$drive->{$o}" if defined($drive->{$o});
+ }
return $device;
}
@@ -1539,145 +1547,256 @@ my sub drive_uses_cache_direct {
return $cache_direct;
}
-sub print_drive_commandline_full {
- my ($storecfg, $vmid, $drive, $live_restore_name, $io_uring) = @_;
+sub print_drive_throttle_group {
+ my ($drive) = @_;
+ #command line can't use the structured json limits option,
+ #so limit params need to use with x- as it's unstable api
+ return if drive_is_cdrom($drive) && $drive->{file} eq 'none';
- my $path;
- my $volid = $drive->{file};
my $drive_id = get_drive_id($drive);
+ my $throttle_group = "throttle-group,id=throttle-drive-$drive_id";
+ foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
+ my ($dir, $qmpname) = @$type;
+
+ if (my $v = $drive->{"mbps$dir"}) {
+ $throttle_group .= ",x-bps$qmpname=".int($v*1024*1024);
+ }
+ if (my $v = $drive->{"mbps${dir}_max"}) {
+ $throttle_group .= ",x-bps$qmpname-max=".int($v*1024*1024);
+ }
+ if (my $v = $drive->{"bps${dir}_max_length"}) {
+ $throttle_group .= ",x-bps$qmpname-max-length=$v";
+ }
+ if (my $v = $drive->{"iops${dir}"}) {
+ $throttle_group .= ",x-iops$qmpname=$v";
+ }
+ if (my $v = $drive->{"iops${dir}_max"}) {
+ $throttle_group .= ",x-iops$qmpname-max=$v";
+ }
+ if (my $v = $drive->{"iops${dir}_max_length"}) {
+ $throttle_group .= ",x-iops$qmpname-max-length=$v";
+ }
+ }
+
+ return $throttle_group;
+}
+
+sub generate_file_blockdev {
+ my ($storecfg, $drive, $nodename) = @_;
+
+ my $volid = $drive->{file};
my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
- my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
- if (drive_is_cdrom($drive)) {
- $path = get_iso_path($storecfg, $vmid, $volid);
- die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
+ my $scfg = undef;
+ my $path = $volid;
+ if($storeid && $storeid ne 'nbd') {
+ $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+ $path = PVE::Storage::path($storecfg, $volid);
+ }
+
+ my $blockdev = {};
+
+ if ($path =~ m/^rbd:(\S+)$/) {
+
+ $blockdev->{driver} = 'rbd';
+
+ my @rbd_options = split(/:/, $1);
+ my $keyring = undef;
+ for my $option (@rbd_options) {
+ if ($option =~ m/^(\S+)=(\S+)$/) {
+ my $key = $1;
+ my $value = $2;
+ $blockdev->{'auth-client-required'} = [$value] if $key eq 'auth_supported';
+ $blockdev->{'conf'} = $value if $key eq 'conf';
+ $blockdev->{'user'} = $value if $key eq 'id';
+ $keyring = $value if $key eq 'keyring';
+ if ($key eq 'mon_host') {
+ my $server = [];
+ my @mons = split(';', $value);
+ for my $mon (@mons) {
+ my ($host, $port) = PVE::Tools::parse_host_and_port($mon);
+ $port = '3300' if !$port;
+ push @$server, { host => $host, port => $port };
+ }
+ $blockdev->{server} = $server;
+ }
+ } elsif ($option =~ m|^(\S+)/(\S+)$|){
+ $blockdev->{pool} = $1;
+ my $image = $2;
+
+ if($image =~ m|^(\S+)/(\S+)$|) {
+ $blockdev->{namespace} = $1;
+ $blockdev->{image} = $2;
+ } else {
+ $blockdev->{image} = $image;
+ }
+ }
+ }
+
+ if($keyring && $blockdev->{server}) {
+ #qemu devs are removed passing arbitrary values to blockdev object, and don't have added
+ #keyring to the list of allowed keys. It need to be defined in the store ceph.conf.
+ #https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02676.html
+ #another way could be to simply patch qemu to allow the key
+ my $ceph_conf = "/etc/pve/priv/ceph/${storeid}.conf";
+ $blockdev->{conf} = $ceph_conf;
+ if (!-e $ceph_conf) {
+ my $content = "[global]\nkeyring = $keyring\n";
+ PVE::Tools::file_set_contents($ceph_conf, $content, 0400);
+ }
+ }
+ } elsif ($path =~ m/^nbd:(\S+):(\d+):exportname=(\S+)$/) {
+ my $server = { type => 'inet', host => $1, port => $2 };
+ $blockdev = { driver => 'nbd', server => $server, export => $3 };
+ } elsif ($path =~ m/^nbd:unix:(\S+):exportname=(\S+)$/) {
+ my $server = { type => 'unix', path => $1 };
+ $blockdev = { driver => 'nbd', server => $server, export => $2 };
+ } elsif ($path =~ m|^gluster(\+(tcp\|unix\|rdma))?://(.*)/(.*)/(images/(\S+)/(\S+))$|) {
+ my $protocol = $2 ? $2 : 'inet';
+ $protocol = 'inet' if $protocol eq 'tcp';
+ my $server = [{ type => $protocol, host => $3, port => '24007' }];
+ $blockdev = { driver => 'gluster', server => $server, volume => $4, path => $5 };
+ } elsif ($path =~ m/^\/dev/) {
+ my $driver = drive_is_cdrom($drive) ? 'host_cdrom' : 'host_device';
+ $blockdev = { driver => $driver, filename => $path };
+ } elsif ($path =~ m/^\//) {
+ $blockdev = { driver => 'file', filename => $path};
} else {
- if ($storeid) {
- $path = PVE::Storage::path($storecfg, $volid);
- } else {
- $path = $volid;
+ die "unsupported path: $path\n";
+ #fixme
+ #'{"driver":"iscsi","portal":"iscsi.example.com:3260","target":"demo-target","lun":3,"transport":"tcp"}'
+ }
+
+ my $cache_direct = drive_uses_cache_direct($drive, $scfg);
+ my $cache = {};
+ $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
+ $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
+ $blockdev->{cache} = $cache;
+
+ ##aio
+ if($blockdev->{filename}) {
+ $drive->{aio} = 'threads' if drive_is_cdrom($drive);
+ my $aio = $drive->{aio};
+ if (!$aio) {
+ if (storage_allows_io_uring_default($scfg, $cache_direct)) {
+ # io_uring supports all cache modes
+ $aio = "io_uring";
+ } else {
+ # aio native works only with O_DIRECT
+ if($cache_direct) {
+ $aio = "native";
+ } else {
+ $aio = "threads";
+ }
+ }
}
+ $blockdev->{aio} = $aio;
}
- # For PVE-managed volumes, use the format from the storage layer and prevent overrides via the
- # drive's 'format' option. For unmanaged volumes, fallback to 'raw' to avoid auto-detection by
- # QEMU. For the special case 'none' (get_iso_path() returns an empty $path), there should be no
- # format or QEMU won't start.
- my $format;
- if (drive_is_cdrom($drive) && !$path) {
- # no format
- } elsif ($storeid) {
- $format = checked_volume_format($storecfg, $volid);
+ ##discard && detect-zeroes
+ my $discard = 'ignore';
+ if($drive->{discard}) {
+ $discard = $drive->{discard};
+ $discard = 'unmap' if $discard eq 'on';
+ }
+ $blockdev->{discard} = $discard if !drive_is_cdrom($drive);
- if ($drive->{format} && $drive->{format} ne $format) {
- die "drive '$drive->{interface}$drive->{index}' - volume '$volid'"
- ." - 'format=$drive->{format}' option different from storage format '$format'\n";
- }
+ my $detectzeroes;
+ if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
+ $detectzeroes = 'off';
+ } elsif ($drive->{discard}) {
+ $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
} else {
- $format = $drive->{format} // 'raw';
+ # This used to be our default with discard not being specified:
+ $detectzeroes = 'on';
}
+ $blockdev->{'detect-zeroes'} = $detectzeroes if !drive_is_cdrom($drive);
+ $blockdev->{'node-name'} = $nodename if $nodename;
- my $is_rbd = $path =~ m/^rbd:/;
+ return $blockdev;
+}
- my $opts = '';
- my @qemu_drive_options = qw(heads secs cyls trans media cache rerror werror aio discard);
- foreach my $o (@qemu_drive_options) {
- $opts .= ",$o=$drive->{$o}" if defined($drive->{$o});
- }
+sub generate_format_blockdev {
+ my ($storecfg, $drive, $nodename, $file, $force_readonly) = @_;
- # snapshot only accepts on|off
- if (defined($drive->{snapshot})) {
- my $v = $drive->{snapshot} ? 'on' : 'off';
- $opts .= ",snapshot=$v";
- }
+ my $volid = $drive->{file};
+ my $scfg = undef;
+ my $path = $volid;
+ my $format = $drive->{format};
+ $format //= "raw";
- if (defined($drive->{ro})) { # ro maps to QEMUs `readonly`, which accepts `on` or `off` only
- $opts .= ",readonly=" . ($drive->{ro} ? 'on' : 'off');
- }
+ my $drive_id = get_drive_id($drive);
- foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
- my ($dir, $qmpname) = @$type;
- if (my $v = $drive->{"mbps$dir"}) {
- $opts .= ",throttling.bps$qmpname=".int($v*1024*1024);
- }
- if (my $v = $drive->{"mbps${dir}_max"}) {
- $opts .= ",throttling.bps$qmpname-max=".int($v*1024*1024);
- }
- if (my $v = $drive->{"bps${dir}_max_length"}) {
- $opts .= ",throttling.bps$qmpname-max-length=$v";
- }
- if (my $v = $drive->{"iops${dir}"}) {
- $opts .= ",throttling.iops$qmpname=$v";
- }
- if (my $v = $drive->{"iops${dir}_max"}) {
- $opts .= ",throttling.iops$qmpname-max=$v";
- }
- if (my $v = $drive->{"iops${dir}_max_length"}) {
- $opts .= ",throttling.iops$qmpname-max-length=$v";
- }
+ if ($drive->{zeroinit}) {
+ #fixme how to handle zeroinit ? insert special blockdev filter ?
}
- if ($live_restore_name) {
- $format = "rbd" if $is_rbd;
- die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
- if !$format;
- $opts .= ",format=alloc-track,file.driver=$format";
- } elsif ($format) {
- $opts .= ",format=$format";
+ my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
+
+ if($storeid) {
+ $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+ $format = checked_volume_format($storecfg, $volid);
+ $path = PVE::Storage::path($storecfg, $volid);
}
+ my $readonly = defined($drive->{ro}) || $force_readonly ? JSON::true : JSON::false;
+
+ #libvirt define cache option on both format && file
my $cache_direct = drive_uses_cache_direct($drive, $scfg);
+ my $cache = {};
+ $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
+ $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
- $opts .= ",cache=none" if !$drive->{cache} && $cache_direct;
+ my $blockdev = { driver => $format, file => $file, cache => $cache, 'read-only' => $readonly };
+ $blockdev->{'node-name'} = $nodename if $nodename;
- if (!$drive->{aio}) {
- if ($io_uring && storage_allows_io_uring_default($scfg, $cache_direct)) {
- # io_uring supports all cache modes
- $opts .= ",aio=io_uring";
- } else {
- # aio native works only with O_DIRECT
- if($cache_direct) {
- $opts .= ",aio=native";
- } else {
- $opts .= ",aio=threads";
- }
- }
- }
+ return $blockdev;
- if (!drive_is_cdrom($drive)) {
- my $detectzeroes;
- if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
- $detectzeroes = 'off';
- } elsif ($drive->{discard}) {
- $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
- } else {
- # This used to be our default with discard not being specified:
- $detectzeroes = 'on';
- }
+}
- # note: 'detect-zeroes' works per blockdev and we want it to persist
- # after the alloc-track is removed, so put it on 'file' directly
- my $dz_param = $live_restore_name ? "file.detect-zeroes" : "detect-zeroes";
- $opts .= ",$dz_param=$detectzeroes" if $detectzeroes;
- }
+sub generate_drive_blockdev {
+ my ($storecfg, $vmid, $drive, $force_readonly, $live_restore_name) = @_;
- if ($live_restore_name) {
- $opts .= ",backing=$live_restore_name";
- $opts .= ",auto-remove=on";
+ my $path;
+ my $volid = $drive->{file};
+ my $format = $drive->{format};
+ my $drive_id = get_drive_id($drive);
+
+ my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
+ my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
+
+ my $blockdevs = [];
+
+ if (drive_is_cdrom($drive)) {
+ die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
+
+ $path = get_iso_path($storecfg, $vmid, $volid);
+ return if !$path;
+ $force_readonly = 1;
}
- # my $file_param = $live_restore_name ? "file.file.filename" : "file";
- my $file_param = "file";
+ my $file_nodename = "file-drive-$drive_id";
+ my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
+ my $fmt_nodename = "fmt-drive-$drive_id";
+ my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
+
+ my $blockdev_live_restore = undef;
if ($live_restore_name) {
- # non-rbd drivers require the underlying file to be a separate block
- # node, so add a second .file indirection
- $file_param .= ".file" if !$is_rbd;
- $file_param .= ".filename";
+ die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
+ if !$format;
+
+ $blockdev_live_restore = { 'node-name' => "liverestore-drive-$drive_id",
+ backing => $live_restore_name,
+ 'auto-remove' => 'on', format => "alloc-track",
+ file => $blockdev_format };
}
- my $pathinfo = $path ? "$file_param=$path," : '';
- return "${pathinfo}if=none,id=drive-$drive->{interface}$drive->{index}$opts";
+ #this is the topfilter entry point, use $drive-drive_id as nodename
+ my $blockdev_throttle = { driver => "throttle", 'node-name' => "drive-$drive_id", 'throttle-group' => "throttle-drive-$drive_id" };
+ #put liverestore filter between throttle && format filter
+ $blockdev_throttle->{file} = $live_restore_name ? $blockdev_live_restore : $blockdev_format;
+ return $blockdev_throttle,
}
sub print_pbs_blockdev {
@@ -4091,13 +4210,13 @@ sub config_to_command {
push @$devices, '-blockdev', $live_restore->{blockdev};
}
- my $drive_cmd = print_drive_commandline_full(
- $storecfg, $vmid, $drive, $live_blockdev_name, min_version($kvmver, 6, 0));
-
- # extra protection for templates, but SATA and IDE don't support it..
- $drive_cmd .= ',readonly=on' if drive_is_read_only($conf, $drive);
+ my $throttle_group = print_drive_throttle_group($drive);
+ push @$devices, '-object', $throttle_group if $throttle_group;
- push @$devices, '-drive',$drive_cmd;
+# # extra protection for templates, but SATA and IDE don't support it..
+ my $force_readonly = drive_is_read_only($conf, $drive);
+ my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive, $force_readonly, $live_blockdev_name);
+ push @$devices, '-blockdev', encode_json_ordered($blockdev) if $blockdev;
push @$devices, '-device', print_drivedevice_full(
$storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
});
@@ -8986,4 +9105,8 @@ sub delete_ifaces_ipams_ips {
}
}
+sub encode_json_ordered {
+ return JSON->new->canonical->allow_nonref->encode( $_[0] );
+}
+
1;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 12:36 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
` (11 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 13036 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Mon, 16 Dec 2024 10:12:17 +0100
Message-ID: <20241216091229.3142660-4-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
src/PVE/Storage/DirPlugin.pm | 1 +
src/PVE/Storage/Plugin.pm | 207 +++++++++++++++++++++++++++++------
2 files changed, 176 insertions(+), 32 deletions(-)
diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
index fb23e0a..1cd7ac3 100644
--- a/src/PVE/Storage/DirPlugin.pm
+++ b/src/PVE/Storage/DirPlugin.pm
@@ -81,6 +81,7 @@ sub options {
is_mountpoint => { optional => 1 },
bwlimit => { optional => 1 },
preallocation => { optional => 1 },
+ snapext => { optional => 1 },
};
}
diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
index fececa1..aeba8d3 100644
--- a/src/PVE/Storage/Plugin.pm
+++ b/src/PVE/Storage/Plugin.pm
@@ -214,6 +214,11 @@ my $defaultData = {
maximum => 65535,
optional => 1,
},
+ 'snapext' => {
+ type => 'boolean',
+ description => 'enable external snapshot.',
+ optional => 1,
+ },
},
};
@@ -710,11 +715,15 @@ sub filesystem_path {
# Note: qcow2/qed has internal snapshot, so path is always
# the same (with or without snapshot => same file).
die "can't snapshot this image format\n"
- if defined($snapname) && $format !~ m/^(qcow2|qed)$/;
+ if defined($snapname) && !$scfg->{snapext} && $format !~ m/^(qcow2|qed)$/;
my $dir = $class->get_subdir($scfg, $vtype);
- $dir .= "/$vmid" if $vtype eq 'images';
+ if ($scfg->{snapext} && $snapname) {
+ $name = $class->get_snap_volname($volname, $snapname);
+ } else {
+ $dir .= "/$vmid" if $vtype eq 'images';
+ }
my $path = "$dir/$name";
@@ -953,6 +962,31 @@ sub free_image {
# TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
+sub qemu_img_info {
+ my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
+
+ my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
+ push $cmd->@*, '-f', $file_format if $file_format;
+ push $cmd->@*, '--backing-chain' if $follow_backing_files;
+
+ my $json = '';
+ my $err_output = '';
+ eval {
+ run_command($cmd,
+ timeout => $timeout,
+ outfunc => sub { $json .= shift },
+ errfunc => sub { $err_output .= shift . "\n"},
+ );
+ };
+ warn $@ if $@;
+ if ($err_output) {
+ # if qemu did not output anything to stdout we die with stderr as an error
+ die $err_output if !$json;
+ # otherwise we warn about it and try to parse the json
+ warn $err_output;
+ }
+ return $json;
+}
# set $untrusted if the file in question might be malicious since it isn't
# created by our stack
# this makes certain checks fatal, and adds extra checks for known problems like
@@ -1016,25 +1050,9 @@ sub file_size_info {
warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
$file_format = 'raw';
}
- my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
- push $cmd->@*, '-f', $file_format if $file_format;
- my $json = '';
- my $err_output = '';
- eval {
- run_command($cmd,
- timeout => $timeout,
- outfunc => sub { $json .= shift },
- errfunc => sub { $err_output .= shift . "\n"},
- );
- };
- warn $@ if $@;
- if ($err_output) {
- # if qemu did not output anything to stdout we die with stderr as an error
- die $err_output if !$json;
- # otherwise we warn about it and try to parse the json
- warn $err_output;
- }
+ my $json = qemu_img_info($filename, $file_format, $timeout);
+
if (!$json) {
die "failed to query file information with qemu-img\n" if $untrusted;
# skip decoding if there was no output, e.g. if there was a timeout.
@@ -1162,11 +1180,28 @@ sub volume_snapshot {
die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
- my $path = $class->filesystem_path($scfg, $volname);
+ if($scfg->{snapext}) {
- my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
+ my $path = $class->path($scfg, $volname, $storeid);
+ my $snappath = $class->path($scfg, $volname, $storeid, $snap);
+ my $format = ($class->parse_volname($volname))[6];
+ #rename current volume to snap volume
+ rename($path, $snappath) if -e $path && !-e $snappath;
+
+ my $cmd = ['/usr/bin/qemu-img', 'create', '-b', $snappath,
+ '-F', $format, '-f', 'qcow2', $path];
+
+ my $options = "extended_l2=on,cluster_size=128k,";
+ $options .= preallocation_cmd_option($scfg, 'qcow2');
+ push @$cmd, '-o', $options;
+ run_command($cmd);
- run_command($cmd);
+ } else {
+
+ my $path = $class->filesystem_path($scfg, $volname);
+ my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
+ run_command($cmd);
+ }
return undef;
}
@@ -1177,6 +1212,21 @@ sub volume_snapshot {
sub volume_rollback_is_possible {
my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
+ if ($scfg->{snapext}) {
+ #technically, we could manage multibranch, we it need lot more work for snapshot delete
+ #we need to implemente block-stream from deleted snapshot to all others child branchs
+ #when online, we need to do a transaction for multiple disk when delete the last snapshot
+ #and need to merge in current running file
+
+ my $snappath = $class->path($scfg, $volname, $storeid, $snap);
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $parentsnap = $snapshots->{current}->{parent};
+
+ return 1 if !-e $snappath || $snapshots->{$parentsnap}->{file} eq $snappath;
+
+ die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
+ }
+
return 1;
}
@@ -1187,9 +1237,15 @@ sub volume_snapshot_rollback {
my $path = $class->filesystem_path($scfg, $volname);
- my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
-
- run_command($cmd);
+ if ($scfg->{snapext}) {
+ #simply delete the current snapshot and recreate it
+ my $path = $class->filesystem_path($scfg, $volname);
+ unlink($path);
+ $class->volume_snapshot($scfg, $storeid, $volname, $snap);
+ } else {
+ my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
+ run_command($cmd);
+ }
return undef;
}
@@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
return 1 if $running;
+ my $cmd = "";
my $path = $class->filesystem_path($scfg, $volname);
- $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
+ if ($scfg->{snapext}) {
- my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $snappath = $snapshots->{$snap}->{file};
+ return if !-e $snappath; #already deleted ?
+
+ my $parentsnap = $snapshots->{$snap}->{parent};
+ my $childsnap = $snapshots->{$snap}->{child};
+
+ my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
+ my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
+
+
+ #if first snapshot, we merge child, and rename the snapshot to child
+ if(!$parentsnap) {
+ #we use commit here, as it's faster than rebase
+ #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
+ print"commit $childpath\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
+ run_command($cmd);
+ print"delete $childpath\n";
+
+ unlink($childpath);
+ print"rename $snappath to $childpath\n";
+ rename($snappath, $childpath);
+ } else {
+ print"commit $snappath\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
+ #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
+ die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;
+ print "link $childsnap to $parentsnap\n";
+ $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];
+ run_command($cmd);
+ #delete the snapshot
+ unlink($snappath);
+ }
+
+ } else {
+ $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
- run_command($cmd);
+ $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
+ run_command($cmd);
+ }
return undef;
}
@@ -1246,8 +1341,8 @@ sub volume_has_feature {
current => { qcow2 => 1, raw => 1, vmdk => 1 },
},
rename => {
- current => {qcow2 => 1, raw => 1, vmdk => 1},
- },
+ current => { qcow2 => 1, raw => 1, vmdk => 1},
+ }
};
if ($feature eq 'clone') {
@@ -1481,7 +1576,37 @@ sub status {
sub volume_snapshot_info {
my ($class, $scfg, $storeid, $volname) = @_;
- die "volume_snapshot_info is not implemented for $class";
+ my $path = $class->filesystem_path($scfg, $volname);
+
+ my $backing_chain = 1;
+ my $json = qemu_img_info($path, undef, 10, $backing_chain);
+ die "failed to query file information with qemu-img\n" if !$json;
+ my $snapshots = eval { decode_json($json) };
+
+ my $info = {};
+ my $order = 0;
+ for my $snap (@$snapshots) {
+
+ my $snapfile = $snap->{filename};
+ my $snapname = parse_snapname($snapfile);
+ $snapname = 'current' if !$snapname;
+ my $snapvolname = $class->get_snap_volname($volname, $snapname);
+
+ $info->{$snapname}->{order} = $order;
+ $info->{$snapname}->{file}= $snapfile;
+ $info->{$snapname}->{volname} = $snapvolname;
+ $info->{$snapname}->{volid} = "$storeid:$snapvolname";
+ $info->{$snapname}->{ext} = 1;
+
+ my $parentfile = $snap->{'backing-filename'};
+ if ($parentfile) {
+ my $parentname = parse_snapname($parentfile);
+ $info->{$snapname}->{parent} = $parentname;
+ $info->{$parentname}->{child} = $snapname;
+ }
+ $order++;
+ }
+ return $info;
}
sub activate_storage {
@@ -1867,4 +1992,22 @@ sub config_aware_base_mkdir {
}
}
+sub get_snap_volname {
+ my ($class, $volname, $snapname) = @_;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
+ $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";
+ return $name;
+}
+
+sub parse_snapname {
+ my ($name) = @_;
+
+ my $basename = basename($name);
+ if ($basename =~ m/^snap-(.*)-vm(.*)$/) {
+ return $1;
+ }
+ return undef;
+}
+
1;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (2 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
` (10 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 40905 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests
Date: Mon, 16 Dec 2024 10:12:18 +0100
Message-ID: <20241216091229.3142660-5-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
test/cfg2cmd/bootorder-empty.conf.cmd | 12 ++++++----
test/cfg2cmd/bootorder-legacy.conf.cmd | 12 ++++++----
test/cfg2cmd/bootorder.conf.cmd | 12 ++++++----
...putype-icelake-client-deprecation.conf.cmd | 6 ++---
test/cfg2cmd/ide.conf.cmd | 23 +++++++++++--------
test/cfg2cmd/pinned-version-pxe-pve.conf.cmd | 6 ++---
test/cfg2cmd/pinned-version-pxe.conf.cmd | 6 ++---
test/cfg2cmd/pinned-version.conf.cmd | 6 ++---
test/cfg2cmd/q35-ide.conf.cmd | 23 +++++++++++--------
.../q35-linux-hostpci-template.conf.cmd | 3 ++-
test/cfg2cmd/seabios_serial.conf.cmd | 6 ++---
...imple-balloon-free-page-reporting.conf.cmd | 6 ++---
test/cfg2cmd/simple-btrfs.conf.cmd | 6 ++---
test/cfg2cmd/simple-virtio-blk.conf.cmd | 6 ++---
test/cfg2cmd/simple1-template.conf.cmd | 11 +++++----
test/cfg2cmd/simple1.conf.cmd | 6 ++---
16 files changed, 84 insertions(+), 66 deletions(-)
diff --git a/test/cfg2cmd/bootorder-empty.conf.cmd b/test/cfg2cmd/bootorder-empty.conf.cmd
index 87fa6c28..7a7f96cf 100644
--- a/test/cfg2cmd/bootorder-empty.conf.cmd
+++ b/test/cfg2cmd/bootorder-empty.conf.cmd
@@ -25,14 +25,16 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2' \
-device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi4' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi4"},"node-name":"fmt-drive-scsi4","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}'
-device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio1"},"node-name":"fmt-drive-virtio1","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256' \
diff --git a/test/cfg2cmd/bootorder-legacy.conf.cmd b/test/cfg2cmd/bootorder-legacy.conf.cmd
index a4c3f050..b8ba1588 100644
--- a/test/cfg2cmd/bootorder-legacy.conf.cmd
+++ b/test/cfg2cmd/bootorder-legacy.conf.cmd
@@ -25,14 +25,16 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi4' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi4"},"node-name":"fmt-drive-scsi4","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}' \
-device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio1"},"node-name":"fmt-drive-virtio1","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1,bootindex=302' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=100' \
diff --git a/test/cfg2cmd/bootorder.conf.cmd b/test/cfg2cmd/bootorder.conf.cmd
index 76bd55d7..a119579b 100644
--- a/test/cfg2cmd/bootorder.conf.cmd
+++ b/test/cfg2cmd/bootorder.conf.cmd
@@ -25,14 +25,16 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=103' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=103' \
-device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi4' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi4"},"node-name":"fmt-drive-scsi4","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}' \
-device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4,bootindex=102' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio1"},"node-name":"fmt-drive-virtio1","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=101' \
diff --git a/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd b/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
index bf084432..6b9d587c 100644
--- a/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
+++ b/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
@@ -23,9 +23,9 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/base-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0'
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}'
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-machine 'type=pc+pve0'
diff --git a/test/cfg2cmd/ide.conf.cmd b/test/cfg2cmd/ide.conf.cmd
index 33c6aadc..f465d072 100644
--- a/test/cfg2cmd/ide.conf.cmd
+++ b/test/cfg2cmd/ide.conf.cmd
@@ -23,16 +23,21 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/zero.iso,if=none,id=drive-ide0,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/one.iso,if=none,id=drive-ide1,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.0,unit=1,drive=drive-ide1,id=ide1,bootindex=201' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/two.iso,if=none,id=drive-ide2,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=202' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/three.iso,if=none,id=drive-ide3,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.1,unit=1,drive=drive-ide3,id=ide3,bootindex=203' \
+ -object 'throttle-group,id=throttle-drive-ide0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/zero.iso","node-name":"file-drive-ide0"},"node-name":"fmt-drive-ide0","read-only":true},"node-name":"drive-ide0","throttle-group":"throttle-drive-ide0"}' \
+ -device 'ide-cd,bus=ide.0,unit=0,id=ide0,drive=drive-ide0,bootindex=200' \
+ -object 'throttle-group,id=throttle-drive-ide1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/one.iso","node-name":"file-drive-ide1"},"node-name":"fmt-drive-ide1","read-only":true},"node-name":"drive-ide1","throttle-group":"throttle-drive-ide1"}' \
+ -device 'ide-cd,bus=ide.0,unit=1,id=ide1,drive=drive-ide1,bootindex=201' \
+ -object 'throttle-group,id=throttle-drive-ide2' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/two.iso","node-name":"file-drive-ide2"},"node-name":"fmt-drive-ide2","read-only":true},"node-name":"drive-ide2","throttle-group":"throttle-drive-ide2"}' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,drive=drive-ide2,bootindex=202' \
+ -object 'throttle-group,id=throttle-drive-ide3' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/three.iso","node-name":"file-drive-ide3"},"node-name":"fmt-drive-ide3","read-only":true},"node-name":"drive-ide3","throttle-group":"throttle-drive-ide3"}' \
+ -device 'ide-cd,bus=ide.1,unit=1,id=ide3,drive=drive-ide3,bootindex=203' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/100/vm-100-disk-2.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/vm-100-disk-2.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=2E:01:68:F9:9C:87,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd b/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
index d17d4deb..cb880681 100644
--- a/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
+++ b/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
@@ -23,10 +23,10 @@
-device 'virtio-rng-pci,rng=rng0,max-bytes=1024,period=1000,bus=pci.1,addr=0x1d' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300,romfile=pxe-virtio.rom' \
diff --git a/test/cfg2cmd/pinned-version-pxe.conf.cmd b/test/cfg2cmd/pinned-version-pxe.conf.cmd
index 892fc148..a4dddf3e 100644
--- a/test/cfg2cmd/pinned-version-pxe.conf.cmd
+++ b/test/cfg2cmd/pinned-version-pxe.conf.cmd
@@ -21,10 +21,10 @@
-device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300,romfile=pxe-virtio.rom' \
diff --git a/test/cfg2cmd/pinned-version.conf.cmd b/test/cfg2cmd/pinned-version.conf.cmd
index 13361edf..cde4d273 100644
--- a/test/cfg2cmd/pinned-version.conf.cmd
+++ b/test/cfg2cmd/pinned-version.conf.cmd
@@ -21,10 +21,10 @@
-device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
diff --git a/test/cfg2cmd/q35-ide.conf.cmd b/test/cfg2cmd/q35-ide.conf.cmd
index dd4f1bbe..c7ca20c1 100644
--- a/test/cfg2cmd/q35-ide.conf.cmd
+++ b/test/cfg2cmd/q35-ide.conf.cmd
@@ -22,16 +22,21 @@
-device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/zero.iso,if=none,id=drive-ide0,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/one.iso,if=none,id=drive-ide1,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.2,unit=0,drive=drive-ide1,id=ide1,bootindex=201' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/two.iso,if=none,id=drive-ide2,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=202' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/three.iso,if=none,id=drive-ide3,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.3,unit=0,drive=drive-ide3,id=ide3,bootindex=203' \
+ -object 'throttle-group,id=throttle-drive-ide0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/zero.iso","node-name":"file-drive-ide0"},"node-name":"fmt-drive-ide0","read-only":true},"node-name":"drive-ide0","throttle-group":"throttle-drive-ide0"}' \
+ -device 'ide-cd,bus=ide.0,unit=0,id=ide0,drive=drive-ide0,bootindex=200' \
+ -object 'throttle-group,id=throttle-drive-ide1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/one.iso","node-name":"file-drive-ide1"},"node-name":"fmt-drive-ide1","read-only":true},"node-name":"drive-ide1","throttle-group":"throttle-drive-ide1"}' \
+ -device 'ide-cd,bus=ide.2,unit=0,id=ide1,drive=drive-ide1,bootindex=201' \
+ -object 'throttle-group,id=throttle-drive-ide2' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/two.iso","node-name":"file-drive-ide2"},"node-name":"fmt-drive-ide2","read-only":true},"node-name":"drive-ide2","throttle-group":"throttle-drive-ide2"}' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,drive=drive-ide2,bootindex=202' \
+ -object 'throttle-group,id=throttle-drive-ide3' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/three.iso","node-name":"file-drive-ide3"},"node-name":"fmt-drive-ide3","read-only":true},"node-name":"drive-ide3","throttle-group":"throttle-drive-ide3"}' \
+ -device 'ide-cd,bus=ide.3,unit=0,id=ide3,drive=drive-ide3,bootindex=203' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/100/vm-100-disk-2.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/vm-100-disk-2.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=2E:01:68:F9:9C:87,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd b/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
index cda10630..63c9fbe6 100644
--- a/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
+++ b/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
@@ -24,7 +24,8 @@
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/100/base-100-disk-2.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on,readonly=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/base-100-disk-2.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":true},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
-machine 'accel=tcg,type=pc+pve0' \
-snapshot
diff --git a/test/cfg2cmd/seabios_serial.conf.cmd b/test/cfg2cmd/seabios_serial.conf.cmd
index 1c4e102c..c3597ad1 100644
--- a/test/cfg2cmd/seabios_serial.conf.cmd
+++ b/test/cfg2cmd/seabios_serial.conf.cmd
@@ -23,10 +23,10 @@
-device 'isa-serial,chardev=serial0' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd b/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
index 097a14e1..d7fbe2ca 100644
--- a/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
+++ b/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
@@ -23,10 +23,10 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
diff --git a/test/cfg2cmd/simple-btrfs.conf.cmd b/test/cfg2cmd/simple-btrfs.conf.cmd
index c2354887..879ca729 100644
--- a/test/cfg2cmd/simple-btrfs.conf.cmd
+++ b/test/cfg2cmd/simple-btrfs.conf.cmd
@@ -23,10 +23,10 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/butter/bread/images/8006/vm-8006-disk-0/disk.raw,if=none,id=drive-scsi0,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":false,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/butter/bread/images/8006/vm-8006-disk-0/disk.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple-virtio-blk.conf.cmd b/test/cfg2cmd/simple-virtio-blk.conf.cmd
index d19aca6b..bd4dc308 100644
--- a/test/cfg2cmd/simple-virtio-blk.conf.cmd
+++ b/test/cfg2cmd/simple-virtio-blk.conf.cmd
@@ -24,9 +24,9 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple1-template.conf.cmd b/test/cfg2cmd/simple1-template.conf.cmd
index 35484600..7f9ae106 100644
--- a/test/cfg2cmd/simple1-template.conf.cmd
+++ b/test/cfg2cmd/simple1-template.conf.cmd
@@ -21,13 +21,14 @@
-device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/base-8006-disk-1.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap,readonly=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-1.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":true},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
-device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
- -drive 'file=/var/lib/vz/images/8006/base-8006-disk-0.qcow2,if=none,id=drive-sata0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
- -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
+ -object 'throttle-group,id=throttle-drive-sata0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-0.qcow2","node-name":"file-drive-sata0"},"node-name":"fmt-drive-sata0","read-only":false},"node-name":"drive-sata0","throttle-group":"throttle-drive-sata0"}' \
+ -device 'ide-hd,bus=ahci0.0,id=sata0,drive=drive-sata0' \
-machine 'accel=tcg,smm=off,type=pc+pve0' \
-snapshot
diff --git a/test/cfg2cmd/simple1.conf.cmd b/test/cfg2cmd/simple1.conf.cmd
index ecd14bcc..df35e030 100644
--- a/test/cfg2cmd/simple1.conf.cmd
+++ b/test/cfg2cmd/simple1.conf.cmd
@@ -23,10 +23,10 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (3 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 13:55 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
` (9 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 15525 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
Date: Mon, 16 Dec 2024 10:12:19 +0100
Message-ID: <20241216091229.3142660-6-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
src/PVE/Storage/LVMPlugin.pm | 231 ++++++++++++++++++++++++++++++++---
1 file changed, 213 insertions(+), 18 deletions(-)
diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
index 88fd612..1257cd3 100644
--- a/src/PVE/Storage/LVMPlugin.pm
+++ b/src/PVE/Storage/LVMPlugin.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use IO::File;
+use POSIX qw/ceil/;
use PVE::Tools qw(run_command trim);
use PVE::Storage::Plugin;
@@ -216,6 +217,7 @@ sub type {
sub plugindata {
return {
content => [ {images => 1, rootdir => 1}, { images => 1 }],
+ format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
};
}
@@ -291,7 +293,10 @@ sub parse_volname {
PVE::Storage::Plugin::parse_lvm_name($volname);
if ($volname =~ m/^(vm-(\d+)-\S+)$/) {
- return ('images', $1, $2, undef, undef, undef, 'raw');
+ my $name = $1;
+ my $vmid = $2;
+ my $format = $volname =~ m/\.qcow2$/ ? 'qcow2' : 'raw';
+ return ('images', $name, $vmid, undef, undef, undef, $format);
}
die "unable to parse lvm volume name '$volname'\n";
@@ -300,11 +305,13 @@ sub parse_volname {
sub filesystem_path {
my ($class, $scfg, $volname, $snapname) = @_;
- die "lvm snapshot is not implemented"if defined($snapname);
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
- my ($vtype, $name, $vmid) = $class->parse_volname($volname);
+ die "snapshot is working with qcow2 format only" if defined($snapname) && $format ne 'qcow2';
my $vg = $scfg->{vgname};
+ $name = $class->get_snap_volname($volname, $snapname) if $snapname;
my $path = "/dev/$vg/$name";
@@ -332,7 +339,9 @@ sub find_free_diskname {
my $disk_list = [ keys %{$lvs->{$vg}} ];
- return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, undef, $scfg);
+ $add_fmt_suffix = $fmt eq 'qcow2' ? 1 : undef;
+
+ return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix);
}
sub lvcreate {
@@ -363,7 +372,15 @@ sub lvrename {
sub alloc_image {
my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
- die "unsupported format '$fmt'" if $fmt ne 'raw';
+ die "unsupported format '$fmt'" if $fmt !~ m/(raw|qcow2)/;
+
+ $name = $class->alloc_new_image($storeid, $scfg, $vmid, $fmt, $name, $size);
+ $class->format_qcow2($storeid, $scfg, $name, $size) if $fmt eq 'qcow2';
+ return $name;
+}
+
+sub alloc_new_image {
+ my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
die "illegal name '$name' - should be 'vm-$vmid-*'\n"
if $name && $name !~ m/^vm-$vmid-/;
@@ -376,16 +393,45 @@ sub alloc_image {
my $free = int($vgs->{$vg}->{free});
+
+ #add extra space for qcow2 metadatas
+ #without sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size
+ #with sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size / 16
+ #4MB overhead for 1TB with extented l2 clustersize=128k
+
+ my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
+
+ my $lvmsize = $size;
+ $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
+
die "not enough free space ($free < $size)\n" if $free < $size;
- $name = $class->find_free_diskname($storeid, $scfg, $vmid)
+ $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
if !$name;
- lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
-
+ my $tags = ["pve-vm-$vmid"];
+ push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
+ lvcreate($vg, $name, $lvmsize, $tags);
return $name;
}
+sub format_qcow2 {
+ my ($class, $storeid, $scfg, $name, $size, $backing_file) = @_;
+
+ # activate volume
+ $class->activate_volume($storeid, $scfg, $name, undef, {});
+ my $path = $class->path($scfg, $name, $storeid);
+ # create the qcow2 fs
+ my $cmd = ['/usr/bin/qemu-img', 'create'];
+ push @$cmd, '-b', $backing_file, '-F', 'qcow2' if $backing_file;
+ push @$cmd, '-f', 'qcow2', $path;
+ push @$cmd, "${size}K" if $size;
+ my $options = "extended_l2=on,";
+ $options .= PVE::Storage::Plugin::preallocation_cmd_option($scfg, 'qcow2');
+ push @$cmd, '-o', $options;
+ run_command($cmd);
+}
+
sub free_image {
my ($class, $storeid, $scfg, $volname, $isBase) = @_;
@@ -536,6 +582,12 @@ sub activate_volume {
my $lvm_activate_mode = 'ey';
+ #activate volume && all snapshots volumes by tag
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ $path = "\@pve-$name" if $format eq 'qcow2';
+
my $cmd = ['/sbin/lvchange', "-a$lvm_activate_mode", $path];
run_command($cmd, errmsg => "can't activate LV '$path'");
$cmd = ['/sbin/lvchange', '--refresh', $path];
@@ -548,6 +600,10 @@ sub deactivate_volume {
my $path = $class->path($scfg, $volname, $storeid, $snapname);
return if ! -b $path;
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+ $path = "\@pve-$name" if $format eq 'qcow2';
+
my $cmd = ['/sbin/lvchange', '-aln', $path];
run_command($cmd, errmsg => "can't deactivate LV '$path'");
}
@@ -555,15 +611,27 @@ sub deactivate_volume {
sub volume_resize {
my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
- $size = ($size/1024/1024) . "M";
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ my $lvmsize = $size / 1024;
+ my $qcow2_overhead = ceil($size/1024/1024/1024/1024) * 4096;
+ $lvmsize += $qcow2_overhead if $format eq 'qcow2';
+ $lvmsize = "${lvmsize}k";
my $path = $class->path($scfg, $volname);
- my $cmd = ['/sbin/lvextend', '-L', $size, $path];
+ my $cmd = ['/sbin/lvextend', '-L', $lvmsize, $path];
$class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
run_command($cmd, errmsg => "error resizing volume '$path'");
});
+ if(!$running && $format eq 'qcow2') {
+ my $prealloc_opt = PVE::Storage::Plugin::preallocation_cmd_option($scfg, $format);
+ my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', $format, $path , $size];
+ run_command($cmd, timeout => 10);
+ }
+
return 1;
}
@@ -585,30 +653,149 @@ sub volume_size_info {
sub volume_snapshot {
my ($class, $scfg, $storeid, $volname, $snap) = @_;
- die "lvm snapshot is not implemented";
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ die "can't snapshot this image format\n" if $format ne 'qcow2';
+
+ $class->activate_volume($storeid, $scfg, $volname, undef, {});
+
+ my $snap_volname = $class->get_snap_volname($volname, $snap);
+ my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+ my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
+
+ #rename current lvm volume to snap volume
+ my $vg = $scfg->{vgname};
+ print"rename $volname to $snap_volname\n";
+ eval { lvrename($vg, $volname, $snap_volname) } ;
+
+
+ #allocate a new lvm volume
+ $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
+ eval {
+ $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
+ };
+
+ if ($@) {
+ eval { $class->free_image($storeid, $scfg, $volname, 0) };
+ warn $@ if $@;
+ }
+}
+
+sub volume_rollback_is_possible {
+ my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
+
+ my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $parent_snap = $snapshots->{current}->{parent};
+
+ return 1 if !-e $snap_path || $snapshots->{$parent_snap}->{file} eq $snap_path;
+ die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
+
+ return 1;
}
+
sub volume_snapshot_rollback {
my ($class, $scfg, $storeid, $volname, $snap) = @_;
- die "lvm snapshot rollback is not implemented";
+ die "can't rollback snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ $class->activate_volume($storeid, $scfg, $volname, undef, {});
+ my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
+ my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+ #simply delete the current snapshot and recreate it
+ $class->free_image($storeid, $scfg, $volname, 0);
+ $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
+ $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
+
+ return undef;
}
sub volume_snapshot_delete {
- my ($class, $scfg, $storeid, $volname, $snap) = @_;
+ my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
+
+ die "can't delete snapshot for this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
+
+ return 1 if $running;
+
+ my $cmd = "";
+ my $path = $class->filesystem_path($scfg, $volname);
+
+
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $snap_path = $snapshots->{$snap}->{file};
+ my $snap_volname = $snapshots->{$snap}->{volname};
+ return if !-e $snap_path; #already deleted ?
+
+ my $parent_snap = $snapshots->{$snap}->{parent};
+ my $child_snap = $snapshots->{$snap}->{child};
+
+ my $parent_path = $snapshots->{$parent_snap}->{file} if $parent_snap;
+ my $child_path = $snapshots->{$child_snap}->{file} if $child_snap;
+ my $child_volname = $snapshots->{$child_snap}->{volname} if $child_snap;
+
+
+ #if first snapshot, we merge child, and rename the snapshot to child
+ if(!$parent_snap) {
+ #we use commit here, as it's faster than rebase
+ #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
+ print"commit $child_path\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $child_path];
+ run_command($cmd);
+ print"delete $child_volname\n";
+ $class->free_image($storeid, $scfg, $child_volname, 0);
+
+ print"rename $snap_volname to $child_volname\n";
+ my $vg = $scfg->{vgname};
+ lvrename($vg, $snap_volname, $child_volname);
+ } else {
+ print"commit $snap_path\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $snap_path];
+ #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
+ die "missing parentsnap snapshot to rebase child $child_path\n" if !$parent_path;
+ print "link $child_snap to $parent_snap\n";
+ $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
+ run_command($cmd);
+ #delete the snapshot
+ $class->free_image($storeid, $scfg, $snap_volname, 0);
+ }
- die "lvm snapshot delete is not implemented";
}
sub volume_has_feature {
my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
my $features = {
- copy => { base => 1, current => 1},
- rename => {current => 1},
+ copy => {
+ base => { qcow2 => 1, raw => 1},
+ current => { qcow2 => 1, raw => 1},
+ snap => { qcow2 => 1 },
+ },
+ 'rename' => {
+ current => { qcow2 => 1, raw => 1},
+ },
+ snapshot => {
+ current => { qcow2 => 1 },
+ snap => { qcow2 => 1 },
+ },
+ template => {
+ current => { qcow2 => 1, raw => 1},
+ },
+# don't allow to clone as we can't activate the base on multiple host at the same time
+# clone => {
+# base => { qcow2 => 1, raw => 1},
+# },
};
- my ($vtype, $name, $vmid, $basename, $basevmid, $isBase) =
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
$class->parse_volname($volname);
my $key = undef;
@@ -617,7 +804,7 @@ sub volume_has_feature {
}else{
$key = $isBase ? 'base' : 'current';
}
- return 1 if $features->{$feature}->{$key};
+ return 1 if defined($features->{$feature}->{$key}->{$format});
return undef;
}
@@ -738,4 +925,12 @@ sub rename_volume {
return "${storeid}:${target_volname}";
}
+sub get_snap_volname {
+ my ($class, $volname, $snapname) = @_;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
+ $name = !$snapname || $snapname eq 'current' ? $volname : "snap-$snapname-$name";
+ return $name;
+}
+
1;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (4 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:26 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
` (8 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5878 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
Date: Mon, 16 Dec 2024 10:12:20 +0100
Message-ID: <20241216091229.3142660-7-alexandre.derumier@groupe-cyllene.com>
fixme/testme :
PVE/VZDump/QemuServer.pm: eval { PVE::QemuServer::qemu_drivedel($vmid, "tpmstate0-backup"); };
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 64 +++++++++++++++++++++++++++++++++--------------
1 file changed, 45 insertions(+), 19 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 2832ed09..baf78ec0 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -1582,6 +1582,42 @@ sub print_drive_throttle_group {
return $throttle_group;
}
+sub generate_throttle_group {
+ my ($drive) = @_;
+
+ my $drive_id = get_drive_id($drive);
+
+ my $throttle_group = { id => "throttle-drive-$drive_id" };
+ my $limits = {};
+
+ foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
+ my ($dir, $qmpname) = @$type;
+
+ if (my $v = $drive->{"mbps$dir"}) {
+ $limits->{"bps$qmpname"} = int($v*1024*1024);
+ }
+ if (my $v = $drive->{"mbps${dir}_max"}) {
+ $limits->{"bps$qmpname-max"} = int($v*1024*1024);
+ }
+ if (my $v = $drive->{"bps${dir}_max_length"}) {
+ $limits->{"bps$qmpname-max-length"} = int($v)
+ }
+ if (my $v = $drive->{"iops${dir}"}) {
+ $limits->{"iops$qmpname"} = int($v);
+ }
+ if (my $v = $drive->{"iops${dir}_max"}) {
+ $limits->{"iops$qmpname-max"} = int($v);
+ }
+ if (my $v = $drive->{"iops${dir}_max_length"}) {
+ $limits->{"iops$qmpname-max-length"} = int($v);
+ }
+ }
+
+ $throttle_group->{limits} = $limits;
+
+ return $throttle_group;
+}
+
sub generate_file_blockdev {
my ($storecfg, $drive, $nodename) = @_;
@@ -4595,32 +4631,22 @@ sub qemu_iothread_del {
}
sub qemu_driveadd {
- my ($storecfg, $vmid, $device) = @_;
+ my ($storecfg, $vmid, $drive) = @_;
- my $kvmver = get_running_qemu_version($vmid);
- my $io_uring = min_version($kvmver, 6, 0);
- my $drive = print_drive_commandline_full($storecfg, $vmid, $device, undef, $io_uring);
- $drive =~ s/\\/\\\\/g;
- my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"", 60);
-
- # If the command succeeds qemu prints: "OK"
- return 1 if $ret =~ m/OK/s;
+ my $drive_id = get_drive_id($drive);
+ my $throttle_group = generate_throttle_group($drive);
+ mon_cmd($vmid, 'object-add', "qom-type" => "throttle-group", %$throttle_group);
- die "adding drive failed: $ret\n";
+ my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
+ mon_cmd($vmid, 'blockdev-add', %$blockdev, timeout => 10 * 60);
+ return 1;
}
sub qemu_drivedel {
my ($vmid, $deviceid) = @_;
- my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_del drive-$deviceid", 10 * 60);
- $ret =~ s/^\s+//;
-
- return 1 if $ret eq "";
-
- # NB: device not found errors mean the drive was auto-deleted and we ignore the error
- return 1 if $ret =~ m/Device \'.*?\' not found/s;
-
- die "deleting drive $deviceid failed : $ret\n";
+ mon_cmd($vmid, 'blockdev-del', 'node-name' => "drive-$deviceid", timeout => 10 * 60);
+ mon_cmd($vmid, 'object-del', id => "throttle-drive-$deviceid");
}
sub qemu_deviceaddverify {
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (5 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
` (7 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5021 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots
Date: Mon, 16 Dec 2024 10:12:21 +0100
Message-ID: <20241216091229.3142660-8-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
src/PVE/Storage.pm | 18 +++++++++++++++++-
src/test/run_test_zfspoolplugin.pl | 18 ++++++++++++++++++
2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index 3b4f041..798544b 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -1052,7 +1052,23 @@ sub vdisk_free {
my (undef, undef, undef, undef, undef, $isBase, $format) =
$plugin->parse_volname($volname);
- $cleanup_worker = $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);
+
+ $cleanup_worker = sub {
+ #remove external snapshots
+ activate_volumes($cfg, [ $volid ]);
+ my $snapshots = PVE::Storage::volume_snapshot_info($cfg, $volid);
+ for my $snapid (sort { $snapshots->{$b}->{order} <=> $snapshots->{$a}->{order} } keys %$snapshots) {
+ my $snap = $snapshots->{$snapid};
+ next if $snapid eq 'current';
+ next if !$snap->{volid};
+ next if !$snap->{ext};
+ my ($snap_storeid, $snap_volname) = parse_volume_id($snap->{volid});
+ my (undef, undef, undef, undef, undef, $snap_isBase, $snap_format) =
+ $plugin->parse_volname($volname);
+ $plugin->free_image($snap_storeid, $scfg, $snap_volname, $snap_isBase, $snap_format);
+ }
+ $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);
+ };
});
return if !$cleanup_worker;
diff --git a/src/test/run_test_zfspoolplugin.pl b/src/test/run_test_zfspoolplugin.pl
index 095ccb3..4ff9f22 100755
--- a/src/test/run_test_zfspoolplugin.pl
+++ b/src/test/run_test_zfspoolplugin.pl
@@ -6,12 +6,30 @@ use strict;
use warnings;
use Data::Dumper qw(Dumper);
+use Test::MockModule;
+
use PVE::Storage;
use PVE::Cluster;
use PVE::Tools qw(run_command);
+use PVE::RPCEnvironment;
use Cwd;
$Data::Dumper::Sortkeys = 1;
+my $rpcenv_module;
+$rpcenv_module = Test::MockModule->new('PVE::RPCEnvironment');
+$rpcenv_module->mock(
+ get_user => sub {
+ return 'root@pam';
+ },
+ fork_worker => sub {
+ my ($self, $dtype, $id, $user, $function, $background) = @_;
+ $function->(123456);
+ return '123456';
+ }
+);
+
+my $rpcenv = PVE::RPCEnvironment->init('pub');
+
my $verbose = undef;
my $storagename = "zfstank99";
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (6 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:31 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
` (6 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 3511 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
Date: Mon, 16 Dec 2024 10:12:22 +0100
Message-ID: <20241216091229.3142660-9-alexandre.derumier@groupe-cyllene.com>
Look at qdev value, as cdrom drives can be empty
without any inserted media
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index baf78ec0..3b33fd7d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4425,10 +4425,9 @@ sub vm_devices_list {
}
my $resblock = mon_cmd($vmid, 'query-block');
- foreach my $block (@$resblock) {
- if($block->{device} =~ m/^drive-(\S+)/){
- $devices->{$1} = 1;
- }
+ $resblock = { map { $_->{qdev} => $_ } $resblock->@* };
+ foreach my $blockid (keys %$resblock) {
+ $devices->{$blockid} = 1;
}
my $resmice = mon_cmd($vmid, 'query-mice');
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (7 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:34 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
` (5 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 4217 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert
Date: Mon, 16 Dec 2024 10:12:23 +0100
Message-ID: <20241216091229.3142660-10-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 3b33fd7d..758c8240 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5694,7 +5694,10 @@ sub vmconfig_update_disk {
} else { # cdrom
if ($drive->{file} eq 'none') {
- mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
+ mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
+ mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
+ qemu_drivedel($vmid, $opt);
+
if (drive_is_cloudinit($old_drive)) {
vmconfig_register_unused_drive($storecfg, $vmid, $conf, $old_drive);
}
@@ -5702,14 +5705,16 @@ sub vmconfig_update_disk {
my $path = get_iso_path($storecfg, $vmid, $drive->{file});
# force eject if locked
- mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
+ mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
+ mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
+ eval { qemu_drivedel($vmid, $opt) };
if ($path) {
- mon_cmd($vmid, "blockdev-change-medium",
- id => "$opt", filename => "$path");
+ qemu_driveadd($storecfg, $vmid, $drive);
+ mon_cmd($vmid, "blockdev-insert-medium", id => $opt, 'node-name' => "drive-$opt");
+ mon_cmd($vmid, "blockdev-close-tray", id => $opt);
}
}
-
return 1;
}
}
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (8 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
` (4 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 3266 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev
Date: Mon, 16 Dec 2024 10:12:24 +0100
Message-ID: <20241216091229.3142660-11-alexandre.derumier@groupe-cyllene.com>
We need to use the top blocknode (throttle) as name-node
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 758c8240..22b011e1 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4918,7 +4918,7 @@ sub qemu_block_resize {
mon_cmd(
$vmid,
"block_resize",
- device => $deviceid,
+ 'node-name' => $deviceid,
size => int($size),
timeout => 60,
);
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (9 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
` (3 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 3751 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename
Date: Mon, 16 Dec 2024 10:12:25 +0100
Message-ID: <20241216091229.3142660-12-alexandre.derumier@groupe-cyllene.com>
we have fixed nodename now
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 22b011e1..6bebb906 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6235,20 +6235,15 @@ sub vm_start_nolock {
$migrate_storage_uri = "nbd:${localip}:${storage_migrate_port}";
}
- my $block_info = mon_cmd($vmid, "query-block");
- $block_info = { map { $_->{device} => $_ } $block_info->@* };
-
foreach my $opt (sort keys %$nbd) {
my $drivestr = $nbd->{$opt}->{drivestr};
my $volid = $nbd->{$opt}->{volid};
- my $block_node = $block_info->{"drive-$opt"}->{inserted}->{'node-name'};
-
mon_cmd(
$vmid,
"block-export-add",
id => "drive-$opt",
- 'node-name' => $block_node,
+ 'node-name' => "drive-$opt",
writable => JSON::true,
type => "nbd",
name => "drive-$opt", # NBD export name
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (10 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 15:19 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
` (2 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 9269 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Mon, 16 Dec 2024 10:12:26 +0100
Message-ID: <20241216091229.3142660-13-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuMigrate.pm | 2 +-
PVE/QemuServer.pm | 106 +++++++++++++++++++++++++++++++++++----------
2 files changed, 83 insertions(+), 25 deletions(-)
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index ed5ede30..88627ce4 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1134,7 +1134,7 @@ sub phase2 {
my $bitmap = $target->{bitmap};
$self->log('info', "$drive: start migration to $nbd_uri");
- PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
+ PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $source_drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
}
if (PVE::QemuServer::QMPHelpers::runs_at_least_qemu_version($vmid, 8, 2)) {
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 6bebb906..3d7c41ee 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8184,59 +8184,85 @@ sub qemu_img_convert {
}
sub qemu_drive_mirror {
- my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
+ my ($vmid, $driveid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
$jobs = {} if !$jobs;
+ my $deviceid = "drive-$driveid";
+ my $dst_format;
+ my $dst_path = $dst_volid;
+ my $jobid = "mirror-$deviceid";
+ $jobs->{$jobid} = {};
- my $qemu_target;
- my $format;
- $jobs->{"drive-$drive"} = {};
+ my $storecfg = PVE::Storage::config();
if ($dst_volid =~ /^nbd:/) {
- $qemu_target = $dst_volid;
- $format = "nbd";
+ $dst_format = "nbd";
} else {
- my $storecfg = PVE::Storage::config();
-
- $format = checked_volume_format($storecfg, $dst_volid);
-
- my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
-
- $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
+ $dst_format = checked_volume_format($storecfg, $dst_volid);
+ $dst_path = PVE::Storage::path($storecfg, $dst_volid);
+ }
+
+ # copy original drive config (aio,cache,discard,...)
+ my $dst_drive = dclone($drive);
+ $dst_drive->{format} = $dst_format;
+ $dst_drive->{file} = $dst_path;
+ $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
+ #improve: if target storage don't support aio uring,change it to default native
+ #and remove clone_disk_check_io_uring()
+
+ #add new block device
+ my $nodes = get_blockdev_nodes($vmid);
+
+ my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
+ my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+ my $target_file_blockdev = generate_file_blockdev($storecfg, $dst_drive, $target_file_nodename);
+ my $target_nodename = undef;
+
+ if ($dst_format eq 'nbd') {
+ #nbd file don't have fmt
+ $target_nodename = $target_file_nodename;
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_file_blockdev);
+ } else {
+ $target_nodename = $target_fmt_nodename;
+ my $target_fmt_blockdev = generate_format_blockdev($storecfg, $dst_drive, $target_fmt_nodename, $target_file_blockdev);
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
}
+ #we replace the original src_fmt node in the blockdev graph
+ my $src_fmt_nodename = find_fmt_nodename_drive($storecfg, $vmid, $drive, $nodes);
my $opts = {
+ 'job-id' => $jobid,
timeout => 10,
- device => "drive-$drive",
- mode => "existing",
+ device => $deviceid,
+ replaces => $src_fmt_nodename,
sync => "full",
- target => $qemu_target,
+ target => $target_nodename,
'auto-dismiss' => JSON::false,
};
- $opts->{format} = $format if $format;
if (defined($src_bitmap)) {
$opts->{sync} = 'incremental';
- $opts->{bitmap} = $src_bitmap;
+ $opts->{bitmap} = $src_bitmap; ##FIXME: how to handle bitmap ? special proxmox patch ?
print "drive mirror re-using dirty bitmap '$src_bitmap'\n";
}
if (defined($bwlimit)) {
$opts->{speed} = $bwlimit * 1024;
- print "drive mirror is starting for drive-$drive with bandwidth limit: ${bwlimit} KB/s\n";
+ print "drive mirror is starting for $deviceid with bandwidth limit: ${bwlimit} KB/s\n";
} else {
- print "drive mirror is starting for drive-$drive\n";
+ print "drive mirror is starting for $deviceid\n";
}
# if a job already runs for this device we get an error, catch it for cleanup
- eval { mon_cmd($vmid, "drive-mirror", %$opts); };
+ eval { mon_cmd($vmid, "blockdev-mirror", %$opts); };
+
if (my $err = $@) {
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
+ #FIXME: delete blockdev after job cancel
warn "$@\n" if $@;
die "mirroring error: $err\n";
}
-
- qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga);
+ qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga, 'mirror');
}
# $completion can be either
@@ -8595,7 +8621,7 @@ sub clone_disk {
my $sparseinit = PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $newvolid);
if ($use_drive_mirror) {
- qemu_drive_mirror($vmid, $src_drivename, $newvolid, $newvmid, $sparseinit, $jobs,
+ qemu_drive_mirror($vmid, $src_drivename, $drive, $newvolid, $newvmid, $sparseinit, $jobs,
$completion, $qga, $bwlimit);
} else {
if ($dst_drivename eq 'efidisk0') {
@@ -9130,6 +9156,38 @@ sub delete_ifaces_ipams_ips {
}
}
+sub find_fmt_nodename_drive {
+ my ($storecfg, $vmid, $drive, $nodes) = @_;
+
+ my $volid = $drive->{file};
+ my $format = checked_volume_format($storecfg, $volid);
+ my $path = PVE::Storage::path($storecfg, $volid);
+
+ my $node = find_blockdev_node($nodes, $path, 'fmt');
+ return $node->{'node-name'};
+}
+
+sub get_blockdev_nextid {
+ my ($nodename, $nodes) = @_;
+ my $version = 0;
+ for my $nodeid (keys %$nodes) {
+ if ($nodeid =~ m/^$nodename-(\d+)$/) {
+ my $current_version = $1;
+ $version = $current_version if $current_version >= $version;
+ }
+ }
+ $version++;
+ return "$nodename-$version";
+}
+
+sub get_blockdev_nodes {
+ my ($vmid) = @_;
+
+ my $nodes = PVE::QemuServer::Monitor::mon_cmd($vmid, "query-named-block-nodes");
+ $nodes = { map { $_->{'node-name'} => $_ } $nodes->@* };
+ return $nodes;
+}
+
sub encode_json_ordered {
return JSON->new->canonical->allow_nonref->encode( $_[0] );
}
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (11 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 9:51 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5748 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
Date: Mon, 16 Dec 2024 10:12:27 +0100
Message-ID: <20241216091229.3142660-14-alexandre.derumier@groupe-cyllene.com>
This was a limitation of drive-mirror, blockdev mirror is able
to reopen image with a different aio
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 41 ++++++++++-------------------------------
1 file changed, 10 insertions(+), 31 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 3d7c41ee..dc12b38f 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8207,8 +8207,16 @@ sub qemu_drive_mirror {
$dst_drive->{format} = $dst_format;
$dst_drive->{file} = $dst_path;
$dst_drive->{zeroinit} = 1 if $is_zero_initialized;
- #improve: if target storage don't support aio uring,change it to default native
- #and remove clone_disk_check_io_uring()
+
+ #change aio if io_uring is not supported on target
+ if ($dst_drive->{aio} && $dst_drive->{aio} eq 'io_uring') {
+ my ($dst_storeid) = PVE::Storage::parse_volume_id($dst_drive->{file});
+ my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
+ my $cache_direct = drive_uses_cache_direct($dst_drive, $dst_scfg);
+ if(!storage_allows_io_uring_default($dst_scfg, $cache_direct)) {
+ $dst_drive->{aio} = $cache_direct ? 'native' : 'threads';
+ }
+ }
#add new block device
my $nodes = get_blockdev_nodes($vmid);
@@ -8514,33 +8522,6 @@ sub qemu_drive_mirror_switch_to_active_mode {
}
}
-# Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
-# source, but some storages have problems with io_uring, sometimes even leading to crashes.
-my sub clone_disk_check_io_uring {
- my ($src_drive, $storecfg, $src_storeid, $dst_storeid, $use_drive_mirror) = @_;
-
- return if !$use_drive_mirror;
-
- # Don't complain when not changing storage.
- # Assume if it works for the source, it'll work for the target too.
- return if $src_storeid eq $dst_storeid;
-
- my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
- my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
-
- my $cache_direct = drive_uses_cache_direct($src_drive);
-
- my $src_uses_io_uring;
- if ($src_drive->{aio}) {
- $src_uses_io_uring = $src_drive->{aio} eq 'io_uring';
- } else {
- $src_uses_io_uring = storage_allows_io_uring_default($src_scfg, $cache_direct);
- }
-
- die "target storage is known to cause issues with aio=io_uring (used by current drive)\n"
- if $src_uses_io_uring && !storage_allows_io_uring_default($dst_scfg, $cache_direct);
-}
-
sub clone_disk {
my ($storecfg, $source, $dest, $full, $newvollist, $jobs, $completion, $qga, $bwlimit) = @_;
@@ -8598,8 +8579,6 @@ sub clone_disk {
$dst_format = 'raw';
$size = PVE::QemuServer::Drive::TPMSTATE_DISK_SIZE;
} else {
- clone_disk_check_io_uring($drive, $storecfg, $src_storeid, $storeid, $use_drive_mirror);
-
$size = PVE::Storage::volume_size_info($storecfg, $drive->{file}, 10);
}
$newvolid = PVE::Storage::vdisk_alloc(
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (12 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5424 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
Date: Mon, 16 Dec 2024 10:12:28 +0100
Message-ID: <20241216091229.3142660-15-alexandre.derumier@groupe-cyllene.com>
We need to define name-nodes for all backing chain images,
to be able to live rename them with blockdev-reopen
For linked clone, we don't need to definebase image(s) chain.
They are auto added with #block nodename.
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index dc12b38f..3a3feadf 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -1618,6 +1618,38 @@ sub generate_throttle_group {
return $throttle_group;
}
+sub generate_backing_blockdev {
+ my ($storecfg, $snapshots, $deviceid, $drive, $id) = @_;
+
+ my $snapshot = $snapshots->{$id};
+ my $order = $snapshot->{order};
+ my $parentid = $snapshot->{parent};
+ my $snap_fmt_nodename = "fmt-$deviceid-$order";
+ my $snap_file_nodename = "file-$deviceid-$order";
+
+ my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap_file_nodename);
+ $snap_file_blockdev->{filename} = $snapshot->{file};
+ my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_fmt_nodename, $snap_file_blockdev, 1);
+ $snap_fmt_blockdev->{backing} = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
+ return $snap_fmt_blockdev;
+}
+
+sub generate_backing_chain_blockdev {
+ my ($storecfg, $deviceid, $drive) = @_;
+
+ my $volid = $drive->{file};
+ my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid);
+ return if !$do_snapshots_with_qemu || $do_snapshots_with_qemu != 2;
+
+ my $chain_blockdev = undef;
+ PVE::Storage::activate_volumes($storecfg, [$volid]);
+ #should we use qemu config to list snapshots ?
+ my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+ my $parentid = $snapshots->{'current'}->{parent};
+ $chain_blockdev = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
+ return $chain_blockdev;
+}
+
sub generate_file_blockdev {
my ($storecfg, $drive, $nodename) = @_;
@@ -1816,6 +1848,8 @@ sub generate_drive_blockdev {
my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
my $fmt_nodename = "fmt-drive-$drive_id";
my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
+ my $backing_chain = generate_backing_chain_blockdev($storecfg, "drive-$drive_id", $drive);
+ $blockdev_format->{backing} = $backing_chain if $backing_chain;
my $blockdev_live_restore = undef;
if ($live_restore_name) {
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (13 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
14 siblings, 1 reply; 38+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 18136 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 16 Dec 2024 10:12:29 +0100
Message-ID: <20241216091229.3142660-16-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuConfig.pm | 4 +-
PVE/QemuServer.pm | 345 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 335 insertions(+), 14 deletions(-)
diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index ffdf9f03..c17edb46 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -375,7 +375,7 @@ sub __snapshot_create_vol_snapshot {
print "snapshotting '$device' ($drive->{file})\n";
- PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
+ PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
}
sub __snapshot_delete_remove_drive {
@@ -412,7 +412,7 @@ sub __snapshot_delete_vol_snapshot {
my $storecfg = PVE::Storage::config();
my $volid = $drive->{file};
- PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
+ PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
push @$unused, $volid;
}
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 3a3feadf..f29a8449 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4959,20 +4959,269 @@ sub qemu_block_resize {
}
sub qemu_volume_snapshot {
- my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
+ my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
+ my $volid = $drive->{file};
my $running = check_running($vmid);
-
- if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
- mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
+ my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;
+ if ($do_snapshots_with_qemu) {
+ if($do_snapshots_with_qemu == 2) {
+ my $snap_path = PVE::Storage::path($storecfg, $volid, $snap);
+ my $path = PVE::Storage::path($storecfg, $volid);
+ blockdev_current_rename($storecfg, $vmid, $deviceid, $drive, $path, $snap_path, 1);
+ blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap);
+ } else {
+ mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
+ }
} else {
PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
}
}
+sub blockdev_external_snapshot {
+ my ($storecfg, $vmid, $deviceid, $drive, $snap) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+ my $path = PVE::Storage::path($storecfg, $volid, $snap);
+ my $format_node = find_blockdev_node($nodes, $path, 'fmt');
+ my $format_nodename = $format_node->{'node-name'};
+
+ #preallocate add a new current file
+ my $new_current_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
+ my $new_current_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+ PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
+ my $new_file_blockdev = generate_file_blockdev($storecfg, $drive, $new_current_file_nodename);
+ my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_current_fmt_nodename, $new_file_blockdev);
+
+ $new_fmt_blockdev->{backing} = undef;
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
+ mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename, overlay => $new_current_fmt_nodename);
+}
+
+sub blockdev_snap_rename {
+ my ($storecfg, $vmid, $deviceid, $drive, $src_path, $target_path) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+
+ #copy the original drive param and change target file
+ my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
+ my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+
+ my $src_fmt_node = find_blockdev_node($nodes, $src_path, 'fmt');
+ my $src_fmt_nodename = $src_fmt_node->{'node-name'};
+ my $src_file_node = find_blockdev_node($nodes, $src_path, 'file');
+ my $src_file_nodename = $src_file_node->{'node-name'};
+
+ #untaint
+ if ($src_path =~ m/^(\S+)$/) {
+ $src_path = $1;
+ }
+ if ($target_path =~ m/^(\S+)$/) {
+ $target_path = $1;
+ }
+
+ #create a hardlink
+ link($src_path, $target_path);
+
+ #add new format blockdev
+ my $read_only = 1;
+ my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
+ $target_file_blockdev->{filename} = $target_path;
+ my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_fmt_nodename, $target_file_blockdev, $read_only);
+
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
+
+ #reopen the parent node with different backing file
+ my $parent_fmt_node = find_parent_node($nodes, $src_path);
+ my $parent_fmt_nodename = $parent_fmt_node->{'node-name'};
+ my $parent_path = $parent_fmt_node->{file};
+ my $parent_file_node = find_blockdev_node($nodes, $parent_path, 'file');
+ my $parent_file_nodename = $parent_file_node->{'node-name'};
+ my $filenode_exist = 1;
+ $read_only = $parent_fmt_node->{ro};
+ my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_fmt_nodename, $parent_file_nodename, $read_only);
+ $parent_fmt_blockdev->{backing} = $target_fmt_nodename;
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
+
+ #change backing-file in qcow2 metadatas
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_nodename, 'backing-file' => $target_path);
+
+ # fileblockdev seem to be autoremoved, if it have been created online, but not if they are created at start with command line
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_nodename) };
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_nodename) };
+
+ #delete old $path link
+ unlink($src_path);
+
+ #rename underlay
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvrename $src_path to $target_path\n";
+ run_command(
+ ['/sbin/lvrename', $src_path, $target_path],
+ errmsg => "lvrename $src_path to $target_path error",
+ );
+ }
+}
+
+sub blockdev_current_rename {
+ my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path, $skip_underlay) = @_;
+ ## rename current running image
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+ my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+
+ my $file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
+ $file_blockdev->{filename} = $target_path;
+
+ my $format_node = find_blockdev_node($nodes, $path, 'fmt');
+ my $format_nodename = $format_node->{'node-name'};
+
+ my $file_node = find_blockdev_node($nodes, $path, 'file');
+ my $file_nodename = $file_node->{'node-name'};
+
+ my $backingfile = $format_node->{image}->{'backing-filename'};
+ my $backing_node = $backingfile ? find_blockdev_node($nodes, $backingfile, 'fmt') : undef;
+
+ #create a hardlink
+ link($path, $target_path);
+ #add new file blockdev
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$file_blockdev);
+
+ #reopen the current fmt nodename with a new file nodename
+ my $reopen_blockdev = generate_format_blockdev($storecfg, $drive, $format_nodename, $target_file_nodename);
+ $reopen_blockdev->{backing} = $backing_node->{'node-name'};
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$reopen_blockdev]);
+
+ # delete old file blockdev
+ # seem that the old file block is autoremoved after reopen if the file nodename is autogenerad with #block ?
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
+
+ unlink($path);
+
+ #skip_underlay: lvm will be renamed later in Storage::volume_snaphot
+ return if $skip_underlay;
+
+ #rename underlay
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvrename $path to $target_path\n";
+ run_command(
+ ['/sbin/lvrename', $path, $target_path],
+ errmsg => "lvrename $path to $target_path error",
+ );
+ }
+}
+
+sub blockdev_commit {
+ my ($storecfg, $vmid, $deviceid, $drive, $top_path, $base_path) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+
+ #untaint
+ if ($top_path =~ m/^(\S+)$/) {
+ $top_path = $1;
+ }
+
+ print "block-commit top:$top_path to base:$base_path\n";
+ my $job_id = "commit-$deviceid";
+ my $jobs = {};
+
+ my $base_node = find_blockdev_node($nodes, $base_path, 'fmt');
+ my $top_node = find_blockdev_node($nodes, $top_path, 'fmt');
+
+ my $options = { 'job-id' => $job_id, device => $deviceid };
+ $options->{'top-node'} = $top_node->{'node-name'};
+ $options->{'base-node'} = $base_node->{'node-name'};
+
+
+ mon_cmd($vmid, 'block-commit', %$options);
+ $jobs->{$job_id} = {};
+
+ qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'commit');
+
+ #remove fmt-blockdev, file-blockdev && file
+ my $fmt_node = find_blockdev_node($nodes, $top_path, 'fmt');
+ my $fmt_nodename = $fmt_node->{'node-name'};
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_nodename) };
+
+ my $file_node = find_blockdev_node($nodes, $top_path, 'file');
+ my $file_nodename = $file_node->{'node-name'};
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
+
+
+
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvremove $top_path\n";
+ run_command(
+ ['/sbin/lvremove', '-f', $top_path],
+ errmsg => "lvremove $top_path",
+ );
+ } else {
+ unlink($top_path);
+ }
+
+}
+
+sub blockdev_live_commit {
+ my ($storecfg, $vmid, $deviceid, $drive, $current_path, $snapshot_path) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+
+ #untaint
+ if ($current_path =~ m/^(\S+)$/) {
+ $current_path = $1;
+ }
+
+ print "live block-commit top:$current_path to base:$snapshot_path\n";
+ my $job_id = "commit-$deviceid";
+ my $jobs = {};
+
+ my $snapshot_node = find_blockdev_node($nodes, $snapshot_path, 'fmt');
+ my $snapshot_file_node = find_blockdev_node($nodes, $current_path, 'file');
+ my $current_node = find_blockdev_node($nodes, $current_path, 'fmt');
+
+ my $opts = { 'job-id' => $job_id,
+ device => $deviceid,
+ 'base-node' => $snapshot_node->{'node-name'},
+ replaces => $current_node->{'node-name'}
+ };
+ mon_cmd($vmid, "block-commit", %$opts);
+ $jobs->{$job_id} = {};
+
+ qemu_drive_mirror_monitor ($vmid, undef, $jobs, 'complete', 0, 'commit');
+
+ eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $current_node->{'node-name'}) };
+
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvremove $current_path\n";
+ run_command(
+ ['/sbin/lvremove', '-f', $current_path],
+ errmsg => "lvremove $current_path",
+ );
+ } else {
+ unlink($current_path);
+ }
+
+ return;
+
+}
+
sub qemu_volume_snapshot_delete {
- my ($vmid, $storecfg, $volid, $snap) = @_;
+ my ($vmid, $storecfg, $drive, $snap) = @_;
+ my $volid = $drive->{file};
my $running = check_running($vmid);
my $attached_deviceid;
@@ -4984,13 +5233,51 @@ sub qemu_volume_snapshot_delete {
});
}
- if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
- mon_cmd(
- $vmid,
- 'blockdev-snapshot-delete-internal-sync',
- device => $attached_deviceid,
- name => $snap,
- );
+ my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
+ if ($attached_deviceid && $do_snapshots_with_qemu) {
+
+ if ($do_snapshots_with_qemu == 2) {
+
+ my $path = PVE::Storage::path($storecfg, $volid);
+ my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+
+ my $snappath = $snapshots->{$snap}->{file};
+ return if !-e $snappath; #already deleted ?
+
+ my $parentsnap = $snapshots->{$snap}->{parent};
+ my $childsnap = $snapshots->{$snap}->{child};
+
+ my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
+ my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
+
+ #if first snapshot
+ if(!$parentsnap) {
+ print"delete first snapshot $childpath\n";
+ if($childpath eq $path) {
+ #if child is the current (last snapshot), we need to a live active-commit
+ print"commit first snapshot $snappath to current $path\n";
+ blockdev_live_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
+ print" rename $snappath to $path\n";
+ blockdev_current_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $path);
+ } else {
+ print"commit first snapshot $snappath to $childpath path\n";
+ blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
+ print" rename $snappath to $childpath\n";
+ blockdev_snap_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $childpath);
+ }
+ } else {
+ #intermediate snapshot, we just need to commit the snapshot
+ print"commit intermediate snapshot $snappath to $parentpath\n";
+ blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $parentpath, 'auto');
+ }
+ } else {
+ mon_cmd(
+ $vmid,
+ 'blockdev-snapshot-delete-internal-sync',
+ device => $attached_deviceid,
+ name => $snap,
+ );
+ }
} else {
PVE::Storage::volume_snapshot_delete(
$storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
@@ -8066,6 +8353,8 @@ sub do_snapshots_with_qemu {
return 1;
}
+ return 2 if $scfg->{snapext} || $scfg->{type} eq 'lvm' && $volid =~ m/\.(qcow2)/;
+
if ($volid =~ m/\.(qcow2|qed)$/){
return 1;
}
@@ -9169,6 +9458,38 @@ sub delete_ifaces_ipams_ips {
}
}
+sub find_blockdev_node {
+ my ($nodes, $path, $type) = @_;
+
+ my $found_nodeid = undef;
+ my $found_node = undef;
+ for my $nodeid (keys %$nodes) {
+ my $node = $nodes->{$nodeid};
+ if ($nodeid =~ m/^$type-(\S+)$/ && $node->{file} eq $path ) {
+ $found_node = $node;
+ last;
+ }
+ }
+ die "can't found nodeid for file $path\n" if !$found_node;
+ return $found_node;
+}
+
+sub find_parent_node {
+ my ($nodes, $backing_path) = @_;
+
+ my $found_nodeid = undef;
+ my $found_node = undef;
+ for my $nodeid (keys %$nodes) {
+ my $node = $nodes->{$nodeid};
+ if ($nodeid =~ m/^fmt-(\S+)$/ && $node->{backing_file} && $node->{backing_file} eq $backing_path) {
+ $found_node = $node;
+ last;
+ }
+ }
+ die "can't found nodeid for file $backing_path\n" if !$found_node;
+ return $found_node;
+}
+
sub find_fmt_nodename_drive {
my ($storecfg, $vmid, $drive, $nodes) = @_;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
@ 2025-01-08 13:27 ` Fabian Grünbichler
2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
0 siblings, 2 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 13:27 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> This is needed for external snapshot live commit,
> when the top blocknode is not the fmt-node.
> (in our case, the throttle-group node is the topnode)
so this is needed to workaround a limitation in block-commit? I think if we need this it should probably be submitted upstream for inclusion, or we provide our own copy of block-commit with it in the meantime?
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> ...052-block-commit-add-replaces-option.patch | 137 ++++++++++++++++++
> debian/patches/series | 1 +
> 2 files changed, 138 insertions(+)
> create mode 100644 debian/patches/pve/0052-block-commit-add-replaces-option.patch
>
> diff --git a/debian/patches/pve/0052-block-commit-add-replaces-option.patch b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
> new file mode 100644
> index 0000000..2488b5b
> --- /dev/null
> +++ b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
> @@ -0,0 +1,137 @@
> +From ae39fd3bb72db440cf380978af9bf5693c12ac6c Mon Sep 17 00:00:00 2001
> +From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> +Date: Wed, 11 Dec 2024 16:20:25 +0100
> +Subject: [PATCH] block-commit: add replaces option
> +
> +This use same code than drive-mirror for live commit, but the option
> +is not send currently.
> +
> +Allow to replaces a different node than the root node after the block-commit
> +(as we use throttle-group as root, and not the drive)
> +---
> + block/mirror.c | 4 ++--
> + block/replication.c | 2 +-
> + blockdev.c | 4 ++--
> + include/block/block_int-global-state.h | 4 +++-
> + qapi/block-core.json | 5 ++++-
> + qemu-img.c | 2 +-
> + 6 files changed, 13 insertions(+), 8 deletions(-)
> +
> +diff --git a/block/mirror.c b/block/mirror.c
> +index 2f12238..1a5e528 100644
> +--- a/block/mirror.c
> ++++ b/block/mirror.c
> +@@ -2086,7 +2086,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
> + int64_t speed, BlockdevOnError on_error,
> + const char *filter_node_name,
> + BlockCompletionFunc *cb, void *opaque,
> +- bool auto_complete, Error **errp)
> ++ bool auto_complete, const char *replaces, Error **errp)
> + {
> + bool base_read_only;
> + BlockJob *job;
> +@@ -2102,7 +2102,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
> + }
> +
> + job = mirror_start_job(
> +- job_id, bs, creation_flags, base, NULL, speed, 0, 0,
> ++ job_id, bs, creation_flags, base, replaces, speed, 0, 0,
> + MIRROR_LEAVE_BACKING_CHAIN, false,
> + on_error, on_error, true, cb, opaque,
> + &commit_active_job_driver, MIRROR_SYNC_MODE_FULL,
> +diff --git a/block/replication.c b/block/replication.c
> +index 0415a5e..debbe25 100644
> +--- a/block/replication.c
> ++++ b/block/replication.c
> +@@ -711,7 +711,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
> + s->commit_job = commit_active_start(
> + NULL, bs->file->bs, s->secondary_disk->bs,
> + JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
> +- NULL, replication_done, bs, true, errp);
> ++ NULL, replication_done, bs, true, NULL, errp);
> + bdrv_graph_rdunlock_main_loop();
> + break;
> + default:
> +diff --git a/blockdev.c b/blockdev.c
> +index cbe2243..349fb71 100644
> +--- a/blockdev.c
> ++++ b/blockdev.c
> +@@ -2435,7 +2435,7 @@ void qmp_block_commit(const char *job_id, const char *device,
> + const char *filter_node_name,
> + bool has_auto_finalize, bool auto_finalize,
> + bool has_auto_dismiss, bool auto_dismiss,
> +- Error **errp)
> ++ const char *replaces, Error **errp)
> + {
> + BlockDriverState *bs;
> + BlockDriverState *iter;
> +@@ -2596,7 +2596,7 @@ void qmp_block_commit(const char *job_id, const char *device,
> + job_id = bdrv_get_device_name(bs);
> + }
> + commit_active_start(job_id, top_bs, base_bs, job_flags, speed, on_error,
> +- filter_node_name, NULL, NULL, false, &local_err);
> ++ filter_node_name, NULL, NULL, false, replaces, &local_err);
> + } else {
> + BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);
> + if (bdrv_op_is_blocked(overlay_bs, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
> +diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
> +index f0c642b..194b580 100644
> +--- a/include/block/block_int-global-state.h
> ++++ b/include/block/block_int-global-state.h
> +@@ -115,6 +115,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
> + * @cb: Completion function for the job.
> + * @opaque: Opaque pointer value passed to @cb.
> + * @auto_complete: Auto complete the job.
> ++ * @replaces: Block graph node name to replace once the commit is done.
> + * @errp: Error object.
> + *
> + */
> +@@ -123,7 +124,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
> + int64_t speed, BlockdevOnError on_error,
> + const char *filter_node_name,
> + BlockCompletionFunc *cb, void *opaque,
> +- bool auto_complete, Error **errp);
> ++ bool auto_complete, const char *replaces,
> ++ Error **errp);
> + /*
> + * mirror_start:
> + * @job_id: The id of the newly-created job, or %NULL to use the
> +diff --git a/qapi/block-core.json b/qapi/block-core.json
> +index ff441d4..50564c7 100644
> +--- a/qapi/block-core.json
> ++++ b/qapi/block-core.json
> +@@ -2098,6 +2098,8 @@
> + # disappear from the query list without user intervention.
> + # Defaults to true. (Since 3.1)
> + #
> ++# @replaces: graph node name to be replaced base image node.
> ++#
> + # Features:
> + #
> + # @deprecated: Members @base and @top are deprecated. Use @base-node
> +@@ -2125,7 +2127,8 @@
> + '*speed': 'int',
> + '*on-error': 'BlockdevOnError',
> + '*filter-node-name': 'str',
> +- '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
> ++ '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
> ++ '*replaces': 'str' },
> + 'allow-preconfig': true }
> +
> + ##
> +diff --git a/qemu-img.c b/qemu-img.c
> +index a6c88e0..f6c59bc 100644
> +--- a/qemu-img.c
> ++++ b/qemu-img.c
> +@@ -1079,7 +1079,7 @@ static int img_commit(int argc, char **argv)
> +
> + commit_active_start("commit", bs, base_bs, JOB_DEFAULT, rate_limit,
> + BLOCKDEV_ON_ERROR_REPORT, NULL, common_block_job_cb,
> +- &cbi, false, &local_err);
> ++ &cbi, false, NULL, &local_err);
> + if (local_err) {
> + goto done;
> + }
> +--
> +2.39.5
> +
> diff --git a/debian/patches/series b/debian/patches/series
> index 93c97bf..e604a23 100644
> --- a/debian/patches/series
> +++ b/debian/patches/series
> @@ -92,3 +92,4 @@ pve/0048-PVE-backup-fixup-error-handling-for-fleecing.patch
> pve/0049-PVE-backup-factor-out-setting-up-snapshot-access-for.patch
> pve/0050-PVE-backup-save-device-name-in-device-info-structure.patch
> pve/0051-PVE-backup-include-device-name-in-error-when-setting.patch
> +pve/0052-block-commit-add-replaces-option.patch
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
@ 2025-01-08 14:17 ` Fabian Grünbichler
2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:17 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> The blockdev chain is:
> -throttle-group-node (drive-(ide|scsi|virtio)x)
> - format-node (fmt-drive-x)
> - file-node (file-drive -x)
>
> fixme: implement iscsi:// path
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 351 +++++++++++++++++++++++++++++++---------------
> 1 file changed, 237 insertions(+), 114 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 8192599a..2832ed09 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1464,7 +1464,8 @@ sub print_drivedevice_full {
> } else {
> $device .= ",bus=ahci$controller.$unit";
> }
> - $device .= ",drive=drive-$drive_id,id=$drive_id";
> + $device .= ",id=$drive_id";
> + $device .= ",drive=drive-$drive_id" if $device_type ne 'cd' || $drive->{file} ne 'none';
is this just because you remove the whole drive when ejecting? not sure whether that is really needed..
>
> if ($device_type eq 'hd') {
> if (my $model = $drive->{model}) {
> @@ -1490,6 +1491,13 @@ sub print_drivedevice_full {
> $device .= ",serial=$serial";
> }
>
> + my $writecache = $drive->{cache} && $drive->{cache} =~ /^(?:none|writeback|unsafe)$/ ? "on" : "off";
> + $device .= ",write-cache=$writecache" if $drive->{media} && $drive->{media} ne 'cdrom';
> +
> + my @qemu_drive_options = qw(heads secs cyls trans rerror werror);
> + foreach my $o (@qemu_drive_options) {
> + $device .= ",$o=$drive->{$o}" if defined($drive->{$o});
> + }
>
> return $device;
> }
> @@ -1539,145 +1547,256 @@ my sub drive_uses_cache_direct {
> return $cache_direct;
> }
>
> -sub print_drive_commandline_full {
> - my ($storecfg, $vmid, $drive, $live_restore_name, $io_uring) = @_;
> +sub print_drive_throttle_group {
> + my ($drive) = @_;
> + #command line can't use the structured json limits option,
> + #so limit params need to use with x- as it's unstable api
this comment should be below the early return, or above the whole sub.
> + return if drive_is_cdrom($drive) && $drive->{file} eq 'none';
is this needed if we keep empty cdrom drives around like before? I know throttling practically makes no sense in that case, but it might make the code in general more simple?
>
> - my $path;
> - my $volid = $drive->{file};
> my $drive_id = get_drive_id($drive);
>
> + my $throttle_group = "throttle-group,id=throttle-drive-$drive_id";
> + foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
> + my ($dir, $qmpname) = @$type;
> +
> + if (my $v = $drive->{"mbps$dir"}) {
> + $throttle_group .= ",x-bps$qmpname=".int($v*1024*1024);
> + }
> + if (my $v = $drive->{"mbps${dir}_max"}) {
> + $throttle_group .= ",x-bps$qmpname-max=".int($v*1024*1024);
> + }
> + if (my $v = $drive->{"bps${dir}_max_length"}) {
> + $throttle_group .= ",x-bps$qmpname-max-length=$v";
> + }
> + if (my $v = $drive->{"iops${dir}"}) {
> + $throttle_group .= ",x-iops$qmpname=$v";
> + }
> + if (my $v = $drive->{"iops${dir}_max"}) {
> + $throttle_group .= ",x-iops$qmpname-max=$v";
> + }
> + if (my $v = $drive->{"iops${dir}_max_length"}) {
> + $throttle_group .= ",x-iops$qmpname-max-length=$v";
> + }
> + }
> +
> + return $throttle_group;
> +}
> +
> +sub generate_file_blockdev {
> + my ($storecfg, $drive, $nodename) = @_;
> +
> + my $volid = $drive->{file};
> my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
> - my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
>
> - if (drive_is_cdrom($drive)) {
> - $path = get_iso_path($storecfg, $vmid, $volid);
> - die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
> + my $scfg = undef;
> + my $path = $volid;
I think this should only happen if the parse_volume_id above told us this is an absolute path and not a PVE-managed volume..
> + if($storeid && $storeid ne 'nbd') {
this is wrong.. I guess it's also somewhat wrong in the old qemu_drive_mirror code.. we should probably check using a more specific RE that the "volid" is an NBD URI, and not attempt to parse it as a regular volid in that case..
> + $scfg = PVE::Storage::storage_config($storecfg, $storeid);
> + $path = PVE::Storage::path($storecfg, $volid);
> + }
> +
> + my $blockdev = {};
> +
> + if ($path =~ m/^rbd:(\S+)$/) {
> +
> + $blockdev->{driver} = 'rbd';
> +
> + my @rbd_options = split(/:/, $1);
> + my $keyring = undef;
> + for my $option (@rbd_options) {
> + if ($option =~ m/^(\S+)=(\S+)$/) {
> + my $key = $1;
> + my $value = $2;
> + $blockdev->{'auth-client-required'} = [$value] if $key eq 'auth_supported';
> + $blockdev->{'conf'} = $value if $key eq 'conf';
> + $blockdev->{'user'} = $value if $key eq 'id';
> + $keyring = $value if $key eq 'keyring';
> + if ($key eq 'mon_host') {
> + my $server = [];
> + my @mons = split(';', $value);
> + for my $mon (@mons) {
> + my ($host, $port) = PVE::Tools::parse_host_and_port($mon);
> + $port = '3300' if !$port;
> + push @$server, { host => $host, port => $port };
> + }
> + $blockdev->{server} = $server;
> + }
> + } elsif ($option =~ m|^(\S+)/(\S+)$|){
> + $blockdev->{pool} = $1;
> + my $image = $2;
> +
> + if($image =~ m|^(\S+)/(\S+)$|) {
> + $blockdev->{namespace} = $1;
> + $blockdev->{image} = $2;
> + } else {
> + $blockdev->{image} = $image;
> + }
> + }
> + }
> +
> + if($keyring && $blockdev->{server}) {
> + #qemu devs are removed passing arbitrary values to blockdev object, and don't have added
> + #keyring to the list of allowed keys. It need to be defined in the store ceph.conf.
> + #https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02676.html
> + #another way could be to simply patch qemu to allow the key
I think we either want to allow the keys we need in Qemu (and upstream that), or we want to write the config out to a temporary config and clean that up after Qemu has read its contents..
> + my $ceph_conf = "/etc/pve/priv/ceph/${storeid}.conf";
this file is already taken for external Ceph clusters, we can't just re-use for this purpose without a lot of side effects I think..
> + $blockdev->{conf} = $ceph_conf;
> + if (!-e $ceph_conf) {
> + my $content = "[global]\nkeyring = $keyring\n";
> + PVE::Tools::file_set_contents($ceph_conf, $content, 0400);
> + }
> + }
> + } elsif ($path =~ m/^nbd:(\S+):(\d+):exportname=(\S+)$/) {
> + my $server = { type => 'inet', host => $1, port => $2 };
> + $blockdev = { driver => 'nbd', server => $server, export => $3 };
> + } elsif ($path =~ m/^nbd:unix:(\S+):exportname=(\S+)$/) {
> + my $server = { type => 'unix', path => $1 };
> + $blockdev = { driver => 'nbd', server => $server, export => $2 };
> + } elsif ($path =~ m|^gluster(\+(tcp\|unix\|rdma))?://(.*)/(.*)/(images/(\S+)/(\S+))$|) {
> + my $protocol = $2 ? $2 : 'inet';
> + $protocol = 'inet' if $protocol eq 'tcp';
> + my $server = [{ type => $protocol, host => $3, port => '24007' }];
> + $blockdev = { driver => 'gluster', server => $server, volume => $4, path => $5 };
> + } elsif ($path =~ m/^\/dev/) {
> + my $driver = drive_is_cdrom($drive) ? 'host_cdrom' : 'host_device';
> + $blockdev = { driver => $driver, filename => $path };
> + } elsif ($path =~ m/^\//) {
> + $blockdev = { driver => 'file', filename => $path};
> } else {
> - if ($storeid) {
> - $path = PVE::Storage::path($storecfg, $volid);
> - } else {
> - $path = $volid;
> + die "unsupported path: $path\n";
> + #fixme
> + #'{"driver":"iscsi","portal":"iscsi.example.com:3260","target":"demo-target","lun":3,"transport":"tcp"}'
> + }
> +
> + my $cache_direct = drive_uses_cache_direct($drive, $scfg);
> + my $cache = {};
> + $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
> + $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
> + $blockdev->{cache} = $cache;
> +
> + ##aio
> + if($blockdev->{filename}) {
> + $drive->{aio} = 'threads' if drive_is_cdrom($drive);
> + my $aio = $drive->{aio};
> + if (!$aio) {
> + if (storage_allows_io_uring_default($scfg, $cache_direct)) {
> + # io_uring supports all cache modes
> + $aio = "io_uring";
> + } else {
> + # aio native works only with O_DIRECT
> + if($cache_direct) {
> + $aio = "native";
> + } else {
> + $aio = "threads";
> + }
> + }
> }
> + $blockdev->{aio} = $aio;
> }
>
> - # For PVE-managed volumes, use the format from the storage layer and prevent overrides via the
> - # drive's 'format' option. For unmanaged volumes, fallback to 'raw' to avoid auto-detection by
> - # QEMU. For the special case 'none' (get_iso_path() returns an empty $path), there should be no
> - # format or QEMU won't start.
> - my $format;
> - if (drive_is_cdrom($drive) && !$path) {
> - # no format
> - } elsif ($storeid) {
> - $format = checked_volume_format($storecfg, $volid);
> + ##discard && detect-zeroes
> + my $discard = 'ignore';
> + if($drive->{discard}) {
> + $discard = $drive->{discard};
> + $discard = 'unmap' if $discard eq 'on';
> + }
> + $blockdev->{discard} = $discard if !drive_is_cdrom($drive);
>
> - if ($drive->{format} && $drive->{format} ne $format) {
> - die "drive '$drive->{interface}$drive->{index}' - volume '$volid'"
> - ." - 'format=$drive->{format}' option different from storage format '$format'\n";
> - }
> + my $detectzeroes;
nit: detect_zeroes
> + if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
> + $detectzeroes = 'off';
> + } elsif ($drive->{discard}) {
> + $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
> } else {
> - $format = $drive->{format} // 'raw';
> + # This used to be our default with discard not being specified:
> + $detectzeroes = 'on';
> }
> + $blockdev->{'detect-zeroes'} = $detectzeroes if !drive_is_cdrom($drive);
> + $blockdev->{'node-name'} = $nodename if $nodename;
this last line could be a lot higher up?
>
> - my $is_rbd = $path =~ m/^rbd:/;
> + return $blockdev;
> +}
>
> - my $opts = '';
> - my @qemu_drive_options = qw(heads secs cyls trans media cache rerror werror aio discard);
> - foreach my $o (@qemu_drive_options) {
> - $opts .= ",$o=$drive->{$o}" if defined($drive->{$o});
> - }
> +sub generate_format_blockdev {
> + my ($storecfg, $drive, $nodename, $file, $force_readonly) = @_;
>
> - # snapshot only accepts on|off
> - if (defined($drive->{snapshot})) {
> - my $v = $drive->{snapshot} ? 'on' : 'off';
> - $opts .= ",snapshot=$v";
> - }
> + my $volid = $drive->{file};
> + my $scfg = undef;
> + my $path = $volid;
path is not used at all, other than being conditionally overwritten below..
> + my $format = $drive->{format};
> + $format //= "raw";
the format handling here is very sensitive, and I think this broke it. see the big comment this patch removed ;)
short summary: for PVE-managed volumes we want the format from the storage layer (via checked_volume_format). if the drive has a format set that disagrees, that is a hard error. for absolute paths we us the format from the drive with a fallback to raw.
>
> - if (defined($drive->{ro})) { # ro maps to QEMUs `readonly`, which accepts `on` or `off` only
> - $opts .= ",readonly=" . ($drive->{ro} ? 'on' : 'off');
> - }
> + my $drive_id = get_drive_id($drive);
>
> - foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
> - my ($dir, $qmpname) = @$type;
> - if (my $v = $drive->{"mbps$dir"}) {
> - $opts .= ",throttling.bps$qmpname=".int($v*1024*1024);
> - }
> - if (my $v = $drive->{"mbps${dir}_max"}) {
> - $opts .= ",throttling.bps$qmpname-max=".int($v*1024*1024);
> - }
> - if (my $v = $drive->{"bps${dir}_max_length"}) {
> - $opts .= ",throttling.bps$qmpname-max-length=$v";
> - }
> - if (my $v = $drive->{"iops${dir}"}) {
> - $opts .= ",throttling.iops$qmpname=$v";
> - }
> - if (my $v = $drive->{"iops${dir}_max"}) {
> - $opts .= ",throttling.iops$qmpname-max=$v";
> - }
> - if (my $v = $drive->{"iops${dir}_max_length"}) {
> - $opts .= ",throttling.iops$qmpname-max-length=$v";
> - }
> + if ($drive->{zeroinit}) {
> + #fixme how to handle zeroinit ? insert special blockdev filter ?
> }
>
> - if ($live_restore_name) {
> - $format = "rbd" if $is_rbd;
> - die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
> - if !$format;
> - $opts .= ",format=alloc-track,file.driver=$format";
> - } elsif ($format) {
> - $opts .= ",format=$format";
> + my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
so I guess this should never be called with nbd-URI-volids?
nit: $volname is not used anywhere, so can be removed..
> +
> + if($storeid) {
> + $scfg = PVE::Storage::storage_config($storecfg, $storeid);
> + $format = checked_volume_format($storecfg, $volid);
this is missing the comparison against $drive->{format}
> + $path = PVE::Storage::path($storecfg, $volid);
this is not used anywhere..
> }
>
> + my $readonly = defined($drive->{ro}) || $force_readonly ? JSON::true : JSON::false;
> +
> + #libvirt define cache option on both format && file
> my $cache_direct = drive_uses_cache_direct($drive, $scfg);
> + my $cache = {};
> + $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
> + $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
so we have the same code in two places? should probably be a helper then to not have them go out of sync..
>
> - $opts .= ",cache=none" if !$drive->{cache} && $cache_direct;
> + my $blockdev = { driver => $format, file => $file, cache => $cache, 'read-only' => $readonly };
> + $blockdev->{'node-name'} = $nodename if $nodename;
>
> - if (!$drive->{aio}) {
> - if ($io_uring && storage_allows_io_uring_default($scfg, $cache_direct)) {
> - # io_uring supports all cache modes
> - $opts .= ",aio=io_uring";
> - } else {
> - # aio native works only with O_DIRECT
> - if($cache_direct) {
> - $opts .= ",aio=native";
> - } else {
> - $opts .= ",aio=threads";
> - }
> - }
> - }
> + return $blockdev;
>
> - if (!drive_is_cdrom($drive)) {
> - my $detectzeroes;
> - if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
> - $detectzeroes = 'off';
> - } elsif ($drive->{discard}) {
> - $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
> - } else {
> - # This used to be our default with discard not being specified:
> - $detectzeroes = 'on';
> - }
> +}
>
> - # note: 'detect-zeroes' works per blockdev and we want it to persist
> - # after the alloc-track is removed, so put it on 'file' directly
> - my $dz_param = $live_restore_name ? "file.detect-zeroes" : "detect-zeroes";
> - $opts .= ",$dz_param=$detectzeroes" if $detectzeroes;
> - }
> +sub generate_drive_blockdev {
> + my ($storecfg, $vmid, $drive, $force_readonly, $live_restore_name) = @_;
>
> - if ($live_restore_name) {
> - $opts .= ",backing=$live_restore_name";
> - $opts .= ",auto-remove=on";
> + my $path;
> + my $volid = $drive->{file};
> + my $format = $drive->{format};
this is only used once below
> + my $drive_id = get_drive_id($drive);
> +
> + my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
> + my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
> +
> + my $blockdevs = [];
> +
> + if (drive_is_cdrom($drive)) {
> + die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
> +
> + $path = get_iso_path($storecfg, $vmid, $volid);
> + return if !$path;
> + $force_readonly = 1;
> }
>
> - # my $file_param = $live_restore_name ? "file.file.filename" : "file";
> - my $file_param = "file";
> + my $file_nodename = "file-drive-$drive_id";
> + my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
> + my $fmt_nodename = "fmt-drive-$drive_id";
> + my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
> +
> + my $blockdev_live_restore = undef;
> if ($live_restore_name) {
> - # non-rbd drivers require the underlying file to be a separate block
> - # node, so add a second .file indirection
> - $file_param .= ".file" if !$is_rbd;
> - $file_param .= ".filename";
> + die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
> + if !$format;
for this check, but it is not actually set anywhere here.. so is something missing or can the check go?
> +
> + $blockdev_live_restore = { 'node-name' => "liverestore-drive-$drive_id",
> + backing => $live_restore_name,
> + 'auto-remove' => 'on', format => "alloc-track",
> + file => $blockdev_format };
> }
> - my $pathinfo = $path ? "$file_param=$path," : '';
>
> - return "${pathinfo}if=none,id=drive-$drive->{interface}$drive->{index}$opts";
> + #this is the topfilter entry point, use $drive-drive_id as nodename
> + my $blockdev_throttle = { driver => "throttle", 'node-name' => "drive-$drive_id", 'throttle-group' => "throttle-drive-$drive_id" };
> + #put liverestore filter between throttle && format filter
> + $blockdev_throttle->{file} = $live_restore_name ? $blockdev_live_restore : $blockdev_format;
> + return $blockdev_throttle,
> }
>
> sub print_pbs_blockdev {
> @@ -4091,13 +4210,13 @@ sub config_to_command {
> push @$devices, '-blockdev', $live_restore->{blockdev};
> }
>
> - my $drive_cmd = print_drive_commandline_full(
> - $storecfg, $vmid, $drive, $live_blockdev_name, min_version($kvmver, 6, 0));
> -
> - # extra protection for templates, but SATA and IDE don't support it..
> - $drive_cmd .= ',readonly=on' if drive_is_read_only($conf, $drive);
> + my $throttle_group = print_drive_throttle_group($drive);
> + push @$devices, '-object', $throttle_group if $throttle_group;
>
> - push @$devices, '-drive',$drive_cmd;
> +# # extra protection for templates, but SATA and IDE don't support it..
> + my $force_readonly = drive_is_read_only($conf, $drive);
> + my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive, $force_readonly, $live_blockdev_name);
> + push @$devices, '-blockdev', encode_json_ordered($blockdev) if $blockdev;
> push @$devices, '-device', print_drivedevice_full(
> $storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
> });
> @@ -8986,4 +9105,8 @@ sub delete_ifaces_ipams_ips {
> }
> }
>
> +sub encode_json_ordered {
> + return JSON->new->canonical->allow_nonref->encode( $_[0] );
> +}
this is only used in a single place..
> +
> 1;
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
@ 2025-01-08 14:26 ` Fabian Grünbichler
2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:26 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> fixme/testme :
> PVE/VZDump/QemuServer.pm: eval { PVE::QemuServer::qemu_drivedel($vmid, "tpmstate0-backup"); };
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 64 +++++++++++++++++++++++++++++++++--------------
> 1 file changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2832ed09..baf78ec0 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1582,6 +1582,42 @@ sub print_drive_throttle_group {
> return $throttle_group;
> }
>
> +sub generate_throttle_group {
> + my ($drive) = @_;
> +
> + my $drive_id = get_drive_id($drive);
> +
> + my $throttle_group = { id => "throttle-drive-$drive_id" };
> + my $limits = {};
> +
> + foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
> + my ($dir, $qmpname) = @$type;
> +
> + if (my $v = $drive->{"mbps$dir"}) {
> + $limits->{"bps$qmpname"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"mbps${dir}_max"}) {
> + $limits->{"bps$qmpname-max"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"bps${dir}_max_length"}) {
> + $limits->{"bps$qmpname-max-length"} = int($v)
> + }
> + if (my $v = $drive->{"iops${dir}"}) {
> + $limits->{"iops$qmpname"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max"}) {
> + $limits->{"iops$qmpname-max"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max_length"}) {
> + $limits->{"iops$qmpname-max-length"} = int($v);
> + }
> + }
> +
> + $throttle_group->{limits} = $limits;
> +
> + return $throttle_group;
this and the corresponding print sub are exactly the same, so the print sub could call this and join the limits with the `x-` prefix added? how does this interact with the qemu_block_set_io_throttle helper used when updating the limits at runtime?
> +}
> +
> sub generate_file_blockdev {
> my ($storecfg, $drive, $nodename) = @_;
>
> @@ -4595,32 +4631,22 @@ sub qemu_iothread_del {
> }
>
> sub qemu_driveadd {
> - my ($storecfg, $vmid, $device) = @_;
> + my ($storecfg, $vmid, $drive) = @_;
>
> - my $kvmver = get_running_qemu_version($vmid);
> - my $io_uring = min_version($kvmver, 6, 0);
> - my $drive = print_drive_commandline_full($storecfg, $vmid, $device, undef, $io_uring);
> - $drive =~ s/\\/\\\\/g;
> - my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"", 60);
> -
> - # If the command succeeds qemu prints: "OK"
> - return 1 if $ret =~ m/OK/s;
> + my $drive_id = get_drive_id($drive);
> + my $throttle_group = generate_throttle_group($drive);
do we always need a throttle group? or would we benefit from only adding it when limits are set, and skip that node when I/O is unlimited?
> + mon_cmd($vmid, 'object-add', "qom-type" => "throttle-group", %$throttle_group);
>
> - die "adding drive failed: $ret\n";
> + my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
> + mon_cmd($vmid, 'blockdev-add', %$blockdev, timeout => 10 * 60);
> + return 1;
> }
>
> sub qemu_drivedel {
> my ($vmid, $deviceid) = @_;
>
> - my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_del drive-$deviceid", 10 * 60);
> - $ret =~ s/^\s+//;
> -
> - return 1 if $ret eq "";
> -
> - # NB: device not found errors mean the drive was auto-deleted and we ignore the error
> - return 1 if $ret =~ m/Device \'.*?\' not found/s;
> -
> - die "deleting drive $deviceid failed : $ret\n";
> + mon_cmd($vmid, 'blockdev-del', 'node-name' => "drive-$deviceid", timeout => 10 * 60);
> + mon_cmd($vmid, 'object-del', id => "throttle-drive-$deviceid");
> }
>
> sub qemu_deviceaddverify {
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
@ 2025-01-08 14:31 ` Fabian Grünbichler
0 siblings, 0 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:31 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Look at qdev value, as cdrom drives can be empty
> without any inserted media
is this needed if we don't drive_del the cdrom drive when ejecting the medium?
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index baf78ec0..3b33fd7d 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4425,10 +4425,9 @@ sub vm_devices_list {
> }
>
> my $resblock = mon_cmd($vmid, 'query-block');
> - foreach my $block (@$resblock) {
> - if($block->{device} =~ m/^drive-(\S+)/){
> - $devices->{$1} = 1;
> - }
> + $resblock = { map { $_->{qdev} => $_ } $resblock->@* };
> + foreach my $blockid (keys %$resblock) {
> + $devices->{$blockid} = 1;
> }
>
> my $resmice = mon_cmd($vmid, 'query-mice');
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
@ 2025-01-08 14:34 ` Fabian Grünbichler
0 siblings, 0 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:34 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 3b33fd7d..758c8240 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -5694,7 +5694,10 @@ sub vmconfig_update_disk {
> } else { # cdrom
>
> if ($drive->{file} eq 'none') {
> - mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
> + mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
> + mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
> + qemu_drivedel($vmid, $opt);
the drivedel here
> +
> if (drive_is_cloudinit($old_drive)) {
> vmconfig_register_unused_drive($storecfg, $vmid, $conf, $old_drive);
> }
> @@ -5702,14 +5705,16 @@ sub vmconfig_update_disk {
> my $path = get_iso_path($storecfg, $vmid, $drive->{file});
>
> # force eject if locked
> - mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
> + mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
> + mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
> + eval { qemu_drivedel($vmid, $opt) };
and here
>
> if ($path) {
> - mon_cmd($vmid, "blockdev-change-medium",
> - id => "$opt", filename => "$path");
> + qemu_driveadd($storecfg, $vmid, $drive);
and the driveadd here seem kind of weird..
are they really needed (also see comments on other patches)?
> + mon_cmd($vmid, "blockdev-insert-medium", id => $opt, 'node-name' => "drive-$opt");
> + mon_cmd($vmid, "blockdev-close-tray", id => $opt);
> }
> }
> -
> return 1;
> }
> }
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
@ 2025-01-08 15:19 ` Fabian Grünbichler
0 siblings, 0 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 15:19 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuMigrate.pm | 2 +-
> PVE/QemuServer.pm | 106 +++++++++++++++++++++++++++++++++++----------
> 2 files changed, 83 insertions(+), 25 deletions(-)
>
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index ed5ede30..88627ce4 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -1134,7 +1134,7 @@ sub phase2 {
> my $bitmap = $target->{bitmap};
>
> $self->log('info', "$drive: start migration to $nbd_uri");
> - PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
> + PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $source_drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
> }
>
> if (PVE::QemuServer::QMPHelpers::runs_at_least_qemu_version($vmid, 8, 2)) {
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 6bebb906..3d7c41ee 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -8184,59 +8184,85 @@ sub qemu_img_convert {
> }
>
> sub qemu_drive_mirror {
> - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
> + my ($vmid, $driveid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
the $driveid is contained in $drive (in the form of index and interface). this would still be a breaking change since $drive before was the $driveid, and now it's the parsed drive ;)
>
> $jobs = {} if !$jobs;
> + my $deviceid = "drive-$driveid";
> + my $dst_format;
> + my $dst_path = $dst_volid;
> + my $jobid = "mirror-$deviceid";
> + $jobs->{$jobid} = {};
>
> - my $qemu_target;
> - my $format;
> - $jobs->{"drive-$drive"} = {};
> + my $storecfg = PVE::Storage::config();
>
> if ($dst_volid =~ /^nbd:/) {
> - $qemu_target = $dst_volid;
> - $format = "nbd";
> + $dst_format = "nbd";
> } else {
> - my $storecfg = PVE::Storage::config();
> -
> - $format = checked_volume_format($storecfg, $dst_volid);
> -
> - my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
> -
> - $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
> + $dst_format = checked_volume_format($storecfg, $dst_volid);
> + $dst_path = PVE::Storage::path($storecfg, $dst_volid);
> + }
> +
> + # copy original drive config (aio,cache,discard,...)
> + my $dst_drive = dclone($drive);
> + $dst_drive->{format} = $dst_format;
> + $dst_drive->{file} = $dst_path;
> + $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
> + #improve: if target storage don't support aio uring,change it to default native
> + #and remove clone_disk_check_io_uring()
> +
> + #add new block device
> + my $nodes = get_blockdev_nodes($vmid);
> +
> + my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
> + my $target_file_blockdev = generate_file_blockdev($storecfg, $dst_drive, $target_file_nodename);
> + my $target_nodename = undef;
> +
> + if ($dst_format eq 'nbd') {
> + #nbd file don't have fmt
> + $target_nodename = $target_file_nodename;
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_file_blockdev);
> + } else {
> + $target_nodename = $target_fmt_nodename;
> + my $target_fmt_blockdev = generate_format_blockdev($storecfg, $dst_drive, $target_fmt_nodename, $target_file_blockdev);
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> }
>
> + #we replace the original src_fmt node in the blockdev graph
> + my $src_fmt_nodename = find_fmt_nodename_drive($storecfg, $vmid, $drive, $nodes);
> my $opts = {
> + 'job-id' => $jobid,
> timeout => 10,
> - device => "drive-$drive",
> - mode => "existing",
> + device => $deviceid,
> + replaces => $src_fmt_nodename,
> sync => "full",
> - target => $qemu_target,
> + target => $target_nodename,
> 'auto-dismiss' => JSON::false,
> };
> - $opts->{format} = $format if $format;
>
> if (defined($src_bitmap)) {
> $opts->{sync} = 'incremental';
> - $opts->{bitmap} = $src_bitmap;
> + $opts->{bitmap} = $src_bitmap; ##FIXME: how to handle bitmap ? special proxmox patch ?
> print "drive mirror re-using dirty bitmap '$src_bitmap'\n";
> }
>
> if (defined($bwlimit)) {
> $opts->{speed} = $bwlimit * 1024;
> - print "drive mirror is starting for drive-$drive with bandwidth limit: ${bwlimit} KB/s\n";
> + print "drive mirror is starting for $deviceid with bandwidth limit: ${bwlimit} KB/s\n";
> } else {
> - print "drive mirror is starting for drive-$drive\n";
> + print "drive mirror is starting for $deviceid\n";
> }
>
> # if a job already runs for this device we get an error, catch it for cleanup
> - eval { mon_cmd($vmid, "drive-mirror", %$opts); };
> + eval { mon_cmd($vmid, "blockdev-mirror", %$opts); };
> +
> if (my $err = $@) {
> eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
> + #FIXME: delete blockdev after job cancel
wouldn't we also need to keep track of the device IDs and pass those to the monitor invocation below? if the block job fails or gets canceled, we also need cleanup there..
> warn "$@\n" if $@;
> die "mirroring error: $err\n";
> }
> -
> - qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga);
> + qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga, 'mirror');
> }
>
> # $completion can be either
> @@ -8595,7 +8621,7 @@ sub clone_disk {
>
> my $sparseinit = PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $newvolid);
> if ($use_drive_mirror) {
> - qemu_drive_mirror($vmid, $src_drivename, $newvolid, $newvmid, $sparseinit, $jobs,
> + qemu_drive_mirror($vmid, $src_drivename, $drive, $newvolid, $newvmid, $sparseinit, $jobs,
> $completion, $qga, $bwlimit);
> } else {
> if ($dst_drivename eq 'efidisk0') {
> @@ -9130,6 +9156,38 @@ sub delete_ifaces_ipams_ips {
> }
> }
>
> +sub find_fmt_nodename_drive {
> + my ($storecfg, $vmid, $drive, $nodes) = @_;
> +
> + my $volid = $drive->{file};
> + my $format = checked_volume_format($storecfg, $volid);
$format is not used?
> + my $path = PVE::Storage::path($storecfg, $volid);
is this guaranteed to be stable? also across versions? and including external storage plugins?
> +
> + my $node = find_blockdev_node($nodes, $path, 'fmt');
that one is only added in a later patch.. but I don't think lookups by path are a good idea, we should probably have a deterministic node naming concept instead? e.g., encode the drive + snapshot name?
> + return $node->{'node-name'};
> +}
> +
> +sub get_blockdev_nextid {
> + my ($nodename, $nodes) = @_;
> + my $version = 0;
> + for my $nodeid (keys %$nodes) {
> + if ($nodeid =~ m/^$nodename-(\d+)$/) {
> + my $current_version = $1;
> + $version = $current_version if $current_version >= $version;
> + }
> + }
> + $version++;
> + return "$nodename-$version";
since we shouldn't ever have more than one job for a drive running (right?), couldn't we just have a deterministic name for this? that would also simplify cleanup, including cleanup of a failed cleanup ;)
> +}
> +
> +sub get_blockdev_nodes {
> + my ($vmid) = @_;
> +
> + my $nodes = PVE::QemuServer::Monitor::mon_cmd($vmid, "query-named-block-nodes");
> + $nodes = { map { $_->{'node-name'} => $_ } $nodes->@* };
> + return $nodes;
> +}
> +
> sub encode_json_ordered {
> return JSON->new->canonical->allow_nonref->encode( $_[0] );
> }
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
@ 2025-01-09 9:51 ` Fabian Grünbichler
0 siblings, 0 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 9:51 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> This was a limitation of drive-mirror, blockdev mirror is able
> to reopen image with a different aio
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 41 ++++++++++-------------------------------
> 1 file changed, 10 insertions(+), 31 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 3d7c41ee..dc12b38f 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -8207,8 +8207,16 @@ sub qemu_drive_mirror {
> $dst_drive->{format} = $dst_format;
> $dst_drive->{file} = $dst_path;
> $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
> - #improve: if target storage don't support aio uring,change it to default native
> - #and remove clone_disk_check_io_uring()
> +
> + #change aio if io_uring is not supported on target
> + if ($dst_drive->{aio} && $dst_drive->{aio} eq 'io_uring') {
> + my ($dst_storeid) = PVE::Storage::parse_volume_id($dst_drive->{file});
> + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
> + my $cache_direct = drive_uses_cache_direct($dst_drive, $dst_scfg);
> + if(!storage_allows_io_uring_default($dst_scfg, $cache_direct)) {
> + $dst_drive->{aio} = $cache_direct ? 'native' : 'threads';
> + }
> + }
couldn't/shouldn't we just handle this in generate_file_blockdev?
>
> #add new block device
> my $nodes = get_blockdev_nodes($vmid);
> @@ -8514,33 +8522,6 @@ sub qemu_drive_mirror_switch_to_active_mode {
> }
> }
>
> -# Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
> -# source, but some storages have problems with io_uring, sometimes even leading to crashes.
> -my sub clone_disk_check_io_uring {
> - my ($src_drive, $storecfg, $src_storeid, $dst_storeid, $use_drive_mirror) = @_;
> -
> - return if !$use_drive_mirror;
> -
> - # Don't complain when not changing storage.
> - # Assume if it works for the source, it'll work for the target too.
> - return if $src_storeid eq $dst_storeid;
> -
> - my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
> - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
> -
> - my $cache_direct = drive_uses_cache_direct($src_drive);
> -
> - my $src_uses_io_uring;
> - if ($src_drive->{aio}) {
> - $src_uses_io_uring = $src_drive->{aio} eq 'io_uring';
> - } else {
> - $src_uses_io_uring = storage_allows_io_uring_default($src_scfg, $cache_direct);
> - }
> -
> - die "target storage is known to cause issues with aio=io_uring (used by current drive)\n"
> - if $src_uses_io_uring && !storage_allows_io_uring_default($dst_scfg, $cache_direct);
> -}
> -
> sub clone_disk {
> my ($storecfg, $source, $dest, $full, $newvollist, $jobs, $completion, $qga, $bwlimit) = @_;
>
> @@ -8598,8 +8579,6 @@ sub clone_disk {
> $dst_format = 'raw';
> $size = PVE::QemuServer::Drive::TPMSTATE_DISK_SIZE;
> } else {
> - clone_disk_check_io_uring($drive, $storecfg, $src_storeid, $storeid, $use_drive_mirror);
> -
> $size = PVE::Storage::volume_size_info($storecfg, $drive->{file}, 10);
> }
> $newvolid = PVE::Storage::vdisk_alloc(
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2025-01-09 11:57 ` Fabian Grünbichler
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
0 siblings, 1 reply; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 11:57 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
it would be great if there'd be a summary of the design choices and a high level summary of what happens to the files and block-node-graph here. it's a bit hard to judge from the code below whether it would be possible to eliminate the dynamically named block nodes, for example ;)
a few more comments documenting the behaviour and ideally also some tests (mocking the QMP interactions?) would be nice
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuConfig.pm | 4 +-
> PVE/QemuServer.pm | 345 ++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 335 insertions(+), 14 deletions(-)
>
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index ffdf9f03..c17edb46 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -375,7 +375,7 @@ sub __snapshot_create_vol_snapshot {
>
> print "snapshotting '$device' ($drive->{file})\n";
>
> - PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
> + PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
> }
>
> sub __snapshot_delete_remove_drive {
> @@ -412,7 +412,7 @@ sub __snapshot_delete_vol_snapshot {
> my $storecfg = PVE::Storage::config();
> my $volid = $drive->{file};
>
> - PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
> + PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
>
> push @$unused, $volid;
> }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 3a3feadf..f29a8449 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4959,20 +4959,269 @@ sub qemu_block_resize {
> }
>
> sub qemu_volume_snapshot {
> - my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> + my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>
> + my $volid = $drive->{file};
> my $running = check_running($vmid);
> -
> - if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
> - mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;
> + if ($do_snapshots_with_qemu) {
> + if($do_snapshots_with_qemu == 2) {
this could do without the additional nesting:
if ($do_snapshots_with_qemu == 1) {
...
} elsif ($do_snapshots_with_qemu == 2) {
...
} else {
...
}
> + my $snap_path = PVE::Storage::path($storecfg, $volid, $snap);
> + my $path = PVE::Storage::path($storecfg, $volid);
> + blockdev_current_rename($storecfg, $vmid, $deviceid, $drive, $path, $snap_path, 1);
> + blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap);
what about error handling?
> + } else {
> + mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> + }
> } else {
> PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
this invocation here (continued below)
> }
> }
>
> +sub blockdev_external_snapshot {
> + my ($storecfg, $vmid, $deviceid, $drive, $snap) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $path = PVE::Storage::path($storecfg, $volid, $snap);
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
> + my $format_nodename = $format_node->{'node-name'};
> +
> + #preallocate add a new current file
> + my $new_current_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $new_current_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
okay, so here we have a dynamic node name because the desired target name is still occupied. could we rename the old block node first?
> + PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
(continued from above) and this invocation here are the same?? wouldn't this already create the snapshot on the storage layer? and didn't we just hardlink + reopen + unlink to transform the previous current volume into the snap volume?
should this maybe have been vdisk_alloc and it just works by accident?
> + my $new_file_blockdev = generate_file_blockdev($storecfg, $drive, $new_current_file_nodename);
> + my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_current_fmt_nodename, $new_file_blockdev);
> +
> + $new_fmt_blockdev->{backing} = undef;
generate_format_blockdev doesn't set backing? maybe this should be converted into an assertion?
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
> + mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename, overlay => $new_current_fmt_nodename);
> +}
> +
> +sub blockdev_snap_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $src_path, $target_path) = @_;
I think this whole thing needs more error handling and thought about how to recover from various points failing.. there's also quite some overlap with blockdev_current_rename, I wonder whether it would be possible to simplify the code further by merging the two? but see below, I think we can even get away with dropping this altogether if we switch from block-commit to block-stream..
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #copy the original drive param and change target file
> + my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
> +
> + my $src_fmt_node = find_blockdev_node($nodes, $src_path, 'fmt');
> + my $src_fmt_nodename = $src_fmt_node->{'node-name'};
> + my $src_file_node = find_blockdev_node($nodes, $src_path, 'file');
> + my $src_file_nodename = $src_file_node->{'node-name'};
> +
> + #untaint
> + if ($src_path =~ m/^(\S+)$/) {
> + $src_path = $1;
> + }
> + if ($target_path =~ m/^(\S+)$/) {
> + $target_path = $1;
> + }
shouldn't that have happened in the storage plugin?
> +
> + #create a hardlink
> + link($src_path, $target_path);
should this maybe be done by the storage plugin?
> +
> + #add new format blockdev
> + my $read_only = 1;
> + my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
> + $target_file_blockdev->{filename} = $target_path;
> + my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_fmt_nodename, $target_file_blockdev, $read_only);
> +
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> +
> + #reopen the parent node with different backing file
> + my $parent_fmt_node = find_parent_node($nodes, $src_path);
> + my $parent_fmt_nodename = $parent_fmt_node->{'node-name'};
> + my $parent_path = $parent_fmt_node->{file};
> + my $parent_file_node = find_blockdev_node($nodes, $parent_path, 'file');
> + my $parent_file_nodename = $parent_file_node->{'node-name'};
> + my $filenode_exist = 1;
> + $read_only = $parent_fmt_node->{ro};
> + my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_fmt_nodename, $parent_file_nodename, $read_only);
> + $parent_fmt_blockdev->{backing} = $target_fmt_nodename;
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
> +
> + #change backing-file in qcow2 metadatas
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_nodename, 'backing-file' => $target_path);
> +
> + # fileblockdev seem to be autoremoved, if it have been created online, but not if they are created at start with command line
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_nodename) };
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_nodename) };
> +
> + #delete old $path link
> + unlink($src_path);
and this
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $src_path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $src_path, $target_path],
> + errmsg => "lvrename $src_path to $target_path error",
> + );
> + }
and this as well?
> +}
> +
> +sub blockdev_current_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path, $skip_underlay) = @_;
> + ## rename current running image
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
here we could already incorporate the snapshot name, since we know it?
> +
> + my $file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
> + $file_blockdev->{filename} = $target_path;
> +
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
then we'd know this is always the "current" node, however we deterministically name it?
> + my $format_nodename = $format_node->{'node-name'};
> +
> + my $file_node = find_blockdev_node($nodes, $path, 'file');
same here
> + my $file_nodename = $file_node->{'node-name'};
> +
> + my $backingfile = $format_node->{image}->{'backing-filename'};
> + my $backing_node = $backingfile ? find_blockdev_node($nodes, $backingfile, 'fmt') : undef;
> +
> + #create a hardlink
> + link($path, $target_path);
this
> + #add new file blockdev
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$file_blockdev);
> +
> + #reopen the current fmt nodename with a new file nodename
> + my $reopen_blockdev = generate_format_blockdev($storecfg, $drive, $format_nodename, $target_file_nodename);
> + $reopen_blockdev->{backing} = $backing_node->{'node-name'};
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$reopen_blockdev]);
> +
> + # delete old file blockdev
> + # seem that the old file block is autoremoved after reopen if the file nodename is autogenerad with #block ?
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
> +
> + unlink($path);
> +
and this should be done by the storage layer I think? how does this interact with LVM? would we maybe want to mknod instead of hardlinking the device node? did you try whether a plain rename would also work (not sure - qemu already has an open FD to the file/blockdev, but I am not sure how LVM handles this ;))?
> + #skip_underlay: lvm will be renamed later in Storage::volume_snaphot
> + return if $skip_underlay;
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $path, $target_path],
> + errmsg => "lvrename $path to $target_path error",
> + );
> + }
> +}
> +
> +sub blockdev_commit {
see comments below for qemu_volume_snapshot_delete, I think this..
> + my ($storecfg, $vmid, $deviceid, $drive, $top_path, $base_path) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #untaint
> + if ($top_path =~ m/^(\S+)$/) {
> + $top_path = $1;
> + }
> +
> + print "block-commit top:$top_path to base:$base_path\n";
> + my $job_id = "commit-$deviceid";
> + my $jobs = {};
> +
> + my $base_node = find_blockdev_node($nodes, $base_path, 'fmt');
> + my $top_node = find_blockdev_node($nodes, $top_path, 'fmt');
> +
> + my $options = { 'job-id' => $job_id, device => $deviceid };
> + $options->{'top-node'} = $top_node->{'node-name'};
> + $options->{'base-node'} = $base_node->{'node-name'};
> +
> +
> + mon_cmd($vmid, 'block-commit', %$options);
> + $jobs->{$job_id} = {};
> +
> + qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'commit');
> +
> + #remove fmt-blockdev, file-blockdev && file
> + my $fmt_node = find_blockdev_node($nodes, $top_path, 'fmt');
> + my $fmt_nodename = $fmt_node->{'node-name'};
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_nodename) };
> +
> + my $file_node = find_blockdev_node($nodes, $top_path, 'file');
> + my $file_nodename = $file_node->{'node-name'};
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
> +
> +
> +
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvremove $top_path\n";
> + run_command(
> + ['/sbin/lvremove', '-f', $top_path],
> + errmsg => "lvremove $top_path",
> + );
> + } else {
> + unlink($top_path);
> + }
> +
> +}
> +
> +sub blockdev_live_commit {
and this can be replaced altogether with blockdev_stream..
> + my ($storecfg, $vmid, $deviceid, $drive, $current_path, $snapshot_path) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #untaint
> + if ($current_path =~ m/^(\S+)$/) {
> + $current_path = $1;
> + }
> +
> + print "live block-commit top:$current_path to base:$snapshot_path\n";
> + my $job_id = "commit-$deviceid";
> + my $jobs = {};
> +
> + my $snapshot_node = find_blockdev_node($nodes, $snapshot_path, 'fmt');
> + my $snapshot_file_node = find_blockdev_node($nodes, $current_path, 'file');
> + my $current_node = find_blockdev_node($nodes, $current_path, 'fmt');
> +
> + my $opts = { 'job-id' => $job_id,
> + device => $deviceid,
> + 'base-node' => $snapshot_node->{'node-name'},
> + replaces => $current_node->{'node-name'}
> + };
> + mon_cmd($vmid, "block-commit", %$opts);
> + $jobs->{$job_id} = {};
> +
> + qemu_drive_mirror_monitor ($vmid, undef, $jobs, 'complete', 0, 'commit');
> +
> + eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $current_node->{'node-name'}) };
> +
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvremove $current_path\n";
> + run_command(
> + ['/sbin/lvremove', '-f', $current_path],
> + errmsg => "lvremove $current_path",
> + );
> + } else {
> + unlink($current_path);
> + }
> +
> + return;
> +
> +}
> +
> sub qemu_volume_snapshot_delete {
> - my ($vmid, $storecfg, $volid, $snap) = @_;
> + my ($vmid, $storecfg, $drive, $snap) = @_;
>
> + my $volid = $drive->{file};
> my $running = check_running($vmid);
> my $attached_deviceid;
>
> @@ -4984,13 +5233,51 @@ sub qemu_volume_snapshot_delete {
> });
> }
>
> - if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
> - mon_cmd(
> - $vmid,
> - 'blockdev-snapshot-delete-internal-sync',
> - device => $attached_deviceid,
> - name => $snap,
> - );
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
my + post-if is forbidden, but otherwise, the check for $attached_deviceid could move into the $running condition above.
> + if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> + if ($do_snapshots_with_qemu == 2) {
these ifs could be collapsed as well..
> +
> + my $path = PVE::Storage::path($storecfg, $volid);
> + my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +
> + my $snappath = $snapshots->{$snap}->{file};
> + return if !-e $snappath; #already deleted ?
> +
> + my $parentsnap = $snapshots->{$snap}->{parent};
> + my $childsnap = $snapshots->{$snap}->{child};
> +
> + my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> + my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
> +
> + #if first snapshot
> + if(!$parentsnap) {
> + print"delete first snapshot $childpath\n";
> + if($childpath eq $path) {
> + #if child is the current (last snapshot), we need to a live active-commit
wouldn't it make more sense to use block-stream to merge the contents of the to-be-deleted snapshot into the current overlay? that way we wouldn't need to rename anything, AFAICT..
see https://www.qemu.org/docs/master/interop/live-block-operations.html#brief-overview-of-live-block-qmp-primitives
> + print"commit first snapshot $snappath to current $path\n";
> + blockdev_live_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
> + print" rename $snappath to $path\n";
> + blockdev_current_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $path);
> + } else {
> + print"commit first snapshot $snappath to $childpath path\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
> + print" rename $snappath to $childpath\n";
> + blockdev_snap_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $childpath);
same here, instead of commiting from the child into the to-be-deleted snapshot, and then renaming, why not just block-stream from the to-be-deleted snapshot into the child, and then discard the snapshot that is no longer needed?
> + }
> + } else {
> + #intermediate snapshot, we just need to commit the snapshot
> + print"commit intermediate snapshot $snappath to $parentpath\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $parentpath, 'auto');
commit is the wrong direction though?
if we have A -> B -> C, and B is deleted, the delta previously contained in B should be merged into C, not into A?
so IMHO a simple block-stream + removal of the to-be-deleted snapshot should be the right choice here as well?
that would effectively make all the paths identical AFAICT (stream from to-be-deleted snapshot to child, followed by deletion of the no longer used volume corresponding to the deleted/streamed snapshot) and no longer require any renaming..
> + }
> + } else {
> + mon_cmd(
> + $vmid,
> + 'blockdev-snapshot-delete-internal-sync',
> + device => $attached_deviceid,
> + name => $snap,
> + );
> + }
> } else {
> PVE::Storage::volume_snapshot_delete(
> $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> @@ -8066,6 +8353,8 @@ sub do_snapshots_with_qemu {
> return 1;
> }
>
> + return 2 if $scfg->{snapext} || $scfg->{type} eq 'lvm' && $volid =~ m/\.(qcow2)/;
> +
> if ($volid =~ m/\.(qcow2|qed)$/){
> return 1;
> }
> @@ -9169,6 +9458,38 @@ sub delete_ifaces_ipams_ips {
> }
> }
>
> +sub find_blockdev_node {
like I mentioned in another patch comment, this is already used by earlier patches. but if at all possible, it would be good to avoid the need for this in the first place..
> + my ($nodes, $path, $type) = @_;
> +
> + my $found_nodeid = undef;
> + my $found_node = undef;
> + for my $nodeid (keys %$nodes) {
> + my $node = $nodes->{$nodeid};
> + if ($nodeid =~ m/^$type-(\S+)$/ && $node->{file} eq $path ) {
because $path encoding might change over time/versions..
> + $found_node = $node;
> + last;
> + }
> + }
> + die "can't found nodeid for file $path\n" if !$found_node;
> + return $found_node;
> +}
> +
> +sub find_parent_node {
> + my ($nodes, $backing_path) = @_;
> +
> + my $found_nodeid = undef;
> + my $found_node = undef;
> + for my $nodeid (keys %$nodes) {
> + my $node = $nodes->{$nodeid};
> + if ($nodeid =~ m/^fmt-(\S+)$/ && $node->{backing_file} && $node->{backing_file} eq $backing_path) {
same applies here, but if we switch to block-stream, the only call site for this goes away anyway..
> + $found_node = $node;
> + last;
> + }
> + }
> + die "can't found nodeid for file $backing_path\n" if !$found_node;
> + return $found_node;
> +}
> +
> sub find_fmt_nodename_drive {
> my ($storecfg, $vmid, $drive, $nodes) = @_;
>
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
@ 2025-01-09 11:57 ` Fabian Grünbichler
0 siblings, 0 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 11:57 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> We need to define name-nodes for all backing chain images,
> to be able to live rename them with blockdev-reopen
>
> For linked clone, we don't need to definebase image(s) chain.
> They are auto added with #block nodename.
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index dc12b38f..3a3feadf 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1618,6 +1618,38 @@ sub generate_throttle_group {
> return $throttle_group;
> }
>
> +sub generate_backing_blockdev {
> + my ($storecfg, $snapshots, $deviceid, $drive, $id) = @_;
> +
> + my $snapshot = $snapshots->{$id};
> + my $order = $snapshot->{order};
> + my $parentid = $snapshot->{parent};
> + my $snap_fmt_nodename = "fmt-$deviceid-$order";
> + my $snap_file_nodename = "file-$deviceid-$order";
would it make sense to use the snapshot name here instead of the order? that would allow a deterministic mapping even when snapshots are removed..
> +
> + my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap_file_nodename);
> + $snap_file_blockdev->{filename} = $snapshot->{file};
> + my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_fmt_nodename, $snap_file_blockdev, 1);
> + $snap_fmt_blockdev->{backing} = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
> + return $snap_fmt_blockdev;
> +}
> +
> +sub generate_backing_chain_blockdev {
> + my ($storecfg, $deviceid, $drive) = @_;
> +
> + my $volid = $drive->{file};
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid);
> + return if !$do_snapshots_with_qemu || $do_snapshots_with_qemu != 2;
> +
> + my $chain_blockdev = undef;
> + PVE::Storage::activate_volumes($storecfg, [$volid]);
> + #should we use qemu config to list snapshots ?
from a data consistency PoV, trusting the qcow2 metadata is probably safer.. but we could check that the storage and the config agree, and error out otherwise?
> + my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> + my $parentid = $snapshots->{'current'}->{parent};
> + $chain_blockdev = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
> + return $chain_blockdev;
> +}
> +
> sub generate_file_blockdev {
> my ($storecfg, $drive, $nodename) = @_;
>
> @@ -1816,6 +1848,8 @@ sub generate_drive_blockdev {
> my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
> my $fmt_nodename = "fmt-drive-$drive_id";
> my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
> + my $backing_chain = generate_backing_chain_blockdev($storecfg, "drive-$drive_id", $drive);
> + $blockdev_format->{backing} = $backing_chain if $backing_chain;
>
> my $blockdev_live_restore = undef;
> if ($live_restore_name) {
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2025-01-09 12:36 ` Fabian Grünbichler
2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
0 siblings, 2 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 12:36 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> src/PVE/Storage/DirPlugin.pm | 1 +
> src/PVE/Storage/Plugin.pm | 207 +++++++++++++++++++++++++++++------
> 2 files changed, 176 insertions(+), 32 deletions(-)
>
> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
> index fb23e0a..1cd7ac3 100644
> --- a/src/PVE/Storage/DirPlugin.pm
> +++ b/src/PVE/Storage/DirPlugin.pm
> @@ -81,6 +81,7 @@ sub options {
> is_mountpoint => { optional => 1 },
> bwlimit => { optional => 1 },
> preallocation => { optional => 1 },
> + snapext => { optional => 1 },
> };
> }
>
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index fececa1..aeba8d3 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -214,6 +214,11 @@ my $defaultData = {
> maximum => 65535,
> optional => 1,
> },
> + 'snapext' => {
> + type => 'boolean',
> + description => 'enable external snapshot.',
> + optional => 1,
> + },
> },
> };
>
> @@ -710,11 +715,15 @@ sub filesystem_path {
> # Note: qcow2/qed has internal snapshot, so path is always
> # the same (with or without snapshot => same file).
> die "can't snapshot this image format\n"
> - if defined($snapname) && $format !~ m/^(qcow2|qed)$/;
> + if defined($snapname) && !$scfg->{snapext} && $format !~ m/^(qcow2|qed)$/;
I am not sure if we want to allow snapshots for non-qcow2 files just because snapext is enabled? I know it's technically possible to have a raw base image and then a qcow2 backing chain on top, but this quickly becomes confusing (how is the volume named then? which format does it have in which context)..
>
> my $dir = $class->get_subdir($scfg, $vtype);
>
> - $dir .= "/$vmid" if $vtype eq 'images';
> + if ($scfg->{snapext} && $snapname) {
> + $name = $class->get_snap_volname($volname, $snapname);
> + } else {
> + $dir .= "/$vmid" if $vtype eq 'images';
> + }
>
> my $path = "$dir/$name";
>
> @@ -953,6 +962,31 @@ sub free_image {
> # TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
> my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
>
> +sub qemu_img_info {
> + my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
> +
> + my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> + push $cmd->@*, '-f', $file_format if $file_format;
> + push $cmd->@*, '--backing-chain' if $follow_backing_files;
> +
> + my $json = '';
> + my $err_output = '';
> + eval {
> + run_command($cmd,
> + timeout => $timeout,
> + outfunc => sub { $json .= shift },
> + errfunc => sub { $err_output .= shift . "\n"},
> + );
> + };
> + warn $@ if $@;
> + if ($err_output) {
> + # if qemu did not output anything to stdout we die with stderr as an error
> + die $err_output if !$json;
> + # otherwise we warn about it and try to parse the json
> + warn $err_output;
> + }
> + return $json;
> +}
> # set $untrusted if the file in question might be malicious since it isn't
> # created by our stack
> # this makes certain checks fatal, and adds extra checks for known problems like
> @@ -1016,25 +1050,9 @@ sub file_size_info {
> warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
> $file_format = 'raw';
> }
> - my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> - push $cmd->@*, '-f', $file_format if $file_format;
>
> - my $json = '';
> - my $err_output = '';
> - eval {
> - run_command($cmd,
> - timeout => $timeout,
> - outfunc => sub { $json .= shift },
> - errfunc => sub { $err_output .= shift . "\n"},
> - );
> - };
> - warn $@ if $@;
> - if ($err_output) {
> - # if qemu did not output anything to stdout we die with stderr as an error
> - die $err_output if !$json;
> - # otherwise we warn about it and try to parse the json
> - warn $err_output;
> - }
> + my $json = qemu_img_info($filename, $file_format, $timeout);
> +
> if (!$json) {
> die "failed to query file information with qemu-img\n" if $untrusted;
> # skip decoding if there was no output, e.g. if there was a timeout.
> @@ -1162,11 +1180,28 @@ sub volume_snapshot {
>
> die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
>
> - my $path = $class->filesystem_path($scfg, $volname);
> + if($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + my $path = $class->path($scfg, $volname, $storeid);
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $format = ($class->parse_volname($volname))[6];
> + #rename current volume to snap volume
> + rename($path, $snappath) if -e $path && !-e $snappath;
I think this should die if the snappath already exists, and the one (IMHO wrong) call in qemu-server should switch to vdisk_alloc/alloc_image.. this is rather dangerous otherwise!
> + my $cmd = ['/usr/bin/qemu-img', 'create', '-b', $snappath,
> + '-F', $format, '-f', 'qcow2', $path];
> +
> + my $options = "extended_l2=on,cluster_size=128k,";
> + $options .= preallocation_cmd_option($scfg, 'qcow2');
> + push @$cmd, '-o', $options;
> + run_command($cmd);
>
> - run_command($cmd);
> + } else {
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1177,6 +1212,21 @@ sub volume_snapshot {
> sub volume_rollback_is_possible {
> my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>
> + if ($scfg->{snapext}) {
> + #technically, we could manage multibranch, we it need lot more work for snapshot delete
> + #we need to implemente block-stream from deleted snapshot to all others child branchs
see my comments in qemu-server - I think we actually want block-stream anyway, since it has the semantics we want..
> + #when online, we need to do a transaction for multiple disk when delete the last snapshot
> + #and need to merge in current running file
> +
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $parentsnap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snappath || $snapshots->{$parentsnap}->{file} eq $snappath;
why do we return 1 here if the snapshot doesn't exist? if we only allow rollback to the most recent snapshot for now, then we could just query the current path and see if it is backed by our snapshot?
> +
> + die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> + }
> +
> return 1;
> }
>
> @@ -1187,9 +1237,15 @@ sub volume_snapshot_rollback {
>
> my $path = $class->filesystem_path($scfg, $volname);
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> -
> - run_command($cmd);
> + if ($scfg->{snapext}) {
> + #simply delete the current snapshot and recreate it
> + my $path = $class->filesystem_path($scfg, $volname);
> + unlink($path);
> + $class->volume_snapshot($scfg, $storeid, $volname, $snap);
> + } else {
> + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
>
> return 1 if $running;
>
> + my $cmd = "";
> my $path = $class->filesystem_path($scfg, $volname);
>
> - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> + if ($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $snappath = $snapshots->{$snap}->{file};
> + return if !-e $snappath; #already deleted ?
shouldn't this be an error?
> +
> + my $parentsnap = $snapshots->{$snap}->{parent};
> + my $childsnap = $snapshots->{$snap}->{child};
> +
> + my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> + my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
> +
> +
> + #if first snapshot, we merge child, and rename the snapshot to child
> + if(!$parentsnap) {
> + #we use commit here, as it's faster than rebase
> + #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
> + print"commit $childpath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
> + run_command($cmd);
> + print"delete $childpath\n";
> +
> + unlink($childpath);
this unlink can be skipped?
> + print"rename $snappath to $childpath\n";
> + rename($snappath, $childpath);
since this will overwrite $childpath anyway.. this also reduces the chance of something going wrong:
- if the commit fails halfway through, nothing bad should have happened, other than some data is now stored in two snapshots and takes up extra space
- if the rename fails, then all of the data of $snap is stored twice, but the backing chain is still valid
notable, there is no longer a gap where $childpath doesn't exist, which would break the backing chain!
> + } else {
> + print"commit $snappath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
leftover from previous version? not used/overwritten below ;)
> + #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;
> + print "link $childsnap to $parentsnap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];
does this work? I would read the qemu-img manpage to say that '-u' is for when you've moved/converted the backing file, and want to update the reference in its overlay, and that it doesn't copy any data.. but we need to copy the data from $snap to $childpath (we just want to delete the snapshot, we don't want to drop all its changes from the history, that would corrupt the contents of the image).
note the description of the "safe" variant:
" This is the default mode and performs a real rebase operation. The new backing file may differ from the old one and qemu-img rebase will take care of keeping the
guest-visible content of FILENAME unchanged."
IMHO this is the behaviour we need here?
> + run_command($cmd);
> + #delete the snapshot
> + unlink($snappath);
> + }
> +
> + } else {
> + $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
>
> - run_command($cmd);
> + $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1246,8 +1341,8 @@ sub volume_has_feature {
> current => { qcow2 => 1, raw => 1, vmdk => 1 },
> },
> rename => {
> - current => {qcow2 => 1, raw => 1, vmdk => 1},
> - },
> + current => { qcow2 => 1, raw => 1, vmdk => 1},
> + }
nit: unrelated change?
> };
>
> if ($feature eq 'clone') {
> @@ -1481,7 +1576,37 @@ sub status {
> sub volume_snapshot_info {
> my ($class, $scfg, $storeid, $volname) = @_;
>
> - die "volume_snapshot_info is not implemented for $class";
should this be guarded with $snapext being enabled?
> + my $path = $class->filesystem_path($scfg, $volname);
> +
> + my $backing_chain = 1;
> + my $json = qemu_img_info($path, undef, 10, $backing_chain);
> + die "failed to query file information with qemu-img\n" if !$json;
> + my $snapshots = eval { decode_json($json) };
> +
> + my $info = {};
> + my $order = 0;
> + for my $snap (@$snapshots) {
> +
> + my $snapfile = $snap->{filename};
> + my $snapname = parse_snapname($snapfile);
> + $snapname = 'current' if !$snapname;
> + my $snapvolname = $class->get_snap_volname($volname, $snapname);
> +
> + $info->{$snapname}->{order} = $order;
> + $info->{$snapname}->{file}= $snapfile;
> + $info->{$snapname}->{volname} = $snapvolname;
> + $info->{$snapname}->{volid} = "$storeid:$snapvolname";
> + $info->{$snapname}->{ext} = 1;
> +
> + my $parentfile = $snap->{'backing-filename'};
> + if ($parentfile) {
> + my $parentname = parse_snapname($parentfile);
> + $info->{$snapname}->{parent} = $parentname;
> + $info->{$parentname}->{child} = $snapname;
> + }
> + $order++;
> + }
> + return $info;
> }
>
> sub activate_storage {
> @@ -1867,4 +1992,22 @@ sub config_aware_base_mkdir {
> }
> }
>
> +sub get_snap_volname {
> + my ($class, $volname, $snapname) = @_;
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> + $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";
> + return $name;
> +}
> +
> +sub parse_snapname {
> + my ($name) = @_;
> +
> + my $basename = basename($name);
> + if ($basename =~ m/^snap-(.*)-vm(.*)$/) {
> + return $1;
> + }
> + return undef;
> +}
> +
> 1;
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2025-01-09 11:57 ` Fabian Grünbichler
@ 2025-01-09 13:19 ` Fabio Fantoni via pve-devel
0 siblings, 0 replies; 38+ messages in thread
From: Fabio Fantoni via pve-devel @ 2025-01-09 13:19 UTC (permalink / raw)
To: Proxmox VE development discussion, Fabian Grünbichler; +Cc: Fabio Fantoni
[-- Attachment #1: Type: message/rfc822, Size: 8210 bytes --]
From: Fabio Fantoni <fabio.fantoni@m2r.biz>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Thu, 9 Jan 2025 14:19:38 +0100
Message-ID: <483058af-44d9-441c-98df-fd7150184ebe@m2r.biz>
Il 09/01/2025 12:57, Fabian Grünbichler ha scritto:
>> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> it would be great if there'd be a summary of the design choices and a high level summary of what happens to the files and block-node-graph here. it's a bit hard to judge from the code below whether it would be possible to eliminate the dynamically named block nodes, for example ;)
>
> a few more comments documenting the behaviour and ideally also some tests (mocking the QMP interactions?) would be nice
@Alexandre Derumier: Thanks for add external snapshot support, I have
not looked at the implementation in detail because I do not have enough
time but I think external snapshot support would be useful.
I used it outside of proxmox years ago, on Debian servers with VMs
managed with libvirt, I managed external snapshots completely manually
from cli with multiple commands because they were not implemented in
virtmanager and they were useful to save a lot of time (instead of
backup/restore) in some high-risk operations on VMs with large disks,
raw pre-allocated on hdd disks.
I used them very little and kept them only the minimum time necessary
for delicate maintenance operations, if there were unforeseen events it
returned to the situation before the snapshot, I deleted the external
snapshot and created another one to try again, if instead everything was
ok in the end I did the commit, and went back to using only the
pre-allocated raw image. With high disk usage as in the operations I was
doing the performance decrease with external qcow2 snapshots compared to
just pre-allocated raw disks was huge if I remember correctly (which is
why I used them for the minimum amount of time possible).
If it hasn't already been planned I think it could be useful to warn
users (atleast in documentation) to avoid them underestimating their
possible impact on performance (especially if they basically have
pre-allocated raw on hdd disks for greater performance and minimal
defragmentation) and avoid use or keep them for a long time without real
need. Another important thing to notify users is the increase in space
usage (again mainly for those who are used to pre-allocated disks where
they usually don't risk increases in space).
In this implementation I don't see the possibility of using them on raw
disks (on files) from a fast look, or am I wrong? If so, why? I think
the main use would be in cases like that where you don't have snapshot
support by default
--
Questa email è stata esaminata alla ricerca di virus dal software antivirus Avast.
www.avast.com
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
@ 2025-01-09 13:55 ` Fabian Grünbichler
2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 13:55 UTC (permalink / raw)
To: Proxmox VE development discussion
one downside with this part in particular - we have to always allocate full-size LVs (+qcow2 overhead), even if most of them will end up storing just a single snapshot delta which might be a tiny part of that full-size.. hopefully if discard is working across the whole stack this doesn't actually explode space usage on the storage side, but it makes everything a bit hard to track.. OTOH, while we could in theory extend/reduce the LVs and qcow2 images on them when modifying the backing chain, the additional complexity is probably not worth it at the moment..
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> src/PVE/Storage/LVMPlugin.pm | 231 ++++++++++++++++++++++++++++++++---
> 1 file changed, 213 insertions(+), 18 deletions(-)
>
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 88fd612..1257cd3 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -4,6 +4,7 @@ use strict;
> use warnings;
>
> use IO::File;
> +use POSIX qw/ceil/;
>
> use PVE::Tools qw(run_command trim);
> use PVE::Storage::Plugin;
> @@ -216,6 +217,7 @@ sub type {
> sub plugindata {
> return {
> content => [ {images => 1, rootdir => 1}, { images => 1 }],
> + format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
I wonder if we want to guard the snapshotting-related parts below with an additional "snapext" option here as well? or even the usage of qcow2 altogether?
> };
> }
>
> @@ -291,7 +293,10 @@ sub parse_volname {
> PVE::Storage::Plugin::parse_lvm_name($volname);
>
> if ($volname =~ m/^(vm-(\d+)-\S+)$/) {
> - return ('images', $1, $2, undef, undef, undef, 'raw');
> + my $name = $1;
> + my $vmid = $2;
> + my $format = $volname =~ m/\.qcow2$/ ? 'qcow2' : 'raw';
> + return ('images', $name, $vmid, undef, undef, undef, $format);
> }
>
> die "unable to parse lvm volume name '$volname'\n";
> @@ -300,11 +305,13 @@ sub parse_volname {
> sub filesystem_path {
> my ($class, $scfg, $volname, $snapname) = @_;
>
> - die "lvm snapshot is not implemented"if defined($snapname);
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
>
> - my ($vtype, $name, $vmid) = $class->parse_volname($volname);
> + die "snapshot is working with qcow2 format only" if defined($snapname) && $format ne 'qcow2';
>
> my $vg = $scfg->{vgname};
> + $name = $class->get_snap_volname($volname, $snapname) if $snapname;
>
> my $path = "/dev/$vg/$name";
>
> @@ -332,7 +339,9 @@ sub find_free_diskname {
>
> my $disk_list = [ keys %{$lvs->{$vg}} ];
>
> - return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, undef, $scfg);
> + $add_fmt_suffix = $fmt eq 'qcow2' ? 1 : undef;
> +
> + return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix);
> }
>
> sub lvcreate {
> @@ -363,7 +372,15 @@ sub lvrename {
> sub alloc_image {
> my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
>
> - die "unsupported format '$fmt'" if $fmt ne 'raw';
> + die "unsupported format '$fmt'" if $fmt !~ m/(raw|qcow2)/;
> +
> + $name = $class->alloc_new_image($storeid, $scfg, $vmid, $fmt, $name, $size);
> + $class->format_qcow2($storeid, $scfg, $name, $size) if $fmt eq 'qcow2';
> + return $name;
> +}
> +
> +sub alloc_new_image {
> + my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
>
> die "illegal name '$name' - should be 'vm-$vmid-*'\n"
> if $name && $name !~ m/^vm-$vmid-/;
> @@ -376,16 +393,45 @@ sub alloc_image {
>
> my $free = int($vgs->{$vg}->{free});
>
> +
> + #add extra space for qcow2 metadatas
> + #without sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size
> + #with sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size / 16
> + #4MB overhead for 1TB with extented l2 clustersize=128k
> +
> + my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
there's "qemu-img measure", which seems like it would do exactly what we want ;)
> +
> + my $lvmsize = $size;
> + $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
> +
> die "not enough free space ($free < $size)\n" if $free < $size;
>
> - $name = $class->find_free_diskname($storeid, $scfg, $vmid)
> + $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
> if !$name;
>
> - lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
> -
> + my $tags = ["pve-vm-$vmid"];
> + push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
that's a creative way to avoid the need to discover and activate snapshots one by one below, but it might warrant a comment ;)
> + lvcreate($vg, $name, $lvmsize, $tags);
> return $name;
> }
>
> +sub format_qcow2 {
> + my ($class, $storeid, $scfg, $name, $size, $backing_file) = @_;
> +
> + # activate volume
> + $class->activate_volume($storeid, $scfg, $name, undef, {});
> + my $path = $class->path($scfg, $name, $storeid);
> + # create the qcow2 fs
> + my $cmd = ['/usr/bin/qemu-img', 'create'];
> + push @$cmd, '-b', $backing_file, '-F', 'qcow2' if $backing_file;
> + push @$cmd, '-f', 'qcow2', $path;
> + push @$cmd, "${size}K" if $size;
> + my $options = "extended_l2=on,";
> + $options .= PVE::Storage::Plugin::preallocation_cmd_option($scfg, 'qcow2');
> + push @$cmd, '-o', $options;
> + run_command($cmd);
> +}
> +
> sub free_image {
> my ($class, $storeid, $scfg, $volname, $isBase) = @_;
>
> @@ -536,6 +582,12 @@ sub activate_volume {
>
> my $lvm_activate_mode = 'ey';
>
> + #activate volume && all snapshots volumes by tag
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + $path = "\@pve-$name" if $format eq 'qcow2';
> +
> my $cmd = ['/sbin/lvchange', "-a$lvm_activate_mode", $path];
> run_command($cmd, errmsg => "can't activate LV '$path'");
> $cmd = ['/sbin/lvchange', '--refresh', $path];
> @@ -548,6 +600,10 @@ sub deactivate_volume {
> my $path = $class->path($scfg, $volname, $storeid, $snapname);
> return if ! -b $path;
>
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> + $path = "\@pve-$name" if $format eq 'qcow2';
> +
> my $cmd = ['/sbin/lvchange', '-aln', $path];
> run_command($cmd, errmsg => "can't deactivate LV '$path'");
> }
> @@ -555,15 +611,27 @@ sub deactivate_volume {
> sub volume_resize {
> my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
>
> - $size = ($size/1024/1024) . "M";
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + my $lvmsize = $size / 1024;
> + my $qcow2_overhead = ceil($size/1024/1024/1024/1024) * 4096;
see above
> + $lvmsize += $qcow2_overhead if $format eq 'qcow2';
> + $lvmsize = "${lvmsize}k";
>
> my $path = $class->path($scfg, $volname);
> - my $cmd = ['/sbin/lvextend', '-L', $size, $path];
> + my $cmd = ['/sbin/lvextend', '-L', $lvmsize, $path];
>
> $class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
> run_command($cmd, errmsg => "error resizing volume '$path'");
> });
>
> + if(!$running && $format eq 'qcow2') {
> + my $prealloc_opt = PVE::Storage::Plugin::preallocation_cmd_option($scfg, $format);
> + my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', $format, $path , $size];
> + run_command($cmd, timeout => 10);
> + }
> +
> return 1;
> }
>
> @@ -585,30 +653,149 @@ sub volume_size_info {
> sub volume_snapshot {
> my ($class, $scfg, $storeid, $volname, $snap) = @_;
>
> - die "lvm snapshot is not implemented";
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + die "can't snapshot this image format\n" if $format ne 'qcow2';
> +
> + $class->activate_volume($storeid, $scfg, $volname, undef, {});
> +
> + my $snap_volname = $class->get_snap_volname($volname, $snap);
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> +
> + #rename current lvm volume to snap volume
> + my $vg = $scfg->{vgname};
> + print"rename $volname to $snap_volname\n";
> + eval { lvrename($vg, $volname, $snap_volname) } ;
missing error handling..
> +
> +
> + #allocate a new lvm volume
> + $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
missing error handling
> + eval {
> + $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
> + };
> +
> + if ($@) {
> + eval { $class->free_image($storeid, $scfg, $volname, 0) };
> + warn $@ if $@;
> + }
> +}
> +
> +sub volume_rollback_is_possible {
> + my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
> +
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $parent_snap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snap_path || $snapshots->{$parent_snap}->{file} eq $snap_path;
the first condition here seems wrong, see storage patch #1
> + die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> +
> + return 1;
> }
>
> +
> sub volume_snapshot_rollback {
> my ($class, $scfg, $storeid, $volname, $snap) = @_;
>
> - die "lvm snapshot rollback is not implemented";
> + die "can't rollback snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
above we only have qcow2, which IMHO makes more sense..
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + $class->activate_volume($storeid, $scfg, $volname, undef, {});
> + my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + #simply delete the current snapshot and recreate it
> + $class->free_image($storeid, $scfg, $volname, 0);
> + $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
> + $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
missing error handling..
> +
> + return undef;
> }
>
> sub volume_snapshot_delete {
> - my ($class, $scfg, $storeid, $volname, $snap) = @_;
> + my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
> +
> + die "can't delete snapshot for this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
same as above
> +
> + return 1 if $running;
> +
> + my $cmd = "";
> + my $path = $class->filesystem_path($scfg, $volname);
> +
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $snap_path = $snapshots->{$snap}->{file};
> + my $snap_volname = $snapshots->{$snap}->{volname};
> + return if !-e $snap_path; #already deleted ?
should maybe be a die?
> +
> + my $parent_snap = $snapshots->{$snap}->{parent};
> + my $child_snap = $snapshots->{$snap}->{child};
> +
> + my $parent_path = $snapshots->{$parent_snap}->{file} if $parent_snap;
> + my $child_path = $snapshots->{$child_snap}->{file} if $child_snap;
> + my $child_volname = $snapshots->{$child_snap}->{volname} if $child_snap;
> +
> +
> + #if first snapshot, we merge child, and rename the snapshot to child
> + if(!$parent_snap) {
> + #we use commit here, as it's faster than rebase
> + #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
> + print"commit $child_path\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $child_path];
> + run_command($cmd);
> + print"delete $child_volname\n";
> + $class->free_image($storeid, $scfg, $child_volname, 0);
> +
> + print"rename $snap_volname to $child_volname\n";
> + my $vg = $scfg->{vgname};
> + lvrename($vg, $snap_volname, $child_volname);
missing error handling..
> + } else {
> + print"commit $snap_path\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snap_path];
leftover?
> + #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child $child_path\n" if !$parent_path;
> + print "link $child_snap to $parent_snap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
> + run_command($cmd);
same as for patch #1, I am not sure the -u here is correct..
> + #delete the snapshot
> + $class->free_image($storeid, $scfg, $snap_volname, 0);
> + }
>
> - die "lvm snapshot delete is not implemented";
> }
>
> sub volume_has_feature {
> my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
>
> my $features = {
> - copy => { base => 1, current => 1},
> - rename => {current => 1},
> + copy => {
> + base => { qcow2 => 1, raw => 1},
> + current => { qcow2 => 1, raw => 1},
> + snap => { qcow2 => 1 },
> + },
> + 'rename' => {
> + current => { qcow2 => 1, raw => 1},
> + },
> + snapshot => {
> + current => { qcow2 => 1 },
> + snap => { qcow2 => 1 },
> + },
> + template => {
> + current => { qcow2 => 1, raw => 1},
> + },
> +# don't allow to clone as we can't activate the base on multiple host at the same time
> +# clone => {
> +# base => { qcow2 => 1, raw => 1},
> +# },
I think activating the base would actually be okay, we just must never write to it? ;)
> };
>
> - my ($vtype, $name, $vmid, $basename, $basevmid, $isBase) =
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> $class->parse_volname($volname);
>
> my $key = undef;
> @@ -617,7 +804,7 @@ sub volume_has_feature {
> }else{
> $key = $isBase ? 'base' : 'current';
> }
> - return 1 if $features->{$feature}->{$key};
> + return 1 if defined($features->{$feature}->{$key}->{$format});
>
> return undef;
> }
> @@ -738,4 +925,12 @@ sub rename_volume {
> return "${storeid}:${target_volname}";
> }
>
> +sub get_snap_volname {
> + my ($class, $volname, $snapname) = @_;
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> + $name = !$snapname || $snapname eq 'current' ? $volname : "snap-$snapname-$name";
> + return $name;
> +}
> +
> 1;
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
2025-01-08 13:27 ` Fabian Grünbichler
@ 2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
1 sibling, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 7:55 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 13639 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
Date: Fri, 10 Jan 2025 07:55:31 +0000
Message-ID: <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>, Fiona
Ebner <f.ebner@proxmox.com>
Objet: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-
replaces option patch
Date: 08/01/2025 14:27:02
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
> This is needed for external snapshot live commit,
> when the top blocknode is not the fmt-node.
> (in our case, the throttle-group node is the topnode)
>>so this is needed to workaround a limitation in block-commit? I think
>>if we need this it should probably be submitted upstream for
>>inclusion, or we provide our own copy of block-commit with it in the
>>meantime?
Yes, it could be submitted upstream (after a little bit of review, I'm
not too good in C;)).
It's more a missing option in the qmp syntax, as it's already using
blockdev-mirror code in background.
(redhat don't used throttle group feature until recently, so I think
they never had seen this problem with block-commit, as their top root
node was the disk directly, and not the throttle group)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2025-01-09 12:36 ` Fabian Grünbichler
@ 2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
1 sibling, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 9:10 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 24019 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Fri, 10 Jan 2025 09:10:54 +0000
Message-ID: <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
> @@ -710,11 +715,15 @@ sub filesystem_path {
> # Note: qcow2/qed has internal snapshot, so path is always
> # the same (with or without snapshot => same file).
> die "can't snapshot this image format\n"
> - if defined($snapname) && $format !~ m/^(qcow2|qed)$/;
> ²
>>I am not sure if we want to allow snapshots for non-qcow2 files just
>>because snapext is enabled? I know it's technically possible to have
>>a raw base image and then a qcow2 backing chain on top, but this
>>quickly becomes confusing (how is the volume named then? which format
>>does it have in which context)..
in the V2 I was allowing it, but for this V3 series, I only manage
external snasphot with qcow2 files. (with the snapshot file renaming,
It'll be too complex to manage, confusing for user indeed... )
I think I forgot to clean this in the V3, the check should be simply
die "can't snapshot this image format\n" if defined($snapname) &&
$format !~ m/^(qcow2|qed)$/;
>
> die "can't snapshot this image format\n" if $volname !~
> m/\.(qcow2|qed)$/;
>
> - my $path = $class->filesystem_path($scfg, $volname);
> + if($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + my $path = $class->path($scfg, $volname, $storeid);
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $format = ($class->parse_volname($volname))[6];
> + #rename current volume to snap volume
> + rename($path, $snappath) if -e $path && !-e $snappath;
>>I think this should die if the snappath already exists, and the one
>>(IMHO wrong) call in qemu-server should switch to
>>vdisk_alloc/alloc_image.. this is rather dangerous otherwise!
right !
> + if ($scfg->{snapext}) {
> + #technically, we could manage multibranch, we it need lot more work
> for snapshot delete
> + #we need to implemente block-stream from deleted snapshot to all
> others child branchs
>>see my comments in qemu-server - I think we actually want block-
>>stream anyway, since it has the semantics we want..
I don't agree, we don't want always, because with block-stream, you
need to copy parent to child.
for example, you have a 1TB image, you take a snapshot, writing 5MB in
the snapshot, delete the snapshot, you'll need to read/copy 1TB data
from parent to the snapshot file.
I don't read your qemu-server comment yet ;)
> + #when online, we need to do a transaction for multiple disk when
> delete the last snapshot
> + #and need to merge in current running file
> +
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $parentsnap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snappath || $snapshots->{$parentsnap}->{file} eq
> $snappath;
>>why do we return 1 here if the snapshot doesn't exist? if we only
>>allow rollback to the most recent snapshot for now, then we could
>>just query the current path and see if it is backed by our snapshot?
I think I forget to remove this this from the V2. But the idea is to
check indead if the snapshot back the current image ( with $snapshots-
>{current}->{parent}.
> +
> + die "can't rollback, '$snap' is not most recent snapshot on
> '$volname'\n";
> + }
> +
> return 1;
> }
>
> @@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
>
> return 1 if $running;
>
> + my $cmd = "";
> my $path = $class->filesystem_path($scfg, $volname);
>
> - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> + if ($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $snappath = $snapshots->{$snap}->{file};
> + return if !-e $snappath; #already deleted ?
>>shouldn't this be an error?
This one was if we want to do retry in case of error, if we have
multiple disks. (for example, first snapshot delete api call, the
first disk remove the snapshot, but a bug occur and second disk don't
remove the snapshot).
User could want to unlock the vm-snaphot lock and and fix it manually
with calling again the snapshot delete.
I'm not sure how to handle this correctly ?
> + print"commit $childpath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
> + run_command($cmd);
> + print"delete $childpath\n";
> +
> + unlink($childpath);
this unlink can be skipped?
> + print"rename $snappath to $childpath\n";
> + rename($snappath, $childpath);
>>since this will overwrite $childpath anyway.. this also reduces the
>>chance of something going wrong:
>>
>>- if the commit fails halfway through, nothing bad should have
>>happened, other than some data is now stored in two snapshots and
>>takes up extra space
>>- if the rename fails, then all of the data of $snap is stored twice,
>>but the backing chain is still valid
>>
>>notable, there is no longer a gap where $childpath doesn't exist,
>>which would break the backing chain!
yes you are right, better to have it atomic indeed
> + } else {
> + print"commit $snappath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
>>leftover from previous version? not used/overwritten below ;)
no, this is really to commit the the snapshot to parent
> + #if we delete an intermediate snapshot, we need to link upper
> snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child $childpath\n"
> if !$parentpath;
> + print "link $childsnap to $parentsnap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath,
> '-F', 'qcow2', '-f', 'qcow2', $childpath];
>>does this work? I would read the qemu-img manpage to say that '-u' is
>>for when you've moved/converted the backing file, and want to update
>>the reference in its overlay, and that it doesn't copy any data.. but
>>we need to copy the data from $snap to $childpath (we just want to
>>delete the snapshot, we don't want to drop all its changes from the
>>history, that would corrupt the contents of the image).
>>note the description of the "safe" variant:
>>
>>" This is the default mode and performs a real
>>rebase operation. The new backing file may differ from the old one
>>and qemu-img rebase will take care of keeping the
>> guest-visible content of FILENAME unchanged."
>>
>>IMHO this is the behaviour we need here?
This is only to change the backing chain ref in the qcow2 snapshot.
(this is the only way to do it, they was a qemu-img ammend command in
past, but it has been removed in
2020 https://patchwork.kernel.org/project/qemu-devel/patch/20200403175859.863248-5-eblake@redhat.com/,
so the rebase is the good way to do it)
The merge is done by the previous qemu-img commit. (qemu-img commit
can't change change automatically the backing chain of the upper
snapshot, because it don't have any idea than an upper snapshot could
exist).
this is for this usecase :
A<----B<----C.
you commit B to A, then you need to change the backing file of C to A
(instead B)
A<----C
(when done it live, qemu qmp block-commit is able to change
automatically the backing chain of the upper snapshot, because qemu
known the whole chain)
This is how libvirt is doing too
https://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html
see "Deleting snapshots (and 'offline commit')"
Method (1): base <- sn1 <- sn3 (by copying sn2 into sn1)
Method (2): base <- sn1 <- sn3 (by copying sn2 into sn3)
(This is commit vs stream)
I think that we should look at used space of parent vs child,
to choose the correct direction/method.
> + run_command($cmd);
> + #delete the snapshot
> + unlink($snappath);
> + }
> +
> + } else {
> + $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
>
> - run_command($cmd);
> + $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1246,8 +1341,8 @@ sub volume_has_feature {
> current => { qcow2 => 1, raw => 1, vmdk => 1 },
> },
> rename => {
> - current => {qcow2 => 1, raw => 1, vmdk => 1},
> - },
> + current => { qcow2 => 1, raw => 1, vmdk => 1},
> + }
>>nit: unrelated change?
yep
> };
>
> if ($feature eq 'clone') {
> @@ -1481,7 +1576,37 @@ sub status {
> sub volume_snapshot_info {
> my ($class, $scfg, $storeid, $volname) = @_;
>
> - die "volume_snapshot_info is not implemented for $class";
>>should this be guarded with $snapext being enabled?
yes indeed
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
@ 2025-01-10 9:15 ` Fiona Ebner
2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 38+ messages in thread
From: Fiona Ebner @ 2025-01-10 9:15 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler
Am 10.01.25 um 08:55 schrieb DERUMIER, Alexandre:
> -------- Message initial --------
> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>, Fiona
> Ebner <f.ebner@proxmox.com>
> Objet: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-
> replaces option patch
> Date: 08/01/2025 14:27:02
>
>
>> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
>> 16.12.2024 10:12 CET geschrieben:
>
>> This is needed for external snapshot live commit,
>> when the top blocknode is not the fmt-node.
>> (in our case, the throttle-group node is the topnode)
>
>>> so this is needed to workaround a limitation in block-commit? I think
>>> if we need this it should probably be submitted upstream for
>>> inclusion, or we provide our own copy of block-commit with it in the
>>> meantime?
> Yes, it could be submitted upstream (after a little bit of review, I'm
> not too good in C;)).
>
> It's more a missing option in the qmp syntax, as it's already using
> blockdev-mirror code in background.
>
> (redhat don't used throttle group feature until recently, so I think
> they never had seen this problem with block-commit, as their top root
> node was the disk directly, and not the throttle group)
Maybe it could even be a bug then? In many situations, the filter nodes
on top (like throttle groups) are ignored/skipped to get to the actually
interesting block node for certain block operations. Are there any
situations where you wouldn't want to do that in the block-commit case?
There is a dedicated bdrv_skip_filters() function, e.g. used in
stream_prepare(). Would be good to hear what upstream thinks.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
2025-01-10 9:15 ` Fiona Ebner
@ 2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 9:32 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 13466 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
Date: Fri, 10 Jan 2025 09:32:15 +0000
Message-ID: <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
>>Maybe it could even be a bug then?
Yes, it's a bug. I just think that libvirt currently only implement
block-commit with disk blockdev on topnode.
throttle group are not currently implement in libvirt (but I have seen
some commit to add support recently), they still used the old throttle
method.
>>In many situations, the filter >>nodes
>>on top (like throttle groups) are ignored/skipped to get to the
>>actually
>>interesting block node for certain block operations.
yes, and this option exist in the qmp blockdev-mirror. (and block-
commit is reusing blockdev-mirror code behind)
>>Are there any situations where you wouldn't want to do that in the
>>block-commit case?
mmm, I think it should always be rettach to disk (format blocknode or
file blocknode if no formatnode exist). I really don't known how to
code this, I have just reused the blockdev-mirror way.
Feel free to cleanup this patch and submit it to qemu devs, you are a
better C developper than me ^_^
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
2025-01-09 13:55 ` Fabian Grünbichler
@ 2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 10:16 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 21329 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
Date: Fri, 10 Jan 2025 10:16:44 +0000
Message-ID: <86852ee45321ae5fad3ab9ae0c6cc23bed203de8.camel@groupe-cyllene.com>
>>one downside with this part in particular - we have to always
>>allocate full-size LVs (+qcow2 overhead), even if most of them will
>>end up storing just a single snapshot delta which might be a tiny
>>part of that full-size.. hopefully if discard is working across the
>>whole stack this doesn't actually explode space usage on the storage
>>side, but it makes everything a bit hard to track.. OTOH, while we
>>could in theory extend/reduce the LVs and qcow2 images on them when
>>modifying the backing chain, the additional complexity is probably
>>not worth it at the moment..
see this RFC with dynamic extend. (not shrink/not discard)
https://lore.proxmox.com/pve-devel/mailman.475.1725007456.302.pve-devel@lists.proxmox.com/t/
(I think that the tricky part (as Dominic have fast review), is to
handle resize cluster lock correctly, and handle timeout/retry with a
queue through a specific daemon)
But technically, this is how ovirt is Managed it (and it's works in
production, I have customers using it since multiple years)
>
> sub plugindata {
> return {
> content => [ {images => 1, rootdir => 1}, { images => 1 }],
> + format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
>>I wonder if we want to guard the snapshotting-related parts below
>>with an additional "snapext" option here as well?
I really don't known, it's not possible to do snapshots with .raw
anyway.
on the gui side, it could allow to enable/display the format field for
example if snapext is defined in the storage.
>>or even the usage >>of qcow2 altogether?
I think we should keep to possiblity to choose .raw vs .qcow2 on same
storage, because
maybe a user really need max performance for a specific vm without the
need of snapshot.
>
> +
> + #add extra space for qcow2 metadatas
> + #without sub-allocated clusters : For 1TB storage : l2_size =
> disk_size × 8 / cluster_size
> + #with sub-allocated clusters : For 1TB storage : l2_size =
> disk_size × 8 / cluster_size / 16
> + #4MB overhead for 1TB with
> extented l2 clustersize=128k
> +
> + my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
>>there's "qemu-img measure", which seems like it would do exactly what
>>we want ;)
"Calculate the file size required for a new image. This information can
be used to size logical volumes or SAN LUNs appropriately for the image
that will be placed in them."
indeed, lol. I knowned the command, but I thinked it was to measure
the content of an existing file. I'll do tests to see if I got same
results (and if sub-allocated clusters is correctly handled)
> +
> + my $lvmsize = $size;
> + $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
> +
> die "not enough free space ($free < $size)\n" if $free < $size;
>
> - $name = $class->find_free_diskname($storeid, $scfg, $vmid)
> + $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
> if !$name;
>
> - lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
> -
> + my $tags = ["pve-vm-$vmid"];
> + push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
>>that's a creative way to avoid the need to discover and activate
>>snapshots one by one below, but it might warrant a comment ;)
ah sorry (but yes,this was the idea to active/desactivate the whole
chain in 1call)
> >>
> +
> + #rename current lvm volume to snap volume
> + my $vg² = $scfg->{vgname};
> + print"rename $volname to $snap_volname\n";
> + eval { lvrename($vg, $volname, $snap_volname) } ;
>> missing error handling..
> +
> +
> + #allocate a new lvm volume
> + $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2',
> $volname, $size/1024);
>>missing error handling
ah ,sorry, it should include in the following eval
> + eval {
> + $class->format_qcow2($storeid, $scfg, $volname, undef,
> $snap_path);
> + };
> +
> + if ($@) {
> + eval { $class->free_image($storeid, $scfg, $volname, 0) };
> + warn $@ if $@;
> + }
> +}
> +
> +sub volume_rollback_is_possible {
> + my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
> +
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $parent_snap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snap_path || $snapshots->{$parent_snap}->{file}
> eq $snap_path;
>>the first condition here seems wrong, see storage patch #1
yes
> + die "can't rollback, '$snap' is not most recent snapshot on
> '$volname'\n";
> +
> + return 1;
> }
>
> +
> sub volume_snapshot_rollback {
> my ($class, $scfg, $storeid, $volname, $snap) = @_;
>
> - die "lvm snapshot rollback is not implemented";
> + die "can't rollback snapshot this image format\n" if $volname !~
> m/\.(qcow2|qed)$/;
>>above we only have qcow2, which IMHO makes more sense..
We could remove the .qed everywhere, IT's deprecated since 2017 and we
never have exposed it in the gui.
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $snap_path = $snapshots->{$snap}->{file};
> + my $snap_volname = $snapshots->{$snap}->{volname};
> + return if !-e $snap_path; #already deleted ?
>>should maybe be a die?
same than patch #1 comment. this was for snapdel retry with multiple
disks.
> + } else {
> + print"commit $snap_path\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snap_path];
>> leftover?
still no ;) see my patch#1 reply
> + #if we delete an intermediate snapshot, we need to link
> upper snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child
> $child_path\n" if !$parent_path;
> + print "link $child_snap to $parent_snap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b',
> $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
> + run_command($cmd);
>>same as for patch #1, I am not sure the -u here is correct..
This is correct, see my patch#1 reply
>
> +# don't allow to clone as we can't activate the base on multiple
> host at the same time
> +# clone => {
> +# base => { qcow2 => 1, raw => 1},
> +# },
>>I think activating the base would actually be okay, we just must
>>never write to it? ;)
Ah, this is a good remark. I thinked we couldn't activate an LV on
multiple node at the same time. I'll look at this, this add possibility
of linked clone. (I need to check the external snapshot code with
backing chains first)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
@ 2025-01-10 11:02 ` Fabian Grünbichler
2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
0 siblings, 2 replies; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-10 11:02 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 10.01.2025 10:10 CET geschrieben:
> > + if ($scfg->{snapext}) {
> > + #technically, we could manage multibranch, we it need lot more work
> > for snapshot delete
> > + #we need to implemente block-stream from deleted snapshot to all
> > others child branchs
>
> >>see my comments in qemu-server - I think we actually want block-
> >>stream anyway, since it has the semantics we want..
>
> I don't agree, we don't want always, because with block-stream, you
> need to copy parent to child.
>
> for example, you have a 1TB image, you take a snapshot, writing 5MB in
> the snapshot, delete the snapshot, you'll need to read/copy 1TB data
> from parent to the snapshot file.
> I don't read your qemu-server comment yet ;)
yes, for the "first" snapshot that is true (since that one is basically the baseline data, which will often be huge compared to the snapshot delta). but streaming (rebasing) saves us the rename, which makes the error handling a lot easier/less risky. maybe we could special case the first snapshot as a performance optimization? ;)
> > @@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
> >
> > return 1 if $running;
> >
> > + my $cmd = "";
> > my $path = $class->filesystem_path($scfg, $volname);
> >
> > - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> > + if ($scfg->{snapext}) {
> >
> > - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> > + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> > $volname);
> > + my $snappath = $snapshots->{$snap}->{file};
> > + return if !-e $snappath; #already deleted ?
>
> >>shouldn't this be an error?
>
> This one was if we want to do retry in case of error, if we have
> multiple disks. (for example, first snapshot delete api call, the
> first disk remove the snapshot, but a bug occur and second disk don't
> remove the snapshot).
>
> User could want to unlock the vm-snaphot lock and and fix it manually
> with calling again the snapshot delete.
>
> I'm not sure how to handle this correctly ?
I think the force parameter for snapshot deletion covers this already, and it should be fine for this to die..
>
> > + print"commit $childpath\n";
> > + $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
> > + run_command($cmd);
> > + print"delete $childpath\n";
> > +
> > + unlink($childpath);
>
> this unlink can be skipped?
>
> > + print"rename $snappath to $childpath\n";
> > + rename($snappath, $childpath);
>
> >>since this will overwrite $childpath anyway.. this also reduces the
> >>chance of something going wrong:
> >>
> >>- if the commit fails halfway through, nothing bad should have
> >>happened, other than some data is now stored in two snapshots and
> >>takes up extra space
> >>- if the rename fails, then all of the data of $snap is stored twice,
> >>but the backing chain is still valid
> >>
> >>notable, there is no longer a gap where $childpath doesn't exist,
> >>which would break the backing chain!
>
> yes you are right, better to have it atomic indeed
>
>
> > + } else {
> > + print"commit $snappath\n";
> > + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
>
> >>leftover from previous version? not used/overwritten below ;)
>
> no, this is really to commit the the snapshot to parent
but it is not executed..
>
> > + #if we delete an intermediate snapshot, we need to link upper
> > snapshot to base snapshot
> > + die "missing parentsnap snapshot to rebase child $childpath\n"
> > if !$parentpath;
> > + print "link $childsnap to $parentsnap\n";
> > + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath,
> > '-F', 'qcow2', '-f', 'qcow2', $childpath];
>
> >>does this work? I would read the qemu-img manpage to say that '-u' is
> >>for when you've moved/converted the backing file, and want to update
> >>the reference in its overlay, and that it doesn't copy any data.. but
> >>we need to copy the data from $snap to $childpath (we just want to
> >>delete the snapshot, we don't want to drop all its changes from the
> >>history, that would corrupt the contents of the image).
> >>note the description of the "safe" variant:
> >>
> >>" This is the default mode and performs a real
> >>rebase operation. The new backing file may differ from the old one
> >>and qemu-img rebase will take care of keeping the
> >> guest-visible content of FILENAME unchanged."
> >>
> >>IMHO this is the behaviour we need here?
>
> This is only to change the backing chain ref in the qcow2 snapshot.
> (this is the only way to do it, they was a qemu-img ammend command in
> past, but it has been removed in
> 2020 https://patchwork.kernel.org/project/qemu-devel/patch/20200403175859.863248-5-eblake@redhat.com/,
> so the rebase is the good way to do it)
>
> The merge is done by the previous qemu-img commit. (qemu-img commit
> can't change change automatically the backing chain of the upper
> snapshot, because it don't have any idea than an upper snapshot could
> exist).
see above and below ;)
> this is for this usecase :
>
> A<----B<----C.
>
> you commit B to A, then you need to change the backing file of C to A
> (instead B)
>
> A<----C
but this is the wrong semantics.. the writes/delta in B need to go to C (they happened after A), not to A!
> (when done it live, qemu qmp block-commit is able to change
> automatically the backing chain of the upper snapshot, because qemu
> known the whole chain)
I think it's wrong there as well, see my comments on those patches ;)
> This is how libvirt is doing too
> https://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html
> see "Deleting snapshots (and 'offline commit')"
> Method (1): base <- sn1 <- sn3 (by copying sn2 into sn1)
> Method (2): base <- sn1 <- sn3 (by copying sn2 into sn3)
> (This is commit vs stream)
but they use the "wrong" (v1) naming scheme where the name of the snapshot and the content don't line up..
> I think that we should look at used space of parent vs child,
> to choose the correct direction/method.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2025-01-10 11:02 ` Fabian Grünbichler
@ 2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
1 sibling, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 11:51 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 15532 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Fri, 10 Jan 2025 11:51:35 +0000
Message-ID: <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
>>yes, for the "first" snapshot that is true (since that one is
>>basically the baseline data, which will often be huge compared to the
>>snapshot delta). but streaming (rebasing) saves us the rename, which
>>makes the error handling a lot easier/less risky. maybe we could
>special case the first snapshot as a performance optimization? ;)
Ah, that's a good point indeed. Yes, I think it's a good idea, commit
to "first" snapshot, and rebase for others. I'll look to implement
this.
>
>
> This one was if we want to do retry in case of error, if we have
> multiple disks. (for example, first snapshot delete api call, the
> first disk remove the snapshot, but a bug occur and second disk don't
> remove the snapshot).
>
> User could want to unlock the vm-snaphot lock and and fix it
> manually
> with calling again the snapshot delete.
>
> I'm not sure how to handle this correctly ?
>>I think the force parameter for snapshot deletion covers this
>>already, and it should be fine for this to die..
Ah, ok, I was not aware about this parameter ! thanks.
>
>
> > + } else {
> > + print"commit $snappath\n";
> > + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
>
> > > leftover from previous version? not used/overwritten below ;)
>
> no, this is really to commit the the snapshot to parent
>>but it is not executed..
Ah, ok ! sorrry ! I think I have dropped some code during rebase before
sending patches, because I had tested it a lot of time !
> this is for this usecase :
>
> A<----B<----C.
>
> you commit B to A, then you need to change the backing file of C to
> A
> (instead B)
>
> A<----C
>>but this is the wrong semantics.. the writes/delta in B need to go to
>>C (they happened after A), not to A!
I think they can go to A (commit) or C (stream)
here an example:
current (1TB)
- take snap A
(A (1TB)<------new current 500MB (backing file A))
- take snap B
(A (1TB)<------B 500MB (backingfile A)<------new current 10MB
(backingfile B))
Then, you want to delete B.
so, you stream it to current. (so copy 500MB to current in this
example)
Then, you want to delete snapshot A
you don't want stream A to current, because A is the big initial image.
So, instead, you need to commit the current to A (with the extra 500MB)
So, if you have a lot of snapshot to delete, you are going do a copy
same datas each time to the upper snapshot for nothing, because at the
end we are going to commit to the initial "first" snapshot/image.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
@ 2025-01-10 12:20 ` Fabian Grünbichler
2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 38+ messages in thread
From: Fabian Grünbichler @ 2025-01-10 12:20 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 10.01.2025 12:51 CET geschrieben:
> > > + } else {
> > > + print"commit $snappath\n";
> > > + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
> >
> > > > leftover from previous version? not used/overwritten below ;)
> >
> > no, this is really to commit the the snapshot to parent
>
> >>but it is not executed..
>
> Ah, ok ! sorrry ! I think I have dropped some code during rebase before
> sending patches, because I had tested it a lot of time !
>
>
>
> > this is for this usecase :
> >
> > A<----B<----C.
> >
> > you commit B to A, then you need to change the backing file of C to
> > A
> > (instead B)
> >
> > A<----C
>
> >>but this is the wrong semantics.. the writes/delta in B need to go to
> >>C (they happened after A), not to A!
>
> I think they can go to A (commit) or C (stream)
>
> here an example:
>
> current (1TB)
> - take snap A
>
> (A (1TB)<------new current 500MB (backing file A))
>
> - take snap B
>
> (A (1TB)<------B 500MB (backingfile A)<------new current 10MB
> (backingfile B))
>
>
> Then, you want to delete B.
>
>
> so, you stream it to current. (so copy 500MB to current in this
> example)
>
> Then, you want to delete snapshot A
> you don't want stream A to current, because A is the big initial image.
> So, instead, you need to commit the current to A (with the extra 500MB)
>
>
> So, if you have a lot of snapshot to delete, you are going do a copy
> same datas each time to the upper snapshot for nothing, because at the
> end we are going to commit to the initial "first" snapshot/image.
but you don't know up front that you want to collapse all the snapshots. for each single removal, you have to merge the delta towards the overlay, not the base, else the base contents is no longer matching its name.
think about it this way:
you take a snapshot B at time X. this snapshot must never contain a modification that happened after X. that means you cannot ever commit a newer snapshot into B, unless you are removing and renaming B.
if you start with a chain A -> B -> C -> D (with A being the first snapshot/base, and D being the current active overlay. if you want to remove B, you can either
- stream B into C, remove B
- commit C into B, remove C, rename B to C
in both cases you will end up with a chain A -> C' -> D where C' is the combination of the old B and C.
the downside of the streaming variant is that if B's delta is bigger than C's, you have more I/O. the upside is that there is no inbetween state where the backing chain is broken and error handling can go very wrong.
what you are doing right now is:
chain A->B->C->D as before. remove B by commiting B into A and then rebasing C on top of A. that means you end up with:
A'->C->D where A' is A+B. but now this snapshot A contains writes that happened after the original A was taken. this is wrong.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2025-01-10 12:20 ` Fabian Grünbichler
@ 2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 13:14 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14788 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Fri, 10 Jan 2025 13:14:48 +0000
Message-ID: <32c2f6a7978e2c37bb5b51f44de2261dc12446e8.camel@groupe-cyllene.com>
>>but you don't know up front that you want to collapse all the
>>snapshots. for each single removal, you have to merge the delta
>>towards the overlay, not the base, else the base contents is no
>>longer matching its name.
>>
>>think about it this way:
>>
>>you take a snapshot B at time X. this snapshot must never contain a
>>modification that happened after X. that means you cannot ever commit
>>a newer snapshot into B, unless you are removing and renaming B.
>>if you start with a chain A -> B -> C -> D (with A being the first
>>snapshot/base, and D being the current active overlay. if you want to
>>remove B, you can either
>>- stream B into C, remove B
>>- commit C into B, remove C, rename B to C
>>
>>in both cases you will end up with a chain A -> C' -> D where C' is
>>the combination of the old B and C.
>>
>>the downside of the streaming variant is that if B's delta is bigger
>>than C's, you have more I/O. the upside is that there is no inbetween
>>state where the backing chain is broken and error handling can go
>>very wrong.
>>
>>what you are doing right now is:
>>
>>chain A->B->C->D as before. remove B by commiting B into A and then
>>rebasing C on top of A. that means you end up with:
>>A'->C->D where A' is A+B. but now this snapshot A contains writes
>>that happened after the original A was taken. this is wrong.
Ah yes, you are right.
I was just thinking about it, and have the same conclusion
for example:
A 1TB (12:00)----> B 500MB (13:00) ----> C 10MB (14:00) ---> current
(now)
if I delete B, If I merge it to A, it'll not be the A view at 12:00,
but 13:00.
so we indeed need to merge it to C.
(Sorry, I'm using zfs/ceph for too long, where merge never occur, and
block are only referenced && destroyed in background.)
Ok,I'll rework with stream implementation. (I need to do it anyway for
multi-branch, but later please
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
2025-01-08 14:17 ` Fabian Grünbichler
@ 2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 13:50 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 19012 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
Date: Fri, 10 Jan 2025 13:50:33 +0000
Message-ID: <b4cfffe2a2bfe601affef4f5aab63f6beb72cb97.camel@groupe-cyllene.com>
> - $device .= ",drive=drive-$drive_id,id=$drive_id";
> + $device .= ",id=$drive_id";
> + $device .= ",drive=drive-$drive_id" if $device_type ne 'cd' ||
> $drive->{file} ne 'none';
>>is this just because you remove the whole drive when ejecting? not
>>sure whether that is really needed..
with blockdev, no drive (no disc inserted in the cdrom device), it's
really no blockdev defined.
So we don't pass drive/cdrom media to the cdrom device.
>
> -sub print_drive_commandline_full {
> - my ($storecfg, $vmid, $drive, $live_restore_name, $io_uring) =
> @_;
> +sub print_drive_throttle_group {
> + my ($drive) = @_;
> + #command line can't use the structured json limits option,
> + #so limit params need to use with x- as it's unstable api
>>this comment should be below the early return, or above the whole
>>sub.
ok
> + return if drive_is_cdrom($drive) && $drive->{file} eq 'none';
>>is this needed if we keep empty cdrom drives around like before? I
>>know throttling practically makes no sense in that case, but it might
>>make the code in general more simple?
yes, this is to keep-it like before, but I can put it behind a
throttle-group, no problem.
>
> +sub generate_file_blockdev {
> + my ($storecfg, $drive, $nodename) = @_;
> +
> + my $volid = $drive->{file};
> my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid,
> 1);
> - my $scfg = $storeid ? PVE::Storage::storage_config($storecfg,
> $storeid) : undef;
>
> - if (drive_is_cdrom($drive)) {
> - $path = get_iso_path($storecfg, $vmid, $volid);
> - die "$drive_id: cannot back cdrom drive with a live restore
> image\n" if $live_restore_name;
> + my $scfg = undef;
> + my $path = $volid;
I think this should only happen if the parse_volume_id above told us
this is an absolute path and not a PVE-managed volume..
> + if($storeid && $storeid ne 'nbd') {
>>this is wrong.. I guess it's also somewhat wrong in the old
>>qemu_drive_mirror code.. we should probably check using a more
>>specific RE that the "volid" is an NBD URI, and not attempt to parse
>>it as a regular volid in that case..
ok. I'm already parsing the nbd uri later, I'll adapt the code.
> + my $format = $drive->{format};
> + $format //= "raw";
>>the format handling here is very sensitive, and I think this broke
>>it. see the big comment this patch removed ;)
>>
>>short summary: for PVE-managed volumes we want the format from the
>>storage layer (via checked_volume_format). if the drive has a format
>>set that disagrees, that is a hard error. for absolute paths we us
>>the format from the drive with a fallback to raw.
yes, I have seen the commits during my rebase before sending patches.
I'll fix that.
>
> - if ($live_restore_name) {
> - $format = "rbd" if $is_rbd;
> - die "$drive_id: Proxmox Backup Server backed drive cannot auto-
> detect the format\n"
> - if !$format;
> - $opts .= ",format=alloc-track,file.driver=$format";
> - } elsif ($format) {
> - $opts .= ",format=$format";
> + my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid,
> 1);
>>so I guess this should never be called with nbd-URI-volids?
until we want to live restore to an nbd uri, no ^_^
> + my $readonly = defined($drive->{ro}) || $force_readonly ?
> JSON::true : JSON::false;
> +
> + #libvirt define cache option on both format && file
> my $cache_direct = drive_uses_cache_direct($drive, $scfg);
> + my $cache = {};
> + $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
> + $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq
> 'unsafe' ? JSON::true : JSON::false;
>>so we have the same code in two places? should probably be a helper
>>then to not have them go out of sync..
Ah, yes, forgot to do the helper. Libvirt define it at both file &&
format blockdev, not sure why exactly,.
>
> - # my $file_param = $live_restore_name ? "file.file.filename" :
> "file";
> - my $file_param = "file";
> + my $file_nodename = "file-drive-$drive_id";
> + my $blockdev_file = generate_file_blockdev($storecfg, $drive,
> $file_nodename);
> + my $fmt_nodename = "fmt-drive-$drive_id";
> + my $blockdev_format = generate_format_blockdev($storecfg,
> $drive, $fmt_nodename, $blockdev_file, $force_readonly);
> +
> + my $blockdev_live_restore = undef;
> if ($live_restore_name) {
> - # non-rbd drivers require the underlying file to be a separate
> block
> - # node, so add a second .file indirection
> - $file_param .= ".file" if !$is_rbd;
> - $file_param .= ".filename";
> + die "$drive_id: Proxmox Backup Server backed drive cannot
> auto-detect the format\n"
> + if !$format;
>>for this check, but it is not actually set anywhere here.. so is
>>something missing or can the check go?
can be remove, this is the older code that I forget to remove.
(I don't have tested the backup/restore yet, ad backup is not working)
>
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
2025-01-08 14:26 ` Fabian Grünbichler
@ 2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 38+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 14:08 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 17858 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
Date: Fri, 10 Jan 2025 14:08:20 +0000
Message-ID: <a26394024a9767a4d602c07e362e670808b17fbb.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert
qemu_driveadd && qemu_drivedel
Date: 08/01/2025 15:26:37
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
> fixme/testme :
> PVE/VZDump/QemuServer.pm: eval {
> PVE::QemuServer::qemu_drivedel($vmid, "tpmstate0-backup"); };
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-
> cyllene.com>
> ---
> PVE/QemuServer.pm | 64 +++++++++++++++++++++++++++++++++------------
> --
> 1 file changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2832ed09..baf78ec0 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1582,6 +1582,42 @@ sub print_drive_throttle_group {
> return $throttle_group;
> }
>
> +sub generate_throttle_group {
> + my ($drive) = @_;
> +
> + my $drive_id = get_drive_id($drive);
> +
> + my $throttle_group = { id => "throttle-drive-$drive_id" };
> + my $limits = {};
> +
> + foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-
> write']) {
> + my ($dir, $qmpname) = @$type;
> +
> + if (my $v = $drive->{"mbps$dir"}) {
> + $limits->{"bps$qmpname"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"mbps${dir}_max"}) {
> + $limits->{"bps$qmpname-max"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"bps${dir}_max_length"}) {
> + $limits->{"bps$qmpname-max-length"} = int($v)
> + }
> + if (my $v = $drive->{"iops${dir}"}) {
> + $limits->{"iops$qmpname"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max"}) {
> + $limits->{"iops$qmpname-max"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max_length"}) {
> + $limits->{"iops$qmpname-max-length"} = int($v);
> + }
> + }
> +
> + $throttle_group->{limits} = $limits;
> +
> + return $throttle_group;
>>this and the corresponding print sub are exactly the same, so the
>>print sub could call this and join the limits with the `x-` prefix
>>added?
yes we could merge them.
Currently, the command line can't defined complex qom object (this
should be available soon, qemu devs are working on it). That's why it's
using a different syntax with x-.
>>how does this interact with the qemu_block_set_io_throttle helper
>>used when updating the limits at runtime?
It's still working with block_set_io_throttle, where you define the
device. (the throttling value are passed to the topnode attached to the
device)
> +}
> +
> sub generate_file_blockdev {
> my ($storecfg, $drive, $nodename) = @_;
>
> @@ -4595,32 +4631,22 @@ sub qemu_iothread_del {
> }
>
> sub qemu_driveadd {
> - my ($storecfg, $vmid, $device) = @_;
> + my ($storecfg, $vmid, $drive) = @_;
>
> - my $kvmver = get_running_qemu_version($vmid);
> - my $io_uring = min_version($kvmver, 6, 0);
> - my $drive = print_drive_commandline_full($storecfg, $vmid,
> $device, undef, $io_uring);
> - $drive =~ s/\\/\\\\/g;
> - my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add
> auto \"$drive\"", 60);
> -
> - # If the command succeeds qemu prints: "OK"
> - return 1 if $ret =~ m/OK/s;
> + my $drive_id = get_drive_id($drive);
> + my $throttle_group = generate_throttle_group($drive);
>>do we always need a throttle group? or would we benefit from only
>>adding it when limits are set, and skip that node when I/O is
>>unlimited?
It's adding a lot of complexity without it, because it's not always
possible to insert a new blockdev (here throttlegroup) between the
device and the drive blockdev, when the blockdev is already the top
node attached to the device
the other benefit is to have a stable name for top blocknode.
(drive node names can change when you switch). (less lookup for some
qmp action, like mirror/commit for example where you need to known the
top node nodename)
They a no performance impact to have a throttle group without limit
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2025-01-10 14:08 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-01-08 13:27 ` Fabian Grünbichler
2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
2025-01-10 9:15 ` Fiona Ebner
2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-01-08 14:17 ` Fabian Grünbichler
2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 12:36 ` Fabian Grünbichler
2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
2025-01-10 11:02 ` Fabian Grünbichler
2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
2025-01-10 12:20 ` Fabian Grünbichler
2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-01-09 13:55 ` Fabian Grünbichler
2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-01-08 14:26 ` Fabian Grünbichler
2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-01-08 14:31 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-01-08 14:34 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-01-08 15:19 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-01-09 9:51 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox