* [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 13:27 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
` (13 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 10093 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
Date: Mon, 16 Dec 2024 10:12:15 +0100
Message-ID: <20241216091229.3142660-2-alexandre.derumier@groupe-cyllene.com>
This is needed for external snapshot live commit,
when the top blocknode is not the fmt-node.
(in our case, the throttle-group node is the topnode)
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
...052-block-commit-add-replaces-option.patch | 137 ++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 138 insertions(+)
create mode 100644 debian/patches/pve/0052-block-commit-add-replaces-option.patch
diff --git a/debian/patches/pve/0052-block-commit-add-replaces-option.patch b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
new file mode 100644
index 0000000..2488b5b
--- /dev/null
+++ b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
@@ -0,0 +1,137 @@
+From ae39fd3bb72db440cf380978af9bf5693c12ac6c Mon Sep 17 00:00:00 2001
+From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
+Date: Wed, 11 Dec 2024 16:20:25 +0100
+Subject: [PATCH] block-commit: add replaces option
+
+This use same code than drive-mirror for live commit, but the option
+is not send currently.
+
+Allow to replaces a different node than the root node after the block-commit
+(as we use throttle-group as root, and not the drive)
+---
+ block/mirror.c | 4 ++--
+ block/replication.c | 2 +-
+ blockdev.c | 4 ++--
+ include/block/block_int-global-state.h | 4 +++-
+ qapi/block-core.json | 5 ++++-
+ qemu-img.c | 2 +-
+ 6 files changed, 13 insertions(+), 8 deletions(-)
+
+diff --git a/block/mirror.c b/block/mirror.c
+index 2f12238..1a5e528 100644
+--- a/block/mirror.c
++++ b/block/mirror.c
+@@ -2086,7 +2086,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+ int64_t speed, BlockdevOnError on_error,
+ const char *filter_node_name,
+ BlockCompletionFunc *cb, void *opaque,
+- bool auto_complete, Error **errp)
++ bool auto_complete, const char *replaces, Error **errp)
+ {
+ bool base_read_only;
+ BlockJob *job;
+@@ -2102,7 +2102,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+ }
+
+ job = mirror_start_job(
+- job_id, bs, creation_flags, base, NULL, speed, 0, 0,
++ job_id, bs, creation_flags, base, replaces, speed, 0, 0,
+ MIRROR_LEAVE_BACKING_CHAIN, false,
+ on_error, on_error, true, cb, opaque,
+ &commit_active_job_driver, MIRROR_SYNC_MODE_FULL,
+diff --git a/block/replication.c b/block/replication.c
+index 0415a5e..debbe25 100644
+--- a/block/replication.c
++++ b/block/replication.c
+@@ -711,7 +711,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
+ s->commit_job = commit_active_start(
+ NULL, bs->file->bs, s->secondary_disk->bs,
+ JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
+- NULL, replication_done, bs, true, errp);
++ NULL, replication_done, bs, true, NULL, errp);
+ bdrv_graph_rdunlock_main_loop();
+ break;
+ default:
+diff --git a/blockdev.c b/blockdev.c
+index cbe2243..349fb71 100644
+--- a/blockdev.c
++++ b/blockdev.c
+@@ -2435,7 +2435,7 @@ void qmp_block_commit(const char *job_id, const char *device,
+ const char *filter_node_name,
+ bool has_auto_finalize, bool auto_finalize,
+ bool has_auto_dismiss, bool auto_dismiss,
+- Error **errp)
++ const char *replaces, Error **errp)
+ {
+ BlockDriverState *bs;
+ BlockDriverState *iter;
+@@ -2596,7 +2596,7 @@ void qmp_block_commit(const char *job_id, const char *device,
+ job_id = bdrv_get_device_name(bs);
+ }
+ commit_active_start(job_id, top_bs, base_bs, job_flags, speed, on_error,
+- filter_node_name, NULL, NULL, false, &local_err);
++ filter_node_name, NULL, NULL, false, replaces, &local_err);
+ } else {
+ BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);
+ if (bdrv_op_is_blocked(overlay_bs, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
+diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
+index f0c642b..194b580 100644
+--- a/include/block/block_int-global-state.h
++++ b/include/block/block_int-global-state.h
+@@ -115,6 +115,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @auto_complete: Auto complete the job.
++ * @replaces: Block graph node name to replace once the commit is done.
+ * @errp: Error object.
+ *
+ */
+@@ -123,7 +124,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
+ int64_t speed, BlockdevOnError on_error,
+ const char *filter_node_name,
+ BlockCompletionFunc *cb, void *opaque,
+- bool auto_complete, Error **errp);
++ bool auto_complete, const char *replaces,
++ Error **errp);
+ /*
+ * mirror_start:
+ * @job_id: The id of the newly-created job, or %NULL to use the
+diff --git a/qapi/block-core.json b/qapi/block-core.json
+index ff441d4..50564c7 100644
+--- a/qapi/block-core.json
++++ b/qapi/block-core.json
+@@ -2098,6 +2098,8 @@
+ # disappear from the query list without user intervention.
+ # Defaults to true. (Since 3.1)
+ #
++# @replaces: graph node name to be replaced base image node.
++#
+ # Features:
+ #
+ # @deprecated: Members @base and @top are deprecated. Use @base-node
+@@ -2125,7 +2127,8 @@
+ '*speed': 'int',
+ '*on-error': 'BlockdevOnError',
+ '*filter-node-name': 'str',
+- '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
++ '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
++ '*replaces': 'str' },
+ 'allow-preconfig': true }
+
+ ##
+diff --git a/qemu-img.c b/qemu-img.c
+index a6c88e0..f6c59bc 100644
+--- a/qemu-img.c
++++ b/qemu-img.c
+@@ -1079,7 +1079,7 @@ static int img_commit(int argc, char **argv)
+
+ commit_active_start("commit", bs, base_bs, JOB_DEFAULT, rate_limit,
+ BLOCKDEV_ON_ERROR_REPORT, NULL, common_block_job_cb,
+- &cbi, false, &local_err);
++ &cbi, false, NULL, &local_err);
+ if (local_err) {
+ goto done;
+ }
+--
+2.39.5
+
diff --git a/debian/patches/series b/debian/patches/series
index 93c97bf..e604a23 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -92,3 +92,4 @@ pve/0048-PVE-backup-fixup-error-handling-for-fleecing.patch
pve/0049-PVE-backup-factor-out-setting-up-snapshot-access-for.patch
pve/0050-PVE-backup-save-device-name-in-device-info-structure.patch
pve/0051-PVE-backup-include-device-name-in-error-when-setting.patch
+pve/0052-block-commit-add-replaces-option.patch
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:17 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
` (12 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 19086 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
Date: Mon, 16 Dec 2024 10:12:16 +0100
Message-ID: <20241216091229.3142660-3-alexandre.derumier@groupe-cyllene.com>
The blockdev chain is:
-throttle-group-node (drive-(ide|scsi|virtio)x)
- format-node (fmt-drive-x)
- file-node (file-drive -x)
fixme: implement iscsi:// path
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 351 +++++++++++++++++++++++++++++++---------------
1 file changed, 237 insertions(+), 114 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 8192599a..2832ed09 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -1464,7 +1464,8 @@ sub print_drivedevice_full {
} else {
$device .= ",bus=ahci$controller.$unit";
}
- $device .= ",drive=drive-$drive_id,id=$drive_id";
+ $device .= ",id=$drive_id";
+ $device .= ",drive=drive-$drive_id" if $device_type ne 'cd' || $drive->{file} ne 'none';
if ($device_type eq 'hd') {
if (my $model = $drive->{model}) {
@@ -1490,6 +1491,13 @@ sub print_drivedevice_full {
$device .= ",serial=$serial";
}
+ my $writecache = $drive->{cache} && $drive->{cache} =~ /^(?:none|writeback|unsafe)$/ ? "on" : "off";
+ $device .= ",write-cache=$writecache" if $drive->{media} && $drive->{media} ne 'cdrom';
+
+ my @qemu_drive_options = qw(heads secs cyls trans rerror werror);
+ foreach my $o (@qemu_drive_options) {
+ $device .= ",$o=$drive->{$o}" if defined($drive->{$o});
+ }
return $device;
}
@@ -1539,145 +1547,256 @@ my sub drive_uses_cache_direct {
return $cache_direct;
}
-sub print_drive_commandline_full {
- my ($storecfg, $vmid, $drive, $live_restore_name, $io_uring) = @_;
+sub print_drive_throttle_group {
+ my ($drive) = @_;
+ #command line can't use the structured json limits option,
+ #so limit params need to use with x- as it's unstable api
+ return if drive_is_cdrom($drive) && $drive->{file} eq 'none';
- my $path;
- my $volid = $drive->{file};
my $drive_id = get_drive_id($drive);
+ my $throttle_group = "throttle-group,id=throttle-drive-$drive_id";
+ foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
+ my ($dir, $qmpname) = @$type;
+
+ if (my $v = $drive->{"mbps$dir"}) {
+ $throttle_group .= ",x-bps$qmpname=".int($v*1024*1024);
+ }
+ if (my $v = $drive->{"mbps${dir}_max"}) {
+ $throttle_group .= ",x-bps$qmpname-max=".int($v*1024*1024);
+ }
+ if (my $v = $drive->{"bps${dir}_max_length"}) {
+ $throttle_group .= ",x-bps$qmpname-max-length=$v";
+ }
+ if (my $v = $drive->{"iops${dir}"}) {
+ $throttle_group .= ",x-iops$qmpname=$v";
+ }
+ if (my $v = $drive->{"iops${dir}_max"}) {
+ $throttle_group .= ",x-iops$qmpname-max=$v";
+ }
+ if (my $v = $drive->{"iops${dir}_max_length"}) {
+ $throttle_group .= ",x-iops$qmpname-max-length=$v";
+ }
+ }
+
+ return $throttle_group;
+}
+
+sub generate_file_blockdev {
+ my ($storecfg, $drive, $nodename) = @_;
+
+ my $volid = $drive->{file};
my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
- my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
- if (drive_is_cdrom($drive)) {
- $path = get_iso_path($storecfg, $vmid, $volid);
- die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
+ my $scfg = undef;
+ my $path = $volid;
+ if($storeid && $storeid ne 'nbd') {
+ $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+ $path = PVE::Storage::path($storecfg, $volid);
+ }
+
+ my $blockdev = {};
+
+ if ($path =~ m/^rbd:(\S+)$/) {
+
+ $blockdev->{driver} = 'rbd';
+
+ my @rbd_options = split(/:/, $1);
+ my $keyring = undef;
+ for my $option (@rbd_options) {
+ if ($option =~ m/^(\S+)=(\S+)$/) {
+ my $key = $1;
+ my $value = $2;
+ $blockdev->{'auth-client-required'} = [$value] if $key eq 'auth_supported';
+ $blockdev->{'conf'} = $value if $key eq 'conf';
+ $blockdev->{'user'} = $value if $key eq 'id';
+ $keyring = $value if $key eq 'keyring';
+ if ($key eq 'mon_host') {
+ my $server = [];
+ my @mons = split(';', $value);
+ for my $mon (@mons) {
+ my ($host, $port) = PVE::Tools::parse_host_and_port($mon);
+ $port = '3300' if !$port;
+ push @$server, { host => $host, port => $port };
+ }
+ $blockdev->{server} = $server;
+ }
+ } elsif ($option =~ m|^(\S+)/(\S+)$|){
+ $blockdev->{pool} = $1;
+ my $image = $2;
+
+ if($image =~ m|^(\S+)/(\S+)$|) {
+ $blockdev->{namespace} = $1;
+ $blockdev->{image} = $2;
+ } else {
+ $blockdev->{image} = $image;
+ }
+ }
+ }
+
+ if($keyring && $blockdev->{server}) {
+ #qemu devs are removed passing arbitrary values to blockdev object, and don't have added
+ #keyring to the list of allowed keys. It need to be defined in the store ceph.conf.
+ #https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02676.html
+ #another way could be to simply patch qemu to allow the key
+ my $ceph_conf = "/etc/pve/priv/ceph/${storeid}.conf";
+ $blockdev->{conf} = $ceph_conf;
+ if (!-e $ceph_conf) {
+ my $content = "[global]\nkeyring = $keyring\n";
+ PVE::Tools::file_set_contents($ceph_conf, $content, 0400);
+ }
+ }
+ } elsif ($path =~ m/^nbd:(\S+):(\d+):exportname=(\S+)$/) {
+ my $server = { type => 'inet', host => $1, port => $2 };
+ $blockdev = { driver => 'nbd', server => $server, export => $3 };
+ } elsif ($path =~ m/^nbd:unix:(\S+):exportname=(\S+)$/) {
+ my $server = { type => 'unix', path => $1 };
+ $blockdev = { driver => 'nbd', server => $server, export => $2 };
+ } elsif ($path =~ m|^gluster(\+(tcp\|unix\|rdma))?://(.*)/(.*)/(images/(\S+)/(\S+))$|) {
+ my $protocol = $2 ? $2 : 'inet';
+ $protocol = 'inet' if $protocol eq 'tcp';
+ my $server = [{ type => $protocol, host => $3, port => '24007' }];
+ $blockdev = { driver => 'gluster', server => $server, volume => $4, path => $5 };
+ } elsif ($path =~ m/^\/dev/) {
+ my $driver = drive_is_cdrom($drive) ? 'host_cdrom' : 'host_device';
+ $blockdev = { driver => $driver, filename => $path };
+ } elsif ($path =~ m/^\//) {
+ $blockdev = { driver => 'file', filename => $path};
} else {
- if ($storeid) {
- $path = PVE::Storage::path($storecfg, $volid);
- } else {
- $path = $volid;
+ die "unsupported path: $path\n";
+ #fixme
+ #'{"driver":"iscsi","portal":"iscsi.example.com:3260","target":"demo-target","lun":3,"transport":"tcp"}'
+ }
+
+ my $cache_direct = drive_uses_cache_direct($drive, $scfg);
+ my $cache = {};
+ $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
+ $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
+ $blockdev->{cache} = $cache;
+
+ ##aio
+ if($blockdev->{filename}) {
+ $drive->{aio} = 'threads' if drive_is_cdrom($drive);
+ my $aio = $drive->{aio};
+ if (!$aio) {
+ if (storage_allows_io_uring_default($scfg, $cache_direct)) {
+ # io_uring supports all cache modes
+ $aio = "io_uring";
+ } else {
+ # aio native works only with O_DIRECT
+ if($cache_direct) {
+ $aio = "native";
+ } else {
+ $aio = "threads";
+ }
+ }
}
+ $blockdev->{aio} = $aio;
}
- # For PVE-managed volumes, use the format from the storage layer and prevent overrides via the
- # drive's 'format' option. For unmanaged volumes, fallback to 'raw' to avoid auto-detection by
- # QEMU. For the special case 'none' (get_iso_path() returns an empty $path), there should be no
- # format or QEMU won't start.
- my $format;
- if (drive_is_cdrom($drive) && !$path) {
- # no format
- } elsif ($storeid) {
- $format = checked_volume_format($storecfg, $volid);
+ ##discard && detect-zeroes
+ my $discard = 'ignore';
+ if($drive->{discard}) {
+ $discard = $drive->{discard};
+ $discard = 'unmap' if $discard eq 'on';
+ }
+ $blockdev->{discard} = $discard if !drive_is_cdrom($drive);
- if ($drive->{format} && $drive->{format} ne $format) {
- die "drive '$drive->{interface}$drive->{index}' - volume '$volid'"
- ." - 'format=$drive->{format}' option different from storage format '$format'\n";
- }
+ my $detectzeroes;
+ if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
+ $detectzeroes = 'off';
+ } elsif ($drive->{discard}) {
+ $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
} else {
- $format = $drive->{format} // 'raw';
+ # This used to be our default with discard not being specified:
+ $detectzeroes = 'on';
}
+ $blockdev->{'detect-zeroes'} = $detectzeroes if !drive_is_cdrom($drive);
+ $blockdev->{'node-name'} = $nodename if $nodename;
- my $is_rbd = $path =~ m/^rbd:/;
+ return $blockdev;
+}
- my $opts = '';
- my @qemu_drive_options = qw(heads secs cyls trans media cache rerror werror aio discard);
- foreach my $o (@qemu_drive_options) {
- $opts .= ",$o=$drive->{$o}" if defined($drive->{$o});
- }
+sub generate_format_blockdev {
+ my ($storecfg, $drive, $nodename, $file, $force_readonly) = @_;
- # snapshot only accepts on|off
- if (defined($drive->{snapshot})) {
- my $v = $drive->{snapshot} ? 'on' : 'off';
- $opts .= ",snapshot=$v";
- }
+ my $volid = $drive->{file};
+ my $scfg = undef;
+ my $path = $volid;
+ my $format = $drive->{format};
+ $format //= "raw";
- if (defined($drive->{ro})) { # ro maps to QEMUs `readonly`, which accepts `on` or `off` only
- $opts .= ",readonly=" . ($drive->{ro} ? 'on' : 'off');
- }
+ my $drive_id = get_drive_id($drive);
- foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
- my ($dir, $qmpname) = @$type;
- if (my $v = $drive->{"mbps$dir"}) {
- $opts .= ",throttling.bps$qmpname=".int($v*1024*1024);
- }
- if (my $v = $drive->{"mbps${dir}_max"}) {
- $opts .= ",throttling.bps$qmpname-max=".int($v*1024*1024);
- }
- if (my $v = $drive->{"bps${dir}_max_length"}) {
- $opts .= ",throttling.bps$qmpname-max-length=$v";
- }
- if (my $v = $drive->{"iops${dir}"}) {
- $opts .= ",throttling.iops$qmpname=$v";
- }
- if (my $v = $drive->{"iops${dir}_max"}) {
- $opts .= ",throttling.iops$qmpname-max=$v";
- }
- if (my $v = $drive->{"iops${dir}_max_length"}) {
- $opts .= ",throttling.iops$qmpname-max-length=$v";
- }
+ if ($drive->{zeroinit}) {
+ #fixme how to handle zeroinit ? insert special blockdev filter ?
}
- if ($live_restore_name) {
- $format = "rbd" if $is_rbd;
- die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
- if !$format;
- $opts .= ",format=alloc-track,file.driver=$format";
- } elsif ($format) {
- $opts .= ",format=$format";
+ my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
+
+ if($storeid) {
+ $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+ $format = checked_volume_format($storecfg, $volid);
+ $path = PVE::Storage::path($storecfg, $volid);
}
+ my $readonly = defined($drive->{ro}) || $force_readonly ? JSON::true : JSON::false;
+
+ #libvirt define cache option on both format && file
my $cache_direct = drive_uses_cache_direct($drive, $scfg);
+ my $cache = {};
+ $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
+ $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
- $opts .= ",cache=none" if !$drive->{cache} && $cache_direct;
+ my $blockdev = { driver => $format, file => $file, cache => $cache, 'read-only' => $readonly };
+ $blockdev->{'node-name'} = $nodename if $nodename;
- if (!$drive->{aio}) {
- if ($io_uring && storage_allows_io_uring_default($scfg, $cache_direct)) {
- # io_uring supports all cache modes
- $opts .= ",aio=io_uring";
- } else {
- # aio native works only with O_DIRECT
- if($cache_direct) {
- $opts .= ",aio=native";
- } else {
- $opts .= ",aio=threads";
- }
- }
- }
+ return $blockdev;
- if (!drive_is_cdrom($drive)) {
- my $detectzeroes;
- if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
- $detectzeroes = 'off';
- } elsif ($drive->{discard}) {
- $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
- } else {
- # This used to be our default with discard not being specified:
- $detectzeroes = 'on';
- }
+}
- # note: 'detect-zeroes' works per blockdev and we want it to persist
- # after the alloc-track is removed, so put it on 'file' directly
- my $dz_param = $live_restore_name ? "file.detect-zeroes" : "detect-zeroes";
- $opts .= ",$dz_param=$detectzeroes" if $detectzeroes;
- }
+sub generate_drive_blockdev {
+ my ($storecfg, $vmid, $drive, $force_readonly, $live_restore_name) = @_;
- if ($live_restore_name) {
- $opts .= ",backing=$live_restore_name";
- $opts .= ",auto-remove=on";
+ my $path;
+ my $volid = $drive->{file};
+ my $format = $drive->{format};
+ my $drive_id = get_drive_id($drive);
+
+ my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
+ my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
+
+ my $blockdevs = [];
+
+ if (drive_is_cdrom($drive)) {
+ die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
+
+ $path = get_iso_path($storecfg, $vmid, $volid);
+ return if !$path;
+ $force_readonly = 1;
}
- # my $file_param = $live_restore_name ? "file.file.filename" : "file";
- my $file_param = "file";
+ my $file_nodename = "file-drive-$drive_id";
+ my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
+ my $fmt_nodename = "fmt-drive-$drive_id";
+ my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
+
+ my $blockdev_live_restore = undef;
if ($live_restore_name) {
- # non-rbd drivers require the underlying file to be a separate block
- # node, so add a second .file indirection
- $file_param .= ".file" if !$is_rbd;
- $file_param .= ".filename";
+ die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
+ if !$format;
+
+ $blockdev_live_restore = { 'node-name' => "liverestore-drive-$drive_id",
+ backing => $live_restore_name,
+ 'auto-remove' => 'on', format => "alloc-track",
+ file => $blockdev_format };
}
- my $pathinfo = $path ? "$file_param=$path," : '';
- return "${pathinfo}if=none,id=drive-$drive->{interface}$drive->{index}$opts";
+ #this is the topfilter entry point, use $drive-drive_id as nodename
+ my $blockdev_throttle = { driver => "throttle", 'node-name' => "drive-$drive_id", 'throttle-group' => "throttle-drive-$drive_id" };
+ #put liverestore filter between throttle && format filter
+ $blockdev_throttle->{file} = $live_restore_name ? $blockdev_live_restore : $blockdev_format;
+ return $blockdev_throttle,
}
sub print_pbs_blockdev {
@@ -4091,13 +4210,13 @@ sub config_to_command {
push @$devices, '-blockdev', $live_restore->{blockdev};
}
- my $drive_cmd = print_drive_commandline_full(
- $storecfg, $vmid, $drive, $live_blockdev_name, min_version($kvmver, 6, 0));
-
- # extra protection for templates, but SATA and IDE don't support it..
- $drive_cmd .= ',readonly=on' if drive_is_read_only($conf, $drive);
+ my $throttle_group = print_drive_throttle_group($drive);
+ push @$devices, '-object', $throttle_group if $throttle_group;
- push @$devices, '-drive',$drive_cmd;
+# # extra protection for templates, but SATA and IDE don't support it..
+ my $force_readonly = drive_is_read_only($conf, $drive);
+ my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive, $force_readonly, $live_blockdev_name);
+ push @$devices, '-blockdev', encode_json_ordered($blockdev) if $blockdev;
push @$devices, '-device', print_drivedevice_full(
$storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
});
@@ -8986,4 +9105,8 @@ sub delete_ifaces_ipams_ips {
}
}
+sub encode_json_ordered {
+ return JSON->new->canonical->allow_nonref->encode( $_[0] );
+}
+
1;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 12:36 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
` (11 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 13036 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Mon, 16 Dec 2024 10:12:17 +0100
Message-ID: <20241216091229.3142660-4-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
src/PVE/Storage/DirPlugin.pm | 1 +
src/PVE/Storage/Plugin.pm | 207 +++++++++++++++++++++++++++++------
2 files changed, 176 insertions(+), 32 deletions(-)
diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
index fb23e0a..1cd7ac3 100644
--- a/src/PVE/Storage/DirPlugin.pm
+++ b/src/PVE/Storage/DirPlugin.pm
@@ -81,6 +81,7 @@ sub options {
is_mountpoint => { optional => 1 },
bwlimit => { optional => 1 },
preallocation => { optional => 1 },
+ snapext => { optional => 1 },
};
}
diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
index fececa1..aeba8d3 100644
--- a/src/PVE/Storage/Plugin.pm
+++ b/src/PVE/Storage/Plugin.pm
@@ -214,6 +214,11 @@ my $defaultData = {
maximum => 65535,
optional => 1,
},
+ 'snapext' => {
+ type => 'boolean',
+ description => 'enable external snapshot.',
+ optional => 1,
+ },
},
};
@@ -710,11 +715,15 @@ sub filesystem_path {
# Note: qcow2/qed has internal snapshot, so path is always
# the same (with or without snapshot => same file).
die "can't snapshot this image format\n"
- if defined($snapname) && $format !~ m/^(qcow2|qed)$/;
+ if defined($snapname) && !$scfg->{snapext} && $format !~ m/^(qcow2|qed)$/;
my $dir = $class->get_subdir($scfg, $vtype);
- $dir .= "/$vmid" if $vtype eq 'images';
+ if ($scfg->{snapext} && $snapname) {
+ $name = $class->get_snap_volname($volname, $snapname);
+ } else {
+ $dir .= "/$vmid" if $vtype eq 'images';
+ }
my $path = "$dir/$name";
@@ -953,6 +962,31 @@ sub free_image {
# TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
+sub qemu_img_info {
+ my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
+
+ my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
+ push $cmd->@*, '-f', $file_format if $file_format;
+ push $cmd->@*, '--backing-chain' if $follow_backing_files;
+
+ my $json = '';
+ my $err_output = '';
+ eval {
+ run_command($cmd,
+ timeout => $timeout,
+ outfunc => sub { $json .= shift },
+ errfunc => sub { $err_output .= shift . "\n"},
+ );
+ };
+ warn $@ if $@;
+ if ($err_output) {
+ # if qemu did not output anything to stdout we die with stderr as an error
+ die $err_output if !$json;
+ # otherwise we warn about it and try to parse the json
+ warn $err_output;
+ }
+ return $json;
+}
# set $untrusted if the file in question might be malicious since it isn't
# created by our stack
# this makes certain checks fatal, and adds extra checks for known problems like
@@ -1016,25 +1050,9 @@ sub file_size_info {
warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
$file_format = 'raw';
}
- my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
- push $cmd->@*, '-f', $file_format if $file_format;
- my $json = '';
- my $err_output = '';
- eval {
- run_command($cmd,
- timeout => $timeout,
- outfunc => sub { $json .= shift },
- errfunc => sub { $err_output .= shift . "\n"},
- );
- };
- warn $@ if $@;
- if ($err_output) {
- # if qemu did not output anything to stdout we die with stderr as an error
- die $err_output if !$json;
- # otherwise we warn about it and try to parse the json
- warn $err_output;
- }
+ my $json = qemu_img_info($filename, $file_format, $timeout);
+
if (!$json) {
die "failed to query file information with qemu-img\n" if $untrusted;
# skip decoding if there was no output, e.g. if there was a timeout.
@@ -1162,11 +1180,28 @@ sub volume_snapshot {
die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
- my $path = $class->filesystem_path($scfg, $volname);
+ if($scfg->{snapext}) {
- my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
+ my $path = $class->path($scfg, $volname, $storeid);
+ my $snappath = $class->path($scfg, $volname, $storeid, $snap);
+ my $format = ($class->parse_volname($volname))[6];
+ #rename current volume to snap volume
+ rename($path, $snappath) if -e $path && !-e $snappath;
+
+ my $cmd = ['/usr/bin/qemu-img', 'create', '-b', $snappath,
+ '-F', $format, '-f', 'qcow2', $path];
+
+ my $options = "extended_l2=on,cluster_size=128k,";
+ $options .= preallocation_cmd_option($scfg, 'qcow2');
+ push @$cmd, '-o', $options;
+ run_command($cmd);
- run_command($cmd);
+ } else {
+
+ my $path = $class->filesystem_path($scfg, $volname);
+ my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
+ run_command($cmd);
+ }
return undef;
}
@@ -1177,6 +1212,21 @@ sub volume_snapshot {
sub volume_rollback_is_possible {
my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
+ if ($scfg->{snapext}) {
+ #technically, we could manage multibranch, we it need lot more work for snapshot delete
+ #we need to implemente block-stream from deleted snapshot to all others child branchs
+ #when online, we need to do a transaction for multiple disk when delete the last snapshot
+ #and need to merge in current running file
+
+ my $snappath = $class->path($scfg, $volname, $storeid, $snap);
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $parentsnap = $snapshots->{current}->{parent};
+
+ return 1 if !-e $snappath || $snapshots->{$parentsnap}->{file} eq $snappath;
+
+ die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
+ }
+
return 1;
}
@@ -1187,9 +1237,15 @@ sub volume_snapshot_rollback {
my $path = $class->filesystem_path($scfg, $volname);
- my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
-
- run_command($cmd);
+ if ($scfg->{snapext}) {
+ #simply delete the current snapshot and recreate it
+ my $path = $class->filesystem_path($scfg, $volname);
+ unlink($path);
+ $class->volume_snapshot($scfg, $storeid, $volname, $snap);
+ } else {
+ my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
+ run_command($cmd);
+ }
return undef;
}
@@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
return 1 if $running;
+ my $cmd = "";
my $path = $class->filesystem_path($scfg, $volname);
- $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
+ if ($scfg->{snapext}) {
- my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $snappath = $snapshots->{$snap}->{file};
+ return if !-e $snappath; #already deleted ?
+
+ my $parentsnap = $snapshots->{$snap}->{parent};
+ my $childsnap = $snapshots->{$snap}->{child};
+
+ my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
+ my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
+
+
+ #if first snapshot, we merge child, and rename the snapshot to child
+ if(!$parentsnap) {
+ #we use commit here, as it's faster than rebase
+ #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
+ print"commit $childpath\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
+ run_command($cmd);
+ print"delete $childpath\n";
+
+ unlink($childpath);
+ print"rename $snappath to $childpath\n";
+ rename($snappath, $childpath);
+ } else {
+ print"commit $snappath\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
+ #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
+ die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;
+ print "link $childsnap to $parentsnap\n";
+ $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];
+ run_command($cmd);
+ #delete the snapshot
+ unlink($snappath);
+ }
+
+ } else {
+ $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
- run_command($cmd);
+ $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
+ run_command($cmd);
+ }
return undef;
}
@@ -1246,8 +1341,8 @@ sub volume_has_feature {
current => { qcow2 => 1, raw => 1, vmdk => 1 },
},
rename => {
- current => {qcow2 => 1, raw => 1, vmdk => 1},
- },
+ current => { qcow2 => 1, raw => 1, vmdk => 1},
+ }
};
if ($feature eq 'clone') {
@@ -1481,7 +1576,37 @@ sub status {
sub volume_snapshot_info {
my ($class, $scfg, $storeid, $volname) = @_;
- die "volume_snapshot_info is not implemented for $class";
+ my $path = $class->filesystem_path($scfg, $volname);
+
+ my $backing_chain = 1;
+ my $json = qemu_img_info($path, undef, 10, $backing_chain);
+ die "failed to query file information with qemu-img\n" if !$json;
+ my $snapshots = eval { decode_json($json) };
+
+ my $info = {};
+ my $order = 0;
+ for my $snap (@$snapshots) {
+
+ my $snapfile = $snap->{filename};
+ my $snapname = parse_snapname($snapfile);
+ $snapname = 'current' if !$snapname;
+ my $snapvolname = $class->get_snap_volname($volname, $snapname);
+
+ $info->{$snapname}->{order} = $order;
+ $info->{$snapname}->{file}= $snapfile;
+ $info->{$snapname}->{volname} = $snapvolname;
+ $info->{$snapname}->{volid} = "$storeid:$snapvolname";
+ $info->{$snapname}->{ext} = 1;
+
+ my $parentfile = $snap->{'backing-filename'};
+ if ($parentfile) {
+ my $parentname = parse_snapname($parentfile);
+ $info->{$snapname}->{parent} = $parentname;
+ $info->{$parentname}->{child} = $snapname;
+ }
+ $order++;
+ }
+ return $info;
}
sub activate_storage {
@@ -1867,4 +1992,22 @@ sub config_aware_base_mkdir {
}
}
+sub get_snap_volname {
+ my ($class, $volname, $snapname) = @_;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
+ $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";
+ return $name;
+}
+
+sub parse_snapname {
+ my ($name) = @_;
+
+ my $basename = basename($name);
+ if ($basename =~ m/^snap-(.*)-vm(.*)$/) {
+ return $1;
+ }
+ return undef;
+}
+
1;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (2 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
` (10 subsequent siblings)
14 siblings, 0 replies; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 40905 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests
Date: Mon, 16 Dec 2024 10:12:18 +0100
Message-ID: <20241216091229.3142660-5-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
test/cfg2cmd/bootorder-empty.conf.cmd | 12 ++++++----
test/cfg2cmd/bootorder-legacy.conf.cmd | 12 ++++++----
test/cfg2cmd/bootorder.conf.cmd | 12 ++++++----
...putype-icelake-client-deprecation.conf.cmd | 6 ++---
test/cfg2cmd/ide.conf.cmd | 23 +++++++++++--------
test/cfg2cmd/pinned-version-pxe-pve.conf.cmd | 6 ++---
test/cfg2cmd/pinned-version-pxe.conf.cmd | 6 ++---
test/cfg2cmd/pinned-version.conf.cmd | 6 ++---
test/cfg2cmd/q35-ide.conf.cmd | 23 +++++++++++--------
.../q35-linux-hostpci-template.conf.cmd | 3 ++-
test/cfg2cmd/seabios_serial.conf.cmd | 6 ++---
...imple-balloon-free-page-reporting.conf.cmd | 6 ++---
test/cfg2cmd/simple-btrfs.conf.cmd | 6 ++---
test/cfg2cmd/simple-virtio-blk.conf.cmd | 6 ++---
test/cfg2cmd/simple1-template.conf.cmd | 11 +++++----
test/cfg2cmd/simple1.conf.cmd | 6 ++---
16 files changed, 84 insertions(+), 66 deletions(-)
diff --git a/test/cfg2cmd/bootorder-empty.conf.cmd b/test/cfg2cmd/bootorder-empty.conf.cmd
index 87fa6c28..7a7f96cf 100644
--- a/test/cfg2cmd/bootorder-empty.conf.cmd
+++ b/test/cfg2cmd/bootorder-empty.conf.cmd
@@ -25,14 +25,16 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2' \
-device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi4' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi4"},"node-name":"fmt-drive-scsi4","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}'
-device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio1"},"node-name":"fmt-drive-virtio1","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256' \
diff --git a/test/cfg2cmd/bootorder-legacy.conf.cmd b/test/cfg2cmd/bootorder-legacy.conf.cmd
index a4c3f050..b8ba1588 100644
--- a/test/cfg2cmd/bootorder-legacy.conf.cmd
+++ b/test/cfg2cmd/bootorder-legacy.conf.cmd
@@ -25,14 +25,16 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi4' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi4"},"node-name":"fmt-drive-scsi4","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}' \
-device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio1"},"node-name":"fmt-drive-virtio1","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1,bootindex=302' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=100' \
diff --git a/test/cfg2cmd/bootorder.conf.cmd b/test/cfg2cmd/bootorder.conf.cmd
index 76bd55d7..a119579b 100644
--- a/test/cfg2cmd/bootorder.conf.cmd
+++ b/test/cfg2cmd/bootorder.conf.cmd
@@ -25,14 +25,16 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=103' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=103' \
-device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi4,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi4' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi4"},"node-name":"fmt-drive-scsi4","read-only":false},"node-name":"drive-scsi4","throttle-group":"throttle-drive-scsi4"}' \
-device 'scsi-hd,bus=scsihw0.0,scsi-id=4,drive=drive-scsi4,id=scsi4,bootindex=102' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio1,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-virtio1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio1"},"node-name":"fmt-drive-virtio1","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb,iothread=iothread-virtio1,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=101' \
diff --git a/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd b/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
index bf084432..6b9d587c 100644
--- a/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
+++ b/test/cfg2cmd/cputype-icelake-client-deprecation.conf.cmd
@@ -23,9 +23,9 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/base-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0'
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}'
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-machine 'type=pc+pve0'
diff --git a/test/cfg2cmd/ide.conf.cmd b/test/cfg2cmd/ide.conf.cmd
index 33c6aadc..f465d072 100644
--- a/test/cfg2cmd/ide.conf.cmd
+++ b/test/cfg2cmd/ide.conf.cmd
@@ -23,16 +23,21 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/zero.iso,if=none,id=drive-ide0,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/one.iso,if=none,id=drive-ide1,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.0,unit=1,drive=drive-ide1,id=ide1,bootindex=201' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/two.iso,if=none,id=drive-ide2,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=202' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/three.iso,if=none,id=drive-ide3,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.1,unit=1,drive=drive-ide3,id=ide3,bootindex=203' \
+ -object 'throttle-group,id=throttle-drive-ide0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/zero.iso","node-name":"file-drive-ide0"},"node-name":"fmt-drive-ide0","read-only":true},"node-name":"drive-ide0","throttle-group":"throttle-drive-ide0"}' \
+ -device 'ide-cd,bus=ide.0,unit=0,id=ide0,drive=drive-ide0,bootindex=200' \
+ -object 'throttle-group,id=throttle-drive-ide1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/one.iso","node-name":"file-drive-ide1"},"node-name":"fmt-drive-ide1","read-only":true},"node-name":"drive-ide1","throttle-group":"throttle-drive-ide1"}' \
+ -device 'ide-cd,bus=ide.0,unit=1,id=ide1,drive=drive-ide1,bootindex=201' \
+ -object 'throttle-group,id=throttle-drive-ide2' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/two.iso","node-name":"file-drive-ide2"},"node-name":"fmt-drive-ide2","read-only":true},"node-name":"drive-ide2","throttle-group":"throttle-drive-ide2"}' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,drive=drive-ide2,bootindex=202' \
+ -object 'throttle-group,id=throttle-drive-ide3' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/three.iso","node-name":"file-drive-ide3"},"node-name":"fmt-drive-ide3","read-only":true},"node-name":"drive-ide3","throttle-group":"throttle-drive-ide3"}' \
+ -device 'ide-cd,bus=ide.1,unit=1,id=ide3,drive=drive-ide3,bootindex=203' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/100/vm-100-disk-2.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/vm-100-disk-2.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=2E:01:68:F9:9C:87,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd b/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
index d17d4deb..cb880681 100644
--- a/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
+++ b/test/cfg2cmd/pinned-version-pxe-pve.conf.cmd
@@ -23,10 +23,10 @@
-device 'virtio-rng-pci,rng=rng0,max-bytes=1024,period=1000,bus=pci.1,addr=0x1d' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300,romfile=pxe-virtio.rom' \
diff --git a/test/cfg2cmd/pinned-version-pxe.conf.cmd b/test/cfg2cmd/pinned-version-pxe.conf.cmd
index 892fc148..a4dddf3e 100644
--- a/test/cfg2cmd/pinned-version-pxe.conf.cmd
+++ b/test/cfg2cmd/pinned-version-pxe.conf.cmd
@@ -21,10 +21,10 @@
-device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300,romfile=pxe-virtio.rom' \
diff --git a/test/cfg2cmd/pinned-version.conf.cmd b/test/cfg2cmd/pinned-version.conf.cmd
index 13361edf..cde4d273 100644
--- a/test/cfg2cmd/pinned-version.conf.cmd
+++ b/test/cfg2cmd/pinned-version.conf.cmd
@@ -21,10 +21,10 @@
-device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.raw,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
diff --git a/test/cfg2cmd/q35-ide.conf.cmd b/test/cfg2cmd/q35-ide.conf.cmd
index dd4f1bbe..c7ca20c1 100644
--- a/test/cfg2cmd/q35-ide.conf.cmd
+++ b/test/cfg2cmd/q35-ide.conf.cmd
@@ -22,16 +22,21 @@
-device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/zero.iso,if=none,id=drive-ide0,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/one.iso,if=none,id=drive-ide1,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.2,unit=0,drive=drive-ide1,id=ide1,bootindex=201' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/two.iso,if=none,id=drive-ide2,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=202' \
- -drive 'file=/mnt/pve/cifs-store/template/iso/three.iso,if=none,id=drive-ide3,media=cdrom,format=raw,aio=threads' \
- -device 'ide-cd,bus=ide.3,unit=0,drive=drive-ide3,id=ide3,bootindex=203' \
+ -object 'throttle-group,id=throttle-drive-ide0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/zero.iso","node-name":"file-drive-ide0"},"node-name":"fmt-drive-ide0","read-only":true},"node-name":"drive-ide0","throttle-group":"throttle-drive-ide0"}' \
+ -device 'ide-cd,bus=ide.0,unit=0,id=ide0,drive=drive-ide0,bootindex=200' \
+ -object 'throttle-group,id=throttle-drive-ide1' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/one.iso","node-name":"file-drive-ide1"},"node-name":"fmt-drive-ide1","read-only":true},"node-name":"drive-ide1","throttle-group":"throttle-drive-ide1"}' \
+ -device 'ide-cd,bus=ide.2,unit=0,id=ide1,drive=drive-ide1,bootindex=201' \
+ -object 'throttle-group,id=throttle-drive-ide2' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/two.iso","node-name":"file-drive-ide2"},"node-name":"fmt-drive-ide2","read-only":true},"node-name":"drive-ide2","throttle-group":"throttle-drive-ide2"}' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,drive=drive-ide2,bootindex=202' \
+ -object 'throttle-group,id=throttle-drive-ide3' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"threads","cache":{"direct":false,"no-flush":false},"driver":"file","filename":"/mnt/pve/cifs-store/template/iso/three.iso","node-name":"file-drive-ide3"},"node-name":"fmt-drive-ide3","read-only":true},"node-name":"drive-ide3","throttle-group":"throttle-drive-ide3"}' \
+ -device 'ide-cd,bus=ide.3,unit=0,id=ide3,drive=drive-ide3,bootindex=203' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/100/vm-100-disk-2.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/vm-100-disk-2.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=2E:01:68:F9:9C:87,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd b/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
index cda10630..63c9fbe6 100644
--- a/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
+++ b/test/cfg2cmd/q35-linux-hostpci-template.conf.cmd
@@ -24,7 +24,8 @@
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/100/base-100-disk-2.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on,readonly=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/100/base-100-disk-2.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":true},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
-machine 'accel=tcg,type=pc+pve0' \
-snapshot
diff --git a/test/cfg2cmd/seabios_serial.conf.cmd b/test/cfg2cmd/seabios_serial.conf.cmd
index 1c4e102c..c3597ad1 100644
--- a/test/cfg2cmd/seabios_serial.conf.cmd
+++ b/test/cfg2cmd/seabios_serial.conf.cmd
@@ -23,10 +23,10 @@
-device 'isa-serial,chardev=serial0' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd b/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
index 097a14e1..d7fbe2ca 100644
--- a/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
+++ b/test/cfg2cmd/simple-balloon-free-page-reporting.conf.cmd
@@ -23,10 +23,10 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
diff --git a/test/cfg2cmd/simple-btrfs.conf.cmd b/test/cfg2cmd/simple-btrfs.conf.cmd
index c2354887..879ca729 100644
--- a/test/cfg2cmd/simple-btrfs.conf.cmd
+++ b/test/cfg2cmd/simple-btrfs.conf.cmd
@@ -23,10 +23,10 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/butter/bread/images/8006/vm-8006-disk-0/disk.raw,if=none,id=drive-scsi0,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":false,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":false,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/butter/bread/images/8006/vm-8006-disk-0/disk.raw","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple-virtio-blk.conf.cmd b/test/cfg2cmd/simple-virtio-blk.conf.cmd
index d19aca6b..bd4dc308 100644
--- a/test/cfg2cmd/simple-virtio-blk.conf.cmd
+++ b/test/cfg2cmd/simple-virtio-blk.conf.cmd
@@ -24,9 +24,9 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
+ -object 'throttle-group,id=throttle-drive-virtio0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-virtio0"},"node-name":"fmt-drive-virtio0","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
diff --git a/test/cfg2cmd/simple1-template.conf.cmd b/test/cfg2cmd/simple1-template.conf.cmd
index 35484600..7f9ae106 100644
--- a/test/cfg2cmd/simple1-template.conf.cmd
+++ b/test/cfg2cmd/simple1-template.conf.cmd
@@ -21,13 +21,14 @@
-device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/base-8006-disk-1.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap,readonly=on' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-1.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":true},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
-device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
- -drive 'file=/var/lib/vz/images/8006/base-8006-disk-0.qcow2,if=none,id=drive-sata0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
- -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
+ -object 'throttle-group,id=throttle-drive-sata0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/base-8006-disk-0.qcow2","node-name":"file-drive-sata0"},"node-name":"fmt-drive-sata0","read-only":false},"node-name":"drive-sata0","throttle-group":"throttle-drive-sata0"}' \
+ -device 'ide-hd,bus=ahci0.0,id=sata0,drive=drive-sata0' \
-machine 'accel=tcg,smm=off,type=pc+pve0' \
-snapshot
diff --git a/test/cfg2cmd/simple1.conf.cmd b/test/cfg2cmd/simple1.conf.cmd
index ecd14bcc..df35e030 100644
--- a/test/cfg2cmd/simple1.conf.cmd
+++ b/test/cfg2cmd/simple1.conf.cmd
@@ -23,10 +23,10 @@
-device 'VGA,id=vga,bus=pci.0,addr=0x2' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
- -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
- -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
+ -device 'ide-cd,bus=ide.1,unit=0,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
- -drive 'file=/var/lib/vz/images/8006/vm-8006-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' \
+ -object 'throttle-group,id=throttle-drive-scsi0' \
+ -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"unmap","discard":"unmap","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.qcow2","node-name":"file-drive-scsi0"},"node-name":"fmt-drive-scsi0","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
-device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap8006i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=A2:C0:43:77:08:A0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=300' \
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (3 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 13:55 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
` (9 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 15525 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
Date: Mon, 16 Dec 2024 10:12:19 +0100
Message-ID: <20241216091229.3142660-6-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
src/PVE/Storage/LVMPlugin.pm | 231 ++++++++++++++++++++++++++++++++---
1 file changed, 213 insertions(+), 18 deletions(-)
diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
index 88fd612..1257cd3 100644
--- a/src/PVE/Storage/LVMPlugin.pm
+++ b/src/PVE/Storage/LVMPlugin.pm
@@ -4,6 +4,7 @@ use strict;
use warnings;
use IO::File;
+use POSIX qw/ceil/;
use PVE::Tools qw(run_command trim);
use PVE::Storage::Plugin;
@@ -216,6 +217,7 @@ sub type {
sub plugindata {
return {
content => [ {images => 1, rootdir => 1}, { images => 1 }],
+ format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
};
}
@@ -291,7 +293,10 @@ sub parse_volname {
PVE::Storage::Plugin::parse_lvm_name($volname);
if ($volname =~ m/^(vm-(\d+)-\S+)$/) {
- return ('images', $1, $2, undef, undef, undef, 'raw');
+ my $name = $1;
+ my $vmid = $2;
+ my $format = $volname =~ m/\.qcow2$/ ? 'qcow2' : 'raw';
+ return ('images', $name, $vmid, undef, undef, undef, $format);
}
die "unable to parse lvm volume name '$volname'\n";
@@ -300,11 +305,13 @@ sub parse_volname {
sub filesystem_path {
my ($class, $scfg, $volname, $snapname) = @_;
- die "lvm snapshot is not implemented"if defined($snapname);
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
- my ($vtype, $name, $vmid) = $class->parse_volname($volname);
+ die "snapshot is working with qcow2 format only" if defined($snapname) && $format ne 'qcow2';
my $vg = $scfg->{vgname};
+ $name = $class->get_snap_volname($volname, $snapname) if $snapname;
my $path = "/dev/$vg/$name";
@@ -332,7 +339,9 @@ sub find_free_diskname {
my $disk_list = [ keys %{$lvs->{$vg}} ];
- return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, undef, $scfg);
+ $add_fmt_suffix = $fmt eq 'qcow2' ? 1 : undef;
+
+ return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix);
}
sub lvcreate {
@@ -363,7 +372,15 @@ sub lvrename {
sub alloc_image {
my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
- die "unsupported format '$fmt'" if $fmt ne 'raw';
+ die "unsupported format '$fmt'" if $fmt !~ m/(raw|qcow2)/;
+
+ $name = $class->alloc_new_image($storeid, $scfg, $vmid, $fmt, $name, $size);
+ $class->format_qcow2($storeid, $scfg, $name, $size) if $fmt eq 'qcow2';
+ return $name;
+}
+
+sub alloc_new_image {
+ my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
die "illegal name '$name' - should be 'vm-$vmid-*'\n"
if $name && $name !~ m/^vm-$vmid-/;
@@ -376,16 +393,45 @@ sub alloc_image {
my $free = int($vgs->{$vg}->{free});
+
+ #add extra space for qcow2 metadatas
+ #without sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size
+ #with sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size / 16
+ #4MB overhead for 1TB with extented l2 clustersize=128k
+
+ my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
+
+ my $lvmsize = $size;
+ $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
+
die "not enough free space ($free < $size)\n" if $free < $size;
- $name = $class->find_free_diskname($storeid, $scfg, $vmid)
+ $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
if !$name;
- lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
-
+ my $tags = ["pve-vm-$vmid"];
+ push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
+ lvcreate($vg, $name, $lvmsize, $tags);
return $name;
}
+sub format_qcow2 {
+ my ($class, $storeid, $scfg, $name, $size, $backing_file) = @_;
+
+ # activate volume
+ $class->activate_volume($storeid, $scfg, $name, undef, {});
+ my $path = $class->path($scfg, $name, $storeid);
+ # create the qcow2 fs
+ my $cmd = ['/usr/bin/qemu-img', 'create'];
+ push @$cmd, '-b', $backing_file, '-F', 'qcow2' if $backing_file;
+ push @$cmd, '-f', 'qcow2', $path;
+ push @$cmd, "${size}K" if $size;
+ my $options = "extended_l2=on,";
+ $options .= PVE::Storage::Plugin::preallocation_cmd_option($scfg, 'qcow2');
+ push @$cmd, '-o', $options;
+ run_command($cmd);
+}
+
sub free_image {
my ($class, $storeid, $scfg, $volname, $isBase) = @_;
@@ -536,6 +582,12 @@ sub activate_volume {
my $lvm_activate_mode = 'ey';
+ #activate volume && all snapshots volumes by tag
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ $path = "\@pve-$name" if $format eq 'qcow2';
+
my $cmd = ['/sbin/lvchange', "-a$lvm_activate_mode", $path];
run_command($cmd, errmsg => "can't activate LV '$path'");
$cmd = ['/sbin/lvchange', '--refresh', $path];
@@ -548,6 +600,10 @@ sub deactivate_volume {
my $path = $class->path($scfg, $volname, $storeid, $snapname);
return if ! -b $path;
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+ $path = "\@pve-$name" if $format eq 'qcow2';
+
my $cmd = ['/sbin/lvchange', '-aln', $path];
run_command($cmd, errmsg => "can't deactivate LV '$path'");
}
@@ -555,15 +611,27 @@ sub deactivate_volume {
sub volume_resize {
my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
- $size = ($size/1024/1024) . "M";
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ my $lvmsize = $size / 1024;
+ my $qcow2_overhead = ceil($size/1024/1024/1024/1024) * 4096;
+ $lvmsize += $qcow2_overhead if $format eq 'qcow2';
+ $lvmsize = "${lvmsize}k";
my $path = $class->path($scfg, $volname);
- my $cmd = ['/sbin/lvextend', '-L', $size, $path];
+ my $cmd = ['/sbin/lvextend', '-L', $lvmsize, $path];
$class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
run_command($cmd, errmsg => "error resizing volume '$path'");
});
+ if(!$running && $format eq 'qcow2') {
+ my $prealloc_opt = PVE::Storage::Plugin::preallocation_cmd_option($scfg, $format);
+ my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', $format, $path , $size];
+ run_command($cmd, timeout => 10);
+ }
+
return 1;
}
@@ -585,30 +653,149 @@ sub volume_size_info {
sub volume_snapshot {
my ($class, $scfg, $storeid, $volname, $snap) = @_;
- die "lvm snapshot is not implemented";
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ die "can't snapshot this image format\n" if $format ne 'qcow2';
+
+ $class->activate_volume($storeid, $scfg, $volname, undef, {});
+
+ my $snap_volname = $class->get_snap_volname($volname, $snap);
+ my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+ my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
+
+ #rename current lvm volume to snap volume
+ my $vg = $scfg->{vgname};
+ print"rename $volname to $snap_volname\n";
+ eval { lvrename($vg, $volname, $snap_volname) } ;
+
+
+ #allocate a new lvm volume
+ $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
+ eval {
+ $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
+ };
+
+ if ($@) {
+ eval { $class->free_image($storeid, $scfg, $volname, 0) };
+ warn $@ if $@;
+ }
+}
+
+sub volume_rollback_is_possible {
+ my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
+
+ my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $parent_snap = $snapshots->{current}->{parent};
+
+ return 1 if !-e $snap_path || $snapshots->{$parent_snap}->{file} eq $snap_path;
+ die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
+
+ return 1;
}
+
sub volume_snapshot_rollback {
my ($class, $scfg, $storeid, $volname, $snap) = @_;
- die "lvm snapshot rollback is not implemented";
+ die "can't rollback snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
+ $class->parse_volname($volname);
+
+ $class->activate_volume($storeid, $scfg, $volname, undef, {});
+ my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
+ my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
+
+ #simply delete the current snapshot and recreate it
+ $class->free_image($storeid, $scfg, $volname, 0);
+ $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
+ $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
+
+ return undef;
}
sub volume_snapshot_delete {
- my ($class, $scfg, $storeid, $volname, $snap) = @_;
+ my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
+
+ die "can't delete snapshot for this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
+
+ return 1 if $running;
+
+ my $cmd = "";
+ my $path = $class->filesystem_path($scfg, $volname);
+
+
+ my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
+ my $snap_path = $snapshots->{$snap}->{file};
+ my $snap_volname = $snapshots->{$snap}->{volname};
+ return if !-e $snap_path; #already deleted ?
+
+ my $parent_snap = $snapshots->{$snap}->{parent};
+ my $child_snap = $snapshots->{$snap}->{child};
+
+ my $parent_path = $snapshots->{$parent_snap}->{file} if $parent_snap;
+ my $child_path = $snapshots->{$child_snap}->{file} if $child_snap;
+ my $child_volname = $snapshots->{$child_snap}->{volname} if $child_snap;
+
+
+ #if first snapshot, we merge child, and rename the snapshot to child
+ if(!$parent_snap) {
+ #we use commit here, as it's faster than rebase
+ #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
+ print"commit $child_path\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $child_path];
+ run_command($cmd);
+ print"delete $child_volname\n";
+ $class->free_image($storeid, $scfg, $child_volname, 0);
+
+ print"rename $snap_volname to $child_volname\n";
+ my $vg = $scfg->{vgname};
+ lvrename($vg, $snap_volname, $child_volname);
+ } else {
+ print"commit $snap_path\n";
+ $cmd = ['/usr/bin/qemu-img', 'commit', $snap_path];
+ #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
+ die "missing parentsnap snapshot to rebase child $child_path\n" if !$parent_path;
+ print "link $child_snap to $parent_snap\n";
+ $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
+ run_command($cmd);
+ #delete the snapshot
+ $class->free_image($storeid, $scfg, $snap_volname, 0);
+ }
- die "lvm snapshot delete is not implemented";
}
sub volume_has_feature {
my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
my $features = {
- copy => { base => 1, current => 1},
- rename => {current => 1},
+ copy => {
+ base => { qcow2 => 1, raw => 1},
+ current => { qcow2 => 1, raw => 1},
+ snap => { qcow2 => 1 },
+ },
+ 'rename' => {
+ current => { qcow2 => 1, raw => 1},
+ },
+ snapshot => {
+ current => { qcow2 => 1 },
+ snap => { qcow2 => 1 },
+ },
+ template => {
+ current => { qcow2 => 1, raw => 1},
+ },
+# don't allow to clone as we can't activate the base on multiple host at the same time
+# clone => {
+# base => { qcow2 => 1, raw => 1},
+# },
};
- my ($vtype, $name, $vmid, $basename, $basevmid, $isBase) =
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
$class->parse_volname($volname);
my $key = undef;
@@ -617,7 +804,7 @@ sub volume_has_feature {
}else{
$key = $isBase ? 'base' : 'current';
}
- return 1 if $features->{$feature}->{$key};
+ return 1 if defined($features->{$feature}->{$key}->{$format});
return undef;
}
@@ -738,4 +925,12 @@ sub rename_volume {
return "${storeid}:${target_volname}";
}
+sub get_snap_volname {
+ my ($class, $volname, $snapname) = @_;
+
+ my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
+ $name = !$snapname || $snapname eq 'current' ? $volname : "snap-$snapname-$name";
+ return $name;
+}
+
1;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (4 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:26 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
` (8 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5878 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
Date: Mon, 16 Dec 2024 10:12:20 +0100
Message-ID: <20241216091229.3142660-7-alexandre.derumier@groupe-cyllene.com>
fixme/testme :
PVE/VZDump/QemuServer.pm: eval { PVE::QemuServer::qemu_drivedel($vmid, "tpmstate0-backup"); };
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 64 +++++++++++++++++++++++++++++++++--------------
1 file changed, 45 insertions(+), 19 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 2832ed09..baf78ec0 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -1582,6 +1582,42 @@ sub print_drive_throttle_group {
return $throttle_group;
}
+sub generate_throttle_group {
+ my ($drive) = @_;
+
+ my $drive_id = get_drive_id($drive);
+
+ my $throttle_group = { id => "throttle-drive-$drive_id" };
+ my $limits = {};
+
+ foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
+ my ($dir, $qmpname) = @$type;
+
+ if (my $v = $drive->{"mbps$dir"}) {
+ $limits->{"bps$qmpname"} = int($v*1024*1024);
+ }
+ if (my $v = $drive->{"mbps${dir}_max"}) {
+ $limits->{"bps$qmpname-max"} = int($v*1024*1024);
+ }
+ if (my $v = $drive->{"bps${dir}_max_length"}) {
+ $limits->{"bps$qmpname-max-length"} = int($v)
+ }
+ if (my $v = $drive->{"iops${dir}"}) {
+ $limits->{"iops$qmpname"} = int($v);
+ }
+ if (my $v = $drive->{"iops${dir}_max"}) {
+ $limits->{"iops$qmpname-max"} = int($v);
+ }
+ if (my $v = $drive->{"iops${dir}_max_length"}) {
+ $limits->{"iops$qmpname-max-length"} = int($v);
+ }
+ }
+
+ $throttle_group->{limits} = $limits;
+
+ return $throttle_group;
+}
+
sub generate_file_blockdev {
my ($storecfg, $drive, $nodename) = @_;
@@ -4595,32 +4631,22 @@ sub qemu_iothread_del {
}
sub qemu_driveadd {
- my ($storecfg, $vmid, $device) = @_;
+ my ($storecfg, $vmid, $drive) = @_;
- my $kvmver = get_running_qemu_version($vmid);
- my $io_uring = min_version($kvmver, 6, 0);
- my $drive = print_drive_commandline_full($storecfg, $vmid, $device, undef, $io_uring);
- $drive =~ s/\\/\\\\/g;
- my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"", 60);
-
- # If the command succeeds qemu prints: "OK"
- return 1 if $ret =~ m/OK/s;
+ my $drive_id = get_drive_id($drive);
+ my $throttle_group = generate_throttle_group($drive);
+ mon_cmd($vmid, 'object-add', "qom-type" => "throttle-group", %$throttle_group);
- die "adding drive failed: $ret\n";
+ my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
+ mon_cmd($vmid, 'blockdev-add', %$blockdev, timeout => 10 * 60);
+ return 1;
}
sub qemu_drivedel {
my ($vmid, $deviceid) = @_;
- my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_del drive-$deviceid", 10 * 60);
- $ret =~ s/^\s+//;
-
- return 1 if $ret eq "";
-
- # NB: device not found errors mean the drive was auto-deleted and we ignore the error
- return 1 if $ret =~ m/Device \'.*?\' not found/s;
-
- die "deleting drive $deviceid failed : $ret\n";
+ mon_cmd($vmid, 'blockdev-del', 'node-name' => "drive-$deviceid", timeout => 10 * 60);
+ mon_cmd($vmid, 'object-del', id => "throttle-drive-$deviceid");
}
sub qemu_deviceaddverify {
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (5 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
` (7 subsequent siblings)
14 siblings, 0 replies; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5021 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots
Date: Mon, 16 Dec 2024 10:12:21 +0100
Message-ID: <20241216091229.3142660-8-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
src/PVE/Storage.pm | 18 +++++++++++++++++-
src/test/run_test_zfspoolplugin.pl | 18 ++++++++++++++++++
2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index 3b4f041..798544b 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -1052,7 +1052,23 @@ sub vdisk_free {
my (undef, undef, undef, undef, undef, $isBase, $format) =
$plugin->parse_volname($volname);
- $cleanup_worker = $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);
+
+ $cleanup_worker = sub {
+ #remove external snapshots
+ activate_volumes($cfg, [ $volid ]);
+ my $snapshots = PVE::Storage::volume_snapshot_info($cfg, $volid);
+ for my $snapid (sort { $snapshots->{$b}->{order} <=> $snapshots->{$a}->{order} } keys %$snapshots) {
+ my $snap = $snapshots->{$snapid};
+ next if $snapid eq 'current';
+ next if !$snap->{volid};
+ next if !$snap->{ext};
+ my ($snap_storeid, $snap_volname) = parse_volume_id($snap->{volid});
+ my (undef, undef, undef, undef, undef, $snap_isBase, $snap_format) =
+ $plugin->parse_volname($volname);
+ $plugin->free_image($snap_storeid, $scfg, $snap_volname, $snap_isBase, $snap_format);
+ }
+ $plugin->free_image($storeid, $scfg, $volname, $isBase, $format);
+ };
});
return if !$cleanup_worker;
diff --git a/src/test/run_test_zfspoolplugin.pl b/src/test/run_test_zfspoolplugin.pl
index 095ccb3..4ff9f22 100755
--- a/src/test/run_test_zfspoolplugin.pl
+++ b/src/test/run_test_zfspoolplugin.pl
@@ -6,12 +6,30 @@ use strict;
use warnings;
use Data::Dumper qw(Dumper);
+use Test::MockModule;
+
use PVE::Storage;
use PVE::Cluster;
use PVE::Tools qw(run_command);
+use PVE::RPCEnvironment;
use Cwd;
$Data::Dumper::Sortkeys = 1;
+my $rpcenv_module;
+$rpcenv_module = Test::MockModule->new('PVE::RPCEnvironment');
+$rpcenv_module->mock(
+ get_user => sub {
+ return 'root@pam';
+ },
+ fork_worker => sub {
+ my ($self, $dtype, $id, $user, $function, $background) = @_;
+ $function->(123456);
+ return '123456';
+ }
+);
+
+my $rpcenv = PVE::RPCEnvironment->init('pub');
+
my $verbose = undef;
my $storagename = "zfstank99";
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (6 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:31 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
` (6 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 3511 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
Date: Mon, 16 Dec 2024 10:12:22 +0100
Message-ID: <20241216091229.3142660-9-alexandre.derumier@groupe-cyllene.com>
Look at qdev value, as cdrom drives can be empty
without any inserted media
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index baf78ec0..3b33fd7d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4425,10 +4425,9 @@ sub vm_devices_list {
}
my $resblock = mon_cmd($vmid, 'query-block');
- foreach my $block (@$resblock) {
- if($block->{device} =~ m/^drive-(\S+)/){
- $devices->{$1} = 1;
- }
+ $resblock = { map { $_->{qdev} => $_ } $resblock->@* };
+ foreach my $blockid (keys %$resblock) {
+ $devices->{$blockid} = 1;
}
my $resmice = mon_cmd($vmid, 'query-mice');
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (7 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 14:34 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
` (5 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 4217 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert
Date: Mon, 16 Dec 2024 10:12:23 +0100
Message-ID: <20241216091229.3142660-10-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 3b33fd7d..758c8240 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5694,7 +5694,10 @@ sub vmconfig_update_disk {
} else { # cdrom
if ($drive->{file} eq 'none') {
- mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
+ mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
+ mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
+ qemu_drivedel($vmid, $opt);
+
if (drive_is_cloudinit($old_drive)) {
vmconfig_register_unused_drive($storecfg, $vmid, $conf, $old_drive);
}
@@ -5702,14 +5705,16 @@ sub vmconfig_update_disk {
my $path = get_iso_path($storecfg, $vmid, $drive->{file});
# force eject if locked
- mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
+ mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
+ mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
+ eval { qemu_drivedel($vmid, $opt) };
if ($path) {
- mon_cmd($vmid, "blockdev-change-medium",
- id => "$opt", filename => "$path");
+ qemu_driveadd($storecfg, $vmid, $drive);
+ mon_cmd($vmid, "blockdev-insert-medium", id => $opt, 'node-name' => "drive-$opt");
+ mon_cmd($vmid, "blockdev-close-tray", id => $opt);
}
}
-
return 1;
}
}
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (8 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
` (4 subsequent siblings)
14 siblings, 0 replies; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 3266 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev
Date: Mon, 16 Dec 2024 10:12:24 +0100
Message-ID: <20241216091229.3142660-11-alexandre.derumier@groupe-cyllene.com>
We need to use the top blocknode (throttle) as name-node
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 758c8240..22b011e1 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4918,7 +4918,7 @@ sub qemu_block_resize {
mon_cmd(
$vmid,
"block_resize",
- device => $deviceid,
+ 'node-name' => $deviceid,
size => int($size),
timeout => 60,
);
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (9 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
` (3 subsequent siblings)
14 siblings, 0 replies; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 3751 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename
Date: Mon, 16 Dec 2024 10:12:25 +0100
Message-ID: <20241216091229.3142660-12-alexandre.derumier@groupe-cyllene.com>
we have fixed nodename now
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 22b011e1..6bebb906 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6235,20 +6235,15 @@ sub vm_start_nolock {
$migrate_storage_uri = "nbd:${localip}:${storage_migrate_port}";
}
- my $block_info = mon_cmd($vmid, "query-block");
- $block_info = { map { $_->{device} => $_ } $block_info->@* };
-
foreach my $opt (sort keys %$nbd) {
my $drivestr = $nbd->{$opt}->{drivestr};
my $volid = $nbd->{$opt}->{volid};
- my $block_node = $block_info->{"drive-$opt"}->{inserted}->{'node-name'};
-
mon_cmd(
$vmid,
"block-export-add",
id => "drive-$opt",
- 'node-name' => $block_node,
+ 'node-name' => "drive-$opt",
writable => JSON::true,
type => "nbd",
name => "drive-$opt", # NBD export name
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (10 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-08 15:19 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
` (2 subsequent siblings)
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 9269 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Mon, 16 Dec 2024 10:12:26 +0100
Message-ID: <20241216091229.3142660-13-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuMigrate.pm | 2 +-
PVE/QemuServer.pm | 106 +++++++++++++++++++++++++++++++++++----------
2 files changed, 83 insertions(+), 25 deletions(-)
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index ed5ede30..88627ce4 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1134,7 +1134,7 @@ sub phase2 {
my $bitmap = $target->{bitmap};
$self->log('info', "$drive: start migration to $nbd_uri");
- PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
+ PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $source_drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
}
if (PVE::QemuServer::QMPHelpers::runs_at_least_qemu_version($vmid, 8, 2)) {
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 6bebb906..3d7c41ee 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8184,59 +8184,85 @@ sub qemu_img_convert {
}
sub qemu_drive_mirror {
- my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
+ my ($vmid, $driveid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
$jobs = {} if !$jobs;
+ my $deviceid = "drive-$driveid";
+ my $dst_format;
+ my $dst_path = $dst_volid;
+ my $jobid = "mirror-$deviceid";
+ $jobs->{$jobid} = {};
- my $qemu_target;
- my $format;
- $jobs->{"drive-$drive"} = {};
+ my $storecfg = PVE::Storage::config();
if ($dst_volid =~ /^nbd:/) {
- $qemu_target = $dst_volid;
- $format = "nbd";
+ $dst_format = "nbd";
} else {
- my $storecfg = PVE::Storage::config();
-
- $format = checked_volume_format($storecfg, $dst_volid);
-
- my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
-
- $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
+ $dst_format = checked_volume_format($storecfg, $dst_volid);
+ $dst_path = PVE::Storage::path($storecfg, $dst_volid);
+ }
+
+ # copy original drive config (aio,cache,discard,...)
+ my $dst_drive = dclone($drive);
+ $dst_drive->{format} = $dst_format;
+ $dst_drive->{file} = $dst_path;
+ $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
+ #improve: if target storage don't support aio uring,change it to default native
+ #and remove clone_disk_check_io_uring()
+
+ #add new block device
+ my $nodes = get_blockdev_nodes($vmid);
+
+ my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
+ my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+ my $target_file_blockdev = generate_file_blockdev($storecfg, $dst_drive, $target_file_nodename);
+ my $target_nodename = undef;
+
+ if ($dst_format eq 'nbd') {
+ #nbd file don't have fmt
+ $target_nodename = $target_file_nodename;
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_file_blockdev);
+ } else {
+ $target_nodename = $target_fmt_nodename;
+ my $target_fmt_blockdev = generate_format_blockdev($storecfg, $dst_drive, $target_fmt_nodename, $target_file_blockdev);
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
}
+ #we replace the original src_fmt node in the blockdev graph
+ my $src_fmt_nodename = find_fmt_nodename_drive($storecfg, $vmid, $drive, $nodes);
my $opts = {
+ 'job-id' => $jobid,
timeout => 10,
- device => "drive-$drive",
- mode => "existing",
+ device => $deviceid,
+ replaces => $src_fmt_nodename,
sync => "full",
- target => $qemu_target,
+ target => $target_nodename,
'auto-dismiss' => JSON::false,
};
- $opts->{format} = $format if $format;
if (defined($src_bitmap)) {
$opts->{sync} = 'incremental';
- $opts->{bitmap} = $src_bitmap;
+ $opts->{bitmap} = $src_bitmap; ##FIXME: how to handle bitmap ? special proxmox patch ?
print "drive mirror re-using dirty bitmap '$src_bitmap'\n";
}
if (defined($bwlimit)) {
$opts->{speed} = $bwlimit * 1024;
- print "drive mirror is starting for drive-$drive with bandwidth limit: ${bwlimit} KB/s\n";
+ print "drive mirror is starting for $deviceid with bandwidth limit: ${bwlimit} KB/s\n";
} else {
- print "drive mirror is starting for drive-$drive\n";
+ print "drive mirror is starting for $deviceid\n";
}
# if a job already runs for this device we get an error, catch it for cleanup
- eval { mon_cmd($vmid, "drive-mirror", %$opts); };
+ eval { mon_cmd($vmid, "blockdev-mirror", %$opts); };
+
if (my $err = $@) {
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
+ #FIXME: delete blockdev after job cancel
warn "$@\n" if $@;
die "mirroring error: $err\n";
}
-
- qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga);
+ qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga, 'mirror');
}
# $completion can be either
@@ -8595,7 +8621,7 @@ sub clone_disk {
my $sparseinit = PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $newvolid);
if ($use_drive_mirror) {
- qemu_drive_mirror($vmid, $src_drivename, $newvolid, $newvmid, $sparseinit, $jobs,
+ qemu_drive_mirror($vmid, $src_drivename, $drive, $newvolid, $newvmid, $sparseinit, $jobs,
$completion, $qga, $bwlimit);
} else {
if ($dst_drivename eq 'efidisk0') {
@@ -9130,6 +9156,38 @@ sub delete_ifaces_ipams_ips {
}
}
+sub find_fmt_nodename_drive {
+ my ($storecfg, $vmid, $drive, $nodes) = @_;
+
+ my $volid = $drive->{file};
+ my $format = checked_volume_format($storecfg, $volid);
+ my $path = PVE::Storage::path($storecfg, $volid);
+
+ my $node = find_blockdev_node($nodes, $path, 'fmt');
+ return $node->{'node-name'};
+}
+
+sub get_blockdev_nextid {
+ my ($nodename, $nodes) = @_;
+ my $version = 0;
+ for my $nodeid (keys %$nodes) {
+ if ($nodeid =~ m/^$nodename-(\d+)$/) {
+ my $current_version = $1;
+ $version = $current_version if $current_version >= $version;
+ }
+ }
+ $version++;
+ return "$nodename-$version";
+}
+
+sub get_blockdev_nodes {
+ my ($vmid) = @_;
+
+ my $nodes = PVE::QemuServer::Monitor::mon_cmd($vmid, "query-named-block-nodes");
+ $nodes = { map { $_->{'node-name'} => $_ } $nodes->@* };
+ return $nodes;
+}
+
sub encode_json_ordered {
return JSON->new->canonical->allow_nonref->encode( $_[0] );
}
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (11 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 9:51 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5748 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
Date: Mon, 16 Dec 2024 10:12:27 +0100
Message-ID: <20241216091229.3142660-14-alexandre.derumier@groupe-cyllene.com>
This was a limitation of drive-mirror, blockdev mirror is able
to reopen image with a different aio
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 41 ++++++++++-------------------------------
1 file changed, 10 insertions(+), 31 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 3d7c41ee..dc12b38f 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8207,8 +8207,16 @@ sub qemu_drive_mirror {
$dst_drive->{format} = $dst_format;
$dst_drive->{file} = $dst_path;
$dst_drive->{zeroinit} = 1 if $is_zero_initialized;
- #improve: if target storage don't support aio uring,change it to default native
- #and remove clone_disk_check_io_uring()
+
+ #change aio if io_uring is not supported on target
+ if ($dst_drive->{aio} && $dst_drive->{aio} eq 'io_uring') {
+ my ($dst_storeid) = PVE::Storage::parse_volume_id($dst_drive->{file});
+ my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
+ my $cache_direct = drive_uses_cache_direct($dst_drive, $dst_scfg);
+ if(!storage_allows_io_uring_default($dst_scfg, $cache_direct)) {
+ $dst_drive->{aio} = $cache_direct ? 'native' : 'threads';
+ }
+ }
#add new block device
my $nodes = get_blockdev_nodes($vmid);
@@ -8514,33 +8522,6 @@ sub qemu_drive_mirror_switch_to_active_mode {
}
}
-# Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
-# source, but some storages have problems with io_uring, sometimes even leading to crashes.
-my sub clone_disk_check_io_uring {
- my ($src_drive, $storecfg, $src_storeid, $dst_storeid, $use_drive_mirror) = @_;
-
- return if !$use_drive_mirror;
-
- # Don't complain when not changing storage.
- # Assume if it works for the source, it'll work for the target too.
- return if $src_storeid eq $dst_storeid;
-
- my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
- my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
-
- my $cache_direct = drive_uses_cache_direct($src_drive);
-
- my $src_uses_io_uring;
- if ($src_drive->{aio}) {
- $src_uses_io_uring = $src_drive->{aio} eq 'io_uring';
- } else {
- $src_uses_io_uring = storage_allows_io_uring_default($src_scfg, $cache_direct);
- }
-
- die "target storage is known to cause issues with aio=io_uring (used by current drive)\n"
- if $src_uses_io_uring && !storage_allows_io_uring_default($dst_scfg, $cache_direct);
-}
-
sub clone_disk {
my ($storecfg, $source, $dest, $full, $newvollist, $jobs, $completion, $qga, $bwlimit) = @_;
@@ -8598,8 +8579,6 @@ sub clone_disk {
$dst_format = 'raw';
$size = PVE::QemuServer::Drive::TPMSTATE_DISK_SIZE;
} else {
- clone_disk_check_io_uring($drive, $storecfg, $src_storeid, $storeid, $use_drive_mirror);
-
$size = PVE::Storage::volume_size_info($storecfg, $drive->{file}, 10);
}
$newvolid = PVE::Storage::vdisk_alloc(
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (12 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 5424 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
Date: Mon, 16 Dec 2024 10:12:28 +0100
Message-ID: <20241216091229.3142660-15-alexandre.derumier@groupe-cyllene.com>
We need to define name-nodes for all backing chain images,
to be able to live rename them with blockdev-reopen
For linked clone, we don't need to definebase image(s) chain.
They are auto added with #block nodename.
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuServer.pm | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index dc12b38f..3a3feadf 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -1618,6 +1618,38 @@ sub generate_throttle_group {
return $throttle_group;
}
+sub generate_backing_blockdev {
+ my ($storecfg, $snapshots, $deviceid, $drive, $id) = @_;
+
+ my $snapshot = $snapshots->{$id};
+ my $order = $snapshot->{order};
+ my $parentid = $snapshot->{parent};
+ my $snap_fmt_nodename = "fmt-$deviceid-$order";
+ my $snap_file_nodename = "file-$deviceid-$order";
+
+ my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap_file_nodename);
+ $snap_file_blockdev->{filename} = $snapshot->{file};
+ my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_fmt_nodename, $snap_file_blockdev, 1);
+ $snap_fmt_blockdev->{backing} = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
+ return $snap_fmt_blockdev;
+}
+
+sub generate_backing_chain_blockdev {
+ my ($storecfg, $deviceid, $drive) = @_;
+
+ my $volid = $drive->{file};
+ my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid);
+ return if !$do_snapshots_with_qemu || $do_snapshots_with_qemu != 2;
+
+ my $chain_blockdev = undef;
+ PVE::Storage::activate_volumes($storecfg, [$volid]);
+ #should we use qemu config to list snapshots ?
+ my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+ my $parentid = $snapshots->{'current'}->{parent};
+ $chain_blockdev = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
+ return $chain_blockdev;
+}
+
sub generate_file_blockdev {
my ($storecfg, $drive, $nodename) = @_;
@@ -1816,6 +1848,8 @@ sub generate_drive_blockdev {
my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
my $fmt_nodename = "fmt-drive-$drive_id";
my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
+ my $backing_chain = generate_backing_chain_blockdev($storecfg, "drive-$drive_id", $drive);
+ $blockdev_format->{backing} = $backing_chain if $backing_chain;
my $blockdev_live_restore = undef;
if ($live_restore_name) {
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
` (13 preceding siblings ...)
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
@ 2024-12-16 9:12 ` Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
14 siblings, 1 reply; 68+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 18136 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 16 Dec 2024 10:12:29 +0100
Message-ID: <20241216091229.3142660-16-alexandre.derumier@groupe-cyllene.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
PVE/QemuConfig.pm | 4 +-
PVE/QemuServer.pm | 345 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 335 insertions(+), 14 deletions(-)
diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index ffdf9f03..c17edb46 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -375,7 +375,7 @@ sub __snapshot_create_vol_snapshot {
print "snapshotting '$device' ($drive->{file})\n";
- PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
+ PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
}
sub __snapshot_delete_remove_drive {
@@ -412,7 +412,7 @@ sub __snapshot_delete_vol_snapshot {
my $storecfg = PVE::Storage::config();
my $volid = $drive->{file};
- PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
+ PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
push @$unused, $volid;
}
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 3a3feadf..f29a8449 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4959,20 +4959,269 @@ sub qemu_block_resize {
}
sub qemu_volume_snapshot {
- my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
+ my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
+ my $volid = $drive->{file};
my $running = check_running($vmid);
-
- if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
- mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
+ my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;
+ if ($do_snapshots_with_qemu) {
+ if($do_snapshots_with_qemu == 2) {
+ my $snap_path = PVE::Storage::path($storecfg, $volid, $snap);
+ my $path = PVE::Storage::path($storecfg, $volid);
+ blockdev_current_rename($storecfg, $vmid, $deviceid, $drive, $path, $snap_path, 1);
+ blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap);
+ } else {
+ mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
+ }
} else {
PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
}
}
+sub blockdev_external_snapshot {
+ my ($storecfg, $vmid, $deviceid, $drive, $snap) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+ my $path = PVE::Storage::path($storecfg, $volid, $snap);
+ my $format_node = find_blockdev_node($nodes, $path, 'fmt');
+ my $format_nodename = $format_node->{'node-name'};
+
+ #preallocate add a new current file
+ my $new_current_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
+ my $new_current_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+ PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
+ my $new_file_blockdev = generate_file_blockdev($storecfg, $drive, $new_current_file_nodename);
+ my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_current_fmt_nodename, $new_file_blockdev);
+
+ $new_fmt_blockdev->{backing} = undef;
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
+ mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename, overlay => $new_current_fmt_nodename);
+}
+
+sub blockdev_snap_rename {
+ my ($storecfg, $vmid, $deviceid, $drive, $src_path, $target_path) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+
+ #copy the original drive param and change target file
+ my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
+ my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+
+ my $src_fmt_node = find_blockdev_node($nodes, $src_path, 'fmt');
+ my $src_fmt_nodename = $src_fmt_node->{'node-name'};
+ my $src_file_node = find_blockdev_node($nodes, $src_path, 'file');
+ my $src_file_nodename = $src_file_node->{'node-name'};
+
+ #untaint
+ if ($src_path =~ m/^(\S+)$/) {
+ $src_path = $1;
+ }
+ if ($target_path =~ m/^(\S+)$/) {
+ $target_path = $1;
+ }
+
+ #create a hardlink
+ link($src_path, $target_path);
+
+ #add new format blockdev
+ my $read_only = 1;
+ my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
+ $target_file_blockdev->{filename} = $target_path;
+ my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_fmt_nodename, $target_file_blockdev, $read_only);
+
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
+
+ #reopen the parent node with different backing file
+ my $parent_fmt_node = find_parent_node($nodes, $src_path);
+ my $parent_fmt_nodename = $parent_fmt_node->{'node-name'};
+ my $parent_path = $parent_fmt_node->{file};
+ my $parent_file_node = find_blockdev_node($nodes, $parent_path, 'file');
+ my $parent_file_nodename = $parent_file_node->{'node-name'};
+ my $filenode_exist = 1;
+ $read_only = $parent_fmt_node->{ro};
+ my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_fmt_nodename, $parent_file_nodename, $read_only);
+ $parent_fmt_blockdev->{backing} = $target_fmt_nodename;
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
+
+ #change backing-file in qcow2 metadatas
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_nodename, 'backing-file' => $target_path);
+
+ # fileblockdev seem to be autoremoved, if it have been created online, but not if they are created at start with command line
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_nodename) };
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_nodename) };
+
+ #delete old $path link
+ unlink($src_path);
+
+ #rename underlay
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvrename $src_path to $target_path\n";
+ run_command(
+ ['/sbin/lvrename', $src_path, $target_path],
+ errmsg => "lvrename $src_path to $target_path error",
+ );
+ }
+}
+
+sub blockdev_current_rename {
+ my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path, $skip_underlay) = @_;
+ ## rename current running image
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+ my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
+
+ my $file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
+ $file_blockdev->{filename} = $target_path;
+
+ my $format_node = find_blockdev_node($nodes, $path, 'fmt');
+ my $format_nodename = $format_node->{'node-name'};
+
+ my $file_node = find_blockdev_node($nodes, $path, 'file');
+ my $file_nodename = $file_node->{'node-name'};
+
+ my $backingfile = $format_node->{image}->{'backing-filename'};
+ my $backing_node = $backingfile ? find_blockdev_node($nodes, $backingfile, 'fmt') : undef;
+
+ #create a hardlink
+ link($path, $target_path);
+ #add new file blockdev
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$file_blockdev);
+
+ #reopen the current fmt nodename with a new file nodename
+ my $reopen_blockdev = generate_format_blockdev($storecfg, $drive, $format_nodename, $target_file_nodename);
+ $reopen_blockdev->{backing} = $backing_node->{'node-name'};
+ PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$reopen_blockdev]);
+
+ # delete old file blockdev
+ # seem that the old file block is autoremoved after reopen if the file nodename is autogenerad with #block ?
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
+
+ unlink($path);
+
+ #skip_underlay: lvm will be renamed later in Storage::volume_snaphot
+ return if $skip_underlay;
+
+ #rename underlay
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvrename $path to $target_path\n";
+ run_command(
+ ['/sbin/lvrename', $path, $target_path],
+ errmsg => "lvrename $path to $target_path error",
+ );
+ }
+}
+
+sub blockdev_commit {
+ my ($storecfg, $vmid, $deviceid, $drive, $top_path, $base_path) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+
+ #untaint
+ if ($top_path =~ m/^(\S+)$/) {
+ $top_path = $1;
+ }
+
+ print "block-commit top:$top_path to base:$base_path\n";
+ my $job_id = "commit-$deviceid";
+ my $jobs = {};
+
+ my $base_node = find_blockdev_node($nodes, $base_path, 'fmt');
+ my $top_node = find_blockdev_node($nodes, $top_path, 'fmt');
+
+ my $options = { 'job-id' => $job_id, device => $deviceid };
+ $options->{'top-node'} = $top_node->{'node-name'};
+ $options->{'base-node'} = $base_node->{'node-name'};
+
+
+ mon_cmd($vmid, 'block-commit', %$options);
+ $jobs->{$job_id} = {};
+
+ qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'commit');
+
+ #remove fmt-blockdev, file-blockdev && file
+ my $fmt_node = find_blockdev_node($nodes, $top_path, 'fmt');
+ my $fmt_nodename = $fmt_node->{'node-name'};
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_nodename) };
+
+ my $file_node = find_blockdev_node($nodes, $top_path, 'file');
+ my $file_nodename = $file_node->{'node-name'};
+ eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
+
+
+
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvremove $top_path\n";
+ run_command(
+ ['/sbin/lvremove', '-f', $top_path],
+ errmsg => "lvremove $top_path",
+ );
+ } else {
+ unlink($top_path);
+ }
+
+}
+
+sub blockdev_live_commit {
+ my ($storecfg, $vmid, $deviceid, $drive, $current_path, $snapshot_path) = @_;
+
+ my $nodes = get_blockdev_nodes($vmid);
+ my $volid = $drive->{file};
+
+ #untaint
+ if ($current_path =~ m/^(\S+)$/) {
+ $current_path = $1;
+ }
+
+ print "live block-commit top:$current_path to base:$snapshot_path\n";
+ my $job_id = "commit-$deviceid";
+ my $jobs = {};
+
+ my $snapshot_node = find_blockdev_node($nodes, $snapshot_path, 'fmt');
+ my $snapshot_file_node = find_blockdev_node($nodes, $current_path, 'file');
+ my $current_node = find_blockdev_node($nodes, $current_path, 'fmt');
+
+ my $opts = { 'job-id' => $job_id,
+ device => $deviceid,
+ 'base-node' => $snapshot_node->{'node-name'},
+ replaces => $current_node->{'node-name'}
+ };
+ mon_cmd($vmid, "block-commit", %$opts);
+ $jobs->{$job_id} = {};
+
+ qemu_drive_mirror_monitor ($vmid, undef, $jobs, 'complete', 0, 'commit');
+
+ eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $current_node->{'node-name'}) };
+
+ my $storage_name = PVE::Storage::parse_volume_id($volid);
+ my $scfg = $storecfg->{ids}->{$storage_name};
+ if ($scfg->{type} eq 'lvm') {
+ print"lvremove $current_path\n";
+ run_command(
+ ['/sbin/lvremove', '-f', $current_path],
+ errmsg => "lvremove $current_path",
+ );
+ } else {
+ unlink($current_path);
+ }
+
+ return;
+
+}
+
sub qemu_volume_snapshot_delete {
- my ($vmid, $storecfg, $volid, $snap) = @_;
+ my ($vmid, $storecfg, $drive, $snap) = @_;
+ my $volid = $drive->{file};
my $running = check_running($vmid);
my $attached_deviceid;
@@ -4984,13 +5233,51 @@ sub qemu_volume_snapshot_delete {
});
}
- if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
- mon_cmd(
- $vmid,
- 'blockdev-snapshot-delete-internal-sync',
- device => $attached_deviceid,
- name => $snap,
- );
+ my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
+ if ($attached_deviceid && $do_snapshots_with_qemu) {
+
+ if ($do_snapshots_with_qemu == 2) {
+
+ my $path = PVE::Storage::path($storecfg, $volid);
+ my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+
+ my $snappath = $snapshots->{$snap}->{file};
+ return if !-e $snappath; #already deleted ?
+
+ my $parentsnap = $snapshots->{$snap}->{parent};
+ my $childsnap = $snapshots->{$snap}->{child};
+
+ my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
+ my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
+
+ #if first snapshot
+ if(!$parentsnap) {
+ print"delete first snapshot $childpath\n";
+ if($childpath eq $path) {
+ #if child is the current (last snapshot), we need to a live active-commit
+ print"commit first snapshot $snappath to current $path\n";
+ blockdev_live_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
+ print" rename $snappath to $path\n";
+ blockdev_current_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $path);
+ } else {
+ print"commit first snapshot $snappath to $childpath path\n";
+ blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
+ print" rename $snappath to $childpath\n";
+ blockdev_snap_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $childpath);
+ }
+ } else {
+ #intermediate snapshot, we just need to commit the snapshot
+ print"commit intermediate snapshot $snappath to $parentpath\n";
+ blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $parentpath, 'auto');
+ }
+ } else {
+ mon_cmd(
+ $vmid,
+ 'blockdev-snapshot-delete-internal-sync',
+ device => $attached_deviceid,
+ name => $snap,
+ );
+ }
} else {
PVE::Storage::volume_snapshot_delete(
$storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
@@ -8066,6 +8353,8 @@ sub do_snapshots_with_qemu {
return 1;
}
+ return 2 if $scfg->{snapext} || $scfg->{type} eq 'lvm' && $volid =~ m/\.(qcow2)/;
+
if ($volid =~ m/\.(qcow2|qed)$/){
return 1;
}
@@ -9169,6 +9458,38 @@ sub delete_ifaces_ipams_ips {
}
}
+sub find_blockdev_node {
+ my ($nodes, $path, $type) = @_;
+
+ my $found_nodeid = undef;
+ my $found_node = undef;
+ for my $nodeid (keys %$nodes) {
+ my $node = $nodes->{$nodeid};
+ if ($nodeid =~ m/^$type-(\S+)$/ && $node->{file} eq $path ) {
+ $found_node = $node;
+ last;
+ }
+ }
+ die "can't found nodeid for file $path\n" if !$found_node;
+ return $found_node;
+}
+
+sub find_parent_node {
+ my ($nodes, $backing_path) = @_;
+
+ my $found_nodeid = undef;
+ my $found_node = undef;
+ for my $nodeid (keys %$nodes) {
+ my $node = $nodes->{$nodeid};
+ if ($nodeid =~ m/^fmt-(\S+)$/ && $node->{backing_file} && $node->{backing_file} eq $backing_path) {
+ $found_node = $node;
+ last;
+ }
+ }
+ die "can't found nodeid for file $backing_path\n" if !$found_node;
+ return $found_node;
+}
+
sub find_fmt_nodename_drive {
my ($storecfg, $vmid, $drive, $nodes) = @_;
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
@ 2025-01-08 13:27 ` Fabian Grünbichler
2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
0 siblings, 2 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 13:27 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> This is needed for external snapshot live commit,
> when the top blocknode is not the fmt-node.
> (in our case, the throttle-group node is the topnode)
so this is needed to workaround a limitation in block-commit? I think if we need this it should probably be submitted upstream for inclusion, or we provide our own copy of block-commit with it in the meantime?
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> ...052-block-commit-add-replaces-option.patch | 137 ++++++++++++++++++
> debian/patches/series | 1 +
> 2 files changed, 138 insertions(+)
> create mode 100644 debian/patches/pve/0052-block-commit-add-replaces-option.patch
>
> diff --git a/debian/patches/pve/0052-block-commit-add-replaces-option.patch b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
> new file mode 100644
> index 0000000..2488b5b
> --- /dev/null
> +++ b/debian/patches/pve/0052-block-commit-add-replaces-option.patch
> @@ -0,0 +1,137 @@
> +From ae39fd3bb72db440cf380978af9bf5693c12ac6c Mon Sep 17 00:00:00 2001
> +From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> +Date: Wed, 11 Dec 2024 16:20:25 +0100
> +Subject: [PATCH] block-commit: add replaces option
> +
> +This use same code than drive-mirror for live commit, but the option
> +is not send currently.
> +
> +Allow to replaces a different node than the root node after the block-commit
> +(as we use throttle-group as root, and not the drive)
> +---
> + block/mirror.c | 4 ++--
> + block/replication.c | 2 +-
> + blockdev.c | 4 ++--
> + include/block/block_int-global-state.h | 4 +++-
> + qapi/block-core.json | 5 ++++-
> + qemu-img.c | 2 +-
> + 6 files changed, 13 insertions(+), 8 deletions(-)
> +
> +diff --git a/block/mirror.c b/block/mirror.c
> +index 2f12238..1a5e528 100644
> +--- a/block/mirror.c
> ++++ b/block/mirror.c
> +@@ -2086,7 +2086,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
> + int64_t speed, BlockdevOnError on_error,
> + const char *filter_node_name,
> + BlockCompletionFunc *cb, void *opaque,
> +- bool auto_complete, Error **errp)
> ++ bool auto_complete, const char *replaces, Error **errp)
> + {
> + bool base_read_only;
> + BlockJob *job;
> +@@ -2102,7 +2102,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
> + }
> +
> + job = mirror_start_job(
> +- job_id, bs, creation_flags, base, NULL, speed, 0, 0,
> ++ job_id, bs, creation_flags, base, replaces, speed, 0, 0,
> + MIRROR_LEAVE_BACKING_CHAIN, false,
> + on_error, on_error, true, cb, opaque,
> + &commit_active_job_driver, MIRROR_SYNC_MODE_FULL,
> +diff --git a/block/replication.c b/block/replication.c
> +index 0415a5e..debbe25 100644
> +--- a/block/replication.c
> ++++ b/block/replication.c
> +@@ -711,7 +711,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
> + s->commit_job = commit_active_start(
> + NULL, bs->file->bs, s->secondary_disk->bs,
> + JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
> +- NULL, replication_done, bs, true, errp);
> ++ NULL, replication_done, bs, true, NULL, errp);
> + bdrv_graph_rdunlock_main_loop();
> + break;
> + default:
> +diff --git a/blockdev.c b/blockdev.c
> +index cbe2243..349fb71 100644
> +--- a/blockdev.c
> ++++ b/blockdev.c
> +@@ -2435,7 +2435,7 @@ void qmp_block_commit(const char *job_id, const char *device,
> + const char *filter_node_name,
> + bool has_auto_finalize, bool auto_finalize,
> + bool has_auto_dismiss, bool auto_dismiss,
> +- Error **errp)
> ++ const char *replaces, Error **errp)
> + {
> + BlockDriverState *bs;
> + BlockDriverState *iter;
> +@@ -2596,7 +2596,7 @@ void qmp_block_commit(const char *job_id, const char *device,
> + job_id = bdrv_get_device_name(bs);
> + }
> + commit_active_start(job_id, top_bs, base_bs, job_flags, speed, on_error,
> +- filter_node_name, NULL, NULL, false, &local_err);
> ++ filter_node_name, NULL, NULL, false, replaces, &local_err);
> + } else {
> + BlockDriverState *overlay_bs = bdrv_find_overlay(bs, top_bs);
> + if (bdrv_op_is_blocked(overlay_bs, BLOCK_OP_TYPE_COMMIT_TARGET, errp)) {
> +diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
> +index f0c642b..194b580 100644
> +--- a/include/block/block_int-global-state.h
> ++++ b/include/block/block_int-global-state.h
> +@@ -115,6 +115,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
> + * @cb: Completion function for the job.
> + * @opaque: Opaque pointer value passed to @cb.
> + * @auto_complete: Auto complete the job.
> ++ * @replaces: Block graph node name to replace once the commit is done.
> + * @errp: Error object.
> + *
> + */
> +@@ -123,7 +124,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
> + int64_t speed, BlockdevOnError on_error,
> + const char *filter_node_name,
> + BlockCompletionFunc *cb, void *opaque,
> +- bool auto_complete, Error **errp);
> ++ bool auto_complete, const char *replaces,
> ++ Error **errp);
> + /*
> + * mirror_start:
> + * @job_id: The id of the newly-created job, or %NULL to use the
> +diff --git a/qapi/block-core.json b/qapi/block-core.json
> +index ff441d4..50564c7 100644
> +--- a/qapi/block-core.json
> ++++ b/qapi/block-core.json
> +@@ -2098,6 +2098,8 @@
> + # disappear from the query list without user intervention.
> + # Defaults to true. (Since 3.1)
> + #
> ++# @replaces: graph node name to be replaced base image node.
> ++#
> + # Features:
> + #
> + # @deprecated: Members @base and @top are deprecated. Use @base-node
> +@@ -2125,7 +2127,8 @@
> + '*speed': 'int',
> + '*on-error': 'BlockdevOnError',
> + '*filter-node-name': 'str',
> +- '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
> ++ '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
> ++ '*replaces': 'str' },
> + 'allow-preconfig': true }
> +
> + ##
> +diff --git a/qemu-img.c b/qemu-img.c
> +index a6c88e0..f6c59bc 100644
> +--- a/qemu-img.c
> ++++ b/qemu-img.c
> +@@ -1079,7 +1079,7 @@ static int img_commit(int argc, char **argv)
> +
> + commit_active_start("commit", bs, base_bs, JOB_DEFAULT, rate_limit,
> + BLOCKDEV_ON_ERROR_REPORT, NULL, common_block_job_cb,
> +- &cbi, false, &local_err);
> ++ &cbi, false, NULL, &local_err);
> + if (local_err) {
> + goto done;
> + }
> +--
> +2.39.5
> +
> diff --git a/debian/patches/series b/debian/patches/series
> index 93c97bf..e604a23 100644
> --- a/debian/patches/series
> +++ b/debian/patches/series
> @@ -92,3 +92,4 @@ pve/0048-PVE-backup-fixup-error-handling-for-fleecing.patch
> pve/0049-PVE-backup-factor-out-setting-up-snapshot-access-for.patch
> pve/0050-PVE-backup-save-device-name-in-device-info-structure.patch
> pve/0051-PVE-backup-include-device-name-in-error-when-setting.patch
> +pve/0052-block-commit-add-replaces-option.patch
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
@ 2025-01-08 14:17 ` Fabian Grünbichler
2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:17 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> The blockdev chain is:
> -throttle-group-node (drive-(ide|scsi|virtio)x)
> - format-node (fmt-drive-x)
> - file-node (file-drive -x)
>
> fixme: implement iscsi:// path
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 351 +++++++++++++++++++++++++++++++---------------
> 1 file changed, 237 insertions(+), 114 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 8192599a..2832ed09 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1464,7 +1464,8 @@ sub print_drivedevice_full {
> } else {
> $device .= ",bus=ahci$controller.$unit";
> }
> - $device .= ",drive=drive-$drive_id,id=$drive_id";
> + $device .= ",id=$drive_id";
> + $device .= ",drive=drive-$drive_id" if $device_type ne 'cd' || $drive->{file} ne 'none';
is this just because you remove the whole drive when ejecting? not sure whether that is really needed..
>
> if ($device_type eq 'hd') {
> if (my $model = $drive->{model}) {
> @@ -1490,6 +1491,13 @@ sub print_drivedevice_full {
> $device .= ",serial=$serial";
> }
>
> + my $writecache = $drive->{cache} && $drive->{cache} =~ /^(?:none|writeback|unsafe)$/ ? "on" : "off";
> + $device .= ",write-cache=$writecache" if $drive->{media} && $drive->{media} ne 'cdrom';
> +
> + my @qemu_drive_options = qw(heads secs cyls trans rerror werror);
> + foreach my $o (@qemu_drive_options) {
> + $device .= ",$o=$drive->{$o}" if defined($drive->{$o});
> + }
>
> return $device;
> }
> @@ -1539,145 +1547,256 @@ my sub drive_uses_cache_direct {
> return $cache_direct;
> }
>
> -sub print_drive_commandline_full {
> - my ($storecfg, $vmid, $drive, $live_restore_name, $io_uring) = @_;
> +sub print_drive_throttle_group {
> + my ($drive) = @_;
> + #command line can't use the structured json limits option,
> + #so limit params need to use with x- as it's unstable api
this comment should be below the early return, or above the whole sub.
> + return if drive_is_cdrom($drive) && $drive->{file} eq 'none';
is this needed if we keep empty cdrom drives around like before? I know throttling practically makes no sense in that case, but it might make the code in general more simple?
>
> - my $path;
> - my $volid = $drive->{file};
> my $drive_id = get_drive_id($drive);
>
> + my $throttle_group = "throttle-group,id=throttle-drive-$drive_id";
> + foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
> + my ($dir, $qmpname) = @$type;
> +
> + if (my $v = $drive->{"mbps$dir"}) {
> + $throttle_group .= ",x-bps$qmpname=".int($v*1024*1024);
> + }
> + if (my $v = $drive->{"mbps${dir}_max"}) {
> + $throttle_group .= ",x-bps$qmpname-max=".int($v*1024*1024);
> + }
> + if (my $v = $drive->{"bps${dir}_max_length"}) {
> + $throttle_group .= ",x-bps$qmpname-max-length=$v";
> + }
> + if (my $v = $drive->{"iops${dir}"}) {
> + $throttle_group .= ",x-iops$qmpname=$v";
> + }
> + if (my $v = $drive->{"iops${dir}_max"}) {
> + $throttle_group .= ",x-iops$qmpname-max=$v";
> + }
> + if (my $v = $drive->{"iops${dir}_max_length"}) {
> + $throttle_group .= ",x-iops$qmpname-max-length=$v";
> + }
> + }
> +
> + return $throttle_group;
> +}
> +
> +sub generate_file_blockdev {
> + my ($storecfg, $drive, $nodename) = @_;
> +
> + my $volid = $drive->{file};
> my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
> - my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
>
> - if (drive_is_cdrom($drive)) {
> - $path = get_iso_path($storecfg, $vmid, $volid);
> - die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
> + my $scfg = undef;
> + my $path = $volid;
I think this should only happen if the parse_volume_id above told us this is an absolute path and not a PVE-managed volume..
> + if($storeid && $storeid ne 'nbd') {
this is wrong.. I guess it's also somewhat wrong in the old qemu_drive_mirror code.. we should probably check using a more specific RE that the "volid" is an NBD URI, and not attempt to parse it as a regular volid in that case..
> + $scfg = PVE::Storage::storage_config($storecfg, $storeid);
> + $path = PVE::Storage::path($storecfg, $volid);
> + }
> +
> + my $blockdev = {};
> +
> + if ($path =~ m/^rbd:(\S+)$/) {
> +
> + $blockdev->{driver} = 'rbd';
> +
> + my @rbd_options = split(/:/, $1);
> + my $keyring = undef;
> + for my $option (@rbd_options) {
> + if ($option =~ m/^(\S+)=(\S+)$/) {
> + my $key = $1;
> + my $value = $2;
> + $blockdev->{'auth-client-required'} = [$value] if $key eq 'auth_supported';
> + $blockdev->{'conf'} = $value if $key eq 'conf';
> + $blockdev->{'user'} = $value if $key eq 'id';
> + $keyring = $value if $key eq 'keyring';
> + if ($key eq 'mon_host') {
> + my $server = [];
> + my @mons = split(';', $value);
> + for my $mon (@mons) {
> + my ($host, $port) = PVE::Tools::parse_host_and_port($mon);
> + $port = '3300' if !$port;
> + push @$server, { host => $host, port => $port };
> + }
> + $blockdev->{server} = $server;
> + }
> + } elsif ($option =~ m|^(\S+)/(\S+)$|){
> + $blockdev->{pool} = $1;
> + my $image = $2;
> +
> + if($image =~ m|^(\S+)/(\S+)$|) {
> + $blockdev->{namespace} = $1;
> + $blockdev->{image} = $2;
> + } else {
> + $blockdev->{image} = $image;
> + }
> + }
> + }
> +
> + if($keyring && $blockdev->{server}) {
> + #qemu devs are removed passing arbitrary values to blockdev object, and don't have added
> + #keyring to the list of allowed keys. It need to be defined in the store ceph.conf.
> + #https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02676.html
> + #another way could be to simply patch qemu to allow the key
I think we either want to allow the keys we need in Qemu (and upstream that), or we want to write the config out to a temporary config and clean that up after Qemu has read its contents..
> + my $ceph_conf = "/etc/pve/priv/ceph/${storeid}.conf";
this file is already taken for external Ceph clusters, we can't just re-use for this purpose without a lot of side effects I think..
> + $blockdev->{conf} = $ceph_conf;
> + if (!-e $ceph_conf) {
> + my $content = "[global]\nkeyring = $keyring\n";
> + PVE::Tools::file_set_contents($ceph_conf, $content, 0400);
> + }
> + }
> + } elsif ($path =~ m/^nbd:(\S+):(\d+):exportname=(\S+)$/) {
> + my $server = { type => 'inet', host => $1, port => $2 };
> + $blockdev = { driver => 'nbd', server => $server, export => $3 };
> + } elsif ($path =~ m/^nbd:unix:(\S+):exportname=(\S+)$/) {
> + my $server = { type => 'unix', path => $1 };
> + $blockdev = { driver => 'nbd', server => $server, export => $2 };
> + } elsif ($path =~ m|^gluster(\+(tcp\|unix\|rdma))?://(.*)/(.*)/(images/(\S+)/(\S+))$|) {
> + my $protocol = $2 ? $2 : 'inet';
> + $protocol = 'inet' if $protocol eq 'tcp';
> + my $server = [{ type => $protocol, host => $3, port => '24007' }];
> + $blockdev = { driver => 'gluster', server => $server, volume => $4, path => $5 };
> + } elsif ($path =~ m/^\/dev/) {
> + my $driver = drive_is_cdrom($drive) ? 'host_cdrom' : 'host_device';
> + $blockdev = { driver => $driver, filename => $path };
> + } elsif ($path =~ m/^\//) {
> + $blockdev = { driver => 'file', filename => $path};
> } else {
> - if ($storeid) {
> - $path = PVE::Storage::path($storecfg, $volid);
> - } else {
> - $path = $volid;
> + die "unsupported path: $path\n";
> + #fixme
> + #'{"driver":"iscsi","portal":"iscsi.example.com:3260","target":"demo-target","lun":3,"transport":"tcp"}'
> + }
> +
> + my $cache_direct = drive_uses_cache_direct($drive, $scfg);
> + my $cache = {};
> + $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
> + $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
> + $blockdev->{cache} = $cache;
> +
> + ##aio
> + if($blockdev->{filename}) {
> + $drive->{aio} = 'threads' if drive_is_cdrom($drive);
> + my $aio = $drive->{aio};
> + if (!$aio) {
> + if (storage_allows_io_uring_default($scfg, $cache_direct)) {
> + # io_uring supports all cache modes
> + $aio = "io_uring";
> + } else {
> + # aio native works only with O_DIRECT
> + if($cache_direct) {
> + $aio = "native";
> + } else {
> + $aio = "threads";
> + }
> + }
> }
> + $blockdev->{aio} = $aio;
> }
>
> - # For PVE-managed volumes, use the format from the storage layer and prevent overrides via the
> - # drive's 'format' option. For unmanaged volumes, fallback to 'raw' to avoid auto-detection by
> - # QEMU. For the special case 'none' (get_iso_path() returns an empty $path), there should be no
> - # format or QEMU won't start.
> - my $format;
> - if (drive_is_cdrom($drive) && !$path) {
> - # no format
> - } elsif ($storeid) {
> - $format = checked_volume_format($storecfg, $volid);
> + ##discard && detect-zeroes
> + my $discard = 'ignore';
> + if($drive->{discard}) {
> + $discard = $drive->{discard};
> + $discard = 'unmap' if $discard eq 'on';
> + }
> + $blockdev->{discard} = $discard if !drive_is_cdrom($drive);
>
> - if ($drive->{format} && $drive->{format} ne $format) {
> - die "drive '$drive->{interface}$drive->{index}' - volume '$volid'"
> - ." - 'format=$drive->{format}' option different from storage format '$format'\n";
> - }
> + my $detectzeroes;
nit: detect_zeroes
> + if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
> + $detectzeroes = 'off';
> + } elsif ($drive->{discard}) {
> + $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
> } else {
> - $format = $drive->{format} // 'raw';
> + # This used to be our default with discard not being specified:
> + $detectzeroes = 'on';
> }
> + $blockdev->{'detect-zeroes'} = $detectzeroes if !drive_is_cdrom($drive);
> + $blockdev->{'node-name'} = $nodename if $nodename;
this last line could be a lot higher up?
>
> - my $is_rbd = $path =~ m/^rbd:/;
> + return $blockdev;
> +}
>
> - my $opts = '';
> - my @qemu_drive_options = qw(heads secs cyls trans media cache rerror werror aio discard);
> - foreach my $o (@qemu_drive_options) {
> - $opts .= ",$o=$drive->{$o}" if defined($drive->{$o});
> - }
> +sub generate_format_blockdev {
> + my ($storecfg, $drive, $nodename, $file, $force_readonly) = @_;
>
> - # snapshot only accepts on|off
> - if (defined($drive->{snapshot})) {
> - my $v = $drive->{snapshot} ? 'on' : 'off';
> - $opts .= ",snapshot=$v";
> - }
> + my $volid = $drive->{file};
> + my $scfg = undef;
> + my $path = $volid;
path is not used at all, other than being conditionally overwritten below..
> + my $format = $drive->{format};
> + $format //= "raw";
the format handling here is very sensitive, and I think this broke it. see the big comment this patch removed ;)
short summary: for PVE-managed volumes we want the format from the storage layer (via checked_volume_format). if the drive has a format set that disagrees, that is a hard error. for absolute paths we us the format from the drive with a fallback to raw.
>
> - if (defined($drive->{ro})) { # ro maps to QEMUs `readonly`, which accepts `on` or `off` only
> - $opts .= ",readonly=" . ($drive->{ro} ? 'on' : 'off');
> - }
> + my $drive_id = get_drive_id($drive);
>
> - foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
> - my ($dir, $qmpname) = @$type;
> - if (my $v = $drive->{"mbps$dir"}) {
> - $opts .= ",throttling.bps$qmpname=".int($v*1024*1024);
> - }
> - if (my $v = $drive->{"mbps${dir}_max"}) {
> - $opts .= ",throttling.bps$qmpname-max=".int($v*1024*1024);
> - }
> - if (my $v = $drive->{"bps${dir}_max_length"}) {
> - $opts .= ",throttling.bps$qmpname-max-length=$v";
> - }
> - if (my $v = $drive->{"iops${dir}"}) {
> - $opts .= ",throttling.iops$qmpname=$v";
> - }
> - if (my $v = $drive->{"iops${dir}_max"}) {
> - $opts .= ",throttling.iops$qmpname-max=$v";
> - }
> - if (my $v = $drive->{"iops${dir}_max_length"}) {
> - $opts .= ",throttling.iops$qmpname-max-length=$v";
> - }
> + if ($drive->{zeroinit}) {
> + #fixme how to handle zeroinit ? insert special blockdev filter ?
> }
>
> - if ($live_restore_name) {
> - $format = "rbd" if $is_rbd;
> - die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
> - if !$format;
> - $opts .= ",format=alloc-track,file.driver=$format";
> - } elsif ($format) {
> - $opts .= ",format=$format";
> + my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
so I guess this should never be called with nbd-URI-volids?
nit: $volname is not used anywhere, so can be removed..
> +
> + if($storeid) {
> + $scfg = PVE::Storage::storage_config($storecfg, $storeid);
> + $format = checked_volume_format($storecfg, $volid);
this is missing the comparison against $drive->{format}
> + $path = PVE::Storage::path($storecfg, $volid);
this is not used anywhere..
> }
>
> + my $readonly = defined($drive->{ro}) || $force_readonly ? JSON::true : JSON::false;
> +
> + #libvirt define cache option on both format && file
> my $cache_direct = drive_uses_cache_direct($drive, $scfg);
> + my $cache = {};
> + $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
> + $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq 'unsafe' ? JSON::true : JSON::false;
so we have the same code in two places? should probably be a helper then to not have them go out of sync..
>
> - $opts .= ",cache=none" if !$drive->{cache} && $cache_direct;
> + my $blockdev = { driver => $format, file => $file, cache => $cache, 'read-only' => $readonly };
> + $blockdev->{'node-name'} = $nodename if $nodename;
>
> - if (!$drive->{aio}) {
> - if ($io_uring && storage_allows_io_uring_default($scfg, $cache_direct)) {
> - # io_uring supports all cache modes
> - $opts .= ",aio=io_uring";
> - } else {
> - # aio native works only with O_DIRECT
> - if($cache_direct) {
> - $opts .= ",aio=native";
> - } else {
> - $opts .= ",aio=threads";
> - }
> - }
> - }
> + return $blockdev;
>
> - if (!drive_is_cdrom($drive)) {
> - my $detectzeroes;
> - if (defined($drive->{detect_zeroes}) && !$drive->{detect_zeroes}) {
> - $detectzeroes = 'off';
> - } elsif ($drive->{discard}) {
> - $detectzeroes = $drive->{discard} eq 'on' ? 'unmap' : 'on';
> - } else {
> - # This used to be our default with discard not being specified:
> - $detectzeroes = 'on';
> - }
> +}
>
> - # note: 'detect-zeroes' works per blockdev and we want it to persist
> - # after the alloc-track is removed, so put it on 'file' directly
> - my $dz_param = $live_restore_name ? "file.detect-zeroes" : "detect-zeroes";
> - $opts .= ",$dz_param=$detectzeroes" if $detectzeroes;
> - }
> +sub generate_drive_blockdev {
> + my ($storecfg, $vmid, $drive, $force_readonly, $live_restore_name) = @_;
>
> - if ($live_restore_name) {
> - $opts .= ",backing=$live_restore_name";
> - $opts .= ",auto-remove=on";
> + my $path;
> + my $volid = $drive->{file};
> + my $format = $drive->{format};
this is only used once below
> + my $drive_id = get_drive_id($drive);
> +
> + my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
> + my $scfg = $storeid ? PVE::Storage::storage_config($storecfg, $storeid) : undef;
> +
> + my $blockdevs = [];
> +
> + if (drive_is_cdrom($drive)) {
> + die "$drive_id: cannot back cdrom drive with a live restore image\n" if $live_restore_name;
> +
> + $path = get_iso_path($storecfg, $vmid, $volid);
> + return if !$path;
> + $force_readonly = 1;
> }
>
> - # my $file_param = $live_restore_name ? "file.file.filename" : "file";
> - my $file_param = "file";
> + my $file_nodename = "file-drive-$drive_id";
> + my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
> + my $fmt_nodename = "fmt-drive-$drive_id";
> + my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
> +
> + my $blockdev_live_restore = undef;
> if ($live_restore_name) {
> - # non-rbd drivers require the underlying file to be a separate block
> - # node, so add a second .file indirection
> - $file_param .= ".file" if !$is_rbd;
> - $file_param .= ".filename";
> + die "$drive_id: Proxmox Backup Server backed drive cannot auto-detect the format\n"
> + if !$format;
for this check, but it is not actually set anywhere here.. so is something missing or can the check go?
> +
> + $blockdev_live_restore = { 'node-name' => "liverestore-drive-$drive_id",
> + backing => $live_restore_name,
> + 'auto-remove' => 'on', format => "alloc-track",
> + file => $blockdev_format };
> }
> - my $pathinfo = $path ? "$file_param=$path," : '';
>
> - return "${pathinfo}if=none,id=drive-$drive->{interface}$drive->{index}$opts";
> + #this is the topfilter entry point, use $drive-drive_id as nodename
> + my $blockdev_throttle = { driver => "throttle", 'node-name' => "drive-$drive_id", 'throttle-group' => "throttle-drive-$drive_id" };
> + #put liverestore filter between throttle && format filter
> + $blockdev_throttle->{file} = $live_restore_name ? $blockdev_live_restore : $blockdev_format;
> + return $blockdev_throttle,
> }
>
> sub print_pbs_blockdev {
> @@ -4091,13 +4210,13 @@ sub config_to_command {
> push @$devices, '-blockdev', $live_restore->{blockdev};
> }
>
> - my $drive_cmd = print_drive_commandline_full(
> - $storecfg, $vmid, $drive, $live_blockdev_name, min_version($kvmver, 6, 0));
> -
> - # extra protection for templates, but SATA and IDE don't support it..
> - $drive_cmd .= ',readonly=on' if drive_is_read_only($conf, $drive);
> + my $throttle_group = print_drive_throttle_group($drive);
> + push @$devices, '-object', $throttle_group if $throttle_group;
>
> - push @$devices, '-drive',$drive_cmd;
> +# # extra protection for templates, but SATA and IDE don't support it..
> + my $force_readonly = drive_is_read_only($conf, $drive);
> + my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive, $force_readonly, $live_blockdev_name);
> + push @$devices, '-blockdev', encode_json_ordered($blockdev) if $blockdev;
> push @$devices, '-device', print_drivedevice_full(
> $storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type);
> });
> @@ -8986,4 +9105,8 @@ sub delete_ifaces_ipams_ips {
> }
> }
>
> +sub encode_json_ordered {
> + return JSON->new->canonical->allow_nonref->encode( $_[0] );
> +}
this is only used in a single place..
> +
> 1;
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
@ 2025-01-08 14:26 ` Fabian Grünbichler
2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:26 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> fixme/testme :
> PVE/VZDump/QemuServer.pm: eval { PVE::QemuServer::qemu_drivedel($vmid, "tpmstate0-backup"); };
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 64 +++++++++++++++++++++++++++++++++--------------
> 1 file changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2832ed09..baf78ec0 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1582,6 +1582,42 @@ sub print_drive_throttle_group {
> return $throttle_group;
> }
>
> +sub generate_throttle_group {
> + my ($drive) = @_;
> +
> + my $drive_id = get_drive_id($drive);
> +
> + my $throttle_group = { id => "throttle-drive-$drive_id" };
> + my $limits = {};
> +
> + foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-write']) {
> + my ($dir, $qmpname) = @$type;
> +
> + if (my $v = $drive->{"mbps$dir"}) {
> + $limits->{"bps$qmpname"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"mbps${dir}_max"}) {
> + $limits->{"bps$qmpname-max"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"bps${dir}_max_length"}) {
> + $limits->{"bps$qmpname-max-length"} = int($v)
> + }
> + if (my $v = $drive->{"iops${dir}"}) {
> + $limits->{"iops$qmpname"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max"}) {
> + $limits->{"iops$qmpname-max"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max_length"}) {
> + $limits->{"iops$qmpname-max-length"} = int($v);
> + }
> + }
> +
> + $throttle_group->{limits} = $limits;
> +
> + return $throttle_group;
this and the corresponding print sub are exactly the same, so the print sub could call this and join the limits with the `x-` prefix added? how does this interact with the qemu_block_set_io_throttle helper used when updating the limits at runtime?
> +}
> +
> sub generate_file_blockdev {
> my ($storecfg, $drive, $nodename) = @_;
>
> @@ -4595,32 +4631,22 @@ sub qemu_iothread_del {
> }
>
> sub qemu_driveadd {
> - my ($storecfg, $vmid, $device) = @_;
> + my ($storecfg, $vmid, $drive) = @_;
>
> - my $kvmver = get_running_qemu_version($vmid);
> - my $io_uring = min_version($kvmver, 6, 0);
> - my $drive = print_drive_commandline_full($storecfg, $vmid, $device, undef, $io_uring);
> - $drive =~ s/\\/\\\\/g;
> - my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add auto \"$drive\"", 60);
> -
> - # If the command succeeds qemu prints: "OK"
> - return 1 if $ret =~ m/OK/s;
> + my $drive_id = get_drive_id($drive);
> + my $throttle_group = generate_throttle_group($drive);
do we always need a throttle group? or would we benefit from only adding it when limits are set, and skip that node when I/O is unlimited?
> + mon_cmd($vmid, 'object-add', "qom-type" => "throttle-group", %$throttle_group);
>
> - die "adding drive failed: $ret\n";
> + my $blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
> + mon_cmd($vmid, 'blockdev-add', %$blockdev, timeout => 10 * 60);
> + return 1;
> }
>
> sub qemu_drivedel {
> my ($vmid, $deviceid) = @_;
>
> - my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_del drive-$deviceid", 10 * 60);
> - $ret =~ s/^\s+//;
> -
> - return 1 if $ret eq "";
> -
> - # NB: device not found errors mean the drive was auto-deleted and we ignore the error
> - return 1 if $ret =~ m/Device \'.*?\' not found/s;
> -
> - die "deleting drive $deviceid failed : $ret\n";
> + mon_cmd($vmid, 'blockdev-del', 'node-name' => "drive-$deviceid", timeout => 10 * 60);
> + mon_cmd($vmid, 'object-del', id => "throttle-drive-$deviceid");
> }
>
> sub qemu_deviceaddverify {
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
@ 2025-01-08 14:31 ` Fabian Grünbichler
2025-01-13 7:56 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:31 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Look at qdev value, as cdrom drives can be empty
> without any inserted media
is this needed if we don't drive_del the cdrom drive when ejecting the medium?
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index baf78ec0..3b33fd7d 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4425,10 +4425,9 @@ sub vm_devices_list {
> }
>
> my $resblock = mon_cmd($vmid, 'query-block');
> - foreach my $block (@$resblock) {
> - if($block->{device} =~ m/^drive-(\S+)/){
> - $devices->{$1} = 1;
> - }
> + $resblock = { map { $_->{qdev} => $_ } $resblock->@* };
> + foreach my $blockid (keys %$resblock) {
> + $devices->{$blockid} = 1;
> }
>
> my $resmice = mon_cmd($vmid, 'query-mice');
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
@ 2025-01-08 14:34 ` Fabian Grünbichler
0 siblings, 0 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 14:34 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 3b33fd7d..758c8240 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -5694,7 +5694,10 @@ sub vmconfig_update_disk {
> } else { # cdrom
>
> if ($drive->{file} eq 'none') {
> - mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
> + mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
> + mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
> + qemu_drivedel($vmid, $opt);
the drivedel here
> +
> if (drive_is_cloudinit($old_drive)) {
> vmconfig_register_unused_drive($storecfg, $vmid, $conf, $old_drive);
> }
> @@ -5702,14 +5705,16 @@ sub vmconfig_update_disk {
> my $path = get_iso_path($storecfg, $vmid, $drive->{file});
>
> # force eject if locked
> - mon_cmd($vmid, "eject", force => JSON::true, id => "$opt");
> + mon_cmd($vmid, "blockdev-open-tray", force => JSON::true, id => $opt);
> + mon_cmd($vmid, "blockdev-remove-medium", id => $opt);
> + eval { qemu_drivedel($vmid, $opt) };
and here
>
> if ($path) {
> - mon_cmd($vmid, "blockdev-change-medium",
> - id => "$opt", filename => "$path");
> + qemu_driveadd($storecfg, $vmid, $drive);
and the driveadd here seem kind of weird..
are they really needed (also see comments on other patches)?
> + mon_cmd($vmid, "blockdev-insert-medium", id => $opt, 'node-name' => "drive-$opt");
> + mon_cmd($vmid, "blockdev-close-tray", id => $opt);
> }
> }
> -
> return 1;
> }
> }
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
@ 2025-01-08 15:19 ` Fabian Grünbichler
2025-01-13 8:27 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0d0d4c4d73110cf0e692cae0ee65bf7f9a6ce93a.camel@groupe-cyllene.com>
0 siblings, 2 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-08 15:19 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuMigrate.pm | 2 +-
> PVE/QemuServer.pm | 106 +++++++++++++++++++++++++++++++++++----------
> 2 files changed, 83 insertions(+), 25 deletions(-)
>
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index ed5ede30..88627ce4 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -1134,7 +1134,7 @@ sub phase2 {
> my $bitmap = $target->{bitmap};
>
> $self->log('info', "$drive: start migration to $nbd_uri");
> - PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
> + PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $source_drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
> }
>
> if (PVE::QemuServer::QMPHelpers::runs_at_least_qemu_version($vmid, 8, 2)) {
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 6bebb906..3d7c41ee 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -8184,59 +8184,85 @@ sub qemu_img_convert {
> }
>
> sub qemu_drive_mirror {
> - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
> + my ($vmid, $driveid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $completion, $qga, $bwlimit, $src_bitmap) = @_;
the $driveid is contained in $drive (in the form of index and interface). this would still be a breaking change since $drive before was the $driveid, and now it's the parsed drive ;)
>
> $jobs = {} if !$jobs;
> + my $deviceid = "drive-$driveid";
> + my $dst_format;
> + my $dst_path = $dst_volid;
> + my $jobid = "mirror-$deviceid";
> + $jobs->{$jobid} = {};
>
> - my $qemu_target;
> - my $format;
> - $jobs->{"drive-$drive"} = {};
> + my $storecfg = PVE::Storage::config();
>
> if ($dst_volid =~ /^nbd:/) {
> - $qemu_target = $dst_volid;
> - $format = "nbd";
> + $dst_format = "nbd";
> } else {
> - my $storecfg = PVE::Storage::config();
> -
> - $format = checked_volume_format($storecfg, $dst_volid);
> -
> - my $dst_path = PVE::Storage::path($storecfg, $dst_volid);
> -
> - $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
> + $dst_format = checked_volume_format($storecfg, $dst_volid);
> + $dst_path = PVE::Storage::path($storecfg, $dst_volid);
> + }
> +
> + # copy original drive config (aio,cache,discard,...)
> + my $dst_drive = dclone($drive);
> + $dst_drive->{format} = $dst_format;
> + $dst_drive->{file} = $dst_path;
> + $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
> + #improve: if target storage don't support aio uring,change it to default native
> + #and remove clone_disk_check_io_uring()
> +
> + #add new block device
> + my $nodes = get_blockdev_nodes($vmid);
> +
> + my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
> + my $target_file_blockdev = generate_file_blockdev($storecfg, $dst_drive, $target_file_nodename);
> + my $target_nodename = undef;
> +
> + if ($dst_format eq 'nbd') {
> + #nbd file don't have fmt
> + $target_nodename = $target_file_nodename;
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_file_blockdev);
> + } else {
> + $target_nodename = $target_fmt_nodename;
> + my $target_fmt_blockdev = generate_format_blockdev($storecfg, $dst_drive, $target_fmt_nodename, $target_file_blockdev);
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> }
>
> + #we replace the original src_fmt node in the blockdev graph
> + my $src_fmt_nodename = find_fmt_nodename_drive($storecfg, $vmid, $drive, $nodes);
> my $opts = {
> + 'job-id' => $jobid,
> timeout => 10,
> - device => "drive-$drive",
> - mode => "existing",
> + device => $deviceid,
> + replaces => $src_fmt_nodename,
> sync => "full",
> - target => $qemu_target,
> + target => $target_nodename,
> 'auto-dismiss' => JSON::false,
> };
> - $opts->{format} = $format if $format;
>
> if (defined($src_bitmap)) {
> $opts->{sync} = 'incremental';
> - $opts->{bitmap} = $src_bitmap;
> + $opts->{bitmap} = $src_bitmap; ##FIXME: how to handle bitmap ? special proxmox patch ?
> print "drive mirror re-using dirty bitmap '$src_bitmap'\n";
> }
>
> if (defined($bwlimit)) {
> $opts->{speed} = $bwlimit * 1024;
> - print "drive mirror is starting for drive-$drive with bandwidth limit: ${bwlimit} KB/s\n";
> + print "drive mirror is starting for $deviceid with bandwidth limit: ${bwlimit} KB/s\n";
> } else {
> - print "drive mirror is starting for drive-$drive\n";
> + print "drive mirror is starting for $deviceid\n";
> }
>
> # if a job already runs for this device we get an error, catch it for cleanup
> - eval { mon_cmd($vmid, "drive-mirror", %$opts); };
> + eval { mon_cmd($vmid, "blockdev-mirror", %$opts); };
> +
> if (my $err = $@) {
> eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) };
> + #FIXME: delete blockdev after job cancel
wouldn't we also need to keep track of the device IDs and pass those to the monitor invocation below? if the block job fails or gets canceled, we also need cleanup there..
> warn "$@\n" if $@;
> die "mirroring error: $err\n";
> }
> -
> - qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga);
> + qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $completion, $qga, 'mirror');
> }
>
> # $completion can be either
> @@ -8595,7 +8621,7 @@ sub clone_disk {
>
> my $sparseinit = PVE::Storage::volume_has_feature($storecfg, 'sparseinit', $newvolid);
> if ($use_drive_mirror) {
> - qemu_drive_mirror($vmid, $src_drivename, $newvolid, $newvmid, $sparseinit, $jobs,
> + qemu_drive_mirror($vmid, $src_drivename, $drive, $newvolid, $newvmid, $sparseinit, $jobs,
> $completion, $qga, $bwlimit);
> } else {
> if ($dst_drivename eq 'efidisk0') {
> @@ -9130,6 +9156,38 @@ sub delete_ifaces_ipams_ips {
> }
> }
>
> +sub find_fmt_nodename_drive {
> + my ($storecfg, $vmid, $drive, $nodes) = @_;
> +
> + my $volid = $drive->{file};
> + my $format = checked_volume_format($storecfg, $volid);
$format is not used?
> + my $path = PVE::Storage::path($storecfg, $volid);
is this guaranteed to be stable? also across versions? and including external storage plugins?
> +
> + my $node = find_blockdev_node($nodes, $path, 'fmt');
that one is only added in a later patch.. but I don't think lookups by path are a good idea, we should probably have a deterministic node naming concept instead? e.g., encode the drive + snapshot name?
> + return $node->{'node-name'};
> +}
> +
> +sub get_blockdev_nextid {
> + my ($nodename, $nodes) = @_;
> + my $version = 0;
> + for my $nodeid (keys %$nodes) {
> + if ($nodeid =~ m/^$nodename-(\d+)$/) {
> + my $current_version = $1;
> + $version = $current_version if $current_version >= $version;
> + }
> + }
> + $version++;
> + return "$nodename-$version";
since we shouldn't ever have more than one job for a drive running (right?), couldn't we just have a deterministic name for this? that would also simplify cleanup, including cleanup of a failed cleanup ;)
> +}
> +
> +sub get_blockdev_nodes {
> + my ($vmid) = @_;
> +
> + my $nodes = PVE::QemuServer::Monitor::mon_cmd($vmid, "query-named-block-nodes");
> + $nodes = { map { $_->{'node-name'} => $_ } $nodes->@* };
> + return $nodes;
> +}
> +
> sub encode_json_ordered {
> return JSON->new->canonical->allow_nonref->encode( $_[0] );
> }
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
@ 2025-01-09 9:51 ` Fabian Grünbichler
2025-01-13 8:38 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 9:51 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> This was a limitation of drive-mirror, blockdev mirror is able
> to reopen image with a different aio
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 41 ++++++++++-------------------------------
> 1 file changed, 10 insertions(+), 31 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 3d7c41ee..dc12b38f 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -8207,8 +8207,16 @@ sub qemu_drive_mirror {
> $dst_drive->{format} = $dst_format;
> $dst_drive->{file} = $dst_path;
> $dst_drive->{zeroinit} = 1 if $is_zero_initialized;
> - #improve: if target storage don't support aio uring,change it to default native
> - #and remove clone_disk_check_io_uring()
> +
> + #change aio if io_uring is not supported on target
> + if ($dst_drive->{aio} && $dst_drive->{aio} eq 'io_uring') {
> + my ($dst_storeid) = PVE::Storage::parse_volume_id($dst_drive->{file});
> + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
> + my $cache_direct = drive_uses_cache_direct($dst_drive, $dst_scfg);
> + if(!storage_allows_io_uring_default($dst_scfg, $cache_direct)) {
> + $dst_drive->{aio} = $cache_direct ? 'native' : 'threads';
> + }
> + }
couldn't/shouldn't we just handle this in generate_file_blockdev?
>
> #add new block device
> my $nodes = get_blockdev_nodes($vmid);
> @@ -8514,33 +8522,6 @@ sub qemu_drive_mirror_switch_to_active_mode {
> }
> }
>
> -# Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
> -# source, but some storages have problems with io_uring, sometimes even leading to crashes.
> -my sub clone_disk_check_io_uring {
> - my ($src_drive, $storecfg, $src_storeid, $dst_storeid, $use_drive_mirror) = @_;
> -
> - return if !$use_drive_mirror;
> -
> - # Don't complain when not changing storage.
> - # Assume if it works for the source, it'll work for the target too.
> - return if $src_storeid eq $dst_storeid;
> -
> - my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
> - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid);
> -
> - my $cache_direct = drive_uses_cache_direct($src_drive);
> -
> - my $src_uses_io_uring;
> - if ($src_drive->{aio}) {
> - $src_uses_io_uring = $src_drive->{aio} eq 'io_uring';
> - } else {
> - $src_uses_io_uring = storage_allows_io_uring_default($src_scfg, $cache_direct);
> - }
> -
> - die "target storage is known to cause issues with aio=io_uring (used by current drive)\n"
> - if $src_uses_io_uring && !storage_allows_io_uring_default($dst_scfg, $cache_direct);
> -}
> -
> sub clone_disk {
> my ($storecfg, $source, $dest, $full, $newvollist, $jobs, $completion, $qga, $bwlimit) = @_;
>
> @@ -8598,8 +8579,6 @@ sub clone_disk {
> $dst_format = 'raw';
> $size = PVE::QemuServer::Drive::TPMSTATE_DISK_SIZE;
> } else {
> - clone_disk_check_io_uring($drive, $storecfg, $src_storeid, $storeid, $use_drive_mirror);
> -
> $size = PVE::Storage::volume_size_info($storecfg, $drive->{file}, 10);
> }
> $newvolid = PVE::Storage::vdisk_alloc(
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2025-01-09 11:57 ` Fabian Grünbichler
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
` (2 more replies)
0 siblings, 3 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 11:57 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
it would be great if there'd be a summary of the design choices and a high level summary of what happens to the files and block-node-graph here. it's a bit hard to judge from the code below whether it would be possible to eliminate the dynamically named block nodes, for example ;)
a few more comments documenting the behaviour and ideally also some tests (mocking the QMP interactions?) would be nice
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuConfig.pm | 4 +-
> PVE/QemuServer.pm | 345 ++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 335 insertions(+), 14 deletions(-)
>
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index ffdf9f03..c17edb46 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -375,7 +375,7 @@ sub __snapshot_create_vol_snapshot {
>
> print "snapshotting '$device' ($drive->{file})\n";
>
> - PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
> + PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
> }
>
> sub __snapshot_delete_remove_drive {
> @@ -412,7 +412,7 @@ sub __snapshot_delete_vol_snapshot {
> my $storecfg = PVE::Storage::config();
> my $volid = $drive->{file};
>
> - PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
> + PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
>
> push @$unused, $volid;
> }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 3a3feadf..f29a8449 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4959,20 +4959,269 @@ sub qemu_block_resize {
> }
>
> sub qemu_volume_snapshot {
> - my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> + my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>
> + my $volid = $drive->{file};
> my $running = check_running($vmid);
> -
> - if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
> - mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;
> + if ($do_snapshots_with_qemu) {
> + if($do_snapshots_with_qemu == 2) {
this could do without the additional nesting:
if ($do_snapshots_with_qemu == 1) {
...
} elsif ($do_snapshots_with_qemu == 2) {
...
} else {
...
}
> + my $snap_path = PVE::Storage::path($storecfg, $volid, $snap);
> + my $path = PVE::Storage::path($storecfg, $volid);
> + blockdev_current_rename($storecfg, $vmid, $deviceid, $drive, $path, $snap_path, 1);
> + blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap);
what about error handling?
> + } else {
> + mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> + }
> } else {
> PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
this invocation here (continued below)
> }
> }
>
> +sub blockdev_external_snapshot {
> + my ($storecfg, $vmid, $deviceid, $drive, $snap) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $path = PVE::Storage::path($storecfg, $volid, $snap);
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
> + my $format_nodename = $format_node->{'node-name'};
> +
> + #preallocate add a new current file
> + my $new_current_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $new_current_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
okay, so here we have a dynamic node name because the desired target name is still occupied. could we rename the old block node first?
> + PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
(continued from above) and this invocation here are the same?? wouldn't this already create the snapshot on the storage layer? and didn't we just hardlink + reopen + unlink to transform the previous current volume into the snap volume?
should this maybe have been vdisk_alloc and it just works by accident?
> + my $new_file_blockdev = generate_file_blockdev($storecfg, $drive, $new_current_file_nodename);
> + my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_current_fmt_nodename, $new_file_blockdev);
> +
> + $new_fmt_blockdev->{backing} = undef;
generate_format_blockdev doesn't set backing? maybe this should be converted into an assertion?
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
> + mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename, overlay => $new_current_fmt_nodename);
> +}
> +
> +sub blockdev_snap_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $src_path, $target_path) = @_;
I think this whole thing needs more error handling and thought about how to recover from various points failing.. there's also quite some overlap with blockdev_current_rename, I wonder whether it would be possible to simplify the code further by merging the two? but see below, I think we can even get away with dropping this altogether if we switch from block-commit to block-stream..
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #copy the original drive param and change target file
> + my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
> +
> + my $src_fmt_node = find_blockdev_node($nodes, $src_path, 'fmt');
> + my $src_fmt_nodename = $src_fmt_node->{'node-name'};
> + my $src_file_node = find_blockdev_node($nodes, $src_path, 'file');
> + my $src_file_nodename = $src_file_node->{'node-name'};
> +
> + #untaint
> + if ($src_path =~ m/^(\S+)$/) {
> + $src_path = $1;
> + }
> + if ($target_path =~ m/^(\S+)$/) {
> + $target_path = $1;
> + }
shouldn't that have happened in the storage plugin?
> +
> + #create a hardlink
> + link($src_path, $target_path);
should this maybe be done by the storage plugin?
> +
> + #add new format blockdev
> + my $read_only = 1;
> + my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
> + $target_file_blockdev->{filename} = $target_path;
> + my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_fmt_nodename, $target_file_blockdev, $read_only);
> +
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> +
> + #reopen the parent node with different backing file
> + my $parent_fmt_node = find_parent_node($nodes, $src_path);
> + my $parent_fmt_nodename = $parent_fmt_node->{'node-name'};
> + my $parent_path = $parent_fmt_node->{file};
> + my $parent_file_node = find_blockdev_node($nodes, $parent_path, 'file');
> + my $parent_file_nodename = $parent_file_node->{'node-name'};
> + my $filenode_exist = 1;
> + $read_only = $parent_fmt_node->{ro};
> + my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_fmt_nodename, $parent_file_nodename, $read_only);
> + $parent_fmt_blockdev->{backing} = $target_fmt_nodename;
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
> +
> + #change backing-file in qcow2 metadatas
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_nodename, 'backing-file' => $target_path);
> +
> + # fileblockdev seem to be autoremoved, if it have been created online, but not if they are created at start with command line
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_nodename) };
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_nodename) };
> +
> + #delete old $path link
> + unlink($src_path);
and this
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $src_path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $src_path, $target_path],
> + errmsg => "lvrename $src_path to $target_path error",
> + );
> + }
and this as well?
> +}
> +
> +sub blockdev_current_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path, $skip_underlay) = @_;
> + ## rename current running image
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
here we could already incorporate the snapshot name, since we know it?
> +
> + my $file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
> + $file_blockdev->{filename} = $target_path;
> +
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
then we'd know this is always the "current" node, however we deterministically name it?
> + my $format_nodename = $format_node->{'node-name'};
> +
> + my $file_node = find_blockdev_node($nodes, $path, 'file');
same here
> + my $file_nodename = $file_node->{'node-name'};
> +
> + my $backingfile = $format_node->{image}->{'backing-filename'};
> + my $backing_node = $backingfile ? find_blockdev_node($nodes, $backingfile, 'fmt') : undef;
> +
> + #create a hardlink
> + link($path, $target_path);
this
> + #add new file blockdev
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$file_blockdev);
> +
> + #reopen the current fmt nodename with a new file nodename
> + my $reopen_blockdev = generate_format_blockdev($storecfg, $drive, $format_nodename, $target_file_nodename);
> + $reopen_blockdev->{backing} = $backing_node->{'node-name'};
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$reopen_blockdev]);
> +
> + # delete old file blockdev
> + # seem that the old file block is autoremoved after reopen if the file nodename is autogenerad with #block ?
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
> +
> + unlink($path);
> +
and this should be done by the storage layer I think? how does this interact with LVM? would we maybe want to mknod instead of hardlinking the device node? did you try whether a plain rename would also work (not sure - qemu already has an open FD to the file/blockdev, but I am not sure how LVM handles this ;))?
> + #skip_underlay: lvm will be renamed later in Storage::volume_snaphot
> + return if $skip_underlay;
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $path, $target_path],
> + errmsg => "lvrename $path to $target_path error",
> + );
> + }
> +}
> +
> +sub blockdev_commit {
see comments below for qemu_volume_snapshot_delete, I think this..
> + my ($storecfg, $vmid, $deviceid, $drive, $top_path, $base_path) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #untaint
> + if ($top_path =~ m/^(\S+)$/) {
> + $top_path = $1;
> + }
> +
> + print "block-commit top:$top_path to base:$base_path\n";
> + my $job_id = "commit-$deviceid";
> + my $jobs = {};
> +
> + my $base_node = find_blockdev_node($nodes, $base_path, 'fmt');
> + my $top_node = find_blockdev_node($nodes, $top_path, 'fmt');
> +
> + my $options = { 'job-id' => $job_id, device => $deviceid };
> + $options->{'top-node'} = $top_node->{'node-name'};
> + $options->{'base-node'} = $base_node->{'node-name'};
> +
> +
> + mon_cmd($vmid, 'block-commit', %$options);
> + $jobs->{$job_id} = {};
> +
> + qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'commit');
> +
> + #remove fmt-blockdev, file-blockdev && file
> + my $fmt_node = find_blockdev_node($nodes, $top_path, 'fmt');
> + my $fmt_nodename = $fmt_node->{'node-name'};
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_nodename) };
> +
> + my $file_node = find_blockdev_node($nodes, $top_path, 'file');
> + my $file_nodename = $file_node->{'node-name'};
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
> +
> +
> +
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvremove $top_path\n";
> + run_command(
> + ['/sbin/lvremove', '-f', $top_path],
> + errmsg => "lvremove $top_path",
> + );
> + } else {
> + unlink($top_path);
> + }
> +
> +}
> +
> +sub blockdev_live_commit {
and this can be replaced altogether with blockdev_stream..
> + my ($storecfg, $vmid, $deviceid, $drive, $current_path, $snapshot_path) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #untaint
> + if ($current_path =~ m/^(\S+)$/) {
> + $current_path = $1;
> + }
> +
> + print "live block-commit top:$current_path to base:$snapshot_path\n";
> + my $job_id = "commit-$deviceid";
> + my $jobs = {};
> +
> + my $snapshot_node = find_blockdev_node($nodes, $snapshot_path, 'fmt');
> + my $snapshot_file_node = find_blockdev_node($nodes, $current_path, 'file');
> + my $current_node = find_blockdev_node($nodes, $current_path, 'fmt');
> +
> + my $opts = { 'job-id' => $job_id,
> + device => $deviceid,
> + 'base-node' => $snapshot_node->{'node-name'},
> + replaces => $current_node->{'node-name'}
> + };
> + mon_cmd($vmid, "block-commit", %$opts);
> + $jobs->{$job_id} = {};
> +
> + qemu_drive_mirror_monitor ($vmid, undef, $jobs, 'complete', 0, 'commit');
> +
> + eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $current_node->{'node-name'}) };
> +
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvremove $current_path\n";
> + run_command(
> + ['/sbin/lvremove', '-f', $current_path],
> + errmsg => "lvremove $current_path",
> + );
> + } else {
> + unlink($current_path);
> + }
> +
> + return;
> +
> +}
> +
> sub qemu_volume_snapshot_delete {
> - my ($vmid, $storecfg, $volid, $snap) = @_;
> + my ($vmid, $storecfg, $drive, $snap) = @_;
>
> + my $volid = $drive->{file};
> my $running = check_running($vmid);
> my $attached_deviceid;
>
> @@ -4984,13 +5233,51 @@ sub qemu_volume_snapshot_delete {
> });
> }
>
> - if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
> - mon_cmd(
> - $vmid,
> - 'blockdev-snapshot-delete-internal-sync',
> - device => $attached_deviceid,
> - name => $snap,
> - );
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
my + post-if is forbidden, but otherwise, the check for $attached_deviceid could move into the $running condition above.
> + if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> + if ($do_snapshots_with_qemu == 2) {
these ifs could be collapsed as well..
> +
> + my $path = PVE::Storage::path($storecfg, $volid);
> + my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +
> + my $snappath = $snapshots->{$snap}->{file};
> + return if !-e $snappath; #already deleted ?
> +
> + my $parentsnap = $snapshots->{$snap}->{parent};
> + my $childsnap = $snapshots->{$snap}->{child};
> +
> + my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> + my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
> +
> + #if first snapshot
> + if(!$parentsnap) {
> + print"delete first snapshot $childpath\n";
> + if($childpath eq $path) {
> + #if child is the current (last snapshot), we need to a live active-commit
wouldn't it make more sense to use block-stream to merge the contents of the to-be-deleted snapshot into the current overlay? that way we wouldn't need to rename anything, AFAICT..
see https://www.qemu.org/docs/master/interop/live-block-operations.html#brief-overview-of-live-block-qmp-primitives
> + print"commit first snapshot $snappath to current $path\n";
> + blockdev_live_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
> + print" rename $snappath to $path\n";
> + blockdev_current_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $path);
> + } else {
> + print"commit first snapshot $snappath to $childpath path\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
> + print" rename $snappath to $childpath\n";
> + blockdev_snap_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $childpath);
same here, instead of commiting from the child into the to-be-deleted snapshot, and then renaming, why not just block-stream from the to-be-deleted snapshot into the child, and then discard the snapshot that is no longer needed?
> + }
> + } else {
> + #intermediate snapshot, we just need to commit the snapshot
> + print"commit intermediate snapshot $snappath to $parentpath\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $parentpath, 'auto');
commit is the wrong direction though?
if we have A -> B -> C, and B is deleted, the delta previously contained in B should be merged into C, not into A?
so IMHO a simple block-stream + removal of the to-be-deleted snapshot should be the right choice here as well?
that would effectively make all the paths identical AFAICT (stream from to-be-deleted snapshot to child, followed by deletion of the no longer used volume corresponding to the deleted/streamed snapshot) and no longer require any renaming..
> + }
> + } else {
> + mon_cmd(
> + $vmid,
> + 'blockdev-snapshot-delete-internal-sync',
> + device => $attached_deviceid,
> + name => $snap,
> + );
> + }
> } else {
> PVE::Storage::volume_snapshot_delete(
> $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> @@ -8066,6 +8353,8 @@ sub do_snapshots_with_qemu {
> return 1;
> }
>
> + return 2 if $scfg->{snapext} || $scfg->{type} eq 'lvm' && $volid =~ m/\.(qcow2)/;
> +
> if ($volid =~ m/\.(qcow2|qed)$/){
> return 1;
> }
> @@ -9169,6 +9458,38 @@ sub delete_ifaces_ipams_ips {
> }
> }
>
> +sub find_blockdev_node {
like I mentioned in another patch comment, this is already used by earlier patches. but if at all possible, it would be good to avoid the need for this in the first place..
> + my ($nodes, $path, $type) = @_;
> +
> + my $found_nodeid = undef;
> + my $found_node = undef;
> + for my $nodeid (keys %$nodes) {
> + my $node = $nodes->{$nodeid};
> + if ($nodeid =~ m/^$type-(\S+)$/ && $node->{file} eq $path ) {
because $path encoding might change over time/versions..
> + $found_node = $node;
> + last;
> + }
> + }
> + die "can't found nodeid for file $path\n" if !$found_node;
> + return $found_node;
> +}
> +
> +sub find_parent_node {
> + my ($nodes, $backing_path) = @_;
> +
> + my $found_nodeid = undef;
> + my $found_node = undef;
> + for my $nodeid (keys %$nodes) {
> + my $node = $nodes->{$nodeid};
> + if ($nodeid =~ m/^fmt-(\S+)$/ && $node->{backing_file} && $node->{backing_file} eq $backing_path) {
same applies here, but if we switch to block-stream, the only call site for this goes away anyway..
> + $found_node = $node;
> + last;
> + }
> + }
> + die "can't found nodeid for file $backing_path\n" if !$found_node;
> + return $found_node;
> +}
> +
> sub find_fmt_nodename_drive {
> my ($storecfg, $vmid, $drive, $nodes) = @_;
>
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
@ 2025-01-09 11:57 ` Fabian Grünbichler
2025-01-13 8:53 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 11:57 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> We need to define name-nodes for all backing chain images,
> to be able to live rename them with blockdev-reopen
>
> For linked clone, we don't need to definebase image(s) chain.
> They are auto added with #block nodename.
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> PVE/QemuServer.pm | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index dc12b38f..3a3feadf 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1618,6 +1618,38 @@ sub generate_throttle_group {
> return $throttle_group;
> }
>
> +sub generate_backing_blockdev {
> + my ($storecfg, $snapshots, $deviceid, $drive, $id) = @_;
> +
> + my $snapshot = $snapshots->{$id};
> + my $order = $snapshot->{order};
> + my $parentid = $snapshot->{parent};
> + my $snap_fmt_nodename = "fmt-$deviceid-$order";
> + my $snap_file_nodename = "file-$deviceid-$order";
would it make sense to use the snapshot name here instead of the order? that would allow a deterministic mapping even when snapshots are removed..
> +
> + my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap_file_nodename);
> + $snap_file_blockdev->{filename} = $snapshot->{file};
> + my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_fmt_nodename, $snap_file_blockdev, 1);
> + $snap_fmt_blockdev->{backing} = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
> + return $snap_fmt_blockdev;
> +}
> +
> +sub generate_backing_chain_blockdev {
> + my ($storecfg, $deviceid, $drive) = @_;
> +
> + my $volid = $drive->{file};
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid);
> + return if !$do_snapshots_with_qemu || $do_snapshots_with_qemu != 2;
> +
> + my $chain_blockdev = undef;
> + PVE::Storage::activate_volumes($storecfg, [$volid]);
> + #should we use qemu config to list snapshots ?
from a data consistency PoV, trusting the qcow2 metadata is probably safer.. but we could check that the storage and the config agree, and error out otherwise?
> + my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> + my $parentid = $snapshots->{'current'}->{parent};
> + $chain_blockdev = generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive, $parentid) if $parentid;
> + return $chain_blockdev;
> +}
> +
> sub generate_file_blockdev {
> my ($storecfg, $drive, $nodename) = @_;
>
> @@ -1816,6 +1848,8 @@ sub generate_drive_blockdev {
> my $blockdev_file = generate_file_blockdev($storecfg, $drive, $file_nodename);
> my $fmt_nodename = "fmt-drive-$drive_id";
> my $blockdev_format = generate_format_blockdev($storecfg, $drive, $fmt_nodename, $blockdev_file, $force_readonly);
> + my $backing_chain = generate_backing_chain_blockdev($storecfg, "drive-$drive_id", $drive);
> + $blockdev_format->{backing} = $backing_chain if $backing_chain;
>
> my $blockdev_live_restore = undef;
> if ($live_restore_name) {
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
@ 2025-01-09 12:36 ` Fabian Grünbichler
2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
0 siblings, 2 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 12:36 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> src/PVE/Storage/DirPlugin.pm | 1 +
> src/PVE/Storage/Plugin.pm | 207 +++++++++++++++++++++++++++++------
> 2 files changed, 176 insertions(+), 32 deletions(-)
>
> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
> index fb23e0a..1cd7ac3 100644
> --- a/src/PVE/Storage/DirPlugin.pm
> +++ b/src/PVE/Storage/DirPlugin.pm
> @@ -81,6 +81,7 @@ sub options {
> is_mountpoint => { optional => 1 },
> bwlimit => { optional => 1 },
> preallocation => { optional => 1 },
> + snapext => { optional => 1 },
> };
> }
>
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index fececa1..aeba8d3 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -214,6 +214,11 @@ my $defaultData = {
> maximum => 65535,
> optional => 1,
> },
> + 'snapext' => {
> + type => 'boolean',
> + description => 'enable external snapshot.',
> + optional => 1,
> + },
> },
> };
>
> @@ -710,11 +715,15 @@ sub filesystem_path {
> # Note: qcow2/qed has internal snapshot, so path is always
> # the same (with or without snapshot => same file).
> die "can't snapshot this image format\n"
> - if defined($snapname) && $format !~ m/^(qcow2|qed)$/;
> + if defined($snapname) && !$scfg->{snapext} && $format !~ m/^(qcow2|qed)$/;
I am not sure if we want to allow snapshots for non-qcow2 files just because snapext is enabled? I know it's technically possible to have a raw base image and then a qcow2 backing chain on top, but this quickly becomes confusing (how is the volume named then? which format does it have in which context)..
>
> my $dir = $class->get_subdir($scfg, $vtype);
>
> - $dir .= "/$vmid" if $vtype eq 'images';
> + if ($scfg->{snapext} && $snapname) {
> + $name = $class->get_snap_volname($volname, $snapname);
> + } else {
> + $dir .= "/$vmid" if $vtype eq 'images';
> + }
>
> my $path = "$dir/$name";
>
> @@ -953,6 +962,31 @@ sub free_image {
> # TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
> my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
>
> +sub qemu_img_info {
> + my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
> +
> + my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> + push $cmd->@*, '-f', $file_format if $file_format;
> + push $cmd->@*, '--backing-chain' if $follow_backing_files;
> +
> + my $json = '';
> + my $err_output = '';
> + eval {
> + run_command($cmd,
> + timeout => $timeout,
> + outfunc => sub { $json .= shift },
> + errfunc => sub { $err_output .= shift . "\n"},
> + );
> + };
> + warn $@ if $@;
> + if ($err_output) {
> + # if qemu did not output anything to stdout we die with stderr as an error
> + die $err_output if !$json;
> + # otherwise we warn about it and try to parse the json
> + warn $err_output;
> + }
> + return $json;
> +}
> # set $untrusted if the file in question might be malicious since it isn't
> # created by our stack
> # this makes certain checks fatal, and adds extra checks for known problems like
> @@ -1016,25 +1050,9 @@ sub file_size_info {
> warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
> $file_format = 'raw';
> }
> - my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> - push $cmd->@*, '-f', $file_format if $file_format;
>
> - my $json = '';
> - my $err_output = '';
> - eval {
> - run_command($cmd,
> - timeout => $timeout,
> - outfunc => sub { $json .= shift },
> - errfunc => sub { $err_output .= shift . "\n"},
> - );
> - };
> - warn $@ if $@;
> - if ($err_output) {
> - # if qemu did not output anything to stdout we die with stderr as an error
> - die $err_output if !$json;
> - # otherwise we warn about it and try to parse the json
> - warn $err_output;
> - }
> + my $json = qemu_img_info($filename, $file_format, $timeout);
> +
> if (!$json) {
> die "failed to query file information with qemu-img\n" if $untrusted;
> # skip decoding if there was no output, e.g. if there was a timeout.
> @@ -1162,11 +1180,28 @@ sub volume_snapshot {
>
> die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
>
> - my $path = $class->filesystem_path($scfg, $volname);
> + if($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + my $path = $class->path($scfg, $volname, $storeid);
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $format = ($class->parse_volname($volname))[6];
> + #rename current volume to snap volume
> + rename($path, $snappath) if -e $path && !-e $snappath;
I think this should die if the snappath already exists, and the one (IMHO wrong) call in qemu-server should switch to vdisk_alloc/alloc_image.. this is rather dangerous otherwise!
> + my $cmd = ['/usr/bin/qemu-img', 'create', '-b', $snappath,
> + '-F', $format, '-f', 'qcow2', $path];
> +
> + my $options = "extended_l2=on,cluster_size=128k,";
> + $options .= preallocation_cmd_option($scfg, 'qcow2');
> + push @$cmd, '-o', $options;
> + run_command($cmd);
>
> - run_command($cmd);
> + } else {
> +
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1177,6 +1212,21 @@ sub volume_snapshot {
> sub volume_rollback_is_possible {
> my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>
> + if ($scfg->{snapext}) {
> + #technically, we could manage multibranch, we it need lot more work for snapshot delete
> + #we need to implemente block-stream from deleted snapshot to all others child branchs
see my comments in qemu-server - I think we actually want block-stream anyway, since it has the semantics we want..
> + #when online, we need to do a transaction for multiple disk when delete the last snapshot
> + #and need to merge in current running file
> +
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $parentsnap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snappath || $snapshots->{$parentsnap}->{file} eq $snappath;
why do we return 1 here if the snapshot doesn't exist? if we only allow rollback to the most recent snapshot for now, then we could just query the current path and see if it is backed by our snapshot?
> +
> + die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> + }
> +
> return 1;
> }
>
> @@ -1187,9 +1237,15 @@ sub volume_snapshot_rollback {
>
> my $path = $class->filesystem_path($scfg, $volname);
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> -
> - run_command($cmd);
> + if ($scfg->{snapext}) {
> + #simply delete the current snapshot and recreate it
> + my $path = $class->filesystem_path($scfg, $volname);
> + unlink($path);
> + $class->volume_snapshot($scfg, $storeid, $volname, $snap);
> + } else {
> + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
>
> return 1 if $running;
>
> + my $cmd = "";
> my $path = $class->filesystem_path($scfg, $volname);
>
> - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> + if ($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $snappath = $snapshots->{$snap}->{file};
> + return if !-e $snappath; #already deleted ?
shouldn't this be an error?
> +
> + my $parentsnap = $snapshots->{$snap}->{parent};
> + my $childsnap = $snapshots->{$snap}->{child};
> +
> + my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> + my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
> +
> +
> + #if first snapshot, we merge child, and rename the snapshot to child
> + if(!$parentsnap) {
> + #we use commit here, as it's faster than rebase
> + #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
> + print"commit $childpath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
> + run_command($cmd);
> + print"delete $childpath\n";
> +
> + unlink($childpath);
this unlink can be skipped?
> + print"rename $snappath to $childpath\n";
> + rename($snappath, $childpath);
since this will overwrite $childpath anyway.. this also reduces the chance of something going wrong:
- if the commit fails halfway through, nothing bad should have happened, other than some data is now stored in two snapshots and takes up extra space
- if the rename fails, then all of the data of $snap is stored twice, but the backing chain is still valid
notable, there is no longer a gap where $childpath doesn't exist, which would break the backing chain!
> + } else {
> + print"commit $snappath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
leftover from previous version? not used/overwritten below ;)
> + #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;
> + print "link $childsnap to $parentsnap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];
does this work? I would read the qemu-img manpage to say that '-u' is for when you've moved/converted the backing file, and want to update the reference in its overlay, and that it doesn't copy any data.. but we need to copy the data from $snap to $childpath (we just want to delete the snapshot, we don't want to drop all its changes from the history, that would corrupt the contents of the image).
note the description of the "safe" variant:
" This is the default mode and performs a real rebase operation. The new backing file may differ from the old one and qemu-img rebase will take care of keeping the
guest-visible content of FILENAME unchanged."
IMHO this is the behaviour we need here?
> + run_command($cmd);
> + #delete the snapshot
> + unlink($snappath);
> + }
> +
> + } else {
> + $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
>
> - run_command($cmd);
> + $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1246,8 +1341,8 @@ sub volume_has_feature {
> current => { qcow2 => 1, raw => 1, vmdk => 1 },
> },
> rename => {
> - current => {qcow2 => 1, raw => 1, vmdk => 1},
> - },
> + current => { qcow2 => 1, raw => 1, vmdk => 1},
> + }
nit: unrelated change?
> };
>
> if ($feature eq 'clone') {
> @@ -1481,7 +1576,37 @@ sub status {
> sub volume_snapshot_info {
> my ($class, $scfg, $storeid, $volname) = @_;
>
> - die "volume_snapshot_info is not implemented for $class";
should this be guarded with $snapext being enabled?
> + my $path = $class->filesystem_path($scfg, $volname);
> +
> + my $backing_chain = 1;
> + my $json = qemu_img_info($path, undef, 10, $backing_chain);
> + die "failed to query file information with qemu-img\n" if !$json;
> + my $snapshots = eval { decode_json($json) };
> +
> + my $info = {};
> + my $order = 0;
> + for my $snap (@$snapshots) {
> +
> + my $snapfile = $snap->{filename};
> + my $snapname = parse_snapname($snapfile);
> + $snapname = 'current' if !$snapname;
> + my $snapvolname = $class->get_snap_volname($volname, $snapname);
> +
> + $info->{$snapname}->{order} = $order;
> + $info->{$snapname}->{file}= $snapfile;
> + $info->{$snapname}->{volname} = $snapvolname;
> + $info->{$snapname}->{volid} = "$storeid:$snapvolname";
> + $info->{$snapname}->{ext} = 1;
> +
> + my $parentfile = $snap->{'backing-filename'};
> + if ($parentfile) {
> + my $parentname = parse_snapname($parentfile);
> + $info->{$snapname}->{parent} = $parentname;
> + $info->{$parentname}->{child} = $snapname;
> + }
> + $order++;
> + }
> + return $info;
> }
>
> sub activate_storage {
> @@ -1867,4 +1992,22 @@ sub config_aware_base_mkdir {
> }
> }
>
> +sub get_snap_volname {
> + my ($class, $volname, $snapname) = @_;
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> + $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";
> + return $name;
> +}
> +
> +sub parse_snapname {
> + my ($name) = @_;
> +
> + my $basename = basename($name);
> + if ($basename =~ m/^snap-(.*)-vm(.*)$/) {
> + return $1;
> + }
> + return undef;
> +}
> +
> 1;
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2025-01-09 11:57 ` Fabian Grünbichler
@ 2025-01-09 13:19 ` Fabio Fantoni via pve-devel
2025-01-20 13:44 ` DERUMIER, Alexandre via pve-devel
[not found] ` <3307ec388a763510ec78f97ed9f0de00c87d54b5.camel@groupe-cyllene.com>
2025-01-13 10:08 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
2 siblings, 2 replies; 68+ messages in thread
From: Fabio Fantoni via pve-devel @ 2025-01-09 13:19 UTC (permalink / raw)
To: Proxmox VE development discussion, Fabian Grünbichler; +Cc: Fabio Fantoni
[-- Attachment #1: Type: message/rfc822, Size: 8210 bytes --]
From: Fabio Fantoni <fabio.fantoni@m2r.biz>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Thu, 9 Jan 2025 14:19:38 +0100
Message-ID: <483058af-44d9-441c-98df-fd7150184ebe@m2r.biz>
Il 09/01/2025 12:57, Fabian Grünbichler ha scritto:
>> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> it would be great if there'd be a summary of the design choices and a high level summary of what happens to the files and block-node-graph here. it's a bit hard to judge from the code below whether it would be possible to eliminate the dynamically named block nodes, for example ;)
>
> a few more comments documenting the behaviour and ideally also some tests (mocking the QMP interactions?) would be nice
@Alexandre Derumier: Thanks for add external snapshot support, I have
not looked at the implementation in detail because I do not have enough
time but I think external snapshot support would be useful.
I used it outside of proxmox years ago, on Debian servers with VMs
managed with libvirt, I managed external snapshots completely manually
from cli with multiple commands because they were not implemented in
virtmanager and they were useful to save a lot of time (instead of
backup/restore) in some high-risk operations on VMs with large disks,
raw pre-allocated on hdd disks.
I used them very little and kept them only the minimum time necessary
for delicate maintenance operations, if there were unforeseen events it
returned to the situation before the snapshot, I deleted the external
snapshot and created another one to try again, if instead everything was
ok in the end I did the commit, and went back to using only the
pre-allocated raw image. With high disk usage as in the operations I was
doing the performance decrease with external qcow2 snapshots compared to
just pre-allocated raw disks was huge if I remember correctly (which is
why I used them for the minimum amount of time possible).
If it hasn't already been planned I think it could be useful to warn
users (atleast in documentation) to avoid them underestimating their
possible impact on performance (especially if they basically have
pre-allocated raw on hdd disks for greater performance and minimal
defragmentation) and avoid use or keep them for a long time without real
need. Another important thing to notify users is the increase in space
usage (again mainly for those who are used to pre-allocated disks where
they usually don't risk increases in space).
In this implementation I don't see the possibility of using them on raw
disks (on files) from a fast look, or am I wrong? If so, why? I think
the main use would be in cases like that where you don't have snapshot
support by default
--
Questa email è stata esaminata alla ricerca di virus dal software antivirus Avast.
www.avast.com
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
@ 2025-01-09 13:55 ` Fabian Grünbichler
2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 13:55 UTC (permalink / raw)
To: Proxmox VE development discussion
one downside with this part in particular - we have to always allocate full-size LVs (+qcow2 overhead), even if most of them will end up storing just a single snapshot delta which might be a tiny part of that full-size.. hopefully if discard is working across the whole stack this doesn't actually explode space usage on the storage side, but it makes everything a bit hard to track.. OTOH, while we could in theory extend/reduce the LVs and qcow2 images on them when modifying the backing chain, the additional complexity is probably not worth it at the moment..
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
> src/PVE/Storage/LVMPlugin.pm | 231 ++++++++++++++++++++++++++++++++---
> 1 file changed, 213 insertions(+), 18 deletions(-)
>
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 88fd612..1257cd3 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -4,6 +4,7 @@ use strict;
> use warnings;
>
> use IO::File;
> +use POSIX qw/ceil/;
>
> use PVE::Tools qw(run_command trim);
> use PVE::Storage::Plugin;
> @@ -216,6 +217,7 @@ sub type {
> sub plugindata {
> return {
> content => [ {images => 1, rootdir => 1}, { images => 1 }],
> + format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
I wonder if we want to guard the snapshotting-related parts below with an additional "snapext" option here as well? or even the usage of qcow2 altogether?
> };
> }
>
> @@ -291,7 +293,10 @@ sub parse_volname {
> PVE::Storage::Plugin::parse_lvm_name($volname);
>
> if ($volname =~ m/^(vm-(\d+)-\S+)$/) {
> - return ('images', $1, $2, undef, undef, undef, 'raw');
> + my $name = $1;
> + my $vmid = $2;
> + my $format = $volname =~ m/\.qcow2$/ ? 'qcow2' : 'raw';
> + return ('images', $name, $vmid, undef, undef, undef, $format);
> }
>
> die "unable to parse lvm volume name '$volname'\n";
> @@ -300,11 +305,13 @@ sub parse_volname {
> sub filesystem_path {
> my ($class, $scfg, $volname, $snapname) = @_;
>
> - die "lvm snapshot is not implemented"if defined($snapname);
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
>
> - my ($vtype, $name, $vmid) = $class->parse_volname($volname);
> + die "snapshot is working with qcow2 format only" if defined($snapname) && $format ne 'qcow2';
>
> my $vg = $scfg->{vgname};
> + $name = $class->get_snap_volname($volname, $snapname) if $snapname;
>
> my $path = "/dev/$vg/$name";
>
> @@ -332,7 +339,9 @@ sub find_free_diskname {
>
> my $disk_list = [ keys %{$lvs->{$vg}} ];
>
> - return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, undef, $scfg);
> + $add_fmt_suffix = $fmt eq 'qcow2' ? 1 : undef;
> +
> + return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix);
> }
>
> sub lvcreate {
> @@ -363,7 +372,15 @@ sub lvrename {
> sub alloc_image {
> my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
>
> - die "unsupported format '$fmt'" if $fmt ne 'raw';
> + die "unsupported format '$fmt'" if $fmt !~ m/(raw|qcow2)/;
> +
> + $name = $class->alloc_new_image($storeid, $scfg, $vmid, $fmt, $name, $size);
> + $class->format_qcow2($storeid, $scfg, $name, $size) if $fmt eq 'qcow2';
> + return $name;
> +}
> +
> +sub alloc_new_image {
> + my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
>
> die "illegal name '$name' - should be 'vm-$vmid-*'\n"
> if $name && $name !~ m/^vm-$vmid-/;
> @@ -376,16 +393,45 @@ sub alloc_image {
>
> my $free = int($vgs->{$vg}->{free});
>
> +
> + #add extra space for qcow2 metadatas
> + #without sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size
> + #with sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size / 16
> + #4MB overhead for 1TB with extented l2 clustersize=128k
> +
> + my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
there's "qemu-img measure", which seems like it would do exactly what we want ;)
> +
> + my $lvmsize = $size;
> + $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
> +
> die "not enough free space ($free < $size)\n" if $free < $size;
>
> - $name = $class->find_free_diskname($storeid, $scfg, $vmid)
> + $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
> if !$name;
>
> - lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
> -
> + my $tags = ["pve-vm-$vmid"];
> + push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
that's a creative way to avoid the need to discover and activate snapshots one by one below, but it might warrant a comment ;)
> + lvcreate($vg, $name, $lvmsize, $tags);
> return $name;
> }
>
> +sub format_qcow2 {
> + my ($class, $storeid, $scfg, $name, $size, $backing_file) = @_;
> +
> + # activate volume
> + $class->activate_volume($storeid, $scfg, $name, undef, {});
> + my $path = $class->path($scfg, $name, $storeid);
> + # create the qcow2 fs
> + my $cmd = ['/usr/bin/qemu-img', 'create'];
> + push @$cmd, '-b', $backing_file, '-F', 'qcow2' if $backing_file;
> + push @$cmd, '-f', 'qcow2', $path;
> + push @$cmd, "${size}K" if $size;
> + my $options = "extended_l2=on,";
> + $options .= PVE::Storage::Plugin::preallocation_cmd_option($scfg, 'qcow2');
> + push @$cmd, '-o', $options;
> + run_command($cmd);
> +}
> +
> sub free_image {
> my ($class, $storeid, $scfg, $volname, $isBase) = @_;
>
> @@ -536,6 +582,12 @@ sub activate_volume {
>
> my $lvm_activate_mode = 'ey';
>
> + #activate volume && all snapshots volumes by tag
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + $path = "\@pve-$name" if $format eq 'qcow2';
> +
> my $cmd = ['/sbin/lvchange', "-a$lvm_activate_mode", $path];
> run_command($cmd, errmsg => "can't activate LV '$path'");
> $cmd = ['/sbin/lvchange', '--refresh', $path];
> @@ -548,6 +600,10 @@ sub deactivate_volume {
> my $path = $class->path($scfg, $volname, $storeid, $snapname);
> return if ! -b $path;
>
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> + $path = "\@pve-$name" if $format eq 'qcow2';
> +
> my $cmd = ['/sbin/lvchange', '-aln', $path];
> run_command($cmd, errmsg => "can't deactivate LV '$path'");
> }
> @@ -555,15 +611,27 @@ sub deactivate_volume {
> sub volume_resize {
> my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
>
> - $size = ($size/1024/1024) . "M";
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + my $lvmsize = $size / 1024;
> + my $qcow2_overhead = ceil($size/1024/1024/1024/1024) * 4096;
see above
> + $lvmsize += $qcow2_overhead if $format eq 'qcow2';
> + $lvmsize = "${lvmsize}k";
>
> my $path = $class->path($scfg, $volname);
> - my $cmd = ['/sbin/lvextend', '-L', $size, $path];
> + my $cmd = ['/sbin/lvextend', '-L', $lvmsize, $path];
>
> $class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
> run_command($cmd, errmsg => "error resizing volume '$path'");
> });
>
> + if(!$running && $format eq 'qcow2') {
> + my $prealloc_opt = PVE::Storage::Plugin::preallocation_cmd_option($scfg, $format);
> + my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', $format, $path , $size];
> + run_command($cmd, timeout => 10);
> + }
> +
> return 1;
> }
>
> @@ -585,30 +653,149 @@ sub volume_size_info {
> sub volume_snapshot {
> my ($class, $scfg, $storeid, $volname, $snap) = @_;
>
> - die "lvm snapshot is not implemented";
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + die "can't snapshot this image format\n" if $format ne 'qcow2';
> +
> + $class->activate_volume($storeid, $scfg, $volname, undef, {});
> +
> + my $snap_volname = $class->get_snap_volname($volname, $snap);
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> +
> + #rename current lvm volume to snap volume
> + my $vg = $scfg->{vgname};
> + print"rename $volname to $snap_volname\n";
> + eval { lvrename($vg, $volname, $snap_volname) } ;
missing error handling..
> +
> +
> + #allocate a new lvm volume
> + $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
missing error handling
> + eval {
> + $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
> + };
> +
> + if ($@) {
> + eval { $class->free_image($storeid, $scfg, $volname, 0) };
> + warn $@ if $@;
> + }
> +}
> +
> +sub volume_rollback_is_possible {
> + my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
> +
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $parent_snap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snap_path || $snapshots->{$parent_snap}->{file} eq $snap_path;
the first condition here seems wrong, see storage patch #1
> + die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> +
> + return 1;
> }
>
> +
> sub volume_snapshot_rollback {
> my ($class, $scfg, $storeid, $volname, $snap) = @_;
>
> - die "lvm snapshot rollback is not implemented";
> + die "can't rollback snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
above we only have qcow2, which IMHO makes more sense..
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + $class->activate_volume($storeid, $scfg, $volname, undef, {});
> + my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + #simply delete the current snapshot and recreate it
> + $class->free_image($storeid, $scfg, $volname, 0);
> + $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024);
> + $class->format_qcow2($storeid, $scfg, $volname, undef, $snap_path);
missing error handling..
> +
> + return undef;
> }
>
> sub volume_snapshot_delete {
> - my ($class, $scfg, $storeid, $volname, $snap) = @_;
> + my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
> +
> + die "can't delete snapshot for this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
same as above
> +
> + return 1 if $running;
> +
> + my $cmd = "";
> + my $path = $class->filesystem_path($scfg, $volname);
> +
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $snap_path = $snapshots->{$snap}->{file};
> + my $snap_volname = $snapshots->{$snap}->{volname};
> + return if !-e $snap_path; #already deleted ?
should maybe be a die?
> +
> + my $parent_snap = $snapshots->{$snap}->{parent};
> + my $child_snap = $snapshots->{$snap}->{child};
> +
> + my $parent_path = $snapshots->{$parent_snap}->{file} if $parent_snap;
> + my $child_path = $snapshots->{$child_snap}->{file} if $child_snap;
> + my $child_volname = $snapshots->{$child_snap}->{volname} if $child_snap;
> +
> +
> + #if first snapshot, we merge child, and rename the snapshot to child
> + if(!$parent_snap) {
> + #we use commit here, as it's faster than rebase
> + #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
> + print"commit $child_path\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $child_path];
> + run_command($cmd);
> + print"delete $child_volname\n";
> + $class->free_image($storeid, $scfg, $child_volname, 0);
> +
> + print"rename $snap_volname to $child_volname\n";
> + my $vg = $scfg->{vgname};
> + lvrename($vg, $snap_volname, $child_volname);
missing error handling..
> + } else {
> + print"commit $snap_path\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snap_path];
leftover?
> + #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child $child_path\n" if !$parent_path;
> + print "link $child_snap to $parent_snap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
> + run_command($cmd);
same as for patch #1, I am not sure the -u here is correct..
> + #delete the snapshot
> + $class->free_image($storeid, $scfg, $snap_volname, 0);
> + }
>
> - die "lvm snapshot delete is not implemented";
> }
>
> sub volume_has_feature {
> my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
>
> my $features = {
> - copy => { base => 1, current => 1},
> - rename => {current => 1},
> + copy => {
> + base => { qcow2 => 1, raw => 1},
> + current => { qcow2 => 1, raw => 1},
> + snap => { qcow2 => 1 },
> + },
> + 'rename' => {
> + current => { qcow2 => 1, raw => 1},
> + },
> + snapshot => {
> + current => { qcow2 => 1 },
> + snap => { qcow2 => 1 },
> + },
> + template => {
> + current => { qcow2 => 1, raw => 1},
> + },
> +# don't allow to clone as we can't activate the base on multiple host at the same time
> +# clone => {
> +# base => { qcow2 => 1, raw => 1},
> +# },
I think activating the base would actually be okay, we just must never write to it? ;)
> };
>
> - my ($vtype, $name, $vmid, $basename, $basevmid, $isBase) =
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> $class->parse_volname($volname);
>
> my $key = undef;
> @@ -617,7 +804,7 @@ sub volume_has_feature {
> }else{
> $key = $isBase ? 'base' : 'current';
> }
> - return 1 if $features->{$feature}->{$key};
> + return 1 if defined($features->{$feature}->{$key}->{$format});
>
> return undef;
> }
> @@ -738,4 +925,12 @@ sub rename_volume {
> return "${storeid}:${target_volname}";
> }
>
> +sub get_snap_volname {
> + my ($class, $volname, $snapname) = @_;
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> + $name = !$snapname || $snapname eq 'current' ? $volname : "snap-$snapname-$name";
> + return $name;
> +}
> +
> 1;
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
2025-01-08 13:27 ` Fabian Grünbichler
@ 2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 7:55 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 13639 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
Date: Fri, 10 Jan 2025 07:55:31 +0000
Message-ID: <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>, Fiona
Ebner <f.ebner@proxmox.com>
Objet: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-
replaces option patch
Date: 08/01/2025 14:27:02
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
> This is needed for external snapshot live commit,
> when the top blocknode is not the fmt-node.
> (in our case, the throttle-group node is the topnode)
>>so this is needed to workaround a limitation in block-commit? I think
>>if we need this it should probably be submitted upstream for
>>inclusion, or we provide our own copy of block-commit with it in the
>>meantime?
Yes, it could be submitted upstream (after a little bit of review, I'm
not too good in C;)).
It's more a missing option in the qmp syntax, as it's already using
blockdev-mirror code in background.
(redhat don't used throttle group feature until recently, so I think
they never had seen this problem with block-commit, as their top root
node was the disk directly, and not the throttle group)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2025-01-09 12:36 ` Fabian Grünbichler
@ 2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 9:10 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 24019 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Fri, 10 Jan 2025 09:10:54 +0000
Message-ID: <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
> @@ -710,11 +715,15 @@ sub filesystem_path {
> # Note: qcow2/qed has internal snapshot, so path is always
> # the same (with or without snapshot => same file).
> die "can't snapshot this image format\n"
> - if defined($snapname) && $format !~ m/^(qcow2|qed)$/;
> ²
>>I am not sure if we want to allow snapshots for non-qcow2 files just
>>because snapext is enabled? I know it's technically possible to have
>>a raw base image and then a qcow2 backing chain on top, but this
>>quickly becomes confusing (how is the volume named then? which format
>>does it have in which context)..
in the V2 I was allowing it, but for this V3 series, I only manage
external snasphot with qcow2 files. (with the snapshot file renaming,
It'll be too complex to manage, confusing for user indeed... )
I think I forgot to clean this in the V3, the check should be simply
die "can't snapshot this image format\n" if defined($snapname) &&
$format !~ m/^(qcow2|qed)$/;
>
> die "can't snapshot this image format\n" if $volname !~
> m/\.(qcow2|qed)$/;
>
> - my $path = $class->filesystem_path($scfg, $volname);
> + if($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + my $path = $class->path($scfg, $volname, $storeid);
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $format = ($class->parse_volname($volname))[6];
> + #rename current volume to snap volume
> + rename($path, $snappath) if -e $path && !-e $snappath;
>>I think this should die if the snappath already exists, and the one
>>(IMHO wrong) call in qemu-server should switch to
>>vdisk_alloc/alloc_image.. this is rather dangerous otherwise!
right !
> + if ($scfg->{snapext}) {
> + #technically, we could manage multibranch, we it need lot more work
> for snapshot delete
> + #we need to implemente block-stream from deleted snapshot to all
> others child branchs
>>see my comments in qemu-server - I think we actually want block-
>>stream anyway, since it has the semantics we want..
I don't agree, we don't want always, because with block-stream, you
need to copy parent to child.
for example, you have a 1TB image, you take a snapshot, writing 5MB in
the snapshot, delete the snapshot, you'll need to read/copy 1TB data
from parent to the snapshot file.
I don't read your qemu-server comment yet ;)
> + #when online, we need to do a transaction for multiple disk when
> delete the last snapshot
> + #and need to merge in current running file
> +
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $parentsnap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snappath || $snapshots->{$parentsnap}->{file} eq
> $snappath;
>>why do we return 1 here if the snapshot doesn't exist? if we only
>>allow rollback to the most recent snapshot for now, then we could
>>just query the current path and see if it is backed by our snapshot?
I think I forget to remove this this from the V2. But the idea is to
check indead if the snapshot back the current image ( with $snapshots-
>{current}->{parent}.
> +
> + die "can't rollback, '$snap' is not most recent snapshot on
> '$volname'\n";
> + }
> +
> return 1;
> }
>
> @@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
>
> return 1 if $running;
>
> + my $cmd = "";
> my $path = $class->filesystem_path($scfg, $volname);
>
> - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> + if ($scfg->{snapext}) {
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $snappath = $snapshots->{$snap}->{file};
> + return if !-e $snappath; #already deleted ?
>>shouldn't this be an error?
This one was if we want to do retry in case of error, if we have
multiple disks. (for example, first snapshot delete api call, the
first disk remove the snapshot, but a bug occur and second disk don't
remove the snapshot).
User could want to unlock the vm-snaphot lock and and fix it manually
with calling again the snapshot delete.
I'm not sure how to handle this correctly ?
> + print"commit $childpath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
> + run_command($cmd);
> + print"delete $childpath\n";
> +
> + unlink($childpath);
this unlink can be skipped?
> + print"rename $snappath to $childpath\n";
> + rename($snappath, $childpath);
>>since this will overwrite $childpath anyway.. this also reduces the
>>chance of something going wrong:
>>
>>- if the commit fails halfway through, nothing bad should have
>>happened, other than some data is now stored in two snapshots and
>>takes up extra space
>>- if the rename fails, then all of the data of $snap is stored twice,
>>but the backing chain is still valid
>>
>>notable, there is no longer a gap where $childpath doesn't exist,
>>which would break the backing chain!
yes you are right, better to have it atomic indeed
> + } else {
> + print"commit $snappath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
>>leftover from previous version? not used/overwritten below ;)
no, this is really to commit the the snapshot to parent
> + #if we delete an intermediate snapshot, we need to link upper
> snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child $childpath\n"
> if !$parentpath;
> + print "link $childsnap to $parentsnap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath,
> '-F', 'qcow2', '-f', 'qcow2', $childpath];
>>does this work? I would read the qemu-img manpage to say that '-u' is
>>for when you've moved/converted the backing file, and want to update
>>the reference in its overlay, and that it doesn't copy any data.. but
>>we need to copy the data from $snap to $childpath (we just want to
>>delete the snapshot, we don't want to drop all its changes from the
>>history, that would corrupt the contents of the image).
>>note the description of the "safe" variant:
>>
>>" This is the default mode and performs a real
>>rebase operation. The new backing file may differ from the old one
>>and qemu-img rebase will take care of keeping the
>> guest-visible content of FILENAME unchanged."
>>
>>IMHO this is the behaviour we need here?
This is only to change the backing chain ref in the qcow2 snapshot.
(this is the only way to do it, they was a qemu-img ammend command in
past, but it has been removed in
2020 https://patchwork.kernel.org/project/qemu-devel/patch/20200403175859.863248-5-eblake@redhat.com/,
so the rebase is the good way to do it)
The merge is done by the previous qemu-img commit. (qemu-img commit
can't change change automatically the backing chain of the upper
snapshot, because it don't have any idea than an upper snapshot could
exist).
this is for this usecase :
A<----B<----C.
you commit B to A, then you need to change the backing file of C to A
(instead B)
A<----C
(when done it live, qemu qmp block-commit is able to change
automatically the backing chain of the upper snapshot, because qemu
known the whole chain)
This is how libvirt is doing too
https://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html
see "Deleting snapshots (and 'offline commit')"
Method (1): base <- sn1 <- sn3 (by copying sn2 into sn1)
Method (2): base <- sn1 <- sn3 (by copying sn2 into sn3)
(This is commit vs stream)
I think that we should look at used space of parent vs child,
to choose the correct direction/method.
> + run_command($cmd);
> + #delete the snapshot
> + unlink($snappath);
> + }
> +
> + } else {
> + $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
>
> - run_command($cmd);
> + $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1246,8 +1341,8 @@ sub volume_has_feature {
> current => { qcow2 => 1, raw => 1, vmdk => 1 },
> },
> rename => {
> - current => {qcow2 => 1, raw => 1, vmdk => 1},
> - },
> + current => { qcow2 => 1, raw => 1, vmdk => 1},
> + }
>>nit: unrelated change?
yep
> };
>
> if ($feature eq 'clone') {
> @@ -1481,7 +1576,37 @@ sub status {
> sub volume_snapshot_info {
> my ($class, $scfg, $storeid, $volname) = @_;
>
> - die "volume_snapshot_info is not implemented for $class";
>>should this be guarded with $snapext being enabled?
yes indeed
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
@ 2025-01-10 9:15 ` Fiona Ebner
2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
0 siblings, 2 replies; 68+ messages in thread
From: Fiona Ebner @ 2025-01-10 9:15 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler
Am 10.01.25 um 08:55 schrieb DERUMIER, Alexandre:
> -------- Message initial --------
> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>, Fiona
> Ebner <f.ebner@proxmox.com>
> Objet: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-
> replaces option patch
> Date: 08/01/2025 14:27:02
>
>
>> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
>> 16.12.2024 10:12 CET geschrieben:
>
>> This is needed for external snapshot live commit,
>> when the top blocknode is not the fmt-node.
>> (in our case, the throttle-group node is the topnode)
>
>>> so this is needed to workaround a limitation in block-commit? I think
>>> if we need this it should probably be submitted upstream for
>>> inclusion, or we provide our own copy of block-commit with it in the
>>> meantime?
> Yes, it could be submitted upstream (after a little bit of review, I'm
> not too good in C;)).
>
> It's more a missing option in the qmp syntax, as it's already using
> blockdev-mirror code in background.
>
> (redhat don't used throttle group feature until recently, so I think
> they never had seen this problem with block-commit, as their top root
> node was the disk directly, and not the throttle group)
Maybe it could even be a bug then? In many situations, the filter nodes
on top (like throttle groups) are ignored/skipped to get to the actually
interesting block node for certain block operations. Are there any
situations where you wouldn't want to do that in the block-commit case?
There is a dedicated bdrv_skip_filters() function, e.g. used in
stream_prepare(). Would be good to hear what upstream thinks.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
2025-01-10 9:15 ` Fiona Ebner
@ 2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 9:32 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 13466 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
Date: Fri, 10 Jan 2025 09:32:15 +0000
Message-ID: <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
>>Maybe it could even be a bug then?
Yes, it's a bug. I just think that libvirt currently only implement
block-commit with disk blockdev on topnode.
throttle group are not currently implement in libvirt (but I have seen
some commit to add support recently), they still used the old throttle
method.
>>In many situations, the filter >>nodes
>>on top (like throttle groups) are ignored/skipped to get to the
>>actually
>>interesting block node for certain block operations.
yes, and this option exist in the qmp blockdev-mirror. (and block-
commit is reusing blockdev-mirror code behind)
>>Are there any situations where you wouldn't want to do that in the
>>block-commit case?
mmm, I think it should always be rettach to disk (format blocknode or
file blocknode if no formatnode exist). I really don't known how to
code this, I have just reused the blockdev-mirror way.
Feel free to cleanup this patch and submit it to qemu devs, you are a
better C developper than me ^_^
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
2025-01-09 13:55 ` Fabian Grünbichler
@ 2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 10:16 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 21329 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
Date: Fri, 10 Jan 2025 10:16:44 +0000
Message-ID: <86852ee45321ae5fad3ab9ae0c6cc23bed203de8.camel@groupe-cyllene.com>
>>one downside with this part in particular - we have to always
>>allocate full-size LVs (+qcow2 overhead), even if most of them will
>>end up storing just a single snapshot delta which might be a tiny
>>part of that full-size.. hopefully if discard is working across the
>>whole stack this doesn't actually explode space usage on the storage
>>side, but it makes everything a bit hard to track.. OTOH, while we
>>could in theory extend/reduce the LVs and qcow2 images on them when
>>modifying the backing chain, the additional complexity is probably
>>not worth it at the moment..
see this RFC with dynamic extend. (not shrink/not discard)
https://lore.proxmox.com/pve-devel/mailman.475.1725007456.302.pve-devel@lists.proxmox.com/t/
(I think that the tricky part (as Dominic have fast review), is to
handle resize cluster lock correctly, and handle timeout/retry with a
queue through a specific daemon)
But technically, this is how ovirt is Managed it (and it's works in
production, I have customers using it since multiple years)
>
> sub plugindata {
> return {
> content => [ {images => 1, rootdir => 1}, { images => 1 }],
> + format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
>>I wonder if we want to guard the snapshotting-related parts below
>>with an additional "snapext" option here as well?
I really don't known, it's not possible to do snapshots with .raw
anyway.
on the gui side, it could allow to enable/display the format field for
example if snapext is defined in the storage.
>>or even the usage >>of qcow2 altogether?
I think we should keep to possiblity to choose .raw vs .qcow2 on same
storage, because
maybe a user really need max performance for a specific vm without the
need of snapshot.
>
> +
> + #add extra space for qcow2 metadatas
> + #without sub-allocated clusters : For 1TB storage : l2_size =
> disk_size × 8 / cluster_size
> + #with sub-allocated clusters : For 1TB storage : l2_size =
> disk_size × 8 / cluster_size / 16
> + #4MB overhead for 1TB with
> extented l2 clustersize=128k
> +
> + my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
>>there's "qemu-img measure", which seems like it would do exactly what
>>we want ;)
"Calculate the file size required for a new image. This information can
be used to size logical volumes or SAN LUNs appropriately for the image
that will be placed in them."
indeed, lol. I knowned the command, but I thinked it was to measure
the content of an existing file. I'll do tests to see if I got same
results (and if sub-allocated clusters is correctly handled)
> +
> + my $lvmsize = $size;
> + $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
> +
> die "not enough free space ($free < $size)\n" if $free < $size;
>
> - $name = $class->find_free_diskname($storeid, $scfg, $vmid)
> + $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
> if !$name;
>
> - lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
> -
> + my $tags = ["pve-vm-$vmid"];
> + push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
>>that's a creative way to avoid the need to discover and activate
>>snapshots one by one below, but it might warrant a comment ;)
ah sorry (but yes,this was the idea to active/desactivate the whole
chain in 1call)
> >>
> +
> + #rename current lvm volume to snap volume
> + my $vg² = $scfg->{vgname};
> + print"rename $volname to $snap_volname\n";
> + eval { lvrename($vg, $volname, $snap_volname) } ;
>> missing error handling..
> +
> +
> + #allocate a new lvm volume
> + $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2',
> $volname, $size/1024);
>>missing error handling
ah ,sorry, it should include in the following eval
> + eval {
> + $class->format_qcow2($storeid, $scfg, $volname, undef,
> $snap_path);
> + };
> +
> + if ($@) {
> + eval { $class->free_image($storeid, $scfg, $volname, 0) };
> + warn $@ if $@;
> + }
> +}
> +
> +sub volume_rollback_is_possible {
> + my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
> +
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $parent_snap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snap_path || $snapshots->{$parent_snap}->{file}
> eq $snap_path;
>>the first condition here seems wrong, see storage patch #1
yes
> + die "can't rollback, '$snap' is not most recent snapshot on
> '$volname'\n";
> +
> + return 1;
> }
>
> +
> sub volume_snapshot_rollback {
> my ($class, $scfg, $storeid, $volname, $snap) = @_;
>
> - die "lvm snapshot rollback is not implemented";
> + die "can't rollback snapshot this image format\n" if $volname !~
> m/\.(qcow2|qed)$/;
>>above we only have qcow2, which IMHO makes more sense..
We could remove the .qed everywhere, IT's deprecated since 2017 and we
never have exposed it in the gui.
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $snap_path = $snapshots->{$snap}->{file};
> + my $snap_volname = $snapshots->{$snap}->{volname};
> + return if !-e $snap_path; #already deleted ?
>>should maybe be a die?
same than patch #1 comment. this was for snapdel retry with multiple
disks.
> + } else {
> + print"commit $snap_path\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snap_path];
>> leftover?
still no ;) see my patch#1 reply
> + #if we delete an intermediate snapshot, we need to link
> upper snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child
> $child_path\n" if !$parent_path;
> + print "link $child_snap to $parent_snap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b',
> $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
> + run_command($cmd);
>>same as for patch #1, I am not sure the -u here is correct..
This is correct, see my patch#1 reply
>
> +# don't allow to clone as we can't activate the base on multiple
> host at the same time
> +# clone => {
> +# base => { qcow2 => 1, raw => 1},
> +# },
>>I think activating the base would actually be okay, we just must
>>never write to it? ;)
Ah, this is a good remark. I thinked we couldn't activate an LV on
multiple node at the same time. I'll look at this, this add possibility
of linked clone. (I need to check the external snapshot code with
backing chains first)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
@ 2025-01-10 11:02 ` Fabian Grünbichler
2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
0 siblings, 2 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-10 11:02 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 10.01.2025 10:10 CET geschrieben:
> > + if ($scfg->{snapext}) {
> > + #technically, we could manage multibranch, we it need lot more work
> > for snapshot delete
> > + #we need to implemente block-stream from deleted snapshot to all
> > others child branchs
>
> >>see my comments in qemu-server - I think we actually want block-
> >>stream anyway, since it has the semantics we want..
>
> I don't agree, we don't want always, because with block-stream, you
> need to copy parent to child.
>
> for example, you have a 1TB image, you take a snapshot, writing 5MB in
> the snapshot, delete the snapshot, you'll need to read/copy 1TB data
> from parent to the snapshot file.
> I don't read your qemu-server comment yet ;)
yes, for the "first" snapshot that is true (since that one is basically the baseline data, which will often be huge compared to the snapshot delta). but streaming (rebasing) saves us the rename, which makes the error handling a lot easier/less risky. maybe we could special case the first snapshot as a performance optimization? ;)
> > @@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
> >
> > return 1 if $running;
> >
> > + my $cmd = "";
> > my $path = $class->filesystem_path($scfg, $volname);
> >
> > - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> > + if ($scfg->{snapext}) {
> >
> > - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> > + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> > $volname);
> > + my $snappath = $snapshots->{$snap}->{file};
> > + return if !-e $snappath; #already deleted ?
>
> >>shouldn't this be an error?
>
> This one was if we want to do retry in case of error, if we have
> multiple disks. (for example, first snapshot delete api call, the
> first disk remove the snapshot, but a bug occur and second disk don't
> remove the snapshot).
>
> User could want to unlock the vm-snaphot lock and and fix it manually
> with calling again the snapshot delete.
>
> I'm not sure how to handle this correctly ?
I think the force parameter for snapshot deletion covers this already, and it should be fine for this to die..
>
> > + print"commit $childpath\n";
> > + $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
> > + run_command($cmd);
> > + print"delete $childpath\n";
> > +
> > + unlink($childpath);
>
> this unlink can be skipped?
>
> > + print"rename $snappath to $childpath\n";
> > + rename($snappath, $childpath);
>
> >>since this will overwrite $childpath anyway.. this also reduces the
> >>chance of something going wrong:
> >>
> >>- if the commit fails halfway through, nothing bad should have
> >>happened, other than some data is now stored in two snapshots and
> >>takes up extra space
> >>- if the rename fails, then all of the data of $snap is stored twice,
> >>but the backing chain is still valid
> >>
> >>notable, there is no longer a gap where $childpath doesn't exist,
> >>which would break the backing chain!
>
> yes you are right, better to have it atomic indeed
>
>
> > + } else {
> > + print"commit $snappath\n";
> > + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
>
> >>leftover from previous version? not used/overwritten below ;)
>
> no, this is really to commit the the snapshot to parent
but it is not executed..
>
> > + #if we delete an intermediate snapshot, we need to link upper
> > snapshot to base snapshot
> > + die "missing parentsnap snapshot to rebase child $childpath\n"
> > if !$parentpath;
> > + print "link $childsnap to $parentsnap\n";
> > + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath,
> > '-F', 'qcow2', '-f', 'qcow2', $childpath];
>
> >>does this work? I would read the qemu-img manpage to say that '-u' is
> >>for when you've moved/converted the backing file, and want to update
> >>the reference in its overlay, and that it doesn't copy any data.. but
> >>we need to copy the data from $snap to $childpath (we just want to
> >>delete the snapshot, we don't want to drop all its changes from the
> >>history, that would corrupt the contents of the image).
> >>note the description of the "safe" variant:
> >>
> >>" This is the default mode and performs a real
> >>rebase operation. The new backing file may differ from the old one
> >>and qemu-img rebase will take care of keeping the
> >> guest-visible content of FILENAME unchanged."
> >>
> >>IMHO this is the behaviour we need here?
>
> This is only to change the backing chain ref in the qcow2 snapshot.
> (this is the only way to do it, they was a qemu-img ammend command in
> past, but it has been removed in
> 2020 https://patchwork.kernel.org/project/qemu-devel/patch/20200403175859.863248-5-eblake@redhat.com/,
> so the rebase is the good way to do it)
>
> The merge is done by the previous qemu-img commit. (qemu-img commit
> can't change change automatically the backing chain of the upper
> snapshot, because it don't have any idea than an upper snapshot could
> exist).
see above and below ;)
> this is for this usecase :
>
> A<----B<----C.
>
> you commit B to A, then you need to change the backing file of C to A
> (instead B)
>
> A<----C
but this is the wrong semantics.. the writes/delta in B need to go to C (they happened after A), not to A!
> (when done it live, qemu qmp block-commit is able to change
> automatically the backing chain of the upper snapshot, because qemu
> known the whole chain)
I think it's wrong there as well, see my comments on those patches ;)
> This is how libvirt is doing too
> https://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html
> see "Deleting snapshots (and 'offline commit')"
> Method (1): base <- sn1 <- sn3 (by copying sn2 into sn1)
> Method (2): base <- sn1 <- sn3 (by copying sn2 into sn3)
> (This is commit vs stream)
but they use the "wrong" (v1) naming scheme where the name of the snapshot and the content don't line up..
> I think that we should look at used space of parent vs child,
> to choose the correct direction/method.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2025-01-10 11:02 ` Fabian Grünbichler
@ 2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 11:51 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 15532 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Fri, 10 Jan 2025 11:51:35 +0000
Message-ID: <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
>>yes, for the "first" snapshot that is true (since that one is
>>basically the baseline data, which will often be huge compared to the
>>snapshot delta). but streaming (rebasing) saves us the rename, which
>>makes the error handling a lot easier/less risky. maybe we could
>special case the first snapshot as a performance optimization? ;)
Ah, that's a good point indeed. Yes, I think it's a good idea, commit
to "first" snapshot, and rebase for others. I'll look to implement
this.
>
>
> This one was if we want to do retry in case of error, if we have
> multiple disks. (for example, first snapshot delete api call, the
> first disk remove the snapshot, but a bug occur and second disk don't
> remove the snapshot).
>
> User could want to unlock the vm-snaphot lock and and fix it
> manually
> with calling again the snapshot delete.
>
> I'm not sure how to handle this correctly ?
>>I think the force parameter for snapshot deletion covers this
>>already, and it should be fine for this to die..
Ah, ok, I was not aware about this parameter ! thanks.
>
>
> > + } else {
> > + print"commit $snappath\n";
> > + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
>
> > > leftover from previous version? not used/overwritten below ;)
>
> no, this is really to commit the the snapshot to parent
>>but it is not executed..
Ah, ok ! sorrry ! I think I have dropped some code during rebase before
sending patches, because I had tested it a lot of time !
> this is for this usecase :
>
> A<----B<----C.
>
> you commit B to A, then you need to change the backing file of C to
> A
> (instead B)
>
> A<----C
>>but this is the wrong semantics.. the writes/delta in B need to go to
>>C (they happened after A), not to A!
I think they can go to A (commit) or C (stream)
here an example:
current (1TB)
- take snap A
(A (1TB)<------new current 500MB (backing file A))
- take snap B
(A (1TB)<------B 500MB (backingfile A)<------new current 10MB
(backingfile B))
Then, you want to delete B.
so, you stream it to current. (so copy 500MB to current in this
example)
Then, you want to delete snapshot A
you don't want stream A to current, because A is the big initial image.
So, instead, you need to commit the current to A (with the extra 500MB)
So, if you have a lot of snapshot to delete, you are going do a copy
same datas each time to the upper snapshot for nothing, because at the
end we are going to commit to the initial "first" snapshot/image.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
@ 2025-01-10 12:20 ` Fabian Grünbichler
2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-10 12:20 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 10.01.2025 12:51 CET geschrieben:
> > > + } else {
> > > + print"commit $snappath\n";
> > > + $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];
> >
> > > > leftover from previous version? not used/overwritten below ;)
> >
> > no, this is really to commit the the snapshot to parent
>
> >>but it is not executed..
>
> Ah, ok ! sorrry ! I think I have dropped some code during rebase before
> sending patches, because I had tested it a lot of time !
>
>
>
> > this is for this usecase :
> >
> > A<----B<----C.
> >
> > you commit B to A, then you need to change the backing file of C to
> > A
> > (instead B)
> >
> > A<----C
>
> >>but this is the wrong semantics.. the writes/delta in B need to go to
> >>C (they happened after A), not to A!
>
> I think they can go to A (commit) or C (stream)
>
> here an example:
>
> current (1TB)
> - take snap A
>
> (A (1TB)<------new current 500MB (backing file A))
>
> - take snap B
>
> (A (1TB)<------B 500MB (backingfile A)<------new current 10MB
> (backingfile B))
>
>
> Then, you want to delete B.
>
>
> so, you stream it to current. (so copy 500MB to current in this
> example)
>
> Then, you want to delete snapshot A
> you don't want stream A to current, because A is the big initial image.
> So, instead, you need to commit the current to A (with the extra 500MB)
>
>
> So, if you have a lot of snapshot to delete, you are going do a copy
> same datas each time to the upper snapshot for nothing, because at the
> end we are going to commit to the initial "first" snapshot/image.
but you don't know up front that you want to collapse all the snapshots. for each single removal, you have to merge the delta towards the overlay, not the base, else the base contents is no longer matching its name.
think about it this way:
you take a snapshot B at time X. this snapshot must never contain a modification that happened after X. that means you cannot ever commit a newer snapshot into B, unless you are removing and renaming B.
if you start with a chain A -> B -> C -> D (with A being the first snapshot/base, and D being the current active overlay. if you want to remove B, you can either
- stream B into C, remove B
- commit C into B, remove C, rename B to C
in both cases you will end up with a chain A -> C' -> D where C' is the combination of the old B and C.
the downside of the streaming variant is that if B's delta is bigger than C's, you have more I/O. the upside is that there is no inbetween state where the backing chain is broken and error handling can go very wrong.
what you are doing right now is:
chain A->B->C->D as before. remove B by commiting B into A and then rebasing C on top of A. that means you end up with:
A'->C->D where A' is A+B. but now this snapshot A contains writes that happened after the original A was taken. this is wrong.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
2025-01-10 12:20 ` Fabian Grünbichler
@ 2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 13:14 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14788 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Fri, 10 Jan 2025 13:14:48 +0000
Message-ID: <32c2f6a7978e2c37bb5b51f44de2261dc12446e8.camel@groupe-cyllene.com>
>>but you don't know up front that you want to collapse all the
>>snapshots. for each single removal, you have to merge the delta
>>towards the overlay, not the base, else the base contents is no
>>longer matching its name.
>>
>>think about it this way:
>>
>>you take a snapshot B at time X. this snapshot must never contain a
>>modification that happened after X. that means you cannot ever commit
>>a newer snapshot into B, unless you are removing and renaming B.
>>if you start with a chain A -> B -> C -> D (with A being the first
>>snapshot/base, and D being the current active overlay. if you want to
>>remove B, you can either
>>- stream B into C, remove B
>>- commit C into B, remove C, rename B to C
>>
>>in both cases you will end up with a chain A -> C' -> D where C' is
>>the combination of the old B and C.
>>
>>the downside of the streaming variant is that if B's delta is bigger
>>than C's, you have more I/O. the upside is that there is no inbetween
>>state where the backing chain is broken and error handling can go
>>very wrong.
>>
>>what you are doing right now is:
>>
>>chain A->B->C->D as before. remove B by commiting B into A and then
>>rebasing C on top of A. that means you end up with:
>>A'->C->D where A' is A+B. but now this snapshot A contains writes
>>that happened after the original A was taken. this is wrong.
Ah yes, you are right.
I was just thinking about it, and have the same conclusion
for example:
A 1TB (12:00)----> B 500MB (13:00) ----> C 10MB (14:00) ---> current
(now)
if I delete B, If I merge it to A, it'll not be the A view at 12:00,
but 13:00.
so we indeed need to merge it to C.
(Sorry, I'm using zfs/ceph for too long, where merge never occur, and
block are only referenced && destroyed in background.)
Ok,I'll rework with stream implementation. (I need to do it anyway for
multi-branch, but later please
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
2025-01-08 14:17 ` Fabian Grünbichler
@ 2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 13:50 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 19012 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax
Date: Fri, 10 Jan 2025 13:50:33 +0000
Message-ID: <b4cfffe2a2bfe601affef4f5aab63f6beb72cb97.camel@groupe-cyllene.com>
> - $device .= ",drive=drive-$drive_id,id=$drive_id";
> + $device .= ",id=$drive_id";
> + $device .= ",drive=drive-$drive_id" if $device_type ne 'cd' ||
> $drive->{file} ne 'none';
>>is this just because you remove the whole drive when ejecting? not
>>sure whether that is really needed..
with blockdev, no drive (no disc inserted in the cdrom device), it's
really no blockdev defined.
So we don't pass drive/cdrom media to the cdrom device.
>
> -sub print_drive_commandline_full {
> - my ($storecfg, $vmid, $drive, $live_restore_name, $io_uring) =
> @_;
> +sub print_drive_throttle_group {
> + my ($drive) = @_;
> + #command line can't use the structured json limits option,
> + #so limit params need to use with x- as it's unstable api
>>this comment should be below the early return, or above the whole
>>sub.
ok
> + return if drive_is_cdrom($drive) && $drive->{file} eq 'none';
>>is this needed if we keep empty cdrom drives around like before? I
>>know throttling practically makes no sense in that case, but it might
>>make the code in general more simple?
yes, this is to keep-it like before, but I can put it behind a
throttle-group, no problem.
>
> +sub generate_file_blockdev {
> + my ($storecfg, $drive, $nodename) = @_;
> +
> + my $volid = $drive->{file};
> my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid,
> 1);
> - my $scfg = $storeid ? PVE::Storage::storage_config($storecfg,
> $storeid) : undef;
>
> - if (drive_is_cdrom($drive)) {
> - $path = get_iso_path($storecfg, $vmid, $volid);
> - die "$drive_id: cannot back cdrom drive with a live restore
> image\n" if $live_restore_name;
> + my $scfg = undef;
> + my $path = $volid;
I think this should only happen if the parse_volume_id above told us
this is an absolute path and not a PVE-managed volume..
> + if($storeid && $storeid ne 'nbd') {
>>this is wrong.. I guess it's also somewhat wrong in the old
>>qemu_drive_mirror code.. we should probably check using a more
>>specific RE that the "volid" is an NBD URI, and not attempt to parse
>>it as a regular volid in that case..
ok. I'm already parsing the nbd uri later, I'll adapt the code.
> + my $format = $drive->{format};
> + $format //= "raw";
>>the format handling here is very sensitive, and I think this broke
>>it. see the big comment this patch removed ;)
>>
>>short summary: for PVE-managed volumes we want the format from the
>>storage layer (via checked_volume_format). if the drive has a format
>>set that disagrees, that is a hard error. for absolute paths we us
>>the format from the drive with a fallback to raw.
yes, I have seen the commits during my rebase before sending patches.
I'll fix that.
>
> - if ($live_restore_name) {
> - $format = "rbd" if $is_rbd;
> - die "$drive_id: Proxmox Backup Server backed drive cannot auto-
> detect the format\n"
> - if !$format;
> - $opts .= ",format=alloc-track,file.driver=$format";
> - } elsif ($format) {
> - $opts .= ",format=$format";
> + my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid,
> 1);
>>so I guess this should never be called with nbd-URI-volids?
until we want to live restore to an nbd uri, no ^_^
> + my $readonly = defined($drive->{ro}) || $force_readonly ?
> JSON::true : JSON::false;
> +
> + #libvirt define cache option on both format && file
> my $cache_direct = drive_uses_cache_direct($drive, $scfg);
> + my $cache = {};
> + $cache->{direct} = $cache_direct ? JSON::true : JSON::false;
> + $cache->{'no-flush'} = $drive->{cache} && $drive->{cache} eq
> 'unsafe' ? JSON::true : JSON::false;
>>so we have the same code in two places? should probably be a helper
>>then to not have them go out of sync..
Ah, yes, forgot to do the helper. Libvirt define it at both file &&
format blockdev, not sure why exactly,.
>
> - # my $file_param = $live_restore_name ? "file.file.filename" :
> "file";
> - my $file_param = "file";
> + my $file_nodename = "file-drive-$drive_id";
> + my $blockdev_file = generate_file_blockdev($storecfg, $drive,
> $file_nodename);
> + my $fmt_nodename = "fmt-drive-$drive_id";
> + my $blockdev_format = generate_format_blockdev($storecfg,
> $drive, $fmt_nodename, $blockdev_file, $force_readonly);
> +
> + my $blockdev_live_restore = undef;
> if ($live_restore_name) {
> - # non-rbd drivers require the underlying file to be a separate
> block
> - # node, so add a second .file indirection
> - $file_param .= ".file" if !$is_rbd;
> - $file_param .= ".filename";
> + die "$drive_id: Proxmox Backup Server backed drive cannot
> auto-detect the format\n"
> + if !$format;
>>for this check, but it is not actually set anywhere here.. so is
>>something missing or can the check go?
can be remove, this is the older code that I forget to remove.
(I don't have tested the backup/restore yet, ad backup is not working)
>
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
2025-01-08 14:26 ` Fabian Grünbichler
@ 2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 14:08 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 17858 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel
Date: Fri, 10 Jan 2025 14:08:20 +0000
Message-ID: <a26394024a9767a4d602c07e362e670808b17fbb.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert
qemu_driveadd && qemu_drivedel
Date: 08/01/2025 15:26:37
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
> fixme/testme :
> PVE/VZDump/QemuServer.pm: eval {
> PVE::QemuServer::qemu_drivedel($vmid, "tpmstate0-backup"); };
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-
> cyllene.com>
> ---
> PVE/QemuServer.pm | 64 +++++++++++++++++++++++++++++++++------------
> --
> 1 file changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2832ed09..baf78ec0 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1582,6 +1582,42 @@ sub print_drive_throttle_group {
> return $throttle_group;
> }
>
> +sub generate_throttle_group {
> + my ($drive) = @_;
> +
> + my $drive_id = get_drive_id($drive);
> +
> + my $throttle_group = { id => "throttle-drive-$drive_id" };
> + my $limits = {};
> +
> + foreach my $type (['', '-total'], [_rd => '-read'], [_wr => '-
> write']) {
> + my ($dir, $qmpname) = @$type;
> +
> + if (my $v = $drive->{"mbps$dir"}) {
> + $limits->{"bps$qmpname"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"mbps${dir}_max"}) {
> + $limits->{"bps$qmpname-max"} = int($v*1024*1024);
> + }
> + if (my $v = $drive->{"bps${dir}_max_length"}) {
> + $limits->{"bps$qmpname-max-length"} = int($v)
> + }
> + if (my $v = $drive->{"iops${dir}"}) {
> + $limits->{"iops$qmpname"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max"}) {
> + $limits->{"iops$qmpname-max"} = int($v);
> + }
> + if (my $v = $drive->{"iops${dir}_max_length"}) {
> + $limits->{"iops$qmpname-max-length"} = int($v);
> + }
> + }
> +
> + $throttle_group->{limits} = $limits;
> +
> + return $throttle_group;
>>this and the corresponding print sub are exactly the same, so the
>>print sub could call this and join the limits with the `x-` prefix
>>added?
yes we could merge them.
Currently, the command line can't defined complex qom object (this
should be available soon, qemu devs are working on it). That's why it's
using a different syntax with x-.
>>how does this interact with the qemu_block_set_io_throttle helper
>>used when updating the limits at runtime?
It's still working with block_set_io_throttle, where you define the
device. (the throttling value are passed to the topnode attached to the
device)
> +}
> +
> sub generate_file_blockdev {
> my ($storecfg, $drive, $nodename) = @_;
>
> @@ -4595,32 +4631,22 @@ sub qemu_iothread_del {
> }
>
> sub qemu_driveadd {
> - my ($storecfg, $vmid, $device) = @_;
> + my ($storecfg, $vmid, $drive) = @_;
>
> - my $kvmver = get_running_qemu_version($vmid);
> - my $io_uring = min_version($kvmver, 6, 0);
> - my $drive = print_drive_commandline_full($storecfg, $vmid,
> $device, undef, $io_uring);
> - $drive =~ s/\\/\\\\/g;
> - my $ret = PVE::QemuServer::Monitor::hmp_cmd($vmid, "drive_add
> auto \"$drive\"", 60);
> -
> - # If the command succeeds qemu prints: "OK"
> - return 1 if $ret =~ m/OK/s;
> + my $drive_id = get_drive_id($drive);
> + my $throttle_group = generate_throttle_group($drive);
>>do we always need a throttle group? or would we benefit from only
>>adding it when limits are set, and skip that node when I/O is
>>unlimited?
It's adding a lot of complexity without it, because it's not always
possible to insert a new blockdev (here throttlegroup) between the
device and the drive blockdev, when the blockdev is already the top
node attached to the device
the other benefit is to have a stable name for top blocknode.
(drive node names can change when you switch). (less lookup for some
qmp action, like mirror/commit for example where you need to known the
top node nodename)
They a no performance impact to have a throttle group without limit
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
2025-01-08 14:31 ` Fabian Grünbichler
@ 2025-01-13 7:56 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 7:56 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14346 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query
Date: Mon, 13 Jan 2025 07:56:03 +0000
Message-ID: <26cce4a3ba90a375c30290ec6d692a8f7a6fd979.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH v3 qemu-server 04/11] blockdev:
vm_devices_list : fix block-query
Date: 08/01/2025 15:31:36
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
> Look at qdev value, as cdrom drives can be empty
> without any inserted media
>>is this needed if we don't drive_del the cdrom drive when ejecting
>>the medium?
The original code is buggy for me, because vm_devices_list should list
devices (the media device reder), not (drives/blockdev) -> the media
we can't list an an empty device without media without this.
(We don't drive_del the cdrom drive(device), they are ide devices,
and can't be removed online)
>
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-
> cyllene.com>
> ---
> PVE/QemuServer.pm | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index baf78ec0..3b33fd7d 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4425,10 +4425,9 @@ sub vm_devices_list {
> }
>
> my $resblock = mon_cmd($vmid, 'query-block');
> - foreach my $block (@$resblock) {
> - if($block->{device} =~ m/^drive-(\S+)/){
> - $devices->{$1} = 1;
> - }
> + $resblock = { map { $_->{qdev} => $_ } $resblock->@* };
> + foreach my $blockid (keys %$resblock) {
> + $devices->{$blockid} = 1;
> }
>
> my $resmice = mon_cmd($vmid, 'query-mice');
> --
> 2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-08 15:19 ` Fabian Grünbichler
@ 2025-01-13 8:27 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0d0d4c4d73110cf0e692cae0ee65bf7f9a6ce93a.camel@groupe-cyllene.com>
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 8:27 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 15138 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Mon, 13 Jan 2025 08:27:19 +0000
Message-ID: <0d0d4c4d73110cf0e692cae0ee65bf7f9a6ce93a.camel@groupe-cyllene.com>
> + my $path = PVE::Storage::path($storecfg, $volid);
>>is this guaranteed to be stable? also across versions? and including
>>external storage plugins?
it can't be different than the value we have use for command line
generation. But I think that I should use $path directly (It's working
for block/file , but I think it'll not work with ceph,gluster,...)
I need to reuse the code used to generated the blockdev commande line.
Another way, maybe a better way,is to parse the tree from the top node
(the throttle-group) where the name is fixed. and look for fmt|file
chain attached to this node.
(I just need need to check when we are a doing live renaming, we have 2
files nodes, with the newer file node not attached to the tree before
the switch)
> +
> + my $node = find_blockdev_node($nodes, $path, 'fmt');
>>that one is only added in a later patch.. but I don't think lookups
>>by path are a good idea, we should probably have a deterministic node
>>naming concept instead? e.g., encode the drive + snapshot name?
I really would like to have something deterministic but:
- devices node are 31 characters max. (snapshot name can be more big)
- we can't rename a node (but we are renaming files for snapshot over
time)
As Fiona said, we could have random names and do 1 lookup each time to
list them.
(I really need to have our own name, because blockdev-reopen, for live
renaming of files, is not working with autogenerated block# name)
> + return $node->{'node-name'};
> +}
> +
> +sub get_blockdev_nextid {
> + my ($nodename, $nodes) = @_;
> + my $version = 0;
> + for my $nodeid (keys %$nodes) {
> + if ($nodeid =~ m/^$nodename-(\d+)$/) {
> + my $current_version = $1;
> + $version = $current_version if $current_version >= $version;
> + }
> + }
> + $version++;
> + return "$nodename-$version";
>>since we shouldn't ever have more than one job for a drive running
(right?), couldn't we just have a deterministic name for this? that
>>would also simplify cleanup, including cleanup of a failed cleanup ;)
Still same,
- you need 2 file nodes at the same time for live renaming
- you can have 2 fmt nodes for blockdev-mirror at same time.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
2025-01-09 9:51 ` Fabian Grünbichler
@ 2025-01-13 8:38 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 8:38 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 13507 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default.
Date: Mon, 13 Jan 2025 08:38:01 +0000
Message-ID: <f90768d3c3025e6501f34cdfdb58f276b7885b32.camel@groupe-cyllene.com>
> + #change aio if io_uring is not supported on target
> + if ($dst_drive->{aio} && $dst_drive->{aio} eq 'io_uring') {
> + my ($dst_storeid) = PVE::Storage::parse_volume_id($dst_drive-
> >{file});
> + my $dst_scfg = PVE::Storage::storage_config($storecfg,
> $dst_storeid);
> + my $cache_direct = drive_uses_cache_direct($dst_drive, $dst_scfg);
> + if(!storage_allows_io_uring_default($dst_scfg, $cache_direct)) {
> + $dst_drive->{aio} = $cache_direct ? 'native' : 'threads';
> + }
> + }
>>couldn't/shouldn't we just handle this in generate_file_blockdev?
yes, better to reuse existing code to avoid difference. I'll do it.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
2025-01-09 11:57 ` Fabian Grünbichler
@ 2025-01-13 8:53 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 8:53 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 15173 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support
Date: Mon, 13 Jan 2025 08:53:03 +0000
Message-ID: <e23f530d158c3a79cc02121d3f011d4acda63a76.camel@groupe-cyllene.com>
>
> +sub generate_backing_blockdev {
> + my ($storecfg, $snapshots, $deviceid, $drive, $id) = @_;
> +
> + my $snapshot = $snapshots->{$id};
> + my $order = $snapshot->{order};
> + my $parentid = $snapshot->{parent};
> + my $snap_fmt_nodename = "fmt-$deviceid-$order";
> + my $snap_file_nodename = "file-$deviceid-$order";
>>would it make sense to use the snapshot name here instead of the
>>order? that would allow a deterministic mapping even when snapshots
>>are removed..
31 characters limit for nodename :(
if we could be able to encode the fullpath, I think it could be
easier.I don't known if qemu could be patched to allow more characters
for node names.
With a unique id by file/path (not by drive), I think it could work
with mirror/replace/...
> +
> + my $snap_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $snap_file_nodename);
> + $snap_file_blockdev->{filename} = $snapshot->{file};
> + my $snap_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $snap_fmt_nodename, $snap_file_blockdev, 1);
> + $snap_fmt_blockdev->{backing} =
> generate_backing_blockdev($storecfg, $snapshots, $deviceid, $drive,
> $parentid) if $parentid;
> + return $snap_fmt_blockdev;
> +}
> +
> +sub generate_backing_chain_blockdev {
> + my ($storecfg, $deviceid, $drive) = @_;
> +
> + my $volid = $drive->{file};
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg,
> $volid, $deviceid);
> + return if !$do_snapshots_with_qemu || $do_snapshots_with_qemu !=
> 2;
> +
> + my $chain_blockdev = undef;
> + PVE::Storage::activate_volumes($storecfg, [$volid]);
> + #should we use qemu config to list snapshots ?
>>from a data consistency PoV, trusting the qcow2 metadata is probably
>>safer..
(I asked about this, because we need to active volumes for this, and
currently we are activate them after the config generation).
With this code, this activate volumes when we generate the command
line, but if user have a wrong config, we need to handle desactivate
too.
>>but we could check that the storage and the config agree, and >>error
>>out otherwise?
yes, I was thinking about this. we can list volumes from storage
without need to activate them , then check that all volumes from vm
config are present.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
[not found] ` <0d0d4c4d73110cf0e692cae0ee65bf7f9a6ce93a.camel@groupe-cyllene.com>
@ 2025-01-13 9:52 ` Fabian Grünbichler
2025-01-13 9:55 ` Fabian Grünbichler
` (2 more replies)
0 siblings, 3 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-13 9:52 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 13.01.2025 09:27 CET geschrieben:
>
>
> > + my $path = PVE::Storage::path($storecfg, $volid);
>
> >>is this guaranteed to be stable? also across versions? and including
> >>external storage plugins?
>
> it can't be different than the value we have use for command line
> generation. But I think that I should use $path directly (It's working
> for block/file , but I think it'll not work with ceph,gluster,...)
> I need to reuse the code used to generated the blockdev commande line.
>
> Another way, maybe a better way,is to parse the tree from the top node
> (the throttle-group) where the name is fixed. and look for fmt|file
> chain attached to this node.
>
> (I just need need to check when we are a doing live renaming, we have 2
> files nodes, with the newer file node not attached to the tree before
> the switch)
>
>
>
> > +
> > + my $node = find_blockdev_node($nodes, $path, 'fmt');
>
> >>that one is only added in a later patch.. but I don't think lookups
> >>by path are a good idea, we should probably have a deterministic node
> >>naming concept instead? e.g., encode the drive + snapshot name?
>
> I really would like to have something deterministic but:
>
> - devices node are 31 characters max. (snapshot name can be more big)
> - we can't rename a node (but we are renaming files for snapshot over
> time)
>
>
> As Fiona said, we could have random names and do 1 lookup each time to
> list them.
>
> (I really need to have our own name, because blockdev-reopen, for live
> renaming of files, is not working with autogenerated block# name)
something like this was what I was afraid of ;) this basically means we need to have some way to lookup the nodes based on the structure of the graph, which probably also means verifying that the structure matches the expected one (e.g., if we have X snapshots, we expect N nodes, if we currently have operation A going on, there should be an extra node, etc.pp. - and then we can "know" that the seventh node from the bottom must be snapshot 'foobar' ;)). relying on $path being stable definitely won't work.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-13 9:52 ` Fabian Grünbichler
@ 2025-01-13 9:55 ` Fabian Grünbichler
2025-01-13 10:47 ` DERUMIER, Alexandre via pve-devel
[not found] ` <c1559499319052d6cf10900efd5376c12389a60f.camel@groupe-cyllene.com>
2 siblings, 0 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-13 9:55 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> Fabian Grünbichler <f.gruenbichler@proxmox.com> hat am 13.01.2025 10:52 CET geschrieben:
>
>
> > DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 13.01.2025 09:27 CET geschrieben:
> >
> >
> > > + my $path = PVE::Storage::path($storecfg, $volid);
> >
> > >>is this guaranteed to be stable? also across versions? and including
> > >>external storage plugins?
> >
> > it can't be different than the value we have use for command line
> > generation. But I think that I should use $path directly (It's working
> > for block/file , but I think it'll not work with ceph,gluster,...)
> > I need to reuse the code used to generated the blockdev commande line.
> >
> > Another way, maybe a better way,is to parse the tree from the top node
> > (the throttle-group) where the name is fixed. and look for fmt|file
> > chain attached to this node.
> >
> > (I just need need to check when we are a doing live renaming, we have 2
> > files nodes, with the newer file node not attached to the tree before
> > the switch)
> >
> >
> >
> > > +
> > > + my $node = find_blockdev_node($nodes, $path, 'fmt');
> >
> > >>that one is only added in a later patch.. but I don't think lookups
> > >>by path are a good idea, we should probably have a deterministic node
> > >>naming concept instead? e.g., encode the drive + snapshot name?
> >
> > I really would like to have something deterministic but:
> >
> > - devices node are 31 characters max. (snapshot name can be more big)
> > - we can't rename a node (but we are renaming files for snapshot over
> > time)
> >
> >
> > As Fiona said, we could have random names and do 1 lookup each time to
> > list them.
> >
> > (I really need to have our own name, because blockdev-reopen, for live
> > renaming of files, is not working with autogenerated block# name)
>
> something like this was what I was afraid of ;) this basically means we need to have some way to lookup the nodes based on the structure of the graph, which probably also means verifying that the structure matches the expected one (e.g., if we have X snapshots, we expect N nodes, if we currently have operation A going on, there should be an extra node, etc.pp. - and then we can "know" that the seventh node from the bottom must be snapshot 'foobar' ;)). relying on $path being stable definitely won't work.
something more to add to this - if it is impossible to map back using the structure alone, we might need to somehow keep track ourselves for the full livecycle of the VM? e.g., find a way to attach the "volid+snap" information to a block node as metadata, or to add such a mapping inside the VM or alongside it? OTOH, that approach would then break if a user does a manual QMP block operation (but those are error prone already anyway)
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2025-01-09 11:57 ` Fabian Grünbichler
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
@ 2025-01-13 10:08 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
2 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 10:08 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 24364 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 13 Jan 2025 10:08:27 +0000
Message-ID: <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
>>it would be great if there'd be a summary of the design choices and a
>>high level summary of what happens to the files and block-node-graph
>>here. it's a bit hard to judge from the code below whether it would
>>be possible to eliminate the dynamically named block nodes, for
>>example ;)
yes, sorry, I'll add more infos with qemu limitations and why I'm doing
it like this.
>>a few more comments documenting the behaviour and ideally also some
>>tests (mocking the QMP interactions?) would be nice
yes, I'll add tests, need to find a good way to mock it.
> +
> + #preallocate add a new current file
> + my $new_current_fmt_nodename = get_blockdev_nextid("fmt-
> $deviceid", $nodes);
> + my $new_current_file_nodename = get_blockdev_nextid("file-
> $deviceid", $nodes);
>>okay, so here we have a dynamic node name because the desired target
>>name is still occupied. could we rename the old block node first?
we can't rename a node, that's the problem.
> + PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>>(continued from above) and this invocation here are the same??
The invocation is the same, but they it's not doing the same if it's an
external snasphot.
>> wouldn't this already create the snapshot on the storage layer?
yes, it's create the (lvm volume) + qcow2 file with preallocation
>>and didn't we just hardlink + reopen + unlink to transform the
>>previous current volume into the snap volume?
yes.
here, we are creating the new current volume, adding it to the graph
with blockdev-add, then finally switch to it with blockdev-snapshot
The ugly trick in pve-storage is in plugin.pm
#rename current volume to snap volume
rename($path, $snappath) if -e $path && !-e $snappath;
or in lvm plugin.
eval { lvrename($vg, $volname, $snap_volname) } ;
(and you have already made comment about them ;)
because I'm re-using volume_snapshot (I didn't have to add a new method
in pve-storage) to allocate the snasphot file, but indeed, we have
already to the rename online.
>>should this maybe have been vdisk_alloc and it just works by
accident?
It's not works fine with vdisk_alloc, because the volume need to be
created without the size specified but with backing file param instead.
(if I remember, qemu-img is looking at the backing file size+metadas
and set the correct total size for the new volume)
Maybe a better way could be to reuse vdisk_alloc, and add backing file
as param ?
> + my $new_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $new_current_file_nodename);
> + my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $new_current_fmt_nodename, $new_file_blockdev);
> +
> + $new_fmt_blockdev->{backing} = undef;
>>generate_format_blockdev doesn't set backing?
yes, it's adding backing
>>maybe this should be >>converted into an assertion?
but they are a limitation of the qmp blockdev-ad ++blockdev-snapshot
where the backing attribute need undef in the blockdev-add or the
blockdev-snapshot will fail because it's trying itself to set the
backing file when doing the switch.
From my test, it was related to this
https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01404.html
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$new_fmt_blockdev);
> + mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename,
> overlay => $new_current_fmt_nodename);
> +}
> +
> +sub blockdev_snap_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $src_path,
> $target_path) = @_;
>>I think this whole thing needs more error handling and thought about
>>how to recover from various points failing..
yes, that's the problem with renaming, it's not atomic.
Also, if we need to recover (rollback), how to manage multiple disk ?
>>there's also quite some overlap with blockdev_current_rename, I
>>wonder whether it would be possible to simplify the code further by
>merging the two? but see below, I think we can even get away with
>>dropping this altogether if we switch from block-commit to block-
>>stream..
Yes, I have seperated them because I was not sure of the different
workflow, and It was more simplier to fix one method without breaking
the other.
I'll look to implement block-stream. (and keep commit to initial image
for the last snapshot delete)
> + #untaint
> + if ($src_path =~ m/^(\S+)$/) {
> + $src_path = $1;
> + }
> + if ($target_path =~ m/^(\S+)$/) {
> + $target_path = $1;
> + }
>>shouldn't that have happened in the storage plugin?
>>
> +
> + #create a hardlink
> + link($src_path, $target_path);
>>should this maybe be done by the storage plugin?
This was to avoid to introduce a sub method, but yes, it could be
better indeed.
PVE::Storage::link ?
>
> + #delete old $path link
> + unlink($src_path);
and this
PVE::Storage::unlink ?
(can't use free_image here, because we really want to remove the link
and not the volume )
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $src_path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $src_path, $target_path],
> + errmsg => "lvrename $src_path to $target_path error",
> + );
> + }
>>and this as well?
I didn't reuse lvrename in lvmplugin, because it's using vgname/lvname
and not the path, but I can look to extend it)
> +}
> +
> +sub blockdev_current_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path,
> $skip_underlay) = @_;
> + ## rename current running image
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid",
> $nodes);
>>here we could already incorporate the snapshot name, since we know
it?
31char limits.
> +
> + my $file_blockdev = generate_file_blockdev($storecfg, $drive,
> $target_file_nodename);
> + $file_blockdev->{filename} = $target_path;
> +
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
>>then we'd know this is always the "current" node, however we
>>deterministically name it?
until you are doing a block-mirror, the current fmt node will be
replaced with another current2 fmt node.
>>and this should be done by the storage layer I think? how does this
>>interact with LVM?
from my test, an hardlink is working
>> would we maybe want to mknod instead of hardlinking the
device node?
because /dev/<vgname>/<lv> is not a device node, it's already a link
to the device node
for example:
lrwxrwxrwx 1 root root 7 Dec 10 00:11 vm-10001-disk-0 -> ../dm-9
#ln vm-10001-disk-0 testrename
lrwxrwxrwx 1 root root 7 Dec 10 00:11 vm-10001-disk-0 -> ../dm-9
lrwxrwxrwx 1 root root 7 Dec 10 00:11 testrename -> ../dm-9
>>did you try whether a plain rename would also work (not sure - qemu
>>already has an open FD to the file/blockdev, but I am not sure how
>>LVM handles this ;))?
from my test, the lvrename, it simply the rename the lvm volume
internaly, then rename link.. and as we have already create the link,
it's simply rename it without problem.
#lvrename vm-10001-disk-0 vm-10001-disk-snap1
lrwxrwxrwx 1 root root 7 Dec 10 00:11 vm-10001-disk-snap1 -> ../dm-
9
lrwxrwxrwx 1 root root 7 Dec 10 00:11 testrename -> ../dm-9
#lvrename vm-10001-disk-snap1 testrename
lrwxrwxrwx 1 root root 7 Dec 10 00:11 testrename -> ../dm-9
>
> +
> +sub blockdev_commit {
>>see comments below for qemu_volume_snapshot_delete, I think this..
>>and this can be replaced altogether with blockdev_stream..
>>wouldn't it make more sense to use block-stream to merge the contents
>>of the to-be-deleted snapshot into the current overlay? that way we
>>wouldn't need to rename anything, AFAICT..
>>same here, instead of commiting from the child into the to-be-deleted
>>snapshot, and then renaming, why not just block-stream from the to-
>>be-deleted snapshot into the child, and then discard the snapshot
>>that is no longer needed?
>>commit is the wrong direction though?
>>
>>if we have A -> B -> C, and B is deleted, the delta previously
co>>ntained in B should be merged into C, not into A?
>>
>>so IMHO a simple block-stream + removal of the to-be-deleted snapshot
>>should be the right choice here as well?
>>
>>that would effectively make all the paths identical AFAICT (stream
>>from to-be-deleted snapshot to child, followed by deletion of the no
>>longer used volume corresponding to the deleted/streamed snapshot)
>>and no longer require any renaming..
Yes, got it now. I'll implement block-stream.
But keep commit for last snapshot delete.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-13 9:52 ` Fabian Grünbichler
2025-01-13 9:55 ` Fabian Grünbichler
@ 2025-01-13 10:47 ` DERUMIER, Alexandre via pve-devel
2025-01-13 13:42 ` Fiona Ebner
[not found] ` <c1559499319052d6cf10900efd5376c12389a60f.camel@groupe-cyllene.com>
2 siblings, 1 reply; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 10:47 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14022 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Mon, 13 Jan 2025 10:47:58 +0000
Message-ID: <c1559499319052d6cf10900efd5376c12389a60f.camel@groupe-cyllene.com>
>>something like this was what I was afraid of ;) this basically means
>>we need to have some way to lookup the nodes based on the structure
>>of the graph, which probably also means verifying that the structure
>>matches the expected one (e.g., if we have X snapshots, we expect N
>>nodes, if we currently have operation A going on, there should be an
>>extra node, etc.pp. - and then we can "know" that the seventh node
>>from the bottom must be snapshot 'foobar' ;)).
I think it's not a much a problem to follow the graph from top to
bottom. (as everything attached is have parent-child relationship)
and
- for snapshot, we have the snapshot name in the filename
So we can known if it' a specific snap or the live image.
for the temporary nodes (when we do block-add, before a mirror or
switch), we define the nodename, so we don't need to parse the graph
here.
>>relying on $path being >>stable definitely won't work.
I really don't see why the path couldn't be stable ?
Over time, if something is changing in qemu (for example, rbd://....
with a new param),
it'll only be apply on the new qemu process (after restart or live
migration), so the path in the block-node will be updated too. (live
migration will not keep old block-nodes infos, the used value are qemu
command line arguments)
and the file path is the only attribute in a node that you can't
update.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
[not found] ` <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
@ 2025-01-13 13:27 ` Fabian Grünbichler
2025-01-13 18:07 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-13 13:27 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 13.01.2025 11:08 CET geschrieben:
>
>
> > Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> > 16.12.2024 10:12 CET geschrieben:
>
> >>it would be great if there'd be a summary of the design choices and a
> >>high level summary of what happens to the files and block-node-graph
> >>here. it's a bit hard to judge from the code below whether it would
> >>be possible to eliminate the dynamically named block nodes, for
> >>example ;)
>
> yes, sorry, I'll add more infos with qemu limitations and why I'm doing
> it like this.
>
> >>a few more comments documenting the behaviour and ideally also some
> >>tests (mocking the QMP interactions?) would be nice
> yes, I'll add tests, need to find a good way to mock it.
>
>
> > +
> > + #preallocate add a new current file
> > + my $new_current_fmt_nodename = get_blockdev_nextid("fmt-
> > $deviceid", $nodes);
> > + my $new_current_file_nodename = get_blockdev_nextid("file-
> > $deviceid", $nodes);
>
> >>okay, so here we have a dynamic node name because the desired target
> >>name is still occupied. could we rename the old block node first?
>
> we can't rename a node, that's the problem.
>
>
> > + PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>
> >>(continued from above) and this invocation here are the same??
> The invocation is the same, but they it's not doing the same if it's an
> external snasphot.
>
> >> wouldn't this already create the snapshot on the storage layer?
> yes, it's create the (lvm volume) + qcow2 file with preallocation
>
> >>and didn't we just hardlink + reopen + unlink to transform the
> >>previous current volume into the snap volume?
> yes.
> here, we are creating the new current volume, adding it to the graph
> with blockdev-add, then finally switch to it with blockdev-snapshot
>
> The ugly trick in pve-storage is in plugin.pm
> #rename current volume to snap volume
> rename($path, $snappath) if -e $path && !-e $snappath;
> or in lvm plugin.
> eval { lvrename($vg, $volname, $snap_volname) } ;
>
>
> (and you have already made comment about them ;)
>
>
> because I'm re-using volume_snapshot (I didn't have to add a new method
> in pve-storage) to allocate the snasphot file, but indeed, we have
> already to the rename online.
>
>
> >>should this maybe have been vdisk_alloc and it just works by
> accident?
> It's not works fine with vdisk_alloc, because the volume need to be
> created without the size specified but with backing file param instead.
> (if I remember, qemu-img is looking at the backing file size+metadas
> and set the correct total size for the new volume)
I am not sure I follow.. we create a snapshot, but then we pretend it isn't a file with backing file when passing it to qemu? this seems wrong.. IMHO we should just allocate (+format) here, and then let qemu do the backing link up instead of this confusing (and error-prone, as it masks problems that should be a hard error!) call..
> Maybe a better way could be to reuse vdisk_alloc, and add backing file
> as param ?
that seems.. wrong as well? the file can never be bigger just because it has a backing file right? why can't we just allocate and format the regular volume?
> > + my $new_file_blockdev = generate_file_blockdev($storecfg,
> > $drive, $new_current_file_nodename);
> > + my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> > $drive, $new_current_fmt_nodename, $new_file_blockdev);
> > +
> > + $new_fmt_blockdev->{backing} = undef;
>
> >>generate_format_blockdev doesn't set backing?
> yes, it's adding backing
>
> >>maybe this should be >>converted into an assertion?
>
> but they are a limitation of the qmp blockdev-ad ++blockdev-snapshot
> where the backing attribute need undef in the blockdev-add or the
> blockdev-snapshot will fail because it's trying itself to set the
> backing file when doing the switch.
>
> From my test, it was related to this
> https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01404.html
yeah, what I mean is:
generate_format_blockdev doesn't (and should never) set backing. so setting it to undef has no effect. but we might want to assert that it *is* undef, so that if we ever mistakenly change generate_format_blockdev to set backing, it will be caught instead of silently being papered over..
> > + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> > %$new_fmt_blockdev);
> > + mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename,
> > overlay => $new_current_fmt_nodename);
> > +}
> > +
> > +sub blockdev_snap_rename {
> > + my ($storecfg, $vmid, $deviceid, $drive, $src_path,
> > $target_path) = @_;
>
> >>I think this whole thing needs more error handling and thought about
> >>how to recover from various points failing..
> yes, that's the problem with renaming, it's not atomic.
>
> Also, if we need to recover (rollback), how to manage multiple disk ?
in general, a rollback with multiple disks that fails half-way through can only be recovered using another rollback, and that only works if the half-rollback is idempotent? another reason to avoid the need for renames wherever possible ;)
> >>there's also quite some overlap with blockdev_current_rename, I
> >>wonder whether it would be possible to simplify the code further by
> >merging the two? but see below, I think we can even get away with
> >>dropping this altogether if we switch from block-commit to block-
> >>stream..
> Yes, I have seperated them because I was not sure of the different
> workflow, and It was more simplier to fix one method without breaking
> the other.
>
> I'll look to implement block-stream. (and keep commit to initial image
> for the last snapshot delete)
>
>
> > + #untaint
> > + if ($src_path =~ m/^(\S+)$/) {
> > + $src_path = $1;
> > + }
> > + if ($target_path =~ m/^(\S+)$/) {
> > + $target_path = $1;
> > + }
>
> >>shouldn't that have happened in the storage plugin?
> >>
> > +
> > + #create a hardlink
> > + link($src_path, $target_path);
>
> >>should this maybe be done by the storage plugin?
>
> This was to avoid to introduce a sub method, but yes, it could be
> better indeed.
>
> PVE::Storage::link ?
the issue is that these are all storage-specific things done in qemu-server, and they should be done by the storage plugin, else a third-party plugin can never support external storages.. so we'd need to find the right level of abstraction to tell the storage plugin what to do when, and then the storage plugin can decide how it exposes both snapshot names with the same underlying snapshot (for a while)..
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
[not found] ` <c1559499319052d6cf10900efd5376c12389a60f.camel@groupe-cyllene.com>
@ 2025-01-13 13:31 ` Fabian Grünbichler
2025-01-20 13:37 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-13 13:31 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel
> DERUMIER, Alexandre <alexandre.derumier@groupe-cyllene.com> hat am 13.01.2025 11:47 CET geschrieben:
>
>
> >>something like this was what I was afraid of ;) this basically means
> >>we need to have some way to lookup the nodes based on the structure
> >>of the graph, which probably also means verifying that the structure
> >>matches the expected one (e.g., if we have X snapshots, we expect N
> >>nodes, if we currently have operation A going on, there should be an
> >>extra node, etc.pp. - and then we can "know" that the seventh node
> >>from the bottom must be snapshot 'foobar' ;)).
>
> I think it's not a much a problem to follow the graph from top to
> bottom. (as everything attached is have parent-child relationship)
> and
>
> - for snapshot, we have the snapshot name in the filename
> So we can known if it' a specific snap or the live image.
>
>
> for the temporary nodes (when we do block-add, before a mirror or
> switch), we define the nodename, so we don't need to parse the graph
> here.
>
>
>
> >>relying on $path being >>stable definitely won't work.
>
> I really don't see why the path couldn't be stable ?
>
> Over time, if something is changing in qemu (for example, rbd://....
> with a new param),
> it'll only be apply on the new qemu process (after restart or live
> migration), so the path in the block-node will be updated too. (live
> migration will not keep old block-nodes infos, the used value are qemu
> command line arguments)
>
>
> and the file path is the only attribute in a node that you can't
> update.
the path referenced in the running VM is stable. the path you are looking for in the graph is not. e.g., the path might be something some storage software returns. or udev. or .. and that can change with any software upgrade or not be 100% deterministic in the first place.
let's say you start the VM today and the path returned by the RBD storage plugin is /dev/rbd/XYZ, so that is how the blockdev is opened/the path is recorded. in two weeks, ceph gets updated and now the udev rule or the storage plugin code changes to return the more deterministic /dev/rbd/POOL/XYZ. now the paths don't match anymore. (this is just an example where such a thing happened in practice already ;)).
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-13 10:47 ` DERUMIER, Alexandre via pve-devel
@ 2025-01-13 13:42 ` Fiona Ebner
2025-01-14 10:03 ` DERUMIER, Alexandre via pve-devel
[not found] ` <fa38efbd95b57ba57a5628d6acfcda9d5875fa82.camel@groupe-cyllene.com>
0 siblings, 2 replies; 68+ messages in thread
From: Fiona Ebner @ 2025-01-13 13:42 UTC (permalink / raw)
To: Proxmox VE development discussion, f.gruenbichler
Am 13.01.25 um 11:47 schrieb DERUMIER, Alexandre via pve-devel:
>>> something like this was what I was afraid of 😉 this basically means
>>> we need to have some way to lookup the nodes based on the structure
>>> of the graph, which probably also means verifying that the structure
>>> matches the expected one (e.g., if we have X snapshots, we expect N
>>> nodes, if we currently have operation A going on, there should be an
>>> extra node, etc.pp. - and then we can "know" that the seventh node
>> >from the bottom must be snapshot 'foobar' ;)).
>
> I think it's not a much a problem to follow the graph from top to
> bottom. (as everything attached is have parent-child relationship)
> and
>
> - for snapshot, we have the snapshot name in the filename
> So we can known if it' a specific snap or the live image.
>
>
> for the temporary nodes (when we do block-add, before a mirror or
> switch), we define the nodename, so we don't need to parse the graph
> here.
I do think following the graph structure would be a viable approach too.
>>> relying on $path being >>stable definitely won't work.
>
> I really don't see why the path couldn't be stable ?
>
> Over time, if something is changing in qemu (for example, rbd://....
> with a new param),
> it'll only be apply on the new qemu process (after restart or live
> migration), so the path in the block-node will be updated too. (live
> migration will not keep old block-nodes infos, the used value are qemu
> command line arguments)
>
Upgrading libpve-storage-perl or an external storage plugin while the VM
is running could lead to a different result for path() and thus
breakage, right?
If we do need lookup, an idea to get around the character limit is using
a hash of the information to generate the node name, e.g.
hash("fmt-$volid@$snapname"), hash("file-$volid@$snapname") or whatever
is actually needed as unique information. Even if we only use lowercase
letters, we have 26 base chars, so 26^31 possible values.
So hashes with up to
>>> math.log2(26**31)
145.71363126237387
bits can still fit, which should be more than enough. Even with an
enormous number of 2^50 block nodes (realistically, the max values we
expect to encounter are more like 2^10), the collision probability
(using a simple approximation for the birthday problem) would only be
>>> d=2**145
>>> n=2**50
>>> 1 - math.exp(-(n*n)/(2*d))
1.4210854715202004e-14
>
> and the file path is the only attribute in a node that you can't
> update.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
[not found] ` <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
@ 2025-01-13 14:28 ` Fiona Ebner
2025-01-14 10:10 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: Fiona Ebner @ 2025-01-13 14:28 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler
Am 10.01.25 um 10:32 schrieb DERUMIER, Alexandre:
>>> Maybe it could even be a bug then?
>
> Yes, it's a bug. I just think that libvirt currently only implement
> block-commit with disk blockdev on topnode.
>
> throttle group are not currently implement in libvirt (but I have seen
> some commit to add support recently), they still used the old throttle
> method.
>
>>> In many situations, the filter >>nodes
>>> on top (like throttle groups) are ignored/skipped to get to the
>>> actually
>>> interesting block node for certain block operations.
>
> yes, and this option exist in the qmp blockdev-mirror. (and block-
> commit is reusing blockdev-mirror code behind)
>
>
>>> Are there any situations where you wouldn't want to do that in the
>>> block-commit case?
> mmm, I think it should always be rettach to disk (format blocknode or
> file blocknode if no formatnode exist). I really don't known how to
> code this, I have just reused the blockdev-mirror way.
>
>
> Feel free to cleanup this patch and submit it to qemu devs, you are a
> better C developper than me ^_^
I can try to look into it, but could you give some more details how
exactly the issue manifests? What parameters are you using for
block-commit, how does the graph look like at the time of the operation?
What error do you get without your patch or what exactly does not work
in the block graph?
My first try did not result in an error:
> #!/bin/bash
> rm -f /tmp/backing.qcow2
> rm -f /tmp/top.qcow2
> qemu-img create /tmp/backing.qcow2 -f qcow2 64M
> qemu-img create /tmp/top.qcow2 -f qcow2 64M
> qemu-system-x86_64 --qmp stdio \
> --nodefaults \
> --object throttle-group,id=thrgr0 \
> --blockdev qcow2,node-name=backing0,file.driver=file,file.filename=/tmp/backing.qcow2 \
> --blockdev throttle,node-name=drive-scsi0,throttle-group=thrgr0,file.driver=qcow2,file.node-name=node0,file.file.driver=file,file.file.filename=/tmp/top.qcow2,file.backing=backing0 \
> --device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.0,addr=0x2' \
> --device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "query-block"}
> {"execute": "block-commit", "arguments": { "device": "drive-scsi0", "top-node": "node0", "base-node": "backing0", "job-id": "commit0" } }
> {"execute": "query-block"}
> {"execute": "quit"}
> EOF
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2025-01-13 13:27 ` Fabian Grünbichler
@ 2025-01-13 18:07 ` DERUMIER, Alexandre via pve-devel
2025-01-13 18:58 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 18:07 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 15710 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 13 Jan 2025 18:07:43 +0000
Message-ID: <09c1a7f90fe54d4423d7b3e81ffc618aac999ef2.camel@groupe-cyllene.com>
>
>
> > > should this maybe have been vdisk_alloc and it just works by
> accident?
> It's not works fine with vdisk_alloc, because the volume need to be
> created without the size specified but with backing file param
> instead.
> (if I remember, qemu-img is looking at the backing file size+metadas
> and set the correct total size for the new volume)
>>I am not sure I follow.. we create a snapshot, but then we pretend it
>>isn't a file with backing file when passing it to qemu? this seems
>>wrong.. IMHO we should just allocate (+format) here, and then let
>>qemu do the backing link up instead of this confusing (and error-
>>prone, as it masks problems that should be a hard error!) call..
>>
>>
>>
>>
> >>Maybe a better way could be to reuse vdisk_alloc, and add backing
> >>file
> >>as param ?
>>
>>that seems.. wrong as well? the file can never be bigger just because
>>it has a backing file right? why can't we just allocate and format
>>the regular volume?
I need to redo tests as I don't remember exactly.
From memory (I have wrote it 2month ago, sorry ^_^ ) , it was maybe :
- related with metadatas prealloc size + lvm size).
or (but I need to verify)
The "blockdev-snasphot" qmp later, only change the backing-file in
memory in the blockgraph, but not inside the file itself. (so after a
restart of the process, the chain is borken).
> > + my $new_file_blockdev = generate_file_blockdev($storecfg,
> > $drive, $new_current_file_nodename);
> > + my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> > $drive, $new_current_fmt_nodename, $new_file_blockdev);
> > +
> > + $new_fmt_blockdev->{backing} = undef;
>
> > > generate_format_blockdev doesn't set backing?
> yes, it's adding backing
>
> > > maybe this should be >>converted into an assertion?
>
> but they are a limitation of the qmp blockdev-ad ++blockdev-snapshot
> where the backing attribute need undef in the blockdev-add or the
> blockdev-snapshot will fail because it's trying itself to set the
> backing file when doing the switch.
>
> From my test, it was related to this
>>yeah, what I mean is:
>>
>>generate_format_blockdev doesn't (and should never) set backing. so
>>setting it to undef has no effect. but we might want to assert that
>>it *is* undef, so that if we ever mistakenly change
>>generate_format_blockdev to set backing, it will be caught instead of
>>silently being papered over..
you need it if you want to have block node-names for snapshots.
if backing is no defined, the backing chain is autogenerated with
random #block node-names. (so rename/block reopen can't work)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2025-01-13 18:07 ` DERUMIER, Alexandre via pve-devel
@ 2025-01-13 18:58 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 18:58 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14834 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 13 Jan 2025 18:58:36 +0000
Message-ID: <4fcbe829f838bdce29f9a1492290c700cdcd8944.camel@groupe-cyllene.com>
>
> > > should this maybe have been vdisk_alloc and it just works by
> accident?
> It's not works fine with vdisk_alloc, because the volume need to be
> created without the size specified but with backing file param
> instead.
> (if I remember, qemu-img is looking at the backing file size+metadas
> and set the correct total size for the new volume)
>>I am not sure I follow.. we create a snapshot, but then we pretend it
>>isn't a file with backing file when passing it to qemu? this seems
>>wrong.. IMHO we should just allocate (+format) here, and then let
>>qemu do the backing link up instead of this confusing (and error-
>>prone, as it masks problems that should be a hard error!) call..
>>
>>
>>
>>
> >>Maybe a better way could be to reuse vdisk_alloc, and add backing
> >>file
> >>as param ?
>>
>>that seems.. wrong as well? the file can never be bigger just because
>>it has a backing file right? why can't we just allocate and format
>>the regular volume?
>>I need to redo tests as I don't remember exactly.
>>
>>From memory (I have wrote it 2month ago, sorry ^_^ ) , it was maybe :
>> - related with metadatas prealloc size + lvm size).
>> or (but I need to verify)
>>
>> The "blockdev-snasphot" qmp later, only change the backing-file in
>>memory in the blockgraph, but not inside the file itself. (so after a
>>restart of the process, the chain is borken).
Ok, I have redone tests and verify, this is the second case
The problem is that blockdev-snapshot don't actually rewrite the
backing file inside the qcow2 file.
It still explain on the same link:
https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01404.html
So the only way, is to do it a qcow2 creation.
And then, that's why we need to pass backing=undef with doing blockdev-
add, because if not, qemu will try to open the file the the backing
already opened/locked.
Don't seem to be fixed in last qemu version, and libvirt is still doing
it this way too.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-13 13:42 ` Fiona Ebner
@ 2025-01-14 10:03 ` DERUMIER, Alexandre via pve-devel
[not found] ` <fa38efbd95b57ba57a5628d6acfcda9d5875fa82.camel@groupe-cyllene.com>
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-14 10:03 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14319 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Tue, 14 Jan 2025 10:03:57 +0000
Message-ID: <fa38efbd95b57ba57a5628d6acfcda9d5875fa82.camel@groupe-cyllene.com>
>
>>Upgrading libpve-storage-perl or an external storage plugin while the
>>VM
>>is running could lead to a different result for path() and thus
>>breakage, right?
mmm, yes, you are right
>>If we do need lookup, an idea to get around the character limit is
>>using
>>a hash of the information to generate the node name, e.g.
>>hash("fmt-$volid@$snapname"), hash("file-$volid@$snapname") or
>>whatever
yes, I think it should works
>>is actually needed as unique information. Even if we only use
>>lowercase
>>letters, we have 26 base chars, so 26^31 possible values.
yes, I was think about a hash too, but I was not sure how to convert it
to the alphanum characters (valid char : alphanum , ‘-’, ‘.’ and ‘_’.
)
>>So hashes with up to
>>
> > > math.log2(26**31)
>>145.71363126237387
>>
>>bits can still fit, which should be more than enough. Even with an
>>enormous number of 2^50 block nodes (realistically, the max values we
>>expect to encounter are more like 2^10), the collision probability
>>(using a simple approximation for the birthday problem) would only be
>>
> > > d=2**145
> > > n=2**50
> > > 1 - math.exp(-(n*n)/(2*d))
>>1.4210854715202004e-14
yes, should be enough
a simple md5 is 128bit,
sha1 is 160bit (it's 150bits space with extra -,.,- characters)
Do you known a good hash algorithm ?
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
2025-01-13 14:28 ` Fiona Ebner
@ 2025-01-14 10:10 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-14 10:10 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14728 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch
Date: Tue, 14 Jan 2025 10:10:45 +0000
Message-ID: <77a156658951ac1998a296f547fd516f42a8db0c.camel@groupe-cyllene.com>
>
> Feel free to cleanup this patch and submit it to qemu devs, you are a
> better C developper than me ^_^
>>I can try to look into it, but could you give some more details how
>>exactly the issue manifests? What parameters are you using for
>>block-commit, how does the graph look like at the time of the
>>operation?
>>What error do you get without your patch or what exactly does not
>>work
>>in the block graph?
I'll try to redo the test today, but if I remember, the block-commit is
working, but the new fmt node (your topnode node0) is not attached to
the throttle group.
Try to dump the graph before/after
My first try did not result in an error:
> #!/bin/bash
> rm -f /tmp/backing.qcow2
> rm -f /tmp/top.qcow2
> qemu-img create /tmp/backing.qcow2 -f qcow2 64M
> qemu-img create /tmp/top.qcow2 -f qcow2 64M
> qemu-system-x86_64 --qmp stdio \
> --nodefaults \
> --object throttle-group,id=thrgr0 \
> --blockdev qcow2,node-
> name=backing0,file.driver=file,file.filename=/tmp/backing.qcow2 \
> --blockdev throttle,node-name=drive-scsi0,throttle-
> group=thrgr0,file.driver=qcow2,file.node-
> name=node0,file.file.driver=file,file.file.filename=/tmp/top.qcow2,fi
> le.backing=backing0 \
> --device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.0,addr=0x2' \
> --device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-
> id=0,lun=0,drive=drive-scsi0,id=scsi0' \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "query-block"}
> {"execute": "block-commit", "arguments": { "device": "drive-scsi0",
> "top-node": "node0", "base-node": "backing0", "job-id": "commit0" } }
> {"execute": "query-block"}
> {"execute": "quit"}
> EOF
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
[not found] ` <fa38efbd95b57ba57a5628d6acfcda9d5875fa82.camel@groupe-cyllene.com>
@ 2025-01-15 9:39 ` Fiona Ebner
2025-01-15 9:51 ` Fabian Grünbichler
0 siblings, 1 reply; 68+ messages in thread
From: Fiona Ebner @ 2025-01-15 9:39 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler
Am 14.01.25 um 11:03 schrieb DERUMIER, Alexandre:
>>> If we do need lookup, an idea to get around the character limit is
>>> using
>>> a hash of the information to generate the node name, e.g.
>>> hash("fmt-$volid@$snapname"), hash("file-$volid@$snapname") or
>>> whatever
>
> yes, I think it should works
>
>>> is actually needed as unique information. Even if we only use
>>> lowercase
>>> letters, we have 26 base chars, so 26^31 possible values.
>
> yes, I was think about a hash too, but I was not sure how to convert it
> to the alphanum characters (valid char : alphanum , ‘-’, ‘.’ and ‘_’.
> )
>
>
>
>>> So hashes with up to
>>>
>>>> math.log2(26**31)
>>> 145.71363126237387
>>>
>>> bits can still fit, which should be more than enough. Even with an
>>> enormous number of 2^50 block nodes (realistically, the max values we
>>> expect to encounter are more like 2^10), the collision probability
>>> (using a simple approximation for the birthday problem) would only be
>>>
>>>> d=2**145
>>>> n=2**50
>>>> 1 - math.exp(-(n*n)/(2*d))
>>> 1.4210854715202004e-14
>
> yes, should be enough
>
> a simple md5 is 128bit,
> sha1 is 160bit (it's 150bits space with extra -,.,- characters)
>
> Do you known a good hash algorithm ?
I'm not too well-read in cryptography, but AFAIK, you can shorten the
result of sha256 to get a good hash algorithm with fewer bits. We could
also have the node-name start with a "h" to make sure it doesn't start
with a number and then use base32 for the remaining 30 characters. I.e.
we could take the first 150 bits (32^30 = 2^150) from the sha256 hash
and convert that to base32.
@Shannon @Fabian please correct me if I'm wrong.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 9:39 ` Fiona Ebner
@ 2025-01-15 9:51 ` Fabian Grünbichler
2025-01-15 10:06 ` Fiona Ebner
2025-01-15 10:15 ` DERUMIER, Alexandre via pve-devel
0 siblings, 2 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-15 9:51 UTC (permalink / raw)
To: Fiona Ebner, DERUMIER, Alexandre, pve-devel
> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 10:39 CET geschrieben:
>
>
> Am 14.01.25 um 11:03 schrieb DERUMIER, Alexandre:
> >>> If we do need lookup, an idea to get around the character limit is
> >>> using
> >>> a hash of the information to generate the node name, e.g.
> >>> hash("fmt-$volid@$snapname"), hash("file-$volid@$snapname") or
> >>> whatever
> >
> > yes, I think it should works
> >
> >>> is actually needed as unique information. Even if we only use
> >>> lowercase
> >>> letters, we have 26 base chars, so 26^31 possible values.
> >
> > yes, I was think about a hash too, but I was not sure how to convert it
> > to the alphanum characters (valid char : alphanum , ‘-’, ‘.’ and ‘_’.
> > )
> >
> >
> >
> >>> So hashes with up to
> >>>
> >>>> math.log2(26**31)
> >>> 145.71363126237387
> >>>
> >>> bits can still fit, which should be more than enough. Even with an
> >>> enormous number of 2^50 block nodes (realistically, the max values we
> >>> expect to encounter are more like 2^10), the collision probability
> >>> (using a simple approximation for the birthday problem) would only be
> >>>
> >>>> d=2**145
> >>>> n=2**50
> >>>> 1 - math.exp(-(n*n)/(2*d))
> >>> 1.4210854715202004e-14
> >
> > yes, should be enough
> >
> > a simple md5 is 128bit,
> > sha1 is 160bit (it's 150bits space with extra -,.,- characters)
> >
> > Do you known a good hash algorithm ?
>
> I'm not too well-read in cryptography, but AFAIK, you can shorten the
> result of sha256 to get a good hash algorithm with fewer bits. We could
> also have the node-name start with a "h" to make sure it doesn't start
> with a number and then use base32 for the remaining 30 characters. I.e.
> we could take the first 150 bits (32^30 = 2^150) from the sha256 hash
> and convert that to base32.
>
> @Shannon @Fabian please correct me if I'm wrong.
IMHO this isn't really a cryptographic use case, so I'd not worry too much about any of that.
basically what we have is the following situation:
- we have some input data (volid+snapname)
- we have a key derived from the input data (block node name)
- we have a value (block node)
- we need to be be able to map back the block node (name) to the input data
sometimes we need to allocate a second block node temporarily for a given input data (right?), and we can't rename block nodes, so there might be more than one key value (block node name) for a key. to map back from a block node name to the volid+snapname, we can hash the input data and then use that (shortened) hash as the middle part of the block node name (with a counter as last part and some static/drive-related prefix). the only thing we need to ensure is that the hash is good enough to avoid accidental collisions (given the nature of the input data, I don't think we have to worry about non-accidental collisions either unless we choose a very basic checksum, but even if that were possible, an attacker could only mess with data of a VM where they can already add/remove images anyway..), and that we never re-use a block node name for something that doesn't match its input data (I have to admit I lost track a bit of whether that invariant can hold?).
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 9:51 ` Fabian Grünbichler
@ 2025-01-15 10:06 ` Fiona Ebner
2025-01-15 10:15 ` Fabian Grünbichler
2025-01-16 14:56 ` DERUMIER, Alexandre via pve-devel
2025-01-15 10:15 ` DERUMIER, Alexandre via pve-devel
1 sibling, 2 replies; 68+ messages in thread
From: Fiona Ebner @ 2025-01-15 10:06 UTC (permalink / raw)
To: Fabian Grünbichler, DERUMIER, Alexandre, pve-devel
Am 15.01.25 um 10:51 schrieb Fabian Grünbichler:
>
>> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 10:39 CET geschrieben:
>>
>>
>> Am 14.01.25 um 11:03 schrieb DERUMIER, Alexandre:
>>>>> If we do need lookup, an idea to get around the character limit is
>>>>> using
>>>>> a hash of the information to generate the node name, e.g.
>>>>> hash("fmt-$volid@$snapname"), hash("file-$volid@$snapname") or
>>>>> whatever
>>>
>>> yes, I think it should works
>>>
>>>>> is actually needed as unique information. Even if we only use
>>>>> lowercase
>>>>> letters, we have 26 base chars, so 26^31 possible values.
>>>
>>> yes, I was think about a hash too, but I was not sure how to convert it
>>> to the alphanum characters (valid char : alphanum , ‘-’, ‘.’ and ‘_’.
>>> )
>>>
>>>
>>>
>>>>> So hashes with up to
>>>>>
>>>>>> math.log2(26**31)
>>>>> 145.71363126237387
>>>>>
>>>>> bits can still fit, which should be more than enough. Even with an
>>>>> enormous number of 2^50 block nodes (realistically, the max values we
>>>>> expect to encounter are more like 2^10), the collision probability
>>>>> (using a simple approximation for the birthday problem) would only be
>>>>>
>>>>>> d=2**145
>>>>>> n=2**50
>>>>>> 1 - math.exp(-(n*n)/(2*d))
>>>>> 1.4210854715202004e-14
>>>
>>> yes, should be enough
>>>
>>> a simple md5 is 128bit,
>>> sha1 is 160bit (it's 150bits space with extra -,.,- characters)
>>>
>>> Do you known a good hash algorithm ?
>>
>> I'm not too well-read in cryptography, but AFAIK, you can shorten the
>> result of sha256 to get a good hash algorithm with fewer bits. We could
>> also have the node-name start with a "h" to make sure it doesn't start
>> with a number and then use base32 for the remaining 30 characters. I.e.
>> we could take the first 150 bits (32^30 = 2^150) from the sha256 hash
>> and convert that to base32.
>>
>> @Shannon @Fabian please correct me if I'm wrong.
>
> IMHO this isn't really a cryptographic use case, so I'd not worry too much about any of that.
Yes, we don't need much to get enough collision-resistance. Just wanted
to make sure and check it explicitly.
>
> basically what we have is the following situation:
>
> - we have some input data (volid+snapname)
> - we have a key derived from the input data (block node name)
> - we have a value (block node)
> - we need to be be able to map back the block node (name) to the input data
Oh, we need to map back too? But that can be done via filename in the
block node, or not?
> sometimes we need to allocate a second block node temporarily for a given input data (right?), and we can't rename block nodes, so there might be more than one key value (block node name) for a key. to map back from a block node name to the volid+snapname, we can hash the input data and then use that (shortened) hash as the middle part of the block node name (with a counter as last part and some static/drive-related prefix). the only thing we need to ensure is that the hash is good enough to avoid accidental collisions (given the nature of the input data, I don't think we have to worry about non-accidental collisions either unless we choose a very basic checksum, but even if that were possible, an attacker could only mess with data of a VM where they can already add/remove images anyway..), and that we never re-use a block node name for something that doesn't match its input data (I have to admit I lost track a bit of whether that invariant can hold?).
Okay, sure. If we need other prefixes-suffixes, we can shorten the hash
part more. Even with only 15 characters for the hash, we have an
extremely low probability for collision with about a million nodes:
>>> math.log2(32**15)
75.0
>>> d=2**75
>>> n=2**20
>>> 1 - math.exp(-(n*n)/(2*d))
1.4551915228366852e-11
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 10:06 ` Fiona Ebner
@ 2025-01-15 10:15 ` Fabian Grünbichler
2025-01-15 10:46 ` Fiona Ebner
2025-01-15 13:01 ` DERUMIER, Alexandre via pve-devel
2025-01-16 14:56 ` DERUMIER, Alexandre via pve-devel
1 sibling, 2 replies; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-15 10:15 UTC (permalink / raw)
To: Fiona Ebner, DERUMIER, Alexandre, pve-devel
> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 11:06 CET geschrieben:
>
>
> Am 15.01.25 um 10:51 schrieb Fabian Grünbichler:
> >
> > basically what we have is the following situation:
> >
> > - we have some input data (volid+snapname)
> > - we have a key derived from the input data (block node name)
> > - we have a value (block node)
> > - we need to be be able to map back the block node (name) to the input data
>
> Oh, we need to map back too? But that can be done via filename in the
> block node, or not?
but that filename is the result of PVE::Storage::path which is not stable, so we can't compare that?
for snapshot operations, we need to find out "which block node is the one for the snapshot volume". we can't rely on the filename in the block graph for that, because how we map from volid+snapname to that filename might have changed on our end since that bock node was set up. so we need to find a way to map using (parts of) the block node name, which means
- having a naming scheme that allows to map back from node name to volid+snapname (i.e., the hashing scheme we are discussing ;))
- never re-using a block node for something other than what is encoded in its name (not sure if that possible?)
while an operation is ongoing, we can have $prefix-$hash-1 and $prefix-$hash-2 at the same time, and if we then end up with having just $prefix-$hash-2 after the operation that doesn't matter since we can reliably map that back via the $hash to volid+snapname.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 9:51 ` Fabian Grünbichler
2025-01-15 10:06 ` Fiona Ebner
@ 2025-01-15 10:15 ` DERUMIER, Alexandre via pve-devel
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-15 10:15 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14966 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Wed, 15 Jan 2025 10:15:36 +0000
Message-ID: <5c6f5939f4e1f6edd843bbdb712419510f9b96a0.camel@groupe-cyllene.com>
IMHO this isn't really a cryptographic use case, so I'd not worry too
much about any of that.
basically what we have is the following situation:
- we have some input data (volid+snapname)
- we have a key derived from the input data (block node name)
- we have a value (block node)
- we need to be be able to map back the block node (name) to the input
data
>>sometimes we need to allocate a second block node temporarily for a
>>given input data (right?),
yes, with a unique volume path in the file-node
note that a "block-node" , is a couple of fmt-node (handling
.qcow2,.raw,..)->file-node (with the path to file/block/...
For snapshot rename (current->snap1 for xample), I only switch the file
nodes currently (with the blockdev-reopen), so the fmt-node is not
changing.
The current behaviour is something like:
1) fmt-node-current----->file-node-current
take snap1
2)
a) rename
fmt-node-current-->file-node-snap1
(and delete file-node-current)
b) create a new fmt node current with snap1 as backing
fmt-node-current2-->file-node-current
|
|------------->fmt-node-current---->file-node-snap1
I'm not sure that I can swap both fmt+file couple by another fmt+file
couple for the renaming part. (blockdev-mirror is doing it for
example)
I'll try to do it to be sure, (do the blockdev-reopen at fmt level).
if it's not possible,
for filenode, the hash could work 100% as the volid+snapname shouldn't
be in 2 nodes at the same time.
but for fmt-node, it should need a lookup into the graph, to find the
parent of file-node (file-node found with the hash)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 10:15 ` Fabian Grünbichler
@ 2025-01-15 10:46 ` Fiona Ebner
2025-01-15 10:50 ` Fabian Grünbichler
2025-01-15 13:01 ` DERUMIER, Alexandre via pve-devel
1 sibling, 1 reply; 68+ messages in thread
From: Fiona Ebner @ 2025-01-15 10:46 UTC (permalink / raw)
To: Fabian Grünbichler, DERUMIER, Alexandre, pve-devel
Am 15.01.25 um 11:15 schrieb Fabian Grünbichler:
>
>> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 11:06 CET geschrieben:
>>
>>
>> Am 15.01.25 um 10:51 schrieb Fabian Grünbichler:
>>>
>>> basically what we have is the following situation:
>>>
>>> - we have some input data (volid+snapname)
>>> - we have a key derived from the input data (block node name)
>>> - we have a value (block node)
>>> - we need to be be able to map back the block node (name) to the input data
>>
>> Oh, we need to map back too? But that can be done via filename in the
>> block node, or not?
>
> but that filename is the result of PVE::Storage::path which is not stable, so we can't compare that?
>
> for snapshot operations, we need to find out "which block node is the one for the snapshot volume". we can't rely on the filename in the block graph for that, because how we map from volid+snapname to that filename might have changed on our end since that bock node was set up. so we need to find a way to map using (parts of) the block node name, which means
> - having a naming scheme that allows to map back from node name to volid+snapname (i.e., the hashing scheme we are discussing ;))
> - never re-using a block node for something other than what is encoded in its name (not sure if that possible?)
>
> while an operation is ongoing, we can have $prefix-$hash-1 and $prefix-$hash-2 at the same time, and if we then end up with having just $prefix-$hash-2 after the operation that doesn't matter since we can reliably map that back via the $hash to volid+snapname.
How would you map back from the hash? Wouldn't that require computing
the hashes for all known volid+snapname and comparing which one matches?
Or do you mean having a lookup-table, i.e. Perl hash keeping track of
the hash => volid+snapname mappings?
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 10:46 ` Fiona Ebner
@ 2025-01-15 10:50 ` Fabian Grünbichler
2025-01-15 11:01 ` Fiona Ebner
0 siblings, 1 reply; 68+ messages in thread
From: Fabian Grünbichler @ 2025-01-15 10:50 UTC (permalink / raw)
To: Fiona Ebner, DERUMIER, Alexandre, pve-devel
> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 11:46 CET geschrieben:
>
>
> Am 15.01.25 um 11:15 schrieb Fabian Grünbichler:
> >
> >> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 11:06 CET geschrieben:
> >>
> >>
> >> Am 15.01.25 um 10:51 schrieb Fabian Grünbichler:
> >>>
> >>> basically what we have is the following situation:
> >>>
> >>> - we have some input data (volid+snapname)
> >>> - we have a key derived from the input data (block node name)
> >>> - we have a value (block node)
> >>> - we need to be be able to map back the block node (name) to the input data
> >>
> >> Oh, we need to map back too? But that can be done via filename in the
> >> block node, or not?
> >
> > but that filename is the result of PVE::Storage::path which is not stable, so we can't compare that?
> >
> > for snapshot operations, we need to find out "which block node is the one for the snapshot volume". we can't rely on the filename in the block graph for that, because how we map from volid+snapname to that filename might have changed on our end since that bock node was set up. so we need to find a way to map using (parts of) the block node name, which means
> > - having a naming scheme that allows to map back from node name to volid+snapname (i.e., the hashing scheme we are discussing ;))
> > - never re-using a block node for something other than what is encoded in its name (not sure if that possible?)
> >
> > while an operation is ongoing, we can have $prefix-$hash-1 and $prefix-$hash-2 at the same time, and if we then end up with having just $prefix-$hash-2 after the operation that doesn't matter since we can reliably map that back via the $hash to volid+snapname.
>
> How would you map back from the hash? Wouldn't that require computing
> the hashes for all known volid+snapname and comparing which one matches?
> Or do you mean having a lookup-table, i.e. Perl hash keeping track of
> the hash => volid+snapname mappings?
if you are looking for the block node currently referencing volid A in snapshot X, you calculate the hash for them, and then look at the list of block nodes which should only contain one block node named using that hash. "mapping back" is a bit of a misnomer I guess ;) we just want to know which node corresponds to a *known* input pair, not get the input pair from the node alone.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 10:50 ` Fabian Grünbichler
@ 2025-01-15 11:01 ` Fiona Ebner
0 siblings, 0 replies; 68+ messages in thread
From: Fiona Ebner @ 2025-01-15 11:01 UTC (permalink / raw)
To: Fabian Grünbichler, DERUMIER, Alexandre, pve-devel
Am 15.01.25 um 11:50 schrieb Fabian Grünbichler:
>
>> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 11:46 CET geschrieben:
>>
>>
>> Am 15.01.25 um 11:15 schrieb Fabian Grünbichler:
>>>
>>>> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 11:06 CET geschrieben:
>>>>
>>>>
>>>> Am 15.01.25 um 10:51 schrieb Fabian Grünbichler:
>>>>>
>>>>> basically what we have is the following situation:
>>>>>
>>>>> - we have some input data (volid+snapname)
>>>>> - we have a key derived from the input data (block node name)
>>>>> - we have a value (block node)
>>>>> - we need to be be able to map back the block node (name) to the input data
>>>>
>>>> Oh, we need to map back too? But that can be done via filename in the
>>>> block node, or not?
>>>
>>> but that filename is the result of PVE::Storage::path which is not stable, so we can't compare that?
>>>
>>> for snapshot operations, we need to find out "which block node is the one for the snapshot volume". we can't rely on the filename in the block graph for that, because how we map from volid+snapname to that filename might have changed on our end since that bock node was set up. so we need to find a way to map using (parts of) the block node name, which means
>>> - having a naming scheme that allows to map back from node name to volid+snapname (i.e., the hashing scheme we are discussing ;))
>>> - never re-using a block node for something other than what is encoded in its name (not sure if that possible?)
>>>
>>> while an operation is ongoing, we can have $prefix-$hash-1 and $prefix-$hash-2 at the same time, and if we then end up with having just $prefix-$hash-2 after the operation that doesn't matter since we can reliably map that back via the $hash to volid+snapname.
>>
>> How would you map back from the hash? Wouldn't that require computing
>> the hashes for all known volid+snapname and comparing which one matches?
>> Or do you mean having a lookup-table, i.e. Perl hash keeping track of
>> the hash => volid+snapname mappings?
>
> if you are looking for the block node currently referencing volid A in snapshot X, you calculate the hash for them, and then look at the list of block nodes which should only contain one block node named using that hash. "mapping back" is a bit of a misnomer I guess ;) we just want to know which node corresponds to a *known* input pair, not get the input pair from the node alone.
Yes, I was confused by "mapping back". Thanks for the clarification!
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 10:15 ` Fabian Grünbichler
2025-01-15 10:46 ` Fiona Ebner
@ 2025-01-15 13:01 ` DERUMIER, Alexandre via pve-devel
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-15 13:01 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 16850 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Wed, 15 Jan 2025 13:01:49 +0000
Message-ID: <b26a17ede44789e84c5fc28b05b61dbc404c0fcd.camel@groupe-cyllene.com>
> Fiona Ebner <f.ebner@proxmox.com> hat am 15.01.2025 11:06 CET
> geschrieben:
>
>
> Am 15.01.25 um 10:51 schrieb Fabian Grünbichler:
> >
> > basically what we have is the following situation:
> >
> > - we have some input data (volid+snapname)
> > - we have a key derived from the input data (block node name)
> > - we have a value (block node)
> > - we need to be be able to map back the block node (name) to the
> > input data
>
> Oh, we need to map back too? But that can be done via filename in the
> block node, or not?
>>but that filename is the result of PVE::Storage::path which is not
>>stable, so we can't compare that?
>>for snapshot operations, we need to find out "which block node is the
>>one for the snapshot volume". we can't rely on the filename in the
>>block graph for that, because how we map from volid+snapname to that
>>filename might have changed on our end since that bock node was set
>>up.
The "filename" attribute never change in an existing file-nodename.
(It's impossible to update this attribute).
but we can replace/reopen the file-nodename by another file-nodename as
child of the fmt-nodename.
and to respond to my previous mail, for live rename, I have done tests,
I can reopen the throttle-filter with a new fmt+file nodes couple with
the new filename.
so a hash of (fmt|file)-hash(volid+snapname) should be enough.
workflow for snapshot create:
1)
throttlefilter(drive-scsi0---->fmt-(hash(local:vm-disk-0)--->file
(hash(local:vm-disk-0)---->filename=/path/to/vm-disk-0.qcow2
take a snap1
2) a) create an hardlink && add a new fmt|file couple
throttlefilter(drive-scsi0---->fmt-(hash(local:vm-disk-0)--->file-
(hash(local:vm-disk-0)---->filename=/path/to/vm-disk-0.qcow2
fmt-(hash(local:vm-disk-0-snap1)--->file (hash(local:vm-disk-0-snap1)--
-->filename=/path/to/vm-disk-0-snap1.qcow2
b) swap the fmt node with blockdev-reopen && delete old fmt node
throttlefilter(drive-scsi0---->fmt-(hash(local:vm-disk-0-snap1)---
>file- (hash(local:vm-disk-0-snap1)---->filename=/path/to/vm-disk-0-
snap1.qcow2
c) create a new current fmt-node with snap1 as backing fmt-node with
blockdev-snapshot
throttlefilter(drive-scsi0---->fmt-(hash(local:vm-disk-0)--->file-
(hash(local:vm-disk-0)---->filename=/path/to/vm-disk-0.qcow2
|
|-> fmt-(hash(local:vm-disk-0-snap1)---
>file-(hash(local:vm-disk-0-snap1)---->filename=/path/to/vm-disk-0-
snap1.qcow2
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-15 10:06 ` Fiona Ebner
2025-01-15 10:15 ` Fabian Grünbichler
@ 2025-01-16 14:56 ` DERUMIER, Alexandre via pve-devel
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-16 14:56 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14389 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Thu, 16 Jan 2025 14:56:46 +0000
Message-ID: <1ea9753697e70a61f53f5cae919a6237ed60a0d0.camel@groupe-cyllene.com>
>>Yes, we don't need much to get enough collision-resistance. Just
>>wanted
>>to make sure and check it explicitly.
I have done some test with sha1, with base62 encode ('0..9', 'A..Z',
'a..z)
the node-name require to start with an alpha character prefix
encodebase62(sha1("$volid-$snapid") : 5zU4nVxN7gIUWMaskKc4y6EawWu
28characters
so we have space to prefix like for example:
f-5zU4nVxN7gIUWMaskKc4y6EawWu for fmt-node
e-5zU4nVxN7gIUWMaskKc4y6EawWu for file-node
sub encode_base62 {
my ($input) = @_;
my @chars = ('0'..'9', 'A'..'Z', 'a'..'z');
my $base = 62;
my $value = 0;
foreach my $byte (unpack('C*', $input)) {
$value = $value * 256 + $byte;
}
my $result = '';
while ($value > 0) {
$result = $chars[$value % $base] . $result;
$value = int($value / $base);
}
return $result || '0';
}
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
2025-01-13 13:31 ` Fabian Grünbichler
@ 2025-01-20 13:37 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-20 13:37 UTC (permalink / raw)
To: pve-devel; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14162 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror
Date: Mon, 20 Jan 2025 13:37:20 +0000
Message-ID: <8ae9ce4b14bd90df388e03b6c9b7e2d598f647b5.camel@groupe-cyllene.com>
>>the path referenced in the running VM is stable. the path you are
>>looking for in the graph is not. e.g., the path might be something
>>some storage software returns. or udev. or .. and that can change
>>with any software upgrade or not be 100% deterministic in the first
>>place.
>>
>>let's say you start the VM today and the path returned by the RBD
>>storage plugin is /dev/rbd/XYZ, so that is how the blockdev is
>>opened/the path is recorded. in two weeks, ceph gets updated and now
>>the udev rule or the storage plugin code changes to return the more
>>deterministic /dev/rbd/POOL/XYZ. now the paths don't match anymore.
>>(this is just an example where such a thing happened in practice
>>already ;)).
Yes, got it, thanks!
I have begin to work on the v4 with all your comments, I think it
should work with hash of volid, now that I have fixed the live
renaming. (that was really the main blocker)
I'll try to send a v4 before the fosdem
Thanks for the review and your time !
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
@ 2025-01-20 13:44 ` DERUMIER, Alexandre via pve-devel
[not found] ` <3307ec388a763510ec78f97ed9f0de00c87d54b5.camel@groupe-cyllene.com>
1 sibling, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-20 13:44 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 13709 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 20 Jan 2025 13:44:16 +0000
Message-ID: <3307ec388a763510ec78f97ed9f0de00c87d54b5.camel@groupe-cyllene.com>
Hi Fabio !
>>
>>In this implementation I don't see the possibility of using them on
>>raw
>>disks (on files) from a fast look, or am I wrong? If so, why? I think
>>the main use would be in cases like that where you don't have
>>snapshot
><support by default
Ah, we have discussed about it with Fabian. the V1/v2 had raw support,
but I have removed it from v3 because it's simplify a lot the code,
and from my test, I don't see anymore too much difference between qcow2
&& raw. (maybe 10% diff max)
Note that you can preallocated 100% the base qcow2 image if you want.
(or only metadatas, that's the defaut)
I'm going to do more extensive benchmark, but qcow2 have improved a lot
since last years (with sub-allocation clusters), so it should be too
far from a cow filesystem like zfs,btrfs. (sure, they are still
overhead vs a simple raw).
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
[not found] ` <3307ec388a763510ec78f97ed9f0de00c87d54b5.camel@groupe-cyllene.com>
@ 2025-01-20 14:29 ` Fabio Fantoni via pve-devel
[not found] ` <6bdfe757-ae04-42e1-b197-c9ddb873e353@m2r.biz>
1 sibling, 0 replies; 68+ messages in thread
From: Fabio Fantoni via pve-devel @ 2025-01-20 14:29 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler; +Cc: Fabio Fantoni
[-- Attachment #1: Type: message/rfc822, Size: 9466 bytes --]
From: Fabio Fantoni <fabio.fantoni@m2r.biz>
To: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>, "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 20 Jan 2025 15:29:39 +0100
Message-ID: <6bdfe757-ae04-42e1-b197-c9ddb873e353@m2r.biz>
Il 20/01/2025 14:44, DERUMIER, Alexandre ha scritto:
> Hi Fabio !
>
>>> In this implementation I don't see the possibility of using them on
>>> raw
>>> disks (on files) from a fast look, or am I wrong? If so, why? I think
>>> the main use would be in cases like that where you don't have
>>> snapshot
>> <support by default
> Ah, we have discussed about it with Fabian. the V1/v2 had raw support,
> but I have removed it from v3 because it's simplify a lot the code,
> and from my test, I don't see anymore too much difference between qcow2
> && raw. (maybe 10% diff max)
>
> Note that you can preallocated 100% the base qcow2 image if you want.
> (or only metadatas, that's the defaut)
>
>
> I'm going to do more extensive benchmark, but qcow2 have improved a lot
> since last years (with sub-allocation clusters), so it should be too
> far from a cow filesystem like zfs,btrfs. (sure, they are still
> overhead vs a simple raw).
>
>
>
Thanks for your reply, I don't remember exactly when my tests date back
(for external snapshot), it's been several years now, and I had done
them on the default versions of Debian 10, I used, and still use on most
servers, with libvirt.We had all the servers with enterprise hdd in
raid1 or raid10 and the difference in performance between pre-allocated
raw and qcow2, even just used with external snapshots was clearly
visible (also without benchmark). I haven't had enough time for a few
years to keep myself well informed about the virtualization part and do
significant testing, I've recently started investing a lot of time in
learning and testing again with proxmox, even though I didn't have many
other things to manage.
in recent years we use only or almost only ssd and nvme disks (on new or
improved servers) and it is not necessary to have them pre-allocated,
the latter are basically not pre-allocated and the proxmox ones (latest)
with lvm-thin, however we are upgrading some servers (and we will do
more) to proxmox keeping the existing storage on pre-allocated raw files
and it would be useful for example to have snapshot support also in
these cases, unfortunately since it is not based on libvirt I cannot do
it even manually (in a fairly fast and safe way that I had tested and
documented) from the command line and I do not know if it would be
possible with some workarounds and more operations to force it even if
not implemented in proxmox.
I use qcow2 by default on my PCs (with libvirt though) for some test vms
where I use snapshots quite a bit, but internal and from virt-manager,
using on ssd/nvme and usually a single active vm I don't notice any
particular performance problems, in truth I've never done any tests
regarding performance on my workstations in this regard
for this implementation instead I think it is important to measure and
document in order to give enough information to users to make good
choices (and seems you want do it)
out of curiosity, besides the obvious cases where external snapshots
would be useful, i.e. on raw files and on lvm (not lvm-thin), what other
cases would be useful given that they already have snapshot support
(qcow2 with the internal one, zfs, btrfs, lvm-thin etc)?
not having it on raw files I think would take away a lot of its
usefulness, but I could be wrong if there are uses I can't imagine,
maybe make something more possible with replicas and migrations? (I
haven't looked into it enough)
--
Questa email è stata esaminata alla ricerca di virus dal software antivirus Avast.
www.avast.com
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
[not found] ` <6bdfe757-ae04-42e1-b197-c9ddb873e353@m2r.biz>
@ 2025-01-20 14:41 ` DERUMIER, Alexandre via pve-devel
0 siblings, 0 replies; 68+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-20 14:41 UTC (permalink / raw)
To: pve-devel, fabio.fantoni, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14063 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "fabio.fantoni@m2r.biz" <fabio.fantoni@m2r.biz>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 20 Jan 2025 14:41:12 +0000
Message-ID: <33ac589275d2b53e3bdbd07ecd132b8cf5d8270f.camel@groupe-cyllene.com>
>>out of curiosity, besides the obvious cases where external snapshots
>>would be useful, i.e. on raw files and on lvm (not lvm-thin), what
>>other
>>cases would be useful given that they already have snapshot support
>>(qcow2 with the internal one, zfs, btrfs, lvm-thin etc)?
>>not having it on raw files I think would take away a lot of its
>>usefulness, but I could be wrong if there are uses I can't imagine,
>>maybe make something more possible with replicas and migrations? (I
>>haven't looked into it enough)
I think they could be use for remote replication with snapshot
export/import if storage don't support it (lvm-thin for example).
I'm planning to add pseudo-thin provisoniong too on lvm shared block.
(lvm smaller than the qcow2 format, with dynamic lvm size increase
through a proxmox special daemon )
Note that , for .raw support, it's still possible to add support later,
if user really want it.
Just keep it simple for a first version then add more feature over time
:)
Also, the target is really network storage (qcow2 on nfs/cifs , lvm +
qcow2 on SAN block), that's why I think that network latency also hide
the perf difference between qcow2 && raw.
For local storage, with super fast nvme, it don't make too much sense
to use qcow2. you can use zfs, lvm-thin,.. where internal snasphots are
working great :)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 68+ messages in thread
end of thread, other threads:[~2025-01-20 14:41 UTC | newest]
Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-01-08 13:27 ` Fabian Grünbichler
2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
2025-01-10 9:15 ` Fiona Ebner
2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
2025-01-13 14:28 ` Fiona Ebner
2025-01-14 10:10 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-01-08 14:17 ` Fabian Grünbichler
2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 12:36 ` Fabian Grünbichler
2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
2025-01-10 11:02 ` Fabian Grünbichler
2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
2025-01-10 12:20 ` Fabian Grünbichler
2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-01-09 13:55 ` Fabian Grünbichler
2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-01-08 14:26 ` Fabian Grünbichler
2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-01-08 14:31 ` Fabian Grünbichler
2025-01-13 7:56 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-01-08 14:34 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-01-08 15:19 ` Fabian Grünbichler
2025-01-13 8:27 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0d0d4c4d73110cf0e692cae0ee65bf7f9a6ce93a.camel@groupe-cyllene.com>
2025-01-13 9:52 ` Fabian Grünbichler
2025-01-13 9:55 ` Fabian Grünbichler
2025-01-13 10:47 ` DERUMIER, Alexandre via pve-devel
2025-01-13 13:42 ` Fiona Ebner
2025-01-14 10:03 ` DERUMIER, Alexandre via pve-devel
[not found] ` <fa38efbd95b57ba57a5628d6acfcda9d5875fa82.camel@groupe-cyllene.com>
2025-01-15 9:39 ` Fiona Ebner
2025-01-15 9:51 ` Fabian Grünbichler
2025-01-15 10:06 ` Fiona Ebner
2025-01-15 10:15 ` Fabian Grünbichler
2025-01-15 10:46 ` Fiona Ebner
2025-01-15 10:50 ` Fabian Grünbichler
2025-01-15 11:01 ` Fiona Ebner
2025-01-15 13:01 ` DERUMIER, Alexandre via pve-devel
2025-01-16 14:56 ` DERUMIER, Alexandre via pve-devel
2025-01-15 10:15 ` DERUMIER, Alexandre via pve-devel
[not found] ` <c1559499319052d6cf10900efd5376c12389a60f.camel@groupe-cyllene.com>
2025-01-13 13:31 ` Fabian Grünbichler
2025-01-20 13:37 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-01-09 9:51 ` Fabian Grünbichler
2025-01-13 8:38 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2025-01-13 8:53 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
2025-01-20 13:44 ` DERUMIER, Alexandre via pve-devel
[not found] ` <3307ec388a763510ec78f97ed9f0de00c87d54b5.camel@groupe-cyllene.com>
2025-01-20 14:29 ` Fabio Fantoni via pve-devel
[not found] ` <6bdfe757-ae04-42e1-b197-c9ddb873e353@m2r.biz>
2025-01-20 14:41 ` DERUMIER, Alexandre via pve-devel
2025-01-13 10:08 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
2025-01-13 13:27 ` Fabian Grünbichler
2025-01-13 18:07 ` DERUMIER, Alexandre via pve-devel
2025-01-13 18:58 ` DERUMIER, Alexandre via pve-devel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox