From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support
Date: Tue, 1 Apr 2025 15:50:37 +0200 (CEST) [thread overview]
Message-ID: <1614620193.3974.1743515437162@webmail.proxmox.com> (raw)
In-Reply-To: <mailman.943.1741688960.293.pve-devel@lists.proxmox.com>
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:
some sort of description here would be great ;)
> ---
> src/PVE/Storage.pm | 4 +-
> src/PVE/Storage/DirPlugin.pm | 1 +
> src/PVE/Storage/Plugin.pm | 232 +++++++++++++++++++++++++++++------
> 3 files changed, 196 insertions(+), 41 deletions(-)
>
> diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
> index 3b4f041..79e5c3a 100755
> --- a/src/PVE/Storage.pm
> +++ b/src/PVE/Storage.pm
> @@ -1002,7 +1002,7 @@ sub unmap_volume {
> }
>
> sub vdisk_alloc {
> - my ($cfg, $storeid, $vmid, $fmt, $name, $size) = @_;
> + my ($cfg, $storeid, $vmid, $fmt, $name, $size, $backing) = @_;
>
> die "no storage ID specified\n" if !$storeid;
>
> @@ -1025,7 +1025,7 @@ sub vdisk_alloc {
> # lock shared storage
> return $plugin->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
> my $old_umask = umask(umask|0037);
> - my $volname = eval { $plugin->alloc_image($storeid, $scfg, $vmid, $fmt, $name, $size) };
> + my $volname = eval { $plugin->alloc_image($storeid, $scfg, $vmid, $fmt, $name, $size, $backing) };
> my $err = $@;
> umask $old_umask;
> die $err if $err;
> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
> index fb23e0a..1cd7ac3 100644
> --- a/src/PVE/Storage/DirPlugin.pm
> +++ b/src/PVE/Storage/DirPlugin.pm
> @@ -81,6 +81,7 @@ sub options {
> is_mountpoint => { optional => 1 },
> bwlimit => { optional => 1 },
> preallocation => { optional => 1 },
> + snapext => { optional => 1 },
> };
> }
>
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index 65cf43f..d7f485f 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -216,6 +216,11 @@ my $defaultData = {
> maximum => 65535,
> optional => 1,
> },
> + 'snapext' => {
> + type => 'boolean',
> + description => 'enable external snapshot.',
> + optional => 1,
> + },
> },
> };
>
> @@ -716,7 +721,11 @@ sub filesystem_path {
>
> my $dir = $class->get_subdir($scfg, $vtype);
>
> - $dir .= "/$vmid" if $vtype eq 'images';
> + if ($scfg->{snapext} && $snapname) {
> + $name = $class->get_snap_volname($volname, $snapname);
> + } else {
> + $dir .= "/$vmid" if $vtype eq 'images';
> + }
this is a bit weird, as it mixes volnames (with the `$vmid/` prefix) and names (without), it's only called twice in this patch, and this here already has $volname parsed, so could we maybe let get_snap_volname take and return the $name part without the dir?
>
> my $path = "$dir/$name";
>
> @@ -873,7 +882,7 @@ sub clone_image {
> }
>
> sub alloc_image {
> - my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> + my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;
this extends the storage API, so it should actually do that.. and probably $backing should not be an arbitrary path, but something that is resolved locally?
>
> my $imagedir = $class->get_subdir($scfg, 'images');
> $imagedir .= "/$vmid";
> @@ -901,17 +910,11 @@ sub alloc_image {
> umask $old_umask;
> die $err if $err;
> } else {
> - my $cmd = ['/usr/bin/qemu-img', 'create'];
> -
> - my $prealloc_opt = preallocation_cmd_option($scfg, $fmt);
> - push @$cmd, '-o', $prealloc_opt if defined($prealloc_opt);
>
> - push @$cmd, '-f', $fmt, $path, "${size}K";
> -
> - eval { run_command($cmd, errmsg => "unable to create image"); };
> + eval { qemu_img_create($scfg, $fmt, $size, $path, $backing) };
> if ($@) {
> unlink $path;
> - rmdir $imagedir;
> + rmdir $imagedir if !$backing;
don't think this is needed, rmdir will fail if the dir isn't empty anyway..
> die "$@";
> }
> }
> @@ -955,6 +958,50 @@ sub free_image {
> # TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
> my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
>
> +sub qemu_img_create {
> + my ($scfg, $fmt, $size, $path, $backing) = @_;
> +
> + my $cmd = ['/usr/bin/qemu-img', 'create'];
> +
> + my $options = [];
> +
> + if($backing) {
> + push @$cmd, '-b', $backing, '-F', 'qcow2';
> + push @$options, 'extended_l2=on','cluster_size=128k';
> + };
> + push @$options, preallocation_cmd_option($scfg, $fmt);
> + push @$cmd, '-o', join(',', @$options) if @$options > 0;
> + push @$cmd, '-f', $fmt, $path;
> + push @$cmd, "${size}K" if !$backing;
is this because it will automatically take the size from the backing image?
> +
> + run_command($cmd, errmsg => "unable to create image");
> +}
> +
> +sub qemu_img_info {
> + my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
> +
> + my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> + push $cmd->@*, '-f', $file_format if $file_format;
> + push $cmd->@*, '--backing-chain' if $follow_backing_files;
> +
> + my $json = '';
> + my $err_output = '';
> + eval {
> + run_command($cmd,
> + timeout => $timeout,
> + outfunc => sub { $json .= shift },
> + errfunc => sub { $err_output .= shift . "\n"},
> + );
> + };
> + warn $@ if $@;
> + if ($err_output) {
> + # if qemu did not output anything to stdout we die with stderr as an error
> + die $err_output if !$json;
> + # otherwise we warn about it and try to parse the json
> + warn $err_output;
> + }
> + return $json;
> +}
> # set $untrusted if the file in question might be malicious since it isn't
> # created by our stack
> # this makes certain checks fatal, and adds extra checks for known problems like
> @@ -1018,25 +1065,9 @@ sub file_size_info {
> warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
> $file_format = 'raw';
> }
> - my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> - push $cmd->@*, '-f', $file_format if $file_format;
>
> - my $json = '';
> - my $err_output = '';
> - eval {
> - run_command($cmd,
> - timeout => $timeout,
> - outfunc => sub { $json .= shift },
> - errfunc => sub { $err_output .= shift . "\n"},
> - );
> - };
> - warn $@ if $@;
> - if ($err_output) {
> - # if qemu did not output anything to stdout we die with stderr as an error
> - die $err_output if !$json;
> - # otherwise we warn about it and try to parse the json
> - warn $err_output;
> - }
> + my $json = qemu_img_info($filename, $file_format, $timeout);
> +
> if (!$json) {
> die "failed to query file information with qemu-img\n" if $untrusted;
> # skip decoding if there was no output, e.g. if there was a timeout.
> @@ -1162,11 +1193,29 @@ sub volume_snapshot {
>
> die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
>
> - my $path = $class->filesystem_path($scfg, $volname);
> + if($scfg->{snapext}) {
> +
> + my $path = $class->path($scfg, $volname, $storeid);
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + #rename current volume to snap volume
> + die "snapshot volume $snappath already exist\n" if -e $snappath;
> + rename($path, $snappath) if -e $path;
this is still looking weird.. I don't think it makes sense interface wise to allow snapshotting a volume that doesn't even exist..
> +
> + my ($vtype, $name, $vmid, undef, undef, $isBase, $format) =
> + $class->parse_volname($volname);
> +
> + $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $name, undef, $snappath);
> + if ($@) {
> + eval { $class->free_image($storeid, $scfg, $volname, 0) };
> + warn $@ if $@;
missing cleanup - this should undo the rename from above
> + }
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + } else {
>
> - run_command($cmd);
> + my $path = $class->filesystem_path($scfg, $volname);
> + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1177,6 +1226,21 @@ sub volume_snapshot {
> sub volume_rollback_is_possible {
> my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>
> + if ($scfg->{snapext}) {
> + #technically, we could manage multibranch, we it need lot more work for snapshot delete
> + #we need to implemente block-stream from deleted snapshot to all others child branchs
> + #when online, we need to do a transaction for multiple disk when delete the last snapshot
> + #and need to merge in current running file
> +
> + my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $parentsnap = $snapshots->{current}->{parent};
wouldn't it be enough to check that this equals $snap?
> +
> + return 1 if $snapshots->{$parentsnap}->{file} eq $snappath;
> +
> + die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> + }
> +
> return 1;
> }
>
> @@ -1187,9 +1251,15 @@ sub volume_snapshot_rollback {
>
> my $path = $class->filesystem_path($scfg, $volname);
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> -
> - run_command($cmd);
> + if ($scfg->{snapext}) {
> + #simply delete the current snapshot and recreate it
> + my $path = $class->filesystem_path($scfg, $volname);
> + unlink($path);
> + $class->volume_snapshot($scfg, $storeid, $volname, $snap);
instead of volume_snapshot, this could simply call alloc_image with the backing file? then volume_snapshot could always rename and always cleanup properly..
> + } else {
> + my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1201,13 +1271,49 @@ sub volume_snapshot_delete {
>
> return 1 if $running;
>
> + my $cmd = "";
> my $path = $class->filesystem_path($scfg, $volname);
>
> - $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> + if ($scfg->{snapext}) {
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> + my $snappath = $snapshots->{$snap}->{file};
> + die "volume $snappath is missing" if !-e $snappath;
>
> - my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + my $parentsnap = $snapshots->{$snap}->{parent};
> + my $childsnap = $snapshots->{$snap}->{child};
>
> - run_command($cmd);
> + my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> + my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
my $foo = .. if ...;
is forbidden in our code ;) but I think we always need to have a childsnap anyway, right?
so we could simply check for that, and then switch around the two branches below so that one of them can do
if (my $parentsnap = ...) {
...
} else {
...
}
> +
> + #if first snapshot,as it should be bigger, we merge child, and rename the snapshot to child
> + if(!$parentsnap) {
> + print"commit $childpath\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
we could provide `-d` here to skip emptying $childpath since we renamed over it anyway below..
> + eval { run_command($cmd) };
> + if ($@) {
> + die "error commiting $childpath to $parentpath; $@\n";
this is wrong, there is no $parentpath.. we are committing into $snappath
> + }
> + print"rename $snappath to $childpath\n";
> + rename($snappath, $childpath);
what if this fails?
> + } else {
> + #we rebase the child image on the parent as new backing image
should we extend this to make it clear what this means? it means copying any parts of $snap that are not in $parent and not yet overwritten by $child into $child, right?
so how expensive this is depends on:
- how many changes are between $parent and $snap (increases cost)
- how many of those are overwritten by changes between $snap and $child (decreases cost)
> + die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;
how can this happen? if there is a parentsnap there must be a parentpath as well?
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];
> + eval { run_command($cmd) };
> + if ($@) {
> + die "error rebase $childpath from $parentpath; $@\n";
> + }
> + #delete the snapshot
> + unlink($snappath);
> + }
> +
> + } else {
> + $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> +
> + $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> + run_command($cmd);
> + }
>
> return undef;
> }
> @@ -1246,7 +1352,7 @@ sub volume_has_feature {
> current => { qcow2 => 1, raw => 1, vmdk => 1 },
> },
> rename => {
> - current => {qcow2 => 1, raw => 1, vmdk => 1},
> + current => { qcow2 => 1, raw => 1, vmdk => 1},
> },
> };
>
> @@ -1481,7 +1587,37 @@ sub status {
> sub volume_snapshot_info {
> my ($class, $scfg, $storeid, $volname) = @_;
>
> - die "volume_snapshot_info is not implemented for $class";
> + my $path = $class->filesystem_path($scfg, $volname);
> +
> + my $backing_chain = 1;
> + my $json = qemu_img_info($path, undef, 10, $backing_chain);
> + die "failed to query file information with qemu-img\n" if !$json;
> + my $snapshots = eval { decode_json($json) };
missing error handlign for json decoding..
> +
> + my $info = {};
> + my $order = 0;
> + for my $snap (@$snapshots) {
> +
> + my $snapfile = $snap->{filename};
> + my $snapname = parse_snapname($snapfile);
> + $snapname = 'current' if !$snapname;
> + my $snapvolname = $class->get_snap_volname($volname, $snapname);
> +
> + $info->{$snapname}->{order} = $order;
> + $info->{$snapname}->{file}= $snapfile;
> + $info->{$snapname}->{volname} = $snapvolname;
> + $info->{$snapname}->{volid} = "$storeid:$snapvolname";
> + $info->{$snapname}->{ext} = 1;
> +
> + my $parentfile = $snap->{'backing-filename'};
> + if ($parentfile) {
> + my $parentname = parse_snapname($parentfile);
> + $info->{$snapname}->{parent} = $parentname;
> + $info->{$parentname}->{child} = $snapname;
> + }
> + $order++;
> + }
> + return $info;
> }
>
> sub activate_storage {
> @@ -1867,4 +2003,22 @@ sub config_aware_base_mkdir {
> }
> }
>
> +sub get_snap_volname {
> + my ($class, $volname, $snapname) = @_;
> +
> + my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> + $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";
other way round would be better to group by volume first IMHO ($vmid/snap-$name-$snapname), as this is similar to how we encode snapshots often on the storage level (volume@snap). we also need to have some delimiter between snapshot and volume name that is not allowed in either (hard for volname since basically everything but '/' goes, but snapshots have a restricted character set (configid, which means alphanumeric, hyphen and underscore), so we could use something like '.' as delimiter? or we switch to directories and do $vmid/snap/$snap/$name?)
> + return $name;
> +}
> +
> +sub parse_snapname {
> + my ($name) = @_;
> +
> + my $basename = basename($name);
> + if ($basename =~ m/^snap-(.*)-vm(.*)$/) {
this is not strict enough, see above
> + return $1;
> + }
> + return undef;
> +}
> +
> 1;
> --
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-04-01 13:50 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-04-01 13:50 ` Fabian Grünbichler [this message]
2025-04-02 8:01 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0e2cd118f35aa8d4c410d362fea1a1b366df1570.camel@groupe-cyllene.com>
2025-04-02 8:28 ` Fabian Grünbichler
2025-04-03 4:27 ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 02/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-04-01 13:50 ` Fabian Grünbichler
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 03/11] replace qemu_block_set_io_throttle with qom-set throttlegroup limits Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2025-04-01 13:50 ` Fabian Grünbichler
2025-04-07 11:02 ` DERUMIER, Alexandre via pve-devel
2025-04-07 11:29 ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-04-02 8:10 ` Fabian Grünbichler
2025-04-11 17:32 ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path Alexandre Derumier via pve-devel
2025-04-01 13:50 ` Fabian Grünbichler
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper Alexandre Derumier via pve-devel
2025-04-01 13:50 ` Fabian Grünbichler
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 09/11] blockdev: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-04-02 8:10 ` Fabian Grünbichler
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-04-02 8:10 ` Fabian Grünbichler
2025-04-03 4:51 ` DERUMIER, Alexandre via pve-devel
2025-04-04 11:31 ` DERUMIER, Alexandre via pve-devel
[not found] ` <3e516016a970e52e5a1014dbcd6cf9507581da74.camel@groupe-cyllene.com>
2025-04-04 11:37 ` Fabian Grünbichler
2025-04-04 13:02 ` DERUMIER, Alexandre via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1614620193.3974.1743515437162@webmail.proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal