public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support
Date: Thu, 9 Jan 2025 13:36:38 +0100 (CET)	[thread overview]
Message-ID: <249669964.942.1736426198150@webmail.proxmox.com> (raw)
In-Reply-To: <mailman.215.1734340398.332.pve-devel@lists.proxmox.com>


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  src/PVE/Storage/DirPlugin.pm |   1 +
>  src/PVE/Storage/Plugin.pm    | 207 +++++++++++++++++++++++++++++------
>  2 files changed, 176 insertions(+), 32 deletions(-)
> 
> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
> index fb23e0a..1cd7ac3 100644
> --- a/src/PVE/Storage/DirPlugin.pm
> +++ b/src/PVE/Storage/DirPlugin.pm
> @@ -81,6 +81,7 @@ sub options {
>  	is_mountpoint => { optional => 1 },
>  	bwlimit => { optional => 1 },
>  	preallocation => { optional => 1 },
> +	snapext => { optional => 1 },
>     };
>  }
>  
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index fececa1..aeba8d3 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -214,6 +214,11 @@ my $defaultData = {
>  	    maximum => 65535,
>  	    optional => 1,
>  	},
> +        'snapext' => {
> +	    type => 'boolean',
> +	    description => 'enable external snapshot.',
> +	    optional => 1,
> +        },
>      },
>  };
>  
> @@ -710,11 +715,15 @@ sub filesystem_path {
>      # Note: qcow2/qed has internal snapshot, so path is always
>      # the same (with or without snapshot => same file).
>      die "can't snapshot this image format\n"
> -	if defined($snapname) && $format !~ m/^(qcow2|qed)$/;
> +	if defined($snapname) && !$scfg->{snapext} && $format !~ m/^(qcow2|qed)$/;

I am not sure if we want to allow snapshots for non-qcow2 files just because snapext is enabled? I know it's technically possible to have a raw base image and then a qcow2 backing chain on top, but this quickly becomes confusing (how is the volume named then? which format does it have in which context)..

>  
>      my $dir = $class->get_subdir($scfg, $vtype);
>  
> -    $dir .= "/$vmid" if $vtype eq 'images';
> +    if ($scfg->{snapext} && $snapname) {
> +	$name = $class->get_snap_volname($volname, $snapname);
> +    } else {
> +	$dir .= "/$vmid" if $vtype eq 'images';
> +    }
>  
>      my $path = "$dir/$name";
>  
> @@ -953,6 +962,31 @@ sub free_image {
>  # TODO taken from PVE/QemuServer/Drive.pm, avoiding duplication would be nice
>  my @checked_qemu_img_formats = qw(raw cow qcow qcow2 qed vmdk cloop);
>  
> +sub qemu_img_info {
> +    my ($filename, $file_format, $timeout, $follow_backing_files) = @_;
> +
> +    my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> +    push $cmd->@*, '-f', $file_format if $file_format;
> +    push $cmd->@*, '--backing-chain' if $follow_backing_files;
> +
> +    my $json = '';
> +    my $err_output = '';
> +    eval {
> +        run_command($cmd,
> +            timeout => $timeout,
> +            outfunc => sub { $json .= shift },
> +            errfunc => sub { $err_output .= shift . "\n"},
> +        );
> +    };
> +    warn $@ if $@;
> +    if ($err_output) {
> +        # if qemu did not output anything to stdout we die with stderr as an error
> +        die $err_output if !$json;
> +        # otherwise we warn about it and try to parse the json
> +        warn $err_output;
> +    }
> +    return $json;
> +}
>  # set $untrusted if the file in question might be malicious since it isn't
>  # created by our stack
>  # this makes certain checks fatal, and adds extra checks for known problems like
> @@ -1016,25 +1050,9 @@ sub file_size_info {
>  	warn "file_size_info: '$filename': falling back to 'raw' from unknown format '$file_format'\n";
>  	$file_format = 'raw';
>      }
> -    my $cmd = ['/usr/bin/qemu-img', 'info', '--output=json', $filename];
> -    push $cmd->@*, '-f', $file_format if $file_format;
>  
> -    my $json = '';
> -    my $err_output = '';
> -    eval {
> -	run_command($cmd,
> -	    timeout => $timeout,
> -	    outfunc => sub { $json .= shift },
> -	    errfunc => sub { $err_output .= shift . "\n"},
> -	);
> -    };
> -    warn $@ if $@;
> -    if ($err_output) {
> -	# if qemu did not output anything to stdout we die with stderr as an error
> -	die $err_output if !$json;
> -	# otherwise we warn about it and try to parse the json
> -	warn $err_output;
> -    }
> +    my $json = qemu_img_info($filename, $file_format, $timeout);
> +
>      if (!$json) {
>  	die "failed to query file information with qemu-img\n" if $untrusted;
>  	# skip decoding if there was no output, e.g. if there was a timeout.
> @@ -1162,11 +1180,28 @@ sub volume_snapshot {
>  
>      die "can't snapshot this image format\n" if $volname !~ m/\.(qcow2|qed)$/;
>  
> -    my $path = $class->filesystem_path($scfg, $volname);
> +    if($scfg->{snapext}) {
>  
> -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> +	my $path = $class->path($scfg, $volname, $storeid);
> +	my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> +	my $format = ($class->parse_volname($volname))[6];
> +	#rename current volume to snap volume
> +	rename($path, $snappath) if -e $path && !-e $snappath;

I think this should die if the snappath already exists, and the one (IMHO wrong) call in qemu-server should switch to vdisk_alloc/alloc_image.. this is rather dangerous otherwise!

> +	my $cmd = ['/usr/bin/qemu-img', 'create', '-b', $snappath,
> +		   '-F', $format, '-f', 'qcow2', $path];
> +
> +	my $options = "extended_l2=on,cluster_size=128k,";
> +	$options .= preallocation_cmd_option($scfg, 'qcow2');
> +	push @$cmd, '-o', $options;
> +	run_command($cmd);
>  
> -    run_command($cmd);
> +    } else {
> +
> +	my $path = $class->filesystem_path($scfg, $volname);
> +	my $cmd = ['/usr/bin/qemu-img', 'snapshot','-c', $snap, $path];
> +	run_command($cmd);
> +    }
>  
>      return undef;
>  }
> @@ -1177,6 +1212,21 @@ sub volume_snapshot {
>  sub volume_rollback_is_possible {
>      my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
>  
> +    if ($scfg->{snapext}) {
> +	#technically, we could manage multibranch, we it need lot more work for snapshot delete
> +	#we need to implemente block-stream from deleted snapshot to all others child branchs

see my comments in qemu-server - I think we actually want block-stream anyway, since it has the semantics we want..

> +	#when online, we need to do a transaction for multiple disk when delete the last snapshot
> +	#and need to merge in current running file
> +
> +	my $snappath = $class->path($scfg, $volname, $storeid, $snap);
> +	my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +	my $parentsnap = $snapshots->{current}->{parent};
> +
> +	return 1 if !-e $snappath || $snapshots->{$parentsnap}->{file} eq $snappath;

why do we return 1 here if the snapshot doesn't exist? if we only allow rollback to the most recent snapshot for now, then we could just query the current path and see if it is backed by our snapshot?

> +
> +	die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> +    }
> +
>      return 1;
>  }
>  
> @@ -1187,9 +1237,15 @@ sub volume_snapshot_rollback {
>  
>      my $path = $class->filesystem_path($scfg, $volname);
>  
> -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> -
> -    run_command($cmd);
> +    if ($scfg->{snapext}) {
> +	#simply delete the current snapshot and recreate it
> +	my $path = $class->filesystem_path($scfg, $volname);
> +	unlink($path);
> +	$class->volume_snapshot($scfg, $storeid, $volname, $snap);
> +    } else {
> +	my $cmd = ['/usr/bin/qemu-img', 'snapshot','-a', $snap, $path];
> +	run_command($cmd);
> +    }
>  
>      return undef;
>  }
> @@ -1201,13 +1257,52 @@ sub volume_snapshot_delete {
>  
>      return 1 if $running;
>  
> +    my $cmd = "";
>      my $path = $class->filesystem_path($scfg, $volname);
>  
> -    $class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
> +    if ($scfg->{snapext}) {
>  
> -    my $cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> +	my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +	my $snappath = $snapshots->{$snap}->{file};
> +	return if !-e $snappath;  #already deleted ?

shouldn't this be an error?

> +
> +	my $parentsnap = $snapshots->{$snap}->{parent};
> +	my $childsnap = $snapshots->{$snap}->{child};
> +
> +	my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> +	my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
> +
> +
> +	#if first snapshot, we merge child, and rename the snapshot to child
> +	if(!$parentsnap) {
> +	    #we use commit here, as it's faster than rebase
> +	    #https://lists.gnu.org/archive/html/qemu-discuss/2019-08/msg00041.html
> +	    print"commit $childpath\n";
> +	    $cmd = ['/usr/bin/qemu-img', 'commit', $childpath];
> +	    run_command($cmd);
> +	    print"delete $childpath\n";
> +
> +	    unlink($childpath);

this unlink can be skipped?

> +	    print"rename $snappath to $childpath\n";
> +	    rename($snappath, $childpath);

since this will overwrite $childpath anyway.. this also reduces the chance of something going wrong:

- if the commit fails halfway through, nothing bad should have happened, other than some data is now stored in two snapshots and takes up extra space
- if the rename fails, then all of the data of $snap is stored twice, but the backing chain is still valid

notable, there is no longer a gap where $childpath doesn't exist, which would break the backing chain!

> +	} else {
> +	    print"commit $snappath\n";
> +	    $cmd = ['/usr/bin/qemu-img', 'commit', $snappath];

leftover from previous version? not used/overwritten below ;)

> +	    #if we delete an intermediate snapshot, we need to link upper snapshot to base snapshot
> +	    die "missing parentsnap snapshot to rebase child $childpath\n" if !$parentpath;
> +	    print "link $childsnap to $parentsnap\n";
> +	    $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b', $parentpath, '-F', 'qcow2', '-f', 'qcow2', $childpath];

does this work? I would read the qemu-img manpage to say that '-u' is for when you've moved/converted the backing file, and want to update the reference in its overlay, and that it doesn't copy any data.. but we need to copy the data from $snap to $childpath (we just want to delete the snapshot, we don't want to drop all its changes from the history, that would corrupt the contents of the image).

note the description of the "safe" variant:

"                     This  is  the  default mode and performs a real rebase operation. The new backing file may differ from the old one and qemu-img rebase will take care of keeping the
                     guest-visible content of FILENAME unchanged."

IMHO this is the behaviour we need here?

> +	    run_command($cmd);
> +	    #delete the snapshot
> +	    unlink($snappath);
> +	}
> +
> +    } else {
> +	$class->deactivate_volume($storeid, $scfg, $volname, $snap, {});
>  
> -    run_command($cmd);
> +	$cmd = ['/usr/bin/qemu-img', 'snapshot','-d', $snap, $path];
> +	run_command($cmd);
> +    }
>  
>      return undef;
>  }
> @@ -1246,8 +1341,8 @@ sub volume_has_feature {
>  	    current => { qcow2 => 1, raw => 1, vmdk => 1 },
>  	},
>  	rename => {
> -	    current => {qcow2 => 1, raw => 1, vmdk => 1},
> -	},
> +	    current => { qcow2 => 1, raw => 1, vmdk => 1},
> +	}

nit: unrelated change?

>      };
>  
>      if ($feature eq 'clone') {
> @@ -1481,7 +1576,37 @@ sub status {
>  sub volume_snapshot_info {
>      my ($class, $scfg, $storeid, $volname) = @_;
>  
> -    die "volume_snapshot_info is not implemented for $class";

should this be guarded with $snapext being enabled?

> +    my $path = $class->filesystem_path($scfg, $volname);
> +
> +    my $backing_chain = 1;
> +    my $json = qemu_img_info($path, undef, 10, $backing_chain);
> +    die "failed to query file information with qemu-img\n" if !$json;
> +    my $snapshots = eval { decode_json($json) };
> +
> +    my $info = {};
> +    my $order = 0;
> +    for my $snap (@$snapshots) {
> +
> +	my $snapfile = $snap->{filename};
> +	my $snapname = parse_snapname($snapfile);
> +	$snapname = 'current' if !$snapname;
> +	my $snapvolname = $class->get_snap_volname($volname, $snapname);
> +
> +	$info->{$snapname}->{order} = $order;
> +	$info->{$snapname}->{file}= $snapfile;
> +	$info->{$snapname}->{volname} = $snapvolname;
> +	$info->{$snapname}->{volid} = "$storeid:$snapvolname";
> +	$info->{$snapname}->{ext} = 1;
> +
> +	my $parentfile = $snap->{'backing-filename'};
> +	if ($parentfile) {
> +	    my $parentname = parse_snapname($parentfile);
> +	    $info->{$snapname}->{parent} = $parentname;
> +	    $info->{$parentname}->{child} = $snapname;
> +	}
> +	$order++;
> +    }
> +    return $info;
>  }
>  
>  sub activate_storage {
> @@ -1867,4 +1992,22 @@ sub config_aware_base_mkdir {
>      }
>  }
>  
> +sub get_snap_volname {
> +    my ($class, $volname, $snapname) = @_;
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> +    $name = !$snapname || $snapname eq 'current' ? $volname : "$vmid/snap-$snapname-$name";
> +    return $name;
> +}
> +
> +sub parse_snapname {
> +    my ($name) = @_;
> +
> +    my $basename = basename($name);
> +    if ($basename =~ m/^snap-(.*)-vm(.*)$/) {
> +	return $1;
> +    }
> +    return undef;
> +}
> +
>  1;
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2025-01-09 12:36 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16  9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-01-08 13:27   ` Fabian Grünbichler
2025-01-10  7:55     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
2025-01-10  9:15       ` Fiona Ebner
2025-01-10  9:32         ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-01-08 14:17   ` Fabian Grünbichler
2025-01-10 13:50     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 12:36   ` Fabian Grünbichler [this message]
2025-01-10  9:10     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
2025-01-10 11:02       ` Fabian Grünbichler
2025-01-10 11:51         ` DERUMIER, Alexandre via pve-devel
     [not found]         ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
2025-01-10 12:20           ` Fabian Grünbichler
2025-01-10 13:14             ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-01-09 13:55   ` Fabian Grünbichler
2025-01-10 10:16     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-01-08 14:26   ` Fabian Grünbichler
2025-01-10 14:08     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-01-08 14:31   ` Fabian Grünbichler
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-01-08 14:34   ` Fabian Grünbichler
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-01-08 15:19   ` Fabian Grünbichler
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-01-09  9:51   ` Fabian Grünbichler
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-01-09 11:57   ` Fabian Grünbichler
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 11:57   ` Fabian Grünbichler
2025-01-09 13:19     ` Fabio Fantoni via pve-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=249669964.942.1736426198150@webmail.proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal