all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot
Date: Tue, 1 Apr 2025 15:50:33 +0200 (CEST)	[thread overview]
Message-ID: <1503093360.3972.1743515433288@webmail.proxmox.com> (raw)
In-Reply-To: <mailman.953.1741688968.293.pve-devel@lists.proxmox.com>


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 11.03.2025 11:28 CET geschrieben:

same here - please add some description about why things are implemented how they are.. else we have to dive through multiple threads of review discussions in 2 years when we want to find out ;)

> Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> ---
>  src/PVE/Storage/LVMPlugin.pm | 228 ++++++++++++++++++++++++++++++++---
>  1 file changed, 210 insertions(+), 18 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 38f7fa1..19dbd7e 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -4,6 +4,7 @@ use strict;
>  use warnings;
>  
>  use IO::File;
> +use POSIX qw/ceil/;
>  
>  use PVE::Tools qw(run_command trim);
>  use PVE::Storage::Plugin;
> @@ -218,6 +219,7 @@ sub type {
>  sub plugindata {
>      return {
>  	content => [ {images => 1, rootdir => 1}, { images => 1 }],
> +	format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
>      };
>  }
>  
> @@ -293,7 +295,10 @@ sub parse_volname {
>      PVE::Storage::Plugin::parse_lvm_name($volname);
>  
>      if ($volname =~ m/^(vm-(\d+)-\S+)$/) {
> -	return ('images', $1, $2, undef, undef, undef, 'raw');
> +	my $name = $1;
> +	my $vmid = $2;
> +	my $format = $volname =~ m/\.qcow2$/ ? 'qcow2' : 'raw';

this is really tricky and I am afraid there are still pitfalls/bugs here unless we add additional checks in places that a requested $format and the one in the name match..

> +	return ('images', $name, $vmid, undef, undef, undef, $format);
>      }
>  
>      die "unable to parse lvm volume name '$volname'\n";
> @@ -302,11 +307,13 @@ sub parse_volname {
>  sub filesystem_path {
>      my ($class, $scfg, $volname, $snapname) = @_;
>  
> -    die "lvm snapshot is not implemented"if defined($snapname);
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
>  
> -    my ($vtype, $name, $vmid) = $class->parse_volname($volname);
> +    die "snapshot is working with qcow2 format only" if defined($snapname) && $format ne 'qcow2';
>  
>      my $vg = $scfg->{vgname};
> +    $name = $class->get_snap_volname($volname, $snapname) if $snapname;
>  
>      my $path = "/dev/$vg/$name";
>  
> @@ -334,7 +341,9 @@ sub find_free_diskname {
>  
>      my $disk_list = [ keys %{$lvs->{$vg}} ];
>  
> -    return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, undef, $scfg);
> +    $add_fmt_suffix = $fmt eq 'qcow2' ? 1 : undef;
> +
> +    return PVE::Storage::Plugin::get_next_vm_diskname($disk_list, $storeid, $vmid, $fmt, $scfg, $add_fmt_suffix);
>  }
>  
>  sub lvcreate {
> @@ -363,9 +372,9 @@ sub lvrename {
>  }
>  
>  sub alloc_image {
> -    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;
> +    my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size, $backing) = @_;

same as for the dir-based one - $backing as arbitrary path is not a good idea.. this should either be a snapshot $volname, or just the $snapname itself?

>  
> -    die "unsupported format '$fmt'" if $fmt ne 'raw';
> +    die "unsupported format '$fmt'" if $fmt !~ m/(raw|qcow2)/;
>  
>      die "illegal name '$name' - should be 'vm-$vmid-*'\n"
>  	if  $name && $name !~ m/^vm-$vmid-/;
> @@ -378,12 +387,36 @@ sub alloc_image {
>  
>      my $free = int($vgs->{$vg}->{free});
>  
> +
> +    #add extra space for qcow2 metadatas
> +    #without sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size
> +    #with sub-allocated clusters : For 1TB storage : l2_size = disk_size × 8 / cluster_size / 16

are these formulas valid for all disk sizes, or just for 1TB?

> +                                   #4MB overhead for 1TB with extented l2 clustersize=128k

so this means 1TB x 8 / 128K / 16 = 1GB / 256 = 4MB

if the formula is generic, that means 1 GB of storage == 4KB of overhead, or 1MB of storage == 4 bytes of overhead?

> +
> +    #can't use qemu-img measure, because it's not possible to define options like clustersize && extended_l2
> +    #verification has been done with : qemu-img create -f qcow2 -o extended_l2=on,cluster_size=128k test.img 1G
> +
> +    my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;

above you say 4MB for 1TB, but here you go down to KB and then multiply by 4K? why not go down to MB and multiply by 4?

> +
> +    my $lvmsize = $size;
> +    $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
> +
>      die "not enough free space ($free < $size)\n" if $free < $size;
>  
> -    $name = $class->find_free_diskname($storeid, $scfg, $vmid)
> +    $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
>  	if !$name;
>  
> -    lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
> +    my $tags = ["pve-vm-$vmid"];
> +    #tags all snapshots volumes with the main volume tag for easier activation of the whole group
> +    push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
> +    lvcreate($vg, $name, $lvmsize, $tags);
> +
> +    if ($fmt eq 'qcow2') {
> +	#format the lvm volume with qcow2 format
> +	$class->activate_volume($storeid, $scfg, $name, undef, {});

the last two parameters are not needed..

> +	my $path = $class->path($scfg, $name, $storeid);
> +	PVE::Storage::Plugin::qemu_img_create($scfg, $fmt, $size, $path, $backing);
> +    }
>  
>      return $name;
>  }
> @@ -538,6 +571,12 @@ sub activate_volume {
>  
>      my $lvm_activate_mode = 'ey';
>  
> +    #activate volume && all snapshots volumes by tag
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    $path = "\@pve-$name" if $format eq 'qcow2';
> +
>      my $cmd = ['/sbin/lvchange', "-a$lvm_activate_mode", $path];
>      run_command($cmd, errmsg => "can't activate LV '$path'");
>      $cmd = ['/sbin/lvchange', '--refresh', $path];
> @@ -550,6 +589,10 @@ sub deactivate_volume {
>      my $path = $class->path($scfg, $volname, $storeid, $snapname);
>      return if ! -b $path;
>  
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +    $path = "\@pve-$name" if $format eq 'qcow2';
> +
>      my $cmd = ['/sbin/lvchange', '-aln', $path];
>      run_command($cmd, errmsg => "can't deactivate LV '$path'");
>  }
> @@ -557,15 +600,27 @@ sub deactivate_volume {
>  sub volume_resize {
>      my ($class, $scfg, $storeid, $volname, $size, $running) = @_;
>  
> -    $size = ($size/1024/1024) . "M";
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +	$class->parse_volname($volname);
> +
> +    my $lvmsize = $size / 1024;

I don't really get this, see comments above for alloc_image

> +    my $qcow2_overhead = ceil($size/1024/1024/1024/1024) * 4096;
> +    $lvmsize += $qcow2_overhead if $format eq 'qcow2';

we definitely don't want to have this twice..

> +    $lvmsize = "${lvmsize}k";
>  
>      my $path = $class->path($scfg, $volname);
> -    my $cmd = ['/sbin/lvextend', '-L', $size, $path];
> +    my $cmd = ['/sbin/lvextend', '-L', $lvmsize, $path];
>  
>      $class->cluster_lock_storage($storeid, $scfg->{shared}, undef, sub {
>  	run_command($cmd, errmsg => "error resizing volume '$path'");
>      });
>  
> +    if(!$running && $format eq 'qcow2') {
> +	my $prealloc_opt = PVE::Storage::Plugin::preallocation_cmd_option($scfg, $format);
> +	my $cmd = ['/usr/bin/qemu-img', 'resize', "--$prealloc_opt", '-f', $format, $path , $size];
> +	run_command($cmd, timeout => 10);
> +    }
> +
>      return 1;
>  }
>  
> @@ -587,30 +642,159 @@ sub volume_size_info {
>  sub volume_snapshot {
>      my ($class, $scfg, $storeid, $volname, $snap) = @_;
>  
> -    die "lvm snapshot is not implemented";
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +        $class->parse_volname($volname);
> +
> +    die "can't snapshot this image format\n" if $format ne 'qcow2';
> +
> +    $class->activate_volume($storeid, $scfg, $volname, undef, {});

last two not needed

> +
> +    my $snap_volname = $class->get_snap_volname($volname, $snap);
> +    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);

see above..

> +
> +    my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> +
> +    #rename current lvm volume to snap volume
> +    my $vg = $scfg->{vgname};
> +    print"rename $volname to $snap_volname\n";
> +    eval { lvrename($vg, $volname, $snap_volname); };
> +    if ($@) {
> +	die "can't rename lvm volume from $volname to $snap_volname: $@ \n";
> +    }
> +
> +    eval { $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024, $snap_path); };
> +    if ($@) {
> +        eval { $class->free_image($storeid, $scfg, $volname, 0) };

missing error handling, this needs to rename back? also, this might return a code-reference that needs to be executed..

> +        warn $@ if $@;
> +    }
>  }
>  
> +sub volume_rollback_is_possible {
> +    my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
> +
> +    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> +    $class->activate_volume($storeid, $scfg, $volname, undef, {});
> +    my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +    my $parent_snap = $snapshots->{current}->{parent};

wouldn't it be enough to check that this equals $snap?

> +
> +    return 1 if $snapshots->{$parent_snap}->{file} eq $snap_path;
> +    die "can't rollback, '$snap' is not most recent snapshot on '$volname'\n";
> +
> +    return 1;
> +}
> +
> +
>  sub volume_snapshot_rollback {
>      my ($class, $scfg, $storeid, $volname, $snap) = @_;
>  
> -    die "lvm snapshot rollback is not implemented";
> +    die "can't rollback snapshot for this image format\n" if $volname !~ m/\.(qcow2)$/;

this should go below parse_volname and use the format it returns..

> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
> +        $class->parse_volname($volname);
> +
> +    $class->activate_volume($storeid, $scfg, $volname, undef, {});

two unneeded parameters

> +    my $size = $class->volume_size_info($scfg, $storeid, $volname, 5);
> +    my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> +    #simply delete the current snapshot and recreate it
> +    eval { $class->free_image($storeid, $scfg, $volname, 0) };

might return a code reference that needs to be executed..

> +    if ($@) {
> +	die "can't delete old volume $volname: $@\n";
> +    }
> +
> +    eval { $class->alloc_image($storeid, $scfg, $vmid, 'qcow2', $volname, $size/1024, $snap_path) };
> +    if ($@) {
> +	die "can't allocate new volume $volname: $@\n";
> +    }
> +
> +    return undef;
>  }
>  
>  sub volume_snapshot_delete {
> -    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +    my ($class, $scfg, $storeid, $volname, $snap, $running) = @_;
> +
> +   die "can't delete snapshot for this image format\n" if $volname !~ m/\.(qcow2)$/;

this should parse the volname and use the returned format!

> +
> +   return 1 if $running;
> +
> +   my $cmd = "";
> +   my $path = $class->filesystem_path($scfg, $volname);
> +
> +   my $snapshots = $class->volume_snapshot_info($scfg, $storeid, $volname);
> +   my $snap_path = $snapshots->{$snap}->{file};
> +   my $snap_volname = $snapshots->{$snap}->{volname};
> +   die "volume $snap_path is missing" if !-e $snap_path;
>  
> -    die "lvm snapshot delete is not implemented";
> +   my $parent_snap = $snapshots->{$snap}->{parent};
> +   my $child_snap = $snapshots->{$snap}->{child};
> +
> +   my $parent_path = $snapshots->{$parent_snap}->{file} if $parent_snap;
> +   my $child_path = $snapshots->{$child_snap}->{file} if $child_snap;
> +   my $child_volname = $snapshots->{$child_snap}->{volname} if $child_snap;

same as in the Plugin.pm patch, this is not allowed code style!

> +
> +   #if first snapshot,as it should be bigger,  we merge child, and rename the snapshot to child
> +   if(!$parent_snap) {
> +	print"commit $child_path\n";
> +	$cmd = ['/usr/bin/qemu-img', 'commit', $child_path];

could use `-d`, since we don't use $child_path afterwards anyway

> +	eval {	run_command($cmd) };
> +	if ($@) {
> +	    die "error commiting $child_path to $parent_path: $@\n";
> +	}
> +	print"delete $child_volname\n";
> +	eval { $class->free_image($storeid, $scfg, $child_volname, 0) };

might return a code reference that needs to be executed..

> +	if ($@) {
> +	    die "error delete old snapshot volume $child_volname: $@\n";
> +	}
> +	print"rename $snap_volname to $child_volname\n";
> +	my $vg = $scfg->{vgname};
> +	eval { lvrename($vg, $snap_volname, $child_volname) };
> +	if ($@) {
> +	    die "error renaming snapshot: $@\n";
> +	}
> +    } else {
> +	#we rebase the child image on the parent as new backing image
> +	die "missing parentsnap snapshot to rebase child $child_path\n" if !$parent_path;

how would this happen?

> +	print "link $child_snap to $parent_snap\n";
> +	$cmd = ['/usr/bin/qemu-img', 'rebase', '-b', $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
> +	eval { run_command($cmd) };
> +	if ($@) {
> +	    die "error rebase $child_path with $parent_path; $@\n";
> +	}
> +	#delete the snapshot
> +	eval { $class->free_image($storeid, $scfg, $snap_volname, 0); };

might return a code reference that needs to be executed..

> +	if ($@) {
> +	    die "error delete old snapshot volume $snap_volname: $@\n";
> +	}
> +    }
>  }
>  
>  sub volume_has_feature {
>      my ($class, $scfg, $feature, $storeid, $volname, $snapname, $running) = @_;
>  
>      my $features = {
> -	copy => { base => 1, current => 1},
> -	rename => {current => 1},
> +        copy => {
> +            base => { qcow2 => 1, raw => 1},
> +            current => { qcow2 => 1, raw => 1},
> +            snap => { qcow2 => 1 },

did you actually test this? AFAICT this would still fall back to internal qcow2 snapshots?

> +        },
> +        'rename' => {
> +            current => { qcow2 => 1, raw => 1},

how does this interact with snapshots?

> +        },
> +        snapshot => {
> +            current => { qcow2 => 1 },
> +            snap => { qcow2 => 1 },
> +        },
> +        template => {
> +            current => { qcow2 => 1, raw => 1},

see below..

> +        },
> +	clone => {
> +	    base => { qcow2 => 1, raw => 1 },

how can we do linked clones of raw volumes? how can we do linked clones of qcow2 volumes if we don't allow creating base volumes?

> +	},
>      };
>  
> -    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase) =
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) =
>  	$class->parse_volname($volname);
>  
>      my $key = undef;
> @@ -619,7 +803,7 @@ sub volume_has_feature {
>      }else{
>  	$key =  $isBase ? 'base' : 'current';
>      }
> -    return 1 if $features->{$feature}->{$key};
> +    return 1 if defined($features->{$feature}->{$key}->{$format});

why the defined?

>  
>      return undef;
>  }
> @@ -740,4 +924,12 @@ sub rename_volume {
>      return "${storeid}:${target_volname}";
>  }
>  
> +sub get_snap_volname {
> +    my ($class, $volname, $snapname) = @_;
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $format) = $class->parse_volname($volname);
> +    $name = !$snapname || $snapname eq 'current' ? $volname : "snap-$snapname-$name";

see above..

> +    return $name;
> +}
> +
>  1;
> -- 
> 2.39.5


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-04-01 13:51 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250311102905.2680524-1-alexandre.derumier@groupe-cyllene.com>
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 1/5] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-04-02  8:01     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <0e2cd118f35aa8d4c410d362fea1a1b366df1570.camel@groupe-cyllene.com>
2025-04-02  8:28       ` Fabian Grünbichler
2025-04-03  4:27         ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 02/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 2/5] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler [this message]
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 03/11] replace qemu_block_set_io_throttle with qom-set throttlegroup limits Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 3/5] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-04-07 11:02     ` DERUMIER, Alexandre via pve-devel
2025-04-07 11:29     ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-04-02  8:10   ` Fabian Grünbichler
2025-04-11 17:32     ` DERUMIER, Alexandre via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 4/5] lvm: lvrename helper: allow path Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-03-11 10:28 ` [pve-devel] [PATCH v4 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-03-11 10:28 ` [pve-devel] [PATCH v4 pve-storage 5/5] lvm: add lvremove helper Alexandre Derumier via pve-devel
2025-04-01 13:50   ` Fabian Grünbichler
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 09/11] blockdev: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-04-02  8:10   ` Fabian Grünbichler
2025-03-11 10:29 ` [pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-04-02  8:10   ` Fabian Grünbichler
2025-04-03  4:51     ` DERUMIER, Alexandre via pve-devel
2025-04-04 11:31     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <3e516016a970e52e5a1014dbcd6cf9507581da74.camel@groupe-cyllene.com>
2025-04-04 11:37       ` Fabian Grünbichler
2025-04-04 13:02         ` DERUMIER, Alexandre via pve-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1503093360.3972.1743515433288@webmail.proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal