From: "DERUMIER, Alexandre via pve-devel" <pve-devel@lists.proxmox.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
"f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
Date: Fri, 10 Jan 2025 10:16:44 +0000 [thread overview]
Message-ID: <mailman.199.1736504237.441.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <1155144118.1000.1736430932629@webmail.proxmox.com>
[-- Attachment #1: Type: message/rfc822, Size: 21329 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot
Date: Fri, 10 Jan 2025 10:16:44 +0000
Message-ID: <86852ee45321ae5fad3ab9ae0c6cc23bed203de8.camel@groupe-cyllene.com>
>>one downside with this part in particular - we have to always
>>allocate full-size LVs (+qcow2 overhead), even if most of them will
>>end up storing just a single snapshot delta which might be a tiny
>>part of that full-size.. hopefully if discard is working across the
>>whole stack this doesn't actually explode space usage on the storage
>>side, but it makes everything a bit hard to track.. OTOH, while we
>>could in theory extend/reduce the LVs and qcow2 images on them when
>>modifying the backing chain, the additional complexity is probably
>>not worth it at the moment..
see this RFC with dynamic extend. (not shrink/not discard)
https://lore.proxmox.com/pve-devel/mailman.475.1725007456.302.pve-devel@lists.proxmox.com/t/
(I think that the tricky part (as Dominic have fast review), is to
handle resize cluster lock correctly, and handle timeout/retry with a
queue through a specific daemon)
But technically, this is how ovirt is Managed it (and it's works in
production, I have customers using it since multiple years)
>
> sub plugindata {
> return {
> content => [ {images => 1, rootdir => 1}, { images => 1 }],
> + format => [ { raw => 1, qcow2 => 1 } , 'raw' ],
>>I wonder if we want to guard the snapshotting-related parts below
>>with an additional "snapext" option here as well?
I really don't known, it's not possible to do snapshots with .raw
anyway.
on the gui side, it could allow to enable/display the format field for
example if snapext is defined in the storage.
>>or even the usage >>of qcow2 altogether?
I think we should keep to possiblity to choose .raw vs .qcow2 on same
storage, because
maybe a user really need max performance for a specific vm without the
need of snapshot.
>
> +
> + #add extra space for qcow2 metadatas
> + #without sub-allocated clusters : For 1TB storage : l2_size =
> disk_size × 8 / cluster_size
> + #with sub-allocated clusters : For 1TB storage : l2_size =
> disk_size × 8 / cluster_size / 16
> + #4MB overhead for 1TB with
> extented l2 clustersize=128k
> +
> + my $qcow2_overhead = ceil($size/1024/1024/1024) * 4096;
>>there's "qemu-img measure", which seems like it would do exactly what
>>we want ;)
"Calculate the file size required for a new image. This information can
be used to size logical volumes or SAN LUNs appropriately for the image
that will be placed in them."
indeed, lol. I knowned the command, but I thinked it was to measure
the content of an existing file. I'll do tests to see if I got same
results (and if sub-allocated clusters is correctly handled)
> +
> + my $lvmsize = $size;
> + $lvmsize += $qcow2_overhead if $fmt eq 'qcow2';
> +
> die "not enough free space ($free < $size)\n" if $free < $size;
>
> - $name = $class->find_free_diskname($storeid, $scfg, $vmid)
> + $name = $class->find_free_diskname($storeid, $scfg, $vmid, $fmt)
> if !$name;
>
> - lvcreate($vg, $name, $size, ["pve-vm-$vmid"]);
> -
> + my $tags = ["pve-vm-$vmid"];
> + push @$tags, "\@pve-$name" if $fmt eq 'qcow2';
>>that's a creative way to avoid the need to discover and activate
>>snapshots one by one below, but it might warrant a comment ;)
ah sorry (but yes,this was the idea to active/desactivate the whole
chain in 1call)
> >>
> +
> + #rename current lvm volume to snap volume
> + my $vg² = $scfg->{vgname};
> + print"rename $volname to $snap_volname\n";
> + eval { lvrename($vg, $volname, $snap_volname) } ;
>> missing error handling..
> +
> +
> + #allocate a new lvm volume
> + $class->alloc_new_image($storeid, $scfg, $vmid, 'qcow2',
> $volname, $size/1024);
>>missing error handling
ah ,sorry, it should include in the following eval
> + eval {
> + $class->format_qcow2($storeid, $scfg, $volname, undef,
> $snap_path);
> + };
> +
> + if ($@) {
> + eval { $class->free_image($storeid, $scfg, $volname, 0) };
> + warn $@ if $@;
> + }
> +}
> +
> +sub volume_rollback_is_possible {
> + my ($class, $scfg, $storeid, $volname, $snap, $blockers) = @_;
> +
> + my $snap_path = $class->path($scfg, $volname, $storeid, $snap);
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $parent_snap = $snapshots->{current}->{parent};
> +
> + return 1 if !-e $snap_path || $snapshots->{$parent_snap}->{file}
> eq $snap_path;
>>the first condition here seems wrong, see storage patch #1
yes
> + die "can't rollback, '$snap' is not most recent snapshot on
> '$volname'\n";
> +
> + return 1;
> }
>
> +
> sub volume_snapshot_rollback {
> my ($class, $scfg, $storeid, $volname, $snap) = @_;
>
> - die "lvm snapshot rollback is not implemented";
> + die "can't rollback snapshot this image format\n" if $volname !~
> m/\.(qcow2|qed)$/;
>>above we only have qcow2, which IMHO makes more sense..
We could remove the .qed everywhere, IT's deprecated since 2017 and we
never have exposed it in the gui.
> +
> + my $snapshots = $class->volume_snapshot_info($scfg, $storeid,
> $volname);
> + my $snap_path = $snapshots->{$snap}->{file};
> + my $snap_volname = $snapshots->{$snap}->{volname};
> + return if !-e $snap_path; #already deleted ?
>>should maybe be a die?
same than patch #1 comment. this was for snapdel retry with multiple
disks.
> + } else {
> + print"commit $snap_path\n";
> + $cmd = ['/usr/bin/qemu-img', 'commit', $snap_path];
>> leftover?
still no ;) see my patch#1 reply
> + #if we delete an intermediate snapshot, we need to link
> upper snapshot to base snapshot
> + die "missing parentsnap snapshot to rebase child
> $child_path\n" if !$parent_path;
> + print "link $child_snap to $parent_snap\n";
> + $cmd = ['/usr/bin/qemu-img', 'rebase', '-u', '-b',
> $parent_path, '-F', 'qcow2', '-f', 'qcow2', $child_path];
> + run_command($cmd);
>>same as for patch #1, I am not sure the -u here is correct..
This is correct, see my patch#1 reply
>
> +# don't allow to clone as we can't activate the base on multiple
> host at the same time
> +# clone => {
> +# base => { qcow2 => 1, raw => 1},
> +# },
>>I think activating the base would actually be okay, we just must
>>never write to it? ;)
Ah, this is a good remark. I thinked we couldn't activate an LV on
multiple node at the same time. I'll look at this, this add possibility
of linked clone. (I need to check the external snapshot code with
backing chains first)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-01-10 10:17 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-01-08 13:27 ` Fabian Grünbichler
2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
2025-01-10 9:15 ` Fiona Ebner
2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-01-08 14:17 ` Fabian Grünbichler
2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 12:36 ` Fabian Grünbichler
2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
2025-01-10 11:02 ` Fabian Grünbichler
2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
2025-01-10 12:20 ` Fabian Grünbichler
2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-01-09 13:55 ` Fabian Grünbichler
2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel [this message]
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-01-08 14:26 ` Fabian Grünbichler
2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-01-08 14:31 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-01-08 14:34 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-01-08 15:19 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-01-09 9:51 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mailman.199.1736504237.441.pve-devel@lists.proxmox.com \
--to=pve-devel@lists.proxmox.com \
--cc=alexandre.derumier@groupe-cyllene.com \
--cc=f.gruenbichler@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox