From: "DERUMIER, Alexandre via pve-devel" <pve-devel@lists.proxmox.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
"f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 13 Jan 2025 10:08:27 +0000 [thread overview]
Message-ID: <mailman.242.1736762947.441.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <570013533.830.1736423850641@webmail.proxmox.com>
[-- Attachment #1: Type: message/rfc822, Size: 24364 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 13 Jan 2025 10:08:27 +0000
Message-ID: <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
>>it would be great if there'd be a summary of the design choices and a
>>high level summary of what happens to the files and block-node-graph
>>here. it's a bit hard to judge from the code below whether it would
>>be possible to eliminate the dynamically named block nodes, for
>>example ;)
yes, sorry, I'll add more infos with qemu limitations and why I'm doing
it like this.
>>a few more comments documenting the behaviour and ideally also some
>>tests (mocking the QMP interactions?) would be nice
yes, I'll add tests, need to find a good way to mock it.
> +
> + #preallocate add a new current file
> + my $new_current_fmt_nodename = get_blockdev_nextid("fmt-
> $deviceid", $nodes);
> + my $new_current_file_nodename = get_blockdev_nextid("file-
> $deviceid", $nodes);
>>okay, so here we have a dynamic node name because the desired target
>>name is still occupied. could we rename the old block node first?
we can't rename a node, that's the problem.
> + PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>>(continued from above) and this invocation here are the same??
The invocation is the same, but they it's not doing the same if it's an
external snasphot.
>> wouldn't this already create the snapshot on the storage layer?
yes, it's create the (lvm volume) + qcow2 file with preallocation
>>and didn't we just hardlink + reopen + unlink to transform the
>>previous current volume into the snap volume?
yes.
here, we are creating the new current volume, adding it to the graph
with blockdev-add, then finally switch to it with blockdev-snapshot
The ugly trick in pve-storage is in plugin.pm
#rename current volume to snap volume
rename($path, $snappath) if -e $path && !-e $snappath;
or in lvm plugin.
eval { lvrename($vg, $volname, $snap_volname) } ;
(and you have already made comment about them ;)
because I'm re-using volume_snapshot (I didn't have to add a new method
in pve-storage) to allocate the snasphot file, but indeed, we have
already to the rename online.
>>should this maybe have been vdisk_alloc and it just works by
accident?
It's not works fine with vdisk_alloc, because the volume need to be
created without the size specified but with backing file param instead.
(if I remember, qemu-img is looking at the backing file size+metadas
and set the correct total size for the new volume)
Maybe a better way could be to reuse vdisk_alloc, and add backing file
as param ?
> + my $new_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $new_current_file_nodename);
> + my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $new_current_fmt_nodename, $new_file_blockdev);
> +
> + $new_fmt_blockdev->{backing} = undef;
>>generate_format_blockdev doesn't set backing?
yes, it's adding backing
>>maybe this should be >>converted into an assertion?
but they are a limitation of the qmp blockdev-ad ++blockdev-snapshot
where the backing attribute need undef in the blockdev-add or the
blockdev-snapshot will fail because it's trying itself to set the
backing file when doing the switch.
From my test, it was related to this
https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01404.html
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$new_fmt_blockdev);
> + mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename,
> overlay => $new_current_fmt_nodename);
> +}
> +
> +sub blockdev_snap_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $src_path,
> $target_path) = @_;
>>I think this whole thing needs more error handling and thought about
>>how to recover from various points failing..
yes, that's the problem with renaming, it's not atomic.
Also, if we need to recover (rollback), how to manage multiple disk ?
>>there's also quite some overlap with blockdev_current_rename, I
>>wonder whether it would be possible to simplify the code further by
>merging the two? but see below, I think we can even get away with
>>dropping this altogether if we switch from block-commit to block-
>>stream..
Yes, I have seperated them because I was not sure of the different
workflow, and It was more simplier to fix one method without breaking
the other.
I'll look to implement block-stream. (and keep commit to initial image
for the last snapshot delete)
> + #untaint
> + if ($src_path =~ m/^(\S+)$/) {
> + $src_path = $1;
> + }
> + if ($target_path =~ m/^(\S+)$/) {
> + $target_path = $1;
> + }
>>shouldn't that have happened in the storage plugin?
>>
> +
> + #create a hardlink
> + link($src_path, $target_path);
>>should this maybe be done by the storage plugin?
This was to avoid to introduce a sub method, but yes, it could be
better indeed.
PVE::Storage::link ?
>
> + #delete old $path link
> + unlink($src_path);
and this
PVE::Storage::unlink ?
(can't use free_image here, because we really want to remove the link
and not the volume )
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $src_path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $src_path, $target_path],
> + errmsg => "lvrename $src_path to $target_path error",
> + );
> + }
>>and this as well?
I didn't reuse lvrename in lvmplugin, because it's using vgname/lvname
and not the path, but I can look to extend it)
> +}
> +
> +sub blockdev_current_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path,
> $skip_underlay) = @_;
> + ## rename current running image
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid",
> $nodes);
>>here we could already incorporate the snapshot name, since we know
it?
31char limits.
> +
> + my $file_blockdev = generate_file_blockdev($storecfg, $drive,
> $target_file_nodename);
> + $file_blockdev->{filename} = $target_path;
> +
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
>>then we'd know this is always the "current" node, however we
>>deterministically name it?
until you are doing a block-mirror, the current fmt node will be
replaced with another current2 fmt node.
>>and this should be done by the storage layer I think? how does this
>>interact with LVM?
from my test, an hardlink is working
>> would we maybe want to mknod instead of hardlinking the
device node?
because /dev/<vgname>/<lv> is not a device node, it's already a link
to the device node
for example:
lrwxrwxrwx 1 root root 7 Dec 10 00:11 vm-10001-disk-0 -> ../dm-9
#ln vm-10001-disk-0 testrename
lrwxrwxrwx 1 root root 7 Dec 10 00:11 vm-10001-disk-0 -> ../dm-9
lrwxrwxrwx 1 root root 7 Dec 10 00:11 testrename -> ../dm-9
>>did you try whether a plain rename would also work (not sure - qemu
>>already has an open FD to the file/blockdev, but I am not sure how
>>LVM handles this ;))?
from my test, the lvrename, it simply the rename the lvm volume
internaly, then rename link.. and as we have already create the link,
it's simply rename it without problem.
#lvrename vm-10001-disk-0 vm-10001-disk-snap1
lrwxrwxrwx 1 root root 7 Dec 10 00:11 vm-10001-disk-snap1 -> ../dm-
9
lrwxrwxrwx 1 root root 7 Dec 10 00:11 testrename -> ../dm-9
#lvrename vm-10001-disk-snap1 testrename
lrwxrwxrwx 1 root root 7 Dec 10 00:11 testrename -> ../dm-9
>
> +
> +sub blockdev_commit {
>>see comments below for qemu_volume_snapshot_delete, I think this..
>>and this can be replaced altogether with blockdev_stream..
>>wouldn't it make more sense to use block-stream to merge the contents
>>of the to-be-deleted snapshot into the current overlay? that way we
>>wouldn't need to rename anything, AFAICT..
>>same here, instead of commiting from the child into the to-be-deleted
>>snapshot, and then renaming, why not just block-stream from the to-
>>be-deleted snapshot into the child, and then discard the snapshot
>>that is no longer needed?
>>commit is the wrong direction though?
>>
>>if we have A -> B -> C, and B is deleted, the delta previously
co>>ntained in B should be merged into C, not into A?
>>
>>so IMHO a simple block-stream + removal of the to-be-deleted snapshot
>>should be the right choice here as well?
>>
>>that would effectively make all the paths identical AFAICT (stream
>>from to-be-deleted snapshot to child, followed by deletion of the no
>>longer used volume corresponding to the deleted/streamed snapshot)
>>and no longer require any renaming..
Yes, got it now. I'll implement block-stream.
But keep commit for last snapshot delete.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-01-13 10:09 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16 9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-01-08 13:27 ` Fabian Grünbichler
2025-01-10 7:55 ` DERUMIER, Alexandre via pve-devel
[not found] ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
2025-01-10 9:15 ` Fiona Ebner
2025-01-10 9:32 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
2025-01-13 14:28 ` Fiona Ebner
2025-01-14 10:10 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-01-08 14:17 ` Fabian Grünbichler
2025-01-10 13:50 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 12:36 ` Fabian Grünbichler
2025-01-10 9:10 ` DERUMIER, Alexandre via pve-devel
[not found] ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
2025-01-10 11:02 ` Fabian Grünbichler
2025-01-10 11:51 ` DERUMIER, Alexandre via pve-devel
[not found] ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
2025-01-10 12:20 ` Fabian Grünbichler
2025-01-10 13:14 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-01-09 13:55 ` Fabian Grünbichler
2025-01-10 10:16 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-01-08 14:26 ` Fabian Grünbichler
2025-01-10 14:08 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-01-08 14:31 ` Fabian Grünbichler
2025-01-13 7:56 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-01-08 14:34 ` Fabian Grünbichler
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-01-08 15:19 ` Fabian Grünbichler
2025-01-13 8:27 ` DERUMIER, Alexandre via pve-devel
[not found] ` <0d0d4c4d73110cf0e692cae0ee65bf7f9a6ce93a.camel@groupe-cyllene.com>
2025-01-13 9:52 ` Fabian Grünbichler
2025-01-13 9:55 ` Fabian Grünbichler
2025-01-13 10:47 ` DERUMIER, Alexandre via pve-devel
2025-01-13 13:42 ` Fiona Ebner
2025-01-14 10:03 ` DERUMIER, Alexandre via pve-devel
[not found] ` <fa38efbd95b57ba57a5628d6acfcda9d5875fa82.camel@groupe-cyllene.com>
2025-01-15 9:39 ` Fiona Ebner
2025-01-15 9:51 ` Fabian Grünbichler
2025-01-15 10:06 ` Fiona Ebner
2025-01-15 10:15 ` Fabian Grünbichler
2025-01-15 10:46 ` Fiona Ebner
2025-01-15 10:50 ` Fabian Grünbichler
2025-01-15 11:01 ` Fiona Ebner
2025-01-15 13:01 ` DERUMIER, Alexandre via pve-devel
2025-01-16 14:56 ` DERUMIER, Alexandre via pve-devel
2025-01-15 10:15 ` DERUMIER, Alexandre via pve-devel
[not found] ` <c1559499319052d6cf10900efd5376c12389a60f.camel@groupe-cyllene.com>
2025-01-13 13:31 ` Fabian Grünbichler
2025-01-20 13:37 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-01-09 9:51 ` Fabian Grünbichler
2025-01-13 8:38 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2025-01-13 8:53 ` DERUMIER, Alexandre via pve-devel
2024-12-16 9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 11:57 ` Fabian Grünbichler
2025-01-09 13:19 ` Fabio Fantoni via pve-devel
2025-01-20 13:44 ` DERUMIER, Alexandre via pve-devel
[not found] ` <3307ec388a763510ec78f97ed9f0de00c87d54b5.camel@groupe-cyllene.com>
2025-01-20 14:29 ` Fabio Fantoni via pve-devel
[not found] ` <6bdfe757-ae04-42e1-b197-c9ddb873e353@m2r.biz>
2025-01-20 14:41 ` DERUMIER, Alexandre via pve-devel
2025-01-13 10:08 ` DERUMIER, Alexandre via pve-devel [this message]
[not found] ` <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
2025-01-13 13:27 ` Fabian Grünbichler
2025-01-13 18:07 ` DERUMIER, Alexandre via pve-devel
2025-01-13 18:58 ` DERUMIER, Alexandre via pve-devel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mailman.242.1736762947.441.pve-devel@lists.proxmox.com \
--to=pve-devel@lists.proxmox.com \
--cc=alexandre.derumier@groupe-cyllene.com \
--cc=f.gruenbichler@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox