public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "DERUMIER, Alexandre via pve-devel" <pve-devel@lists.proxmox.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
	"f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 13 Jan 2025 10:08:27 +0000	[thread overview]
Message-ID: <mailman.242.1736762947.441.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <570013533.830.1736423850641@webmail.proxmox.com>

[-- Attachment #1: Type: message/rfc822, Size: 24364 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Date: Mon, 13 Jan 2025 10:08:27 +0000
Message-ID: <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:

>>it would be great if there'd be a summary of the design choices and a
>>high level summary of what happens to the files and block-node-graph
>>here. it's a bit hard to judge from the code below whether it would
>>be possible to eliminate the dynamically named block nodes, for
>>example ;)

yes, sorry, I'll add more infos with qemu limitations and why I'm doing
it like this.

>>a few more comments documenting the behaviour and ideally also some
>>tests (mocking the QMP interactions?) would be nice
yes, I'll add tests, need to find a good way to mock it.


> +
> +    #preallocate add a new current file
> +    my $new_current_fmt_nodename = get_blockdev_nextid("fmt-
> $deviceid", $nodes);
> +    my $new_current_file_nodename = get_blockdev_nextid("file-
> $deviceid", $nodes);

>>okay, so here we have a dynamic node name because the desired target
>>name is still occupied. could we rename the old block node first?

we can't rename a node, that's the problem.


> +    PVE::Storage::volume_snapshot($storecfg, $volid, $snap);

>>(continued from above) and this invocation here are the same??
The invocation is the same, but they it's not doing the same if it's an
external snasphot.

>> wouldn't this already create the snapshot on the storage layer? 
yes, it's create the (lvm volume) +  qcow2 file with preallocation

>>and didn't we just hardlink + reopen + unlink to transform the
>>previous current volume into the snap volume?
yes.
here, we are creating the new current volume,  adding it to the graph
with blockdev-add, then finally switch to it with blockdev-snapshot

The ugly trick in pve-storage is in plugin.pm
#rename current volume to snap volume
rename($path, $snappath) if -e $path && !-e $snappath;
or in lvm plugin.
eval { lvrename($vg, $volname, $snap_volname) } ;


(and you have already made comment about them ;)


because I'm re-using volume_snapshot (I didn't have to add a new method
in pve-storage) to allocate the snasphot file, but indeed, we have
already to the rename online.


>>should this maybe have been vdisk_alloc and it just works by
accident?
It's not works fine with vdisk_alloc, because the volume need to be
created without the size specified but with backing file param instead.
(if I remember, qemu-img is looking at the backing file size+metadas
and set the correct total size for the new volume)


Maybe a better way could be to reuse vdisk_alloc, and add backing file
as param ?



> +    my $new_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $new_current_file_nodename);
> +    my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $new_current_fmt_nodename, $new_file_blockdev);
> +
> +    $new_fmt_blockdev->{backing} = undef;

>>generate_format_blockdev doesn't set backing? 
yes, it's adding backing

>>maybe this should be >>converted into an assertion?

but they are a limitation of the qmp blockdev-ad ++blockdev-snapshot
where the backing attribute need undef in the blockdev-add or the
blockdev-snapshot will fail because it's trying itself to set the
backing file when doing the switch.

From my test, it was related to this
https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01404.html



> +    PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$new_fmt_blockdev);
> +    mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename,
> overlay => $new_current_fmt_nodename);
> +}
> +
> +sub blockdev_snap_rename {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_path,
> $target_path) = @_;

>>I think this whole thing needs more error handling and thought about
>>how to recover from various points failing.. 
yes, that's the problem with renaming, it's not atomic.

Also, if we need to recover (rollback), how to manage multiple disk ?



>>there's also quite some overlap with blockdev_current_rename, I
>>wonder whether it would be possible to simplify the code further by
>merging the two? but see below, I think we can even get away with
>>dropping this altogether if we switch from block-commit to block-
>>stream..
Yes, I have seperated them because I was not sure of the different
workflow, and It was more simplier to fix one method without breaking
the other.

I'll look to implement block-stream.  (and keep commit to initial image
for the last snapshot delete)


> +    #untaint
> +    if ($src_path =~ m/^(\S+)$/) {
> + $src_path = $1;
> +    }
> +    if ($target_path =~ m/^(\S+)$/) {
> + $target_path = $1;
> +    }

>>shouldn't that have happened in the storage plugin?
>>
> +
> +    #create a hardlink
> +    link($src_path, $target_path);

>>should this maybe be done by the storage plugin?

This was to avoid to introduce a sub method, but yes, it could be
better indeed.

PVE::Storage::link  ?

> 
> +    #delete old $path link
> +    unlink($src_path);

and this

PVE::Storage::unlink  ?
(can't use free_image here, because we really want to remove the link
and not the volume )


> +
> +    #rename underlay
> +    my $storage_name = PVE::Storage::parse_volume_id($volid);
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    if ($scfg->{type} eq 'lvm') {
> + print"lvrename $src_path to $target_path\n";
> + run_command(
> +     ['/sbin/lvrename', $src_path, $target_path],
> +     errmsg => "lvrename $src_path to $target_path error",
> + );
> +    }

>>and this as well?
I didn't reuse lvrename in lvmplugin, because it's using vgname/lvname
and not the path, but I can look to extend it)


> +}
> +
> +sub blockdev_current_rename {
> +    my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path,
> $skip_underlay) = @_;
> +    ## rename current running image
> +
> +    my $nodes = get_blockdev_nodes($vmid);
> +    my $volid = $drive->{file};
> +    my $target_file_nodename = get_blockdev_nextid("file-$deviceid",
> $nodes);

>>here we could already incorporate the snapshot name, since we know
it?

31char limits.


> +
> +    my $file_blockdev = generate_file_blockdev($storecfg, $drive,
> $target_file_nodename);
> +    $file_blockdev->{filename} = $target_path;
> +
> +    my $format_node = find_blockdev_node($nodes, $path, 'fmt');

>>then we'd know this is always the "current" node, however we
>>deterministically name it?

until you are doing a block-mirror, the current fmt node will be
replaced with another current2 fmt node.



>>and this should be done by the storage layer I think? how does this
>>interact with LVM?
from my test, an hardlink is working

>> would we maybe want to mknod instead of hardlinking the
device node? 

because /dev/<vgname>/<lv>  is not a device node, it's already a link
to the device node

for example:
lrwxrwxrwx  1 root root    7 Dec 10 00:11 vm-10001-disk-0 -> ../dm-9

#ln vm-10001-disk-0 testrename

lrwxrwxrwx  1 root root    7 Dec 10 00:11 vm-10001-disk-0 -> ../dm-9
lrwxrwxrwx  1 root root    7 Dec 10 00:11 testrename -> ../dm-9


>>did you try whether a plain rename would also work (not sure - qemu
>>already has an open FD to the file/blockdev, but I am not sure how
>>LVM handles this ;))?

from my test, the lvrename, it simply the rename the lvm volume
internaly, then rename link.. and as we have already create the link,
it's simply rename it without problem.

#lvrename vm-10001-disk-0  vm-10001-disk-snap1

lrwxrwxrwx  1 root root    7 Dec 10 00:11 vm-10001-disk-snap1 -> ../dm-
9
lrwxrwxrwx  1 root root    7 Dec 10 00:11 testrename -> ../dm-9

#lvrename vm-10001-disk-snap1 testrename

lrwxrwxrwx  1 root root    7 Dec 10 00:11 testrename -> ../dm-9


> 
> +
> +sub blockdev_commit {

>>see comments below for qemu_volume_snapshot_delete, I think this..

>>and this can be replaced altogether with blockdev_stream..



>>wouldn't it make more sense to use block-stream to merge the contents
>>of the to-be-deleted snapshot into the current overlay? that way we
>>wouldn't need to rename anything, AFAICT..


>>same here, instead of commiting from the child into the to-be-deleted
>>snapshot, and then renaming, why not just block-stream from the to-
>>be-deleted snapshot into the child, and then discard the snapshot
>>that is no longer needed?



>>commit is the wrong direction though?
>>
>>if we have A -> B -> C, and B is deleted, the delta previously
co>>ntained in B should be merged into C, not into A?
>>
>>so IMHO a simple block-stream + removal of the to-be-deleted snapshot
>>should be the right choice here as well?
>>
>>that would effectively make all the paths identical AFAICT (stream
>>from to-be-deleted snapshot to child, followed by deletion of the no
>>longer used volume corresponding to the deleted/streamed snapshot)
>>and no longer require any renaming..

Yes, got it now. I'll implement block-stream.
But keep commit for last snapshot delete.





[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  parent reply	other threads:[~2025-01-13 10:09 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
2024-12-16  9:12 ` [pve-devel] [PATCH v1 pve-qemu 1/1] add block-commit-replaces option patch Alexandre Derumier via pve-devel
2025-01-08 13:27   ` Fabian Grünbichler
2025-01-10  7:55     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <34a164520eba035d1db5f70761b0f4aa59fecfa5.camel@groupe-cyllene.com>
2025-01-10  9:15       ` Fiona Ebner
2025-01-10  9:32         ` DERUMIER, Alexandre via pve-devel
     [not found]         ` <1e45e756801843dd46eb6ce2958d30885ad73bc2.camel@groupe-cyllene.com>
2025-01-13 14:28           ` Fiona Ebner
2025-01-14 10:10             ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 01/11] blockdev: cmdline: convert drive to blockdev syntax Alexandre Derumier via pve-devel
2025-01-08 14:17   ` Fabian Grünbichler
2025-01-10 13:50     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 pve-storage 1/3] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 12:36   ` Fabian Grünbichler
2025-01-10  9:10     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <f25028d41a9588e82889b3ef869a14f33cbd216e.camel@groupe-cyllene.com>
2025-01-10 11:02       ` Fabian Grünbichler
2025-01-10 11:51         ` DERUMIER, Alexandre via pve-devel
     [not found]         ` <1caecaa23e5d57030a9e31f2f0e67648f1930d6a.camel@groupe-cyllene.com>
2025-01-10 12:20           ` Fabian Grünbichler
2025-01-10 13:14             ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 02/11] blockdev: fix cfg2cmd tests Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 pve-storage 2/3] lvmplugin: add qcow2 snapshot Alexandre Derumier via pve-devel
2025-01-09 13:55   ` Fabian Grünbichler
2025-01-10 10:16     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 03/11] blockdev : convert qemu_driveadd && qemu_drivedel Alexandre Derumier via pve-devel
2025-01-08 14:26   ` Fabian Grünbichler
2025-01-10 14:08     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 pve-storage 3/3] storage: vdisk_free: remove external snapshots Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 04/11] blockdev: vm_devices_list : fix block-query Alexandre Derumier via pve-devel
2025-01-08 14:31   ` Fabian Grünbichler
2025-01-13  7:56     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 05/11] blockdev: convert cdrom media eject/insert Alexandre Derumier via pve-devel
2025-01-08 14:34   ` Fabian Grünbichler
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 06/11] blockdev: block_resize: convert to blockdev Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 07/11] blockdev: nbd_export: block-export-add : use drive-$id for nodename Alexandre Derumier via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror Alexandre Derumier via pve-devel
2025-01-08 15:19   ` Fabian Grünbichler
2025-01-13  8:27     ` DERUMIER, Alexandre via pve-devel
     [not found]     ` <0d0d4c4d73110cf0e692cae0ee65bf7f9a6ce93a.camel@groupe-cyllene.com>
2025-01-13  9:52       ` Fabian Grünbichler
2025-01-13  9:55         ` Fabian Grünbichler
2025-01-13 10:47         ` DERUMIER, Alexandre via pve-devel
2025-01-13 13:42           ` Fiona Ebner
2025-01-14 10:03             ` DERUMIER, Alexandre via pve-devel
     [not found]             ` <fa38efbd95b57ba57a5628d6acfcda9d5875fa82.camel@groupe-cyllene.com>
2025-01-15  9:39               ` Fiona Ebner
2025-01-15  9:51                 ` Fabian Grünbichler
2025-01-15 10:06                   ` Fiona Ebner
2025-01-15 10:15                     ` Fabian Grünbichler
2025-01-15 10:46                       ` Fiona Ebner
2025-01-15 10:50                         ` Fabian Grünbichler
2025-01-15 11:01                           ` Fiona Ebner
2025-01-15 13:01                       ` DERUMIER, Alexandre via pve-devel
2025-01-16 14:56                     ` DERUMIER, Alexandre via pve-devel
2025-01-15 10:15                   ` DERUMIER, Alexandre via pve-devel
     [not found]         ` <c1559499319052d6cf10900efd5376c12389a60f.camel@groupe-cyllene.com>
2025-01-13 13:31           ` Fabian Grünbichler
2025-01-20 13:37             ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 09/11] blockdev: mirror: change aio on target if io_uring is not default Alexandre Derumier via pve-devel
2025-01-09  9:51   ` Fabian Grünbichler
2025-01-13  8:38     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 10/11] blockdev: add backing_chain support Alexandre Derumier via pve-devel
2025-01-09 11:57   ` Fabian Grünbichler
2025-01-13  8:53     ` DERUMIER, Alexandre via pve-devel
2024-12-16  9:12 ` [pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support Alexandre Derumier via pve-devel
2025-01-09 11:57   ` Fabian Grünbichler
2025-01-09 13:19     ` Fabio Fantoni via pve-devel
2025-01-20 13:44       ` DERUMIER, Alexandre via pve-devel
     [not found]       ` <3307ec388a763510ec78f97ed9f0de00c87d54b5.camel@groupe-cyllene.com>
2025-01-20 14:29         ` Fabio Fantoni via pve-devel
     [not found]         ` <6bdfe757-ae04-42e1-b197-c9ddb873e353@m2r.biz>
2025-01-20 14:41           ` DERUMIER, Alexandre via pve-devel
2025-01-13 10:08     ` DERUMIER, Alexandre via pve-devel [this message]
     [not found]     ` <0ae72889042e006d9202e837aac7ecf2b413e1b4.camel@groupe-cyllene.com>
2025-01-13 13:27       ` Fabian Grünbichler
2025-01-13 18:07         ` DERUMIER, Alexandre via pve-devel
2025-01-13 18:58           ` DERUMIER, Alexandre via pve-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mailman.242.1736762947.441.pve-devel@lists.proxmox.com \
    --to=pve-devel@lists.proxmox.com \
    --cc=alexandre.derumier@groupe-cyllene.com \
    --cc=f.gruenbichler@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal