* [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
@ 2024-12-16 9:12 Alexandre Derumier via pve-devel
2025-01-09 14:13 ` Fabian Grünbichler
0 siblings, 1 reply; 11+ messages in thread
From: Alexandre Derumier via pve-devel @ 2024-12-16 9:12 UTC (permalink / raw)
To: pve-devel; +Cc: Alexandre Derumier
[-- Attachment #1: Type: message/rfc822, Size: 7026 bytes --]
From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Mon, 16 Dec 2024 10:12:14 +0100
Message-ID: <20241216091229.3142660-1-alexandre.derumier@groupe-cyllene.com>
This patch series implement qcow2 external snapshot support for files && lvm volumes
The current internal qcow2 snapshots have bad write performance because no metadatas can be preallocated.
This is particulary visible on a shared filesystem like ocfs2 or gfs2.
Also other bugs are freeze/lock reported by users since years on snapshots delete on nfs
(The disk access seem to be frozen during all the delete duration)
This also open doors for remote snapshot export-import for storage replication.
Changelog v3:
storage:
- snapshots files now have the name of the snapshot, and "current" snapshot is the vm volname
- allow only qcow2 format for base image for simplication
- merge snapshot code in lvmplugin (qcow2 format auto enable snapshot)
- the code is a lot more simple now
qemu-server:
- convertion -drive to modern -blockdev
This is needed for blockdev-reopen, where we need to create && switch to snapshot files with same cache,aio,...
This is also needed to live rename snapshot files (to keep volname for current snapshot)
I have implemented && tested:
- disk create,delete,resize,convert,drive_mirror, hotplug,unplug,nbd mirror,cdrom insert/eject
- block protocol: file,block_device,rbd,nbd,glusterfs
Note that it's currently incomplete:
- proxmox backup/restore code need to be converted to blockdev, help is needed
- iscsi:// path is not yet implemented (I'll look for the v4)
- efi still in drive format (not blocking, but I'll look for v4)
The live migration between -drive ---> --blockdev seem to work without breaking, and seem to be
transparent for the guest ok, so I think we could try to target pve9 ?
storage.cfg example:
dir: local2
path /var/liv/vz
content snippets,vztmpl,backup,images,iso,rootdir
snapext 1
lvm:test
vgname test
content images
pve-storage:
Alexandre Derumier (3):
qcow2: add external snapshot support
lvmplugin: add qcow2 snapshot
storage: vdisk_free: remove external snapshots
src/PVE/Storage.pm | 18 ++-
src/PVE/Storage/DirPlugin.pm | 1 +
src/PVE/Storage/LVMPlugin.pm | 231 ++++++++++++++++++++++++++---
src/PVE/Storage/Plugin.pm | 207 ++++++++++++++++++++++----
src/test/run_test_zfspoolplugin.pl | 18 +++
5 files changed, 424 insertions(+), 51 deletions(-)
Alexandre Derumier (11):
blockdev: cmdline: convert drive to blockdev syntax
blockdev: fix cfg2cmd tests
blockdev : convert qemu_driveadd && qemu_drivedel
blockdev: vm_devices_list : fix block-query
blockdev: convert cdrom media eject/insert
blockdev: block_resize: convert to blockdev
blockdev: nbd_export: block-export-add : use drive-$id for nodename
blockdev: convert drive_mirror to blockdev_mirror
blockdev: mirror: change aio on target if io_uring is not default.
blockdev: add backing_chain support
qcow2: add external snapshot support
PVE/QemuConfig.pm | 4 +-
PVE/QemuMigrate.pm | 2 +-
PVE/QemuServer.pm | 946 ++++++++++++++----
test/cfg2cmd/bootorder-empty.conf.cmd | 12 +-
test/cfg2cmd/bootorder-legacy.conf.cmd | 12 +-
test/cfg2cmd/bootorder.conf.cmd | 12 +-
...putype-icelake-client-deprecation.conf.cmd | 6 +-
test/cfg2cmd/ide.conf.cmd | 23 +-
test/cfg2cmd/pinned-version-pxe-pve.conf.cmd | 6 +-
test/cfg2cmd/pinned-version-pxe.conf.cmd | 6 +-
test/cfg2cmd/pinned-version.conf.cmd | 6 +-
test/cfg2cmd/q35-ide.conf.cmd | 23 +-
.../q35-linux-hostpci-template.conf.cmd | 3 +-
test/cfg2cmd/seabios_serial.conf.cmd | 6 +-
...imple-balloon-free-page-reporting.conf.cmd | 6 +-
test/cfg2cmd/simple-btrfs.conf.cmd | 6 +-
test/cfg2cmd/simple-virtio-blk.conf.cmd | 6 +-
test/cfg2cmd/simple1-template.conf.cmd | 11 +-
test/cfg2cmd/simple1.conf.cmd | 6 +-
19 files changed, 830 insertions(+), 272 deletions(-)
pve-qemu:
add block-commit-replaces option patch
...052-block-commit-add-replaces-option.patch | 137 ++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 138 insertions(+)
create mode 100644 debian/patches/pve/0052-block-commit-add-replaces-option.patch
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2024-12-16 9:12 [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support Alexandre Derumier via pve-devel
@ 2025-01-09 14:13 ` Fabian Grünbichler
2025-01-10 7:44 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 11+ messages in thread
From: Fabian Grünbichler @ 2025-01-09 14:13 UTC (permalink / raw)
To: Proxmox VE development discussion
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
> This patch series implement qcow2 external snapshot support for files && lvm volumes
>
> The current internal qcow2 snapshots have bad write performance because no metadatas can be preallocated.
>
> This is particulary visible on a shared filesystem like ocfs2 or gfs2.
>
> Also other bugs are freeze/lock reported by users since years on snapshots delete on nfs
> (The disk access seem to be frozen during all the delete duration)
>
> This also open doors for remote snapshot export-import for storage replication.
a few high level remarks:
- I am not sure whether we want to/can switch over to blockdev on the fly (i.e., without some sort of opt-in phase to iron out kinks). what about upgrades with running VMs? I guess some sort of flag and per-VM switching would be a better plan..
- I am also not sure whether we want to mix and match internal and external snapshots, we probably should require a storage to use it or not at creation time and fix it, to allow us to keep the handling code from exploding complexity-wise..
- if you see a way to name the block graph nodes in a deterministic fashion (i.e., have a 1:1 mapping between snapshot and block graph node name) that would be wonderful, else we'd have to find another way to improve the lookup there..
- the relevant commits and code would really benefit from a summary of the design/semantics, it's not easy to reconstruct this from the code alone
- some tests would be great as well, both to verify that the code actually behaves like we expect, and also to catch regressions when it is touched again in the future
- some performance numbers would also be great :)
-- internal vs external snapshots using the same underlying dir/fs
-- qcow2 vs qcow2 with snapshots vs raw on LVM using the same underlying disk
- did you verify the space consumption on the SAN side with the LVM plugin? would be interesting to know how it holds up in practice :)
I haven't done any real world testing yet since I expect that the delta for the next version won't be small either ;)
>
> Changelog v3:
> storage:
> - snapshots files now have the name of the snapshot, and "current" snapshot is the vm volname
> - allow only qcow2 format for base image for simplication
> - merge snapshot code in lvmplugin (qcow2 format auto enable snapshot)
> - the code is a lot more simple now
>
> qemu-server:
> - convertion -drive to modern -blockdev
> This is needed for blockdev-reopen, where we need to create && switch to snapshot files with same cache,aio,...
> This is also needed to live rename snapshot files (to keep volname for current snapshot)
> I have implemented && tested:
> - disk create,delete,resize,convert,drive_mirror, hotplug,unplug,nbd mirror,cdrom insert/eject
> - block protocol: file,block_device,rbd,nbd,glusterfs
> Note that it's currently incomplete:
> - proxmox backup/restore code need to be converted to blockdev, help is needed
> - iscsi:// path is not yet implemented (I'll look for the v4)
> - efi still in drive format (not blocking, but I'll look for v4)
>
> The live migration between -drive ---> --blockdev seem to work without breaking, and seem to be
> transparent for the guest ok, so I think we could try to target pve9 ?
>
>
>
> storage.cfg example:
>
> dir: local2
> path /var/liv/vz
> content snippets,vztmpl,backup,images,iso,rootdir
> snapext 1
>
> lvm:test
> vgname test
> content images
>
>
>
>
>
> pve-storage:
>
> Alexandre Derumier (3):
> qcow2: add external snapshot support
> lvmplugin: add qcow2 snapshot
> storage: vdisk_free: remove external snapshots
>
> src/PVE/Storage.pm | 18 ++-
> src/PVE/Storage/DirPlugin.pm | 1 +
> src/PVE/Storage/LVMPlugin.pm | 231 ++++++++++++++++++++++++++---
> src/PVE/Storage/Plugin.pm | 207 ++++++++++++++++++++++----
> src/test/run_test_zfspoolplugin.pl | 18 +++
> 5 files changed, 424 insertions(+), 51 deletions(-)
>
>
> Alexandre Derumier (11):
> blockdev: cmdline: convert drive to blockdev syntax
> blockdev: fix cfg2cmd tests
> blockdev : convert qemu_driveadd && qemu_drivedel
> blockdev: vm_devices_list : fix block-query
> blockdev: convert cdrom media eject/insert
> blockdev: block_resize: convert to blockdev
> blockdev: nbd_export: block-export-add : use drive-$id for nodename
> blockdev: convert drive_mirror to blockdev_mirror
> blockdev: mirror: change aio on target if io_uring is not default.
> blockdev: add backing_chain support
> qcow2: add external snapshot support
>
> PVE/QemuConfig.pm | 4 +-
> PVE/QemuMigrate.pm | 2 +-
> PVE/QemuServer.pm | 946 ++++++++++++++----
> test/cfg2cmd/bootorder-empty.conf.cmd | 12 +-
> test/cfg2cmd/bootorder-legacy.conf.cmd | 12 +-
> test/cfg2cmd/bootorder.conf.cmd | 12 +-
> ...putype-icelake-client-deprecation.conf.cmd | 6 +-
> test/cfg2cmd/ide.conf.cmd | 23 +-
> test/cfg2cmd/pinned-version-pxe-pve.conf.cmd | 6 +-
> test/cfg2cmd/pinned-version-pxe.conf.cmd | 6 +-
> test/cfg2cmd/pinned-version.conf.cmd | 6 +-
> test/cfg2cmd/q35-ide.conf.cmd | 23 +-
> .../q35-linux-hostpci-template.conf.cmd | 3 +-
> test/cfg2cmd/seabios_serial.conf.cmd | 6 +-
> ...imple-balloon-free-page-reporting.conf.cmd | 6 +-
> test/cfg2cmd/simple-btrfs.conf.cmd | 6 +-
> test/cfg2cmd/simple-virtio-blk.conf.cmd | 6 +-
> test/cfg2cmd/simple1-template.conf.cmd | 11 +-
> test/cfg2cmd/simple1.conf.cmd | 6 +-
> 19 files changed, 830 insertions(+), 272 deletions(-)
>
>
> pve-qemu:
>
> add block-commit-replaces option patch
>
> ...052-block-commit-add-replaces-option.patch | 137 ++++++++++++++++++
> debian/patches/series | 1 +
> 2 files changed, 138 insertions(+)
> create mode 100644 debian/patches/pve/0052-block-commit-add-replaces-option.patch
>
> --
>
> 2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2025-01-09 14:13 ` Fabian Grünbichler
@ 2025-01-10 7:44 ` DERUMIER, Alexandre via pve-devel
2025-01-10 9:55 ` Fiona Ebner
0 siblings, 1 reply; 11+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 7:44 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 25479 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 07:44:34 +0000
Message-ID: <9dee14737f1c78c247486e00a20fa95c420cb366.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
qemu] add external qcow2 snapshot support
Date: 09/01/2025 15:13:14
> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:
> This patch series implement qcow2 external snapshot support for files
> && lvm volumes
>
> The current internal qcow2 snapshots have bad write performance
> because no metadatas can be preallocated.
>
> This is particulary visible on a shared filesystem like ocfs2 or
> gfs2.
>
> Also other bugs are freeze/lock reported by users since years on
> snapshots delete on nfs
> (The disk access seem to be frozen during all the delete duration)
>
> This also open doors for remote snapshot export-import for storage
> replication.
>>
>>a few high level remarks:
>>- I am not sure whether we want to/can switch over to blockdev on the
>>fly (i.e., without some sort of opt-in phase to iron out kinks). what
>>about upgrades with running VMs? I guess some sort of flag and per-VM
>>switching would be a better plan..
I have tested live migration, and it's works for the small tests I have
done. (I was surprised myself). I'll try to do more longer test to be
100% sure that they are not corruption of datas.
on the guest side, it's transparent. on qemu side, the devices and pci
plumbing is still the same, and qemu already use blockdev behind.
If needed, we could make a switch based on qemu version, or or manual
option.
>>- I am also not sure whether we want to mix and match internal and
>>external snapshots, we probably should require a storage to use it or
>>not at creation time and fix it, to allow us to keep the handling
>>code from exploding complexity-wise..
for qcow2, I have a "snapext" option on the storage, so you can't mix
internal/external at the same time.
(We just need to allow to define the snapext option at storage create
only)
For lvm, only external snapshot are possible. (if format is qcow2)
>>- if you see a way to name the block graph nodes in a deterministic
>>fashion (i.e., have a 1:1 mapping between snapshot and block graph
>>node name) that would be wonderful, else we'd have to find another
>>way to improve the lookup there..
1:1 mapping with snapshot is not possible (I have tried it a lot),
because:
- snapshot name can be too long (blockdev name is 31 characters max,
hash based on filename is difficult)
- with external snapshot file renaming, this don't work (snap-->
current). We can't rename a blockdev, so the mapping will drift.
So, I don't think that it's possible to avoid lookup. (I really don't
have idea how to manage it).
I'm not sure it's really a problem ? it's just an extra qmp call, but
it's super fast.
>>- the relevant commits and code would really benefit from a summary
>>of the design/semantics, it's not easy to reconstruct this from the
>>code alone
ok will do
>>- some tests would be great as well, both to verify that the code
>>actually behaves like we expect, and also to catch regressions when
>>it is touched again in the future
yes, I was planning to add test, as we don't have too much tests on
qemu command line currently, this is a good time to add them.
>>- some performance numbers would also be great :)
>>-- internal vs external snapshots using the same underlying dir/fs
>>-- qcow2 vs qcow2 with snapshots vs raw on LVM using the same
>>underlying disk
I'll prepare different hardware san to compare. (Note that it's not
only performance, but disk lock/freeze on snapshot rollback for example
with internal snapshot).
I'll try to rebench again ocfs2 && gfs2 too with external snapshot (as
metadatas preallocation is really important with them, and it's not
possible with internal snap)
>>- did you verify the space consumption on the SAN side with the LVM
>>plugin? would be interesting to know how it holds up in practice :)
Well, currently, it's reserving a full lvm volume for each snapshot.
If your storage support thin provisioning, its not too much a problem,
you can create a lun bigger than your physical storage. I have tried
with a netapp san, it's works.
but for low cost san, we thin provisioning not exist, I'm planning to
add dynamic extend fo small lvm.
I have send some test patch in september 2024, qcow2 size can be
bigger than lvm size. (like qcow2=100G && lvm=1G), then it's possible
to send event with qemu on threshold usage and extend (through a custom
daemon) the lvm volume.
But I was planning to add this in a separated after that the external
snapshot code is commited.
>>I haven't done any real world testing yet since I expect that the
>>delta for the next version won't be small either ;)
Yes, the biggest part is done. (I Hope ^_^ )
I'll try to finish the blockdev convertion of some missing part (iscsi
path, uefi ,...).
The main blocking point for me is the proxmox backup code, I really
don't known how it's works.
>
> Changelog v3:
> storage:
> - snapshots files now have the name of the snapshot, and
> "current" snapshot is the vm volname
> - allow only qcow2 format for base image for simplication
> - merge snapshot code in lvmplugin (qcow2 format auto enable
> snapshot)
> - the code is a lot more simple now
>
> qemu-server:
> - convertion -drive to modern -blockdev
> This is needed for blockdev-reopen, where we need to create &&
> switch to snapshot files with same cache,aio,...
> This is also needed to live rename snapshot files (to keep
> volname for current snapshot)
> I have implemented && tested:
> - disk create,delete,resize,convert,drive_mirror,
> hotplug,unplug,nbd mirror,cdrom insert/eject
> - block protocol: file,block_device,rbd,nbd,glusterfs
> Note that it's currently incomplete:
> - proxmox backup/restore code need to be converted to
> blockdev, help is needed
> - iscsi:// path is not yet implemented (I'll look for the
> v4)
> - efi still in drive format (not blocking, but I'll look for
> v4)
>
> The live migration between -drive ---> --blockdev seem to work
> without breaking, and seem to be
> transparent for the guest ok, so I think we could try to target
> pve9 ?
>
>
>
> storage.cfg example:
>
> dir: local2
> path /var/liv/vz
> content snippets,vztmpl,backup,images,iso,rootdir
> snapext 1
>
> lvm:test
> vgname test
> content images
>
>
>
>
>
> pve-storage:
>
> Alexandre Derumier (3):
> qcow2: add external snapshot support
> lvmplugin: add qcow2 snapshot
> storage: vdisk_free: remove external snapshots
>
> src/PVE/Storage.pm | 18 ++-
> src/PVE/Storage/DirPlugin.pm | 1 +
> src/PVE/Storage/LVMPlugin.pm | 231 ++++++++++++++++++++++++++-
> --
> src/PVE/Storage/Plugin.pm | 207 ++++++++++++++++++++++----
> src/test/run_test_zfspoolplugin.pl | 18 +++
> 5 files changed, 424 insertions(+), 51 deletions(-)
>
>
> Alexandre Derumier (11):
> blockdev: cmdline: convert drive to blockdev syntax
> blockdev: fix cfg2cmd tests
> blockdev : convert qemu_driveadd && qemu_drivedel
> blockdev: vm_devices_list : fix block-query
> blockdev: convert cdrom media eject/insert
> blockdev: block_resize: convert to blockdev
> blockdev: nbd_export: block-export-add : use drive-$id for nodename
> blockdev: convert drive_mirror to blockdev_mirror
> blockdev: mirror: change aio on target if io_uring is not default.
> blockdev: add backing_chain support
> qcow2: add external snapshot support
>
> PVE/QemuConfig.pm | 4 +-
> PVE/QemuMigrate.pm | 2 +-
> PVE/QemuServer.pm | 946 ++++++++++++++--
> --
> test/cfg2cmd/bootorder-empty.conf.cmd | 12 +-
> test/cfg2cmd/bootorder-legacy.conf.cmd | 12 +-
> test/cfg2cmd/bootorder.conf.cmd | 12 +-
> ...putype-icelake-client-deprecation.conf.cmd | 6 +-
> test/cfg2cmd/ide.conf.cmd | 23 +-
> test/cfg2cmd/pinned-version-pxe-pve.conf.cmd | 6 +-
> test/cfg2cmd/pinned-version-pxe.conf.cmd | 6 +-
> test/cfg2cmd/pinned-version.conf.cmd | 6 +-
> test/cfg2cmd/q35-ide.conf.cmd | 23 +-
> .../q35-linux-hostpci-template.conf.cmd | 3 +-
> test/cfg2cmd/seabios_serial.conf.cmd | 6 +-
> ...imple-balloon-free-page-reporting.conf.cmd | 6 +-
> test/cfg2cmd/simple-btrfs.conf.cmd | 6 +-
> test/cfg2cmd/simple-virtio-blk.conf.cmd | 6 +-
> test/cfg2cmd/simple1-template.conf.cmd | 11 +-
> test/cfg2cmd/simple1.conf.cmd | 6 +-
> 19 files changed, 830 insertions(+), 272 deletions(-)
>
>
> pve-qemu:
>
> add block-commit-replaces option patch
>
> ...052-block-commit-add-replaces-option.patch | 137
> ++++++++++++++++++
> debian/patches/series | 1 +
> 2 files changed, 138 insertions(+)
> create mode 100644 debian/patches/pve/0052-block-commit-add-
> replaces-option.patch
>
> --
>
> 2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2025-01-10 7:44 ` DERUMIER, Alexandre via pve-devel
@ 2025-01-10 9:55 ` Fiona Ebner
2025-01-10 12:30 ` DERUMIER, Alexandre via pve-devel
[not found] ` <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>
0 siblings, 2 replies; 11+ messages in thread
From: Fiona Ebner @ 2025-01-10 9:55 UTC (permalink / raw)
To: Proxmox VE development discussion, f.gruenbichler
Am 10.01.25 um 08:44 schrieb DERUMIER, Alexandre via pve-devel:
> -------- Message initial --------
> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
> qemu] add external qcow2 snapshot support
> Date: 09/01/2025 15:13:14
>
>> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am
>> 16.12.2024 10:12 CET geschrieben:
>
>> This patch series implement qcow2 external snapshot support for files
>> && lvm volumes
>>
>> The current internal qcow2 snapshots have bad write performance
>> because no metadatas can be preallocated.
>>
>> This is particulary visible on a shared filesystem like ocfs2 or
>> gfs2.
>>
>> Also other bugs are freeze/lock reported by users since years on
>> snapshots delete on nfs
>> (The disk access seem to be frozen during all the delete duration)
>>
>> This also open doors for remote snapshot export-import for storage
>> replication.
>>>
>>> a few high level remarks:
>>> - I am not sure whether we want to/can switch over to blockdev on the
>>> fly (i.e., without some sort of opt-in phase to iron out kinks). what
>>> about upgrades with running VMs? I guess some sort of flag and per-VM
>>> switching would be a better plan..
>
> I have tested live migration, and it's works for the small tests I have
> done. (I was surprised myself). I'll try to do more longer test to be
> 100% sure that they are not corruption of datas.
>
> on the guest side, it's transparent. on qemu side, the devices and pci
> plumbing is still the same, and qemu already use blockdev behind.
>
> If needed, we could make a switch based on qemu version, or or manual
> option.
Yes, we need to be very careful that all the defaults/behavior would be
the same to not break live-migration. A known difference is format
autodetection, which happens with "-drive file=" but not with
"-blockdev", but not relevant as we explicitly set the format. Dumping
the QObject JSON configs of the drives might be a good heuristic to
check that the properties really are the same before and after the switch.
Switching based on QEMU version would need to be the creation QEMU from
the meta config property. Using machine or running binary version would
mean we would automatically switch for non-Windows OSes which are not
version pinned by default, so that doesn't help if there would be
breakage. I'd really hope it is compatible, because for a per-VM flag,
for backwards-compat reasons (e.g. rolling back to a snapshot with
VMstate) it would need to start out as being off by default.
We wouldn't even need to switch to using '-blockdev' right now (still
good thing to do long-term wise, but if it is opt-in, we can't rely on
all VMs having it, which is bad), you could also set the node-name for
the '-drive'. I.e. switching to '-blockdev' can be done separately to
switching to 'blockdev-*' QMP operations.
>>> - if you see a way to name the block graph nodes in a deterministic
>>> fashion (i.e., have a 1:1 mapping between snapshot and block graph
>>> node name) that would be wonderful, else we'd have to find another
>>> way to improve the lookup there..
>
> 1:1 mapping with snapshot is not possible (I have tried it a lot),
> because:
> - snapshot name can be too long (blockdev name is 31 characters max,
> hash based on filename is difficult)
> - with external snapshot file renaming, this don't work (snap-->
> current). We can't rename a blockdev, so the mapping will drift.
>
> So, I don't think that it's possible to avoid lookup. (I really don't
> have idea how to manage it).
> I'm not sure it's really a problem ? it's just an extra qmp call, but
> it's super fast.
Are we sure the node-name for the drive is always stable? I.e. is the
block node that the guest sees inserted in the drive, always the one
named by the 'node-name' that was initially set when attaching the drive
via '-blockdev' or QMP 'blockdev-add'? After all kinds of block
operations? Even if there are partially completed/failed block
operations? After live migration from a not-yet-updated node? Otherwise,
I'd prefer always querying the node-name before doing a QMP 'blockdev-*'
command to make sure it's actually the node that the guest sees as well,
like we currently do for 'block-export-add'. And we wouldn't even need
to set the node-names ourselves at all if always querying first. Seems a
bit more future-proof as well.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2025-01-10 9:55 ` Fiona Ebner
@ 2025-01-10 12:30 ` DERUMIER, Alexandre via pve-devel
[not found] ` <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>
1 sibling, 0 replies; 11+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-10 12:30 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 21680 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 12:30:22 +0000
Message-ID: <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fiona Ebner <f.ebner@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
f.gruenbichler@proxmox.com <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
qemu] add external qcow2 snapshot support
Date: 10/01/2025 10:55:14
Am 10.01.25 um 08:44 schrieb DERUMIER, Alexandre via pve-devel:
> -------- Message initial --------
> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
> qemu] add external qcow2 snapshot support
> Date: 09/01/2025 15:13:14
>
> > Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat
> > am
> > 16.12.2024 10:12 CET geschrieben:
>
> > This patch series implement qcow2 external snapshot support for
> > files
> > && lvm volumes
> >
> > The current internal qcow2 snapshots have bad write performance
> > because no metadatas can be preallocated.
> >
> > This is particulary visible on a shared filesystem like ocfs2 or
> > gfs2.
> >
> > Also other bugs are freeze/lock reported by users since years on
> > snapshots delete on nfs
> > (The disk access seem to be frozen during all the delete duration)
> >
> > This also open doors for remote snapshot export-import for storage
> > replication.
> > >
> > > a few high level remarks:
> > > - I am not sure whether we want to/can switch over to blockdev on
> > > the
> > > fly (i.e., without some sort of opt-in phase to iron out kinks).
> > > what
> > > about upgrades with running VMs? I guess some sort of flag and
> > > per-VM
> > > switching would be a better plan..
>
> I have tested live migration, and it's works for the small tests I
> have
> done. (I was surprised myself). I'll try to do more longer test to be
> 100% sure that they are not corruption of datas.
>
> on the guest side, it's transparent. on qemu side, the devices and
> pci
> plumbing is still the same, and qemu already use blockdev behind.
>
> If needed, we could make a switch based on qemu version, or or manual
> option.
>>Yes, we need to be very careful that all the defaults/behavior would
be
>>the same to not break live-migration. A known difference is format
>>autodetection, which happens with "-drive file=" but not with
>>"-blockdev", but not relevant as we explicitly set the format.
>>Dumping
>>the QObject JSON configs of the drives might be a good heuristic to
>>check that the properties really are the same before and after the
>>switch.
I had looked manually at qdev info, and dumped the blockdevs generated
by -drive command to compare, I didn't see difference (only the node
names and the additional throttle group node)
>>Switching based on QEMU version would need to be the creation QEMU
>>from
>>the meta config property. Using machine or running binary version
>>would
>>mean we would automatically switch for non-Windows OSes which are not
>>version pinned by default, so that doesn't help if there would be
>>breakage.
That's why I was thinking to implement this for pve9. (based on qemu
version)
>>I'd really hope it is compatible, because for a per-VM flag,
>>for backwards-compat reasons (e.g. rolling back to a snapshot with
>>VMstate) it would need to start out as being off by default.
I think that a vmstate is not a problem, because this is only the guest
memory right ? and devices are not changing.
>>We wouldn't even need to switch to using '-blockdev' right now (still
>>good thing to do long-term wise, but if it is opt-in, we can't rely
>>on
>>all VMs having it, which is bad),
>>you could also set the node-name for the '-drive'.
Are you sure about this ? I don't have seen any documentation about
adding the node-name to drive. (and we need it for hotplug hmp
drive_add too :/ )
not even sure this could define specific name to the file nodename,
which is needed for the snapshot renaming with blockdev-reopen.
>>I.e. switching to '-blockdev' can be done separately to
>>switching to 'blockdev-*' QMP operations.
I really don't known if you can implement qmp blockdev-*, with --drive
syntax (where it could be possible to define nodename).
I known that qmp blockdev-* command accept "device" (for --drive) or
"node-name" (for --blockdev)
(BTW,switching to -blockdev is already breaking qmp proxmox backup ^_^
possibly because of the throttle-group top node, I don't remember
exactly).
I'll take time to retest live migration with differents os, restore
snapshot with state, and see if I have crash or silent data
corruptions.
> > > - if you see a way to name the block graph nodes in a
> > > deterministic
> > > fashion (i.e., have a 1:1 mapping between snapshot and block
> > > graph
> > > node name) that would be wonderful, else we'd have to find
> > > another
> > > way to improve the lookup there..
>
> 1:1 mapping with snapshot is not possible (I have tried it a lot),
> because:
> - snapshot name can be too long (blockdev name is 31 characters
> max,
> hash based on filename is difficult)
> - with external snapshot file renaming, this don't work (snap-->
> current). We can't rename a blockdev, so the mapping will drift.
>
> So, I don't think that it's possible to avoid lookup. (I really
> don't
> have idea how to manage it).
> I'm not sure it's really a problem ? it's just an extra qmp call,
> but
> it's super fast.
>>Are we sure the node-name for the drive is always stable? I.e. is the
>>block node that the guest sees inserted in the drive, always the one
>>named by the 'node-name' that was initially set when attaching the
>>drive
>>via '-blockdev' or QMP 'blockdev-add'? After all kinds of block
>>operations? Even if there are partially completed/failed block
>>operations? After live migration from a not-yet-updated node?
>>Otherwise,
No, the only stable nodename for me (in my implementation) is the top
throttle-group node. as it never change during mirroring, snaphot
rename,...
The drive node (format-node or file-node), can change (2 file-node for
live file renaming with blockdev-reopen for example , 2 format-node
switching after a mirror, ...)
>>I'd prefer always querying the node-name before doing a QMP
>>'blockdev-*'
>>command to make sure it's actually the node that the guest sees as
>>well,
>>like we currently do for 'block-export-add'.
That's the way I have done it
>>And we wouldn't even >>need
>>to set the node-names ourselves at all if always querying first.
>>Seems a
>>bit more future-proof as well.
blockdev-reopen don't work with autogenerated nodenames (block#<id>)
(not sure if it's a bug or not).
That's why I'm currently naming all of them (including backing chain
snapshots too)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
[not found] ` <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>
@ 2025-01-13 10:06 ` Fiona Ebner
2025-01-13 10:54 ` Fiona Ebner
0 siblings, 1 reply; 11+ messages in thread
From: Fiona Ebner @ 2025-01-13 10:06 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler
Am 10.01.25 um 13:30 schrieb DERUMIER, Alexandre:
> -------- Message initial --------
> De: Fiona Ebner <f.ebner@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
> f.gruenbichler@proxmox.com <f.gruenbichler@proxmox.com>
> Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
> Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
> qemu] add external qcow2 snapshot support
> Date: 10/01/2025 10:55:14
>
> Am 10.01.25 um 08:44 schrieb DERUMIER, Alexandre via pve-devel:
>> -------- Message initial --------
>> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
>> Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
>> qemu] add external qcow2 snapshot support
>> Date: 09/01/2025 15:13:14
>>
>>> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat
>>> am
>>> 16.12.2024 10:12 CET geschrieben:
>>
>>> This patch series implement qcow2 external snapshot support for
>>> files
>>> && lvm volumes
>>>
>>> The current internal qcow2 snapshots have bad write performance
>>> because no metadatas can be preallocated.
>>>
>>> This is particulary visible on a shared filesystem like ocfs2 or
>>> gfs2.
>>>
>>> Also other bugs are freeze/lock reported by users since years on
>>> snapshots delete on nfs
>>> (The disk access seem to be frozen during all the delete duration)
>>>
>>> This also open doors for remote snapshot export-import for storage
>>> replication.
>>>>
>>>> a few high level remarks:
>>>> - I am not sure whether we want to/can switch over to blockdev on
>>>> the
>>>> fly (i.e., without some sort of opt-in phase to iron out kinks).
>>>> what
>>>> about upgrades with running VMs? I guess some sort of flag and
>>>> per-VM
>>>> switching would be a better plan..
>>
>> I have tested live migration, and it's works for the small tests I
>> have
>> done. (I was surprised myself). I'll try to do more longer test to be
>> 100% sure that they are not corruption of datas.
>>
>> on the guest side, it's transparent. on qemu side, the devices and
>> pci
>> plumbing is still the same, and qemu already use blockdev behind.
>>
>> If needed, we could make a switch based on qemu version, or or manual
>> option.
>
>>> Yes, we need to be very careful that all the defaults/behavior would
> be
>>> the same to not break live-migration. A known difference is format
>>> autodetection, which happens with "-drive file=" but not with
>>> "-blockdev", but not relevant as we explicitly set the format.
>>> Dumping
>>> the QObject JSON configs of the drives might be a good heuristic to
>>> check that the properties really are the same before and after the
>>> switch.
> I had looked manually at qdev info, and dumped the blockdevs generated
> by -drive command to compare, I didn't see difference (only the node
> names and the additional throttle group node)
>
That's a good sign :)
>>> Switching based on QEMU version would need to be the creation QEMU
>>> from
>>> the meta config property. Using machine or running binary version
>>> would
>>> mean we would automatically switch for non-Windows OSes which are not
>>> version pinned by default, so that doesn't help if there would be
>>> breakage.
>
> That's why I was thinking to implement this for pve9. (based on qemu
> version)
We either don't need to base it on QEMU version or we don't need to wait
for PVE 9. Doing both doesn't give us any real advantage. If live
migration really always works, theoretically we would even need to wait
for PVE 9, but it can still make sense so that the change comes at a
very defined time.
>>> I'd really hope it is compatible, because for a per-VM flag,
>>> for backwards-compat reasons (e.g. rolling back to a snapshot with
>>> VMstate) it would need to start out as being off by default.
>
> I think that a vmstate is not a problem, because this is only the guest
> memory right ? and devices are not changing.
No, VM state also includes the device state. But live migration and
rollback to snapshot with VM state are essentially the same.
>>> We wouldn't even need to switch to using '-blockdev' right now (still
>>> good thing to do long-term wise, but if it is opt-in, we can't rely
>>> on
>>> all VMs having it, which is bad),
>>> you could also set the node-name for the '-drive'.
>
> Are you sure about this ? I don't have seen any documentation about
> adding the node-name to drive. (and we need it for hotplug hmp
> drive_add too :/ )
>
> not even sure this could define specific name to the file nodename,
> which is needed for the snapshot renaming with blockdev-reopen.
Yes, I've worked with -drive + node-name while debugging/developing a
few times already. From man kvm: "-drive accepts all options that are
accepted by -blockdev."
>>> I.e. switching to '-blockdev' can be done separately to
>>> switching to 'blockdev-*' QMP operations.
>
>
> I really don't known if you can implement qmp blockdev-*, with --drive
> syntax (where it could be possible to define nodename).
>
> I known that qmp blockdev-* command accept "device" (for --drive) or
> "node-name" (for --blockdev)
Yes, you can. All drives have an auto-genereated node-name already.
block-export-add already does it right now.
> (BTW,switching to -blockdev is already breaking qmp proxmox backup ^_^
> possibly because of the throttle-group top node, I don't remember
> exactly).
Well, might be another reason not to hard-code node names, but always
query them before usage.
> I'll take time to retest live migration with differents os, restore
> snapshot with state, and see if I have crash or silent data
> corruptions.
Great!
>>>> - if you see a way to name the block graph nodes in a
>>>> deterministic
>>>> fashion (i.e., have a 1:1 mapping between snapshot and block
>>>> graph
>>>> node name) that would be wonderful, else we'd have to find
>>>> another
>>>> way to improve the lookup there..
>>
>> 1:1 mapping with snapshot is not possible (I have tried it a lot),
>> because:
>> - snapshot name can be too long (blockdev name is 31 characters
>> max,
>> hash based on filename is difficult)
>> - with external snapshot file renaming, this don't work (snap-->
>> current). We can't rename a blockdev, so the mapping will drift.
>>
>> So, I don't think that it's possible to avoid lookup. (I really
>> don't
>> have idea how to manage it).
>> I'm not sure it's really a problem ? it's just an extra qmp call,
>> but
>> it's super fast.
>
>>> Are we sure the node-name for the drive is always stable? I.e. is the
>>> block node that the guest sees inserted in the drive, always the one
>>> named by the 'node-name' that was initially set when attaching the
>>> drive
>>> via '-blockdev' or QMP 'blockdev-add'? After all kinds of block
>>> operations? Even if there are partially completed/failed block
>>> operations? After live migration from a not-yet-updated node?
>>> Otherwise,
>
> No, the only stable nodename for me (in my implementation) is the top
> throttle-group node. as it never change during mirroring, snaphot
> rename,...
>
> The drive node (format-node or file-node), can change (2 file-node for
> live file renaming with blockdev-reopen for example , 2 format-node
> switching after a mirror, ...)
>
It does sounds a bit brittle then.
>>> I'd prefer always querying the node-name before doing a QMP
>>> 'blockdev-*'
>>> command to make sure it's actually the node that the guest sees as
>>> well,
>>> like we currently do for 'block-export-add'.
> That's the way I have done it
No, you set a fixed name on the commandline and rely on that when using
the QMP commands later. I mean doing what block-export-add currently
does, i.e. use "query-block" to see which node name is currently
inserted for the drive and then use that node-name.
Another reason for avoiding fixed node-names would be that users or
third parties could modify the block graph of the VM if they need it,
e.g. a third-party backup solution might have the need to do this.
>>> And we wouldn't even >>need
>>> to set the node-names ourselves at all if always querying first.
>>> Seems a
>>> bit more future-proof as well.
>
> blockdev-reopen don't work with autogenerated nodenames (block#<id>)
> (not sure if it's a bug or not).
> That's why I'm currently naming all of them (including backing chain
> snapshots too)
Hmm, sounds like it might a bug, I can look into it. If really required
to make it work, we can still set fixed node-names on the commandline,
but also query them before usage to be sure we have the correct, i.e.
currently inserted node.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2025-01-13 10:06 ` Fiona Ebner
@ 2025-01-13 10:54 ` Fiona Ebner
2025-01-13 10:57 ` DERUMIER, Alexandre via pve-devel
0 siblings, 1 reply; 11+ messages in thread
From: Fiona Ebner @ 2025-01-13 10:54 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler
Am 13.01.25 um 11:06 schrieb Fiona Ebner:
> Am 10.01.25 um 13:30 schrieb DERUMIER, Alexandre:
>> blockdev-reopen don't work with autogenerated nodenames (block#<id>)
>> (not sure if it's a bug or not).
>> That's why I'm currently naming all of them (including backing chain
>> snapshots too)
>
> Hmm, sounds like it might a bug, I can look into it. If really required
> to make it work, we can still set fixed node-names on the commandline,
> but also query them before usage to be sure we have the correct, i.e.
> currently inserted node.
AFAICT, this is because the node name is not set in the original options
for the block driver state and then it wrongly detects an attempt to
change the node name (even if specifying the correct auto-generated one
during reopen). However, it is rather ugly to try and use a -drive
together with blockdev-reopen in any case, blockdev-reopen is really
written with -blockdev in mind and -blockdev necessarily requires
setting node-name up front.
I don't think it's even worth fixing that bug. We should use
blockdev-reopen only after switching to -blockdev, it's much nicer like
that :)
I'd still be in favor of querying the node-name of drives with
query-block before doing QMP operations though. Like that, we can
warn/error if it doesn't match what we expect for example, to catch
unexpected situations.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2025-01-13 10:54 ` Fiona Ebner
@ 2025-01-13 10:57 ` DERUMIER, Alexandre via pve-devel
2025-01-13 11:54 ` Fiona Ebner
0 siblings, 1 reply; 11+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 10:57 UTC (permalink / raw)
To: pve-devel, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 14047 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Mon, 13 Jan 2025 10:57:45 +0000
Message-ID: <bc57ab17fa9e1870ac554ec418e8a0b57c2a81b8.camel@groupe-cyllene.com>
> Hmm, sounds like it might a bug, I can look into it. If really
> required
> to make it work, we can still set fixed node-names on the
> commandline,
> but also query them before usage to be sure we have the correct, i.e.
> currently inserted node.
>>AFAICT, this is because the node name is not set in the original
>>options
>>for the block driver state and then it wrongly detects an attempt to
>>change the node name (even if specifying the correct auto-generated
>>one
>>during reopen). However, it is rather ugly to try and use a -drive
>>together with blockdev-reopen in any case, blockdev-reopen is really
>>written with -blockdev in mind and -blockdev necessarily requires
>>setting node-name up front.
you are too fast ^_^
>>I don't think it's even worth fixing that bug. We should use
>>blockdev-reopen only after switching to -blockdev, it's much nicer
>>like
>>that :)
>>
>>I'd still be in favor of querying the node-name of drives with
>>query-block before doing QMP operations though. Like that, we can
>>warn/error if it doesn't match what we expect for example, to catch
>>unexpected situations.
ok, so the question is : how to query to node-noname of different disk
(+snapshot chain).
Fabian is against the use of the path.
So, I'm a little bit out of idea ^_^
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2025-01-13 10:57 ` DERUMIER, Alexandre via pve-devel
@ 2025-01-13 11:54 ` Fiona Ebner
2025-01-13 11:58 ` DERUMIER, Alexandre via pve-devel
[not found] ` <2cbef7d2a33ed5ea6fab15b97f611fc4bf207c0f.camel@groupe-cyllene.com>
0 siblings, 2 replies; 11+ messages in thread
From: Fiona Ebner @ 2025-01-13 11:54 UTC (permalink / raw)
To: Proxmox VE development discussion, f.gruenbichler
Am 13.01.25 um 11:57 schrieb DERUMIER, Alexandre via pve-devel:
>> Hmm, sounds like it might a bug, I can look into it. If really
>> required
>> to make it work, we can still set fixed node-names on the
>> commandline,
>> but also query them before usage to be sure we have the correct, i.e.
>> currently inserted node.
>
>>> AFAICT, this is because the node name is not set in the original
>>> options
>>> for the block driver state and then it wrongly detects an attempt to
>>> change the node name (even if specifying the correct auto-generated
>>> one
>>> during reopen). However, it is rather ugly to try and use a -drive
>>> together with blockdev-reopen in any case, blockdev-reopen is really
>>> written with -blockdev in mind and -blockdev necessarily requires
>>> setting node-name up front.
>
> you are too fast ^_^
>
>>> I don't think it's even worth fixing that bug. We should use
>>> blockdev-reopen only after switching to -blockdev, it's much nicer
>>> like
>>> that 🙂
>>>
>>> I'd still be in favor of querying the node-name of drives with
>>> query-block before doing QMP operations though. Like that, we can
>>> warn/error if it doesn't match what we expect for example, to catch
>>> unexpected situations.
>
> ok, so the question is : how to query to node-noname of different disk
> (+snapshot chain).
>
> Fabian is against the use of the path.
>
> So, I'm a little bit out of idea ^_^
>
For almost all QMP commands, we only need to care about the node that's
inserted for the drive. And for your use-case, checking that the top
node of the chain matches what we expect is already a good first step.
The lookup itself is a different question, I'll also answer to the other
mail.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
2025-01-13 11:54 ` Fiona Ebner
@ 2025-01-13 11:58 ` DERUMIER, Alexandre via pve-devel
[not found] ` <2cbef7d2a33ed5ea6fab15b97f611fc4bf207c0f.camel@groupe-cyllene.com>
1 sibling, 0 replies; 11+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-01-13 11:58 UTC (permalink / raw)
To: pve-devel, f.ebner, f.gruenbichler; +Cc: DERUMIER, Alexandre
[-- Attachment #1: Type: message/rfc822, Size: 15766 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Mon, 13 Jan 2025 11:58:04 +0000
Message-ID: <2cbef7d2a33ed5ea6fab15b97f611fc4bf207c0f.camel@groupe-cyllene.com>
>>For almost all QMP commands, we only need to care about the node
>>that's
>>inserted for the drive.
(yes, that the throttle group in my implementation, and I have a fixed
name, I'm reusing the "drive-(ide|scsi|virtio)x naming"
>>And for your use-case, checking that the top
>>node of the chain matches what we expect is already a good first
>>step.
>>The lookup itself is a different question, I'll also answer to the
>>other
>>mail.
Maybe this could help to understand the problem:
Here a small resume of the 2 workflow, snapshot && block mirror,
where we have a switch between nodes:
snapshot
--------
--------
a) renaming current
--------------------
1)
device--->throttle-group--->fmt-node1----->file-node1 ----> vm-100-
disk-0.qcow2
2) create a new file node with new file name
device--->throttle-group--->fmt-node1----->file-node1 ----> vm-100-
disk-0.qcow2
file-node2 --> vm-100-disk-0-snap1.qcow2
3) switch the file node with blockdev-reopen
device--->throttle-group--->fmt-node1----->file-node2 ---> vm-100-disk-
0-snap1.qcow2
file-node1 --> vm-100-disk-0.qcow2
4) delete the old filenode
device--->throttle-group--->fmt-node1----->file-node2 ---> vm-100-disk-
0-snap1.qcow2
b) create the new current
-------------------------
1) add a new fmt node
device--->throttle-group--->fmt-node1----->file-node2 ---> vm-100-disk-
0-snap1.qcow2
fmt-node3----->file-node3----->vm-100-disk-0.qcow2
2) blockdev-snapshot -> set fmt-node3 active with fmt-node1 as backing
device--->throttle-group--->fmt-node3----->file-node3----->vm-100-disk-
0.qcow2
|
|--> fmt-node1----->file-node2 ---> vm-
100-disk-0-snap1.qcow2
mirror
--------
--------
1)
device--->throttle-group--->fmt-node1----->file-node1 ----> vm-100-
disk-0.qcow2
2) add a new target fmt + file node
device--->throttle-group--->fmt-node1----->file-node1 ----> vm-100-
disk-0.qcow2
fmt-node2----->file-node2 ----> vm-100-disk-1.qcow2
3) blockdev-mirror (mirror + switch the fmt node on complete)
device--->throttle-group--->fmt-node2----->file-node2 ----> vm-100-
disk-1.qcow2
fmt-node1----->file-node1 ----> vm-100-disk-0.qcow2
4) delete the old fmt+file node
device--->throttle-group--->fmt-node2----->file-node2 ----> vm-100-
disk-1.qcow2
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
[not found] ` <2cbef7d2a33ed5ea6fab15b97f611fc4bf207c0f.camel@groupe-cyllene.com>
@ 2025-01-13 13:42 ` Fiona Ebner
0 siblings, 0 replies; 11+ messages in thread
From: Fiona Ebner @ 2025-01-13 13:42 UTC (permalink / raw)
To: DERUMIER, Alexandre, pve-devel, f.gruenbichler
Am 13.01.25 um 12:58 schrieb DERUMIER, Alexandre:
>
>
>>> For almost all QMP commands, we only need to care about the node
>>> that's
>>> inserted for the drive.
> (yes, that the throttle group in my implementation, and I have a fixed
> name, I'm reusing the "drive-(ide|scsi|virtio)x naming"
>
This assumption might not always hold, see my earlier replies. To avoid
relying on the assumption, we can use query-block and warn if the name
doesn't match the fixed one we expect. This usually shouldn't happen,
but who knows what the future brings and what else touches the block
graph. Querying is cheap and will immediately give us a good idea
what/when something goes wrong. Such a situation might even just be a
valid edge case or third-party use case. I don't want to break those for
no reason.
Let me illustrate what I mean with an example: say that a user requests
move storage for drive scsi0. Then we can:
1. use query-block to get the node name of what's inserted in drive
scsi0 right now
2. warn if the node-name doesn't match the expected drive-scsi0 name
3. run blockdev-mirror with the node name we queried, because that is
the node name with that data that the guest also sees right now
We could also error out instead of warn in step 2, but that might break
some third-party use cases or edge cases we are not aware of right now.
The problem for how to name the nodes in the backing chain is
independent from this. I'll answer the other mail where you discussed
this with Fabian.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-01-13 13:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-16 9:12 [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support Alexandre Derumier via pve-devel
2025-01-09 14:13 ` Fabian Grünbichler
2025-01-10 7:44 ` DERUMIER, Alexandre via pve-devel
2025-01-10 9:55 ` Fiona Ebner
2025-01-10 12:30 ` DERUMIER, Alexandre via pve-devel
[not found] ` <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>
2025-01-13 10:06 ` Fiona Ebner
2025-01-13 10:54 ` Fiona Ebner
2025-01-13 10:57 ` DERUMIER, Alexandre via pve-devel
2025-01-13 11:54 ` Fiona Ebner
2025-01-13 11:58 ` DERUMIER, Alexandre via pve-devel
[not found] ` <2cbef7d2a33ed5ea6fab15b97f611fc4bf207c0f.camel@groupe-cyllene.com>
2025-01-13 13:42 ` Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox