From: "DERUMIER, Alexandre via pve-devel" <pve-devel@lists.proxmox.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
"f.ebner@proxmox.com" <f.ebner@proxmox.com>,
"f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 12:30:22 +0000 [thread overview]
Message-ID: <mailman.215.1736512256.441.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <ba801ed2-5a65-4ab1-81e3-122a2d522006@proxmox.com>
[-- Attachment #1: Type: message/rfc822, Size: 21680 bytes --]
From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 12:30:22 +0000
Message-ID: <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>
-------- Message initial --------
De: Fiona Ebner <f.ebner@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
f.gruenbichler@proxmox.com <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
qemu] add external qcow2 snapshot support
Date: 10/01/2025 10:55:14
Am 10.01.25 um 08:44 schrieb DERUMIER, Alexandre via pve-devel:
> -------- Message initial --------
> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
> qemu] add external qcow2 snapshot support
> Date: 09/01/2025 15:13:14
>
> > Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat
> > am
> > 16.12.2024 10:12 CET geschrieben:
>
> > This patch series implement qcow2 external snapshot support for
> > files
> > && lvm volumes
> >
> > The current internal qcow2 snapshots have bad write performance
> > because no metadatas can be preallocated.
> >
> > This is particulary visible on a shared filesystem like ocfs2 or
> > gfs2.
> >
> > Also other bugs are freeze/lock reported by users since years on
> > snapshots delete on nfs
> > (The disk access seem to be frozen during all the delete duration)
> >
> > This also open doors for remote snapshot export-import for storage
> > replication.
> > >
> > > a few high level remarks:
> > > - I am not sure whether we want to/can switch over to blockdev on
> > > the
> > > fly (i.e., without some sort of opt-in phase to iron out kinks).
> > > what
> > > about upgrades with running VMs? I guess some sort of flag and
> > > per-VM
> > > switching would be a better plan..
>
> I have tested live migration, and it's works for the small tests I
> have
> done. (I was surprised myself). I'll try to do more longer test to be
> 100% sure that they are not corruption of datas.
>
> on the guest side, it's transparent. on qemu side, the devices and
> pci
> plumbing is still the same, and qemu already use blockdev behind.
>
> If needed, we could make a switch based on qemu version, or or manual
> option.
>>Yes, we need to be very careful that all the defaults/behavior would
be
>>the same to not break live-migration. A known difference is format
>>autodetection, which happens with "-drive file=" but not with
>>"-blockdev", but not relevant as we explicitly set the format.
>>Dumping
>>the QObject JSON configs of the drives might be a good heuristic to
>>check that the properties really are the same before and after the
>>switch.
I had looked manually at qdev info, and dumped the blockdevs generated
by -drive command to compare, I didn't see difference (only the node
names and the additional throttle group node)
>>Switching based on QEMU version would need to be the creation QEMU
>>from
>>the meta config property. Using machine or running binary version
>>would
>>mean we would automatically switch for non-Windows OSes which are not
>>version pinned by default, so that doesn't help if there would be
>>breakage.
That's why I was thinking to implement this for pve9. (based on qemu
version)
>>I'd really hope it is compatible, because for a per-VM flag,
>>for backwards-compat reasons (e.g. rolling back to a snapshot with
>>VMstate) it would need to start out as being off by default.
I think that a vmstate is not a problem, because this is only the guest
memory right ? and devices are not changing.
>>We wouldn't even need to switch to using '-blockdev' right now (still
>>good thing to do long-term wise, but if it is opt-in, we can't rely
>>on
>>all VMs having it, which is bad),
>>you could also set the node-name for the '-drive'.
Are you sure about this ? I don't have seen any documentation about
adding the node-name to drive. (and we need it for hotplug hmp
drive_add too :/ )
not even sure this could define specific name to the file nodename,
which is needed for the snapshot renaming with blockdev-reopen.
>>I.e. switching to '-blockdev' can be done separately to
>>switching to 'blockdev-*' QMP operations.
I really don't known if you can implement qmp blockdev-*, with --drive
syntax (where it could be possible to define nodename).
I known that qmp blockdev-* command accept "device" (for --drive) or
"node-name" (for --blockdev)
(BTW,switching to -blockdev is already breaking qmp proxmox backup ^_^
possibly because of the throttle-group top node, I don't remember
exactly).
I'll take time to retest live migration with differents os, restore
snapshot with state, and see if I have crash or silent data
corruptions.
> > > - if you see a way to name the block graph nodes in a
> > > deterministic
> > > fashion (i.e., have a 1:1 mapping between snapshot and block
> > > graph
> > > node name) that would be wonderful, else we'd have to find
> > > another
> > > way to improve the lookup there..
>
> 1:1 mapping with snapshot is not possible (I have tried it a lot),
> because:
> - snapshot name can be too long (blockdev name is 31 characters
> max,
> hash based on filename is difficult)
> - with external snapshot file renaming, this don't work (snap-->
> current). We can't rename a blockdev, so the mapping will drift.
>
> So, I don't think that it's possible to avoid lookup. (I really
> don't
> have idea how to manage it).
> I'm not sure it's really a problem ? it's just an extra qmp call,
> but
> it's super fast.
>>Are we sure the node-name for the drive is always stable? I.e. is the
>>block node that the guest sees inserted in the drive, always the one
>>named by the 'node-name' that was initially set when attaching the
>>drive
>>via '-blockdev' or QMP 'blockdev-add'? After all kinds of block
>>operations? Even if there are partially completed/failed block
>>operations? After live migration from a not-yet-updated node?
>>Otherwise,
No, the only stable nodename for me (in my implementation) is the top
throttle-group node. as it never change during mirroring, snaphot
rename,...
The drive node (format-node or file-node), can change (2 file-node for
live file renaming with blockdev-reopen for example , 2 format-node
switching after a mirror, ...)
>>I'd prefer always querying the node-name before doing a QMP
>>'blockdev-*'
>>command to make sure it's actually the node that the guest sees as
>>well,
>>like we currently do for 'block-export-add'.
That's the way I have done it
>>And we wouldn't even >>need
>>to set the node-names ourselves at all if always querying first.
>>Seems a
>>bit more future-proof as well.
blockdev-reopen don't work with autogenerated nodenames (block#<id>)
(not sure if it's a bug or not).
That's why I'm currently naming all of them (including backing chain
snapshots too)
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2025-01-10 12:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-16 9:12 Alexandre Derumier via pve-devel
2025-01-09 14:13 ` Fabian Grünbichler
2025-01-10 7:44 ` DERUMIER, Alexandre via pve-devel
2025-01-10 9:55 ` Fiona Ebner
2025-01-10 12:30 ` DERUMIER, Alexandre via pve-devel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mailman.215.1736512256.441.pve-devel@lists.proxmox.com \
--to=pve-devel@lists.proxmox.com \
--cc=alexandre.derumier@groupe-cyllene.com \
--cc=f.ebner@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox