public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "DERUMIER, Alexandre via pve-devel" <pve-devel@lists.proxmox.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
	"f.ebner@proxmox.com" <f.ebner@proxmox.com>,
	"f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 12:30:22 +0000	[thread overview]
Message-ID: <mailman.215.1736512256.441.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <ba801ed2-5a65-4ab1-81e3-122a2d522006@proxmox.com>

[-- Attachment #1: Type: message/rfc822, Size: 21680 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 12:30:22 +0000
Message-ID: <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>

-------- Message initial --------
De: Fiona Ebner <f.ebner@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
f.gruenbichler@proxmox.com <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
qemu] add external qcow2 snapshot support
Date: 10/01/2025 10:55:14

Am 10.01.25 um 08:44 schrieb DERUMIER, Alexandre via pve-devel:
> -------- Message initial --------
> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
> qemu] add external qcow2 snapshot support
> Date: 09/01/2025 15:13:14
> 
> > Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat
> > am
> > 16.12.2024 10:12 CET geschrieben:
> 
> > This patch series implement qcow2 external snapshot support for
> > files
> > && lvm volumes
> > 
> > The current internal qcow2 snapshots have bad write performance
> > because no metadatas can be preallocated.
> > 
> > This is particulary visible on a shared filesystem like ocfs2 or
> > gfs2.
> > 
> > Also other bugs are freeze/lock reported by users since years on
> > snapshots delete on nfs
> > (The disk access seem to be frozen during all the delete duration)
> > 
> > This also open doors for remote snapshot export-import for storage
> > replication.
> > > 
> > > a few high level remarks:
> > > - I am not sure whether we want to/can switch over to blockdev on
> > > the
> > > fly (i.e., without some sort of opt-in phase to iron out kinks).
> > > what
> > > about upgrades with running VMs? I guess some sort of flag and
> > > per-VM
> > > switching would be a better plan..
> 
> I have tested live migration, and it's works for the small tests I
> have
> done. (I was surprised myself). I'll try to do more longer test to be
> 100% sure that they are not corruption of datas.
> 
>  on the guest side, it's transparent. on qemu side, the devices and
> pci
> plumbing is still the same, and qemu already use blockdev behind.
> 
> If needed, we could make a switch based on qemu version, or or manual
> option.

>>Yes, we need to be very careful that all the defaults/behavior would
be
>>the same to not break live-migration. A known difference is format
>>autodetection, which happens with "-drive file=" but not with
>>"-blockdev", but not relevant as we explicitly set the format.
>>Dumping
>>the QObject JSON configs of the drives might be a good heuristic to
>>check that the properties really are the same before and after the
>>switch.
I had looked manually at qdev info, and dumped the blockdevs generated
by -drive command to compare, I didn't see difference (only the node
names and the additional throttle group node)

>>Switching based on QEMU version would need to be the creation QEMU
>>from
>>the meta config property. Using machine or running binary version
>>would
>>mean we would automatically switch for non-Windows OSes which are not
>>version pinned by default, so that doesn't help if there would be
>>breakage.

That's why I was thinking to implement this for pve9. (based on qemu
version)

>>I'd really hope it is compatible, because for a per-VM flag,
>>for backwards-compat reasons (e.g. rolling back to a snapshot with
>>VMstate) it would need to start out as being off by default.

I think that a vmstate is not a problem, because this is only the guest
memory right ? and devices are not changing.


>>We wouldn't even need to switch to using '-blockdev' right now (still
>>good thing to do long-term wise, but if it is opt-in, we can't rely
>>on
>>all VMs having it, which is bad), 
>>you could also set the node-name for the '-drive'. 

Are you sure about this ? I don't have seen any documentation about
adding the node-name to drive.   (and we need it for hotplug hmp
drive_add too :/ )

not even sure this could define specific name to the file nodename,
which is needed for the snapshot renaming with blockdev-reopen.


>>I.e. switching to '-blockdev' can be done separately to
>>switching to 'blockdev-*' QMP operations.


I really don't known if you can implement qmp blockdev-*, with --drive
syntax (where it could be possible to define nodename).

I known that qmp blockdev-* command accept "device" (for --drive) or
"node-name" (for --blockdev)


(BTW,switching to -blockdev is already breaking qmp proxmox backup ^_^
possibly because of the throttle-group top node, I don't remember
exactly).



I'll take time to retest  live migration with differents os, restore 
snapshot with state, and see if I have crash or silent data
corruptions.


> > > - if you see a way to name the block graph nodes in a
> > > deterministic
> > > fashion (i.e., have a 1:1 mapping between snapshot and block
> > > graph
> > > node name) that would be wonderful, else we'd have to find
> > > another
> > > way to improve the lookup there..
> 
> 1:1 mapping with snapshot is not possible (I have tried it a lot),
> because:
>   - snapshot name can be too long (blockdev name is 31 characters
> max,
> hash based on filename is difficult)
>   - with external snapshot file renaming, this don't work  (snap-->
> current). We can't rename a blockdev, so the mapping will drift.
> 
>   So, I don't think that it's possible to avoid lookup. (I really
> don't
> have idea how to manage it).  
> I'm not sure it's really a problem ?  it's just an extra qmp call,
> but
> it's super fast.

>>Are we sure the node-name for the drive is always stable? I.e. is the
>>block node that the guest sees inserted in the drive, always the one
>>named by the 'node-name' that was initially set when attaching the
>>drive
>>via '-blockdev' or QMP 'blockdev-add'? After all kinds of block
>>operations? Even if there are partially completed/failed block
>>operations? After live migration from a not-yet-updated node?
>>Otherwise,

No, the only stable nodename for me (in my implementation) is the top
throttle-group node. as it never change during mirroring, snaphot
rename,...

The drive node (format-node or file-node), can change (2 file-node for
live file renaming with blockdev-reopen for example , 2 format-node
switching after a mirror, ...)


>>I'd prefer always querying the node-name before doing a QMP
>>'blockdev-*'
>>command to make sure it's actually the node that the guest sees as
>>well,
>>like we currently do for 'block-export-add'. 
That's the way I have done it

>>And we wouldn't even >>need
>>to set the node-names ourselves at all if always querying first.
>>Seems a
>>bit more future-proof as well.

blockdev-reopen don't work with autogenerated nodenames (block#<id>)
(not sure if it's a bug or not).
That's why I'm currently naming all of them (including backing chain
snapshots too)








[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

      reply	other threads:[~2025-01-10 12:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16  9:12 Alexandre Derumier via pve-devel
2025-01-09 14:13 ` Fabian Grünbichler
2025-01-10  7:44   ` DERUMIER, Alexandre via pve-devel
2025-01-10  9:55     ` Fiona Ebner
2025-01-10 12:30       ` DERUMIER, Alexandre via pve-devel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mailman.215.1736512256.441.pve-devel@lists.proxmox.com \
    --to=pve-devel@lists.proxmox.com \
    --cc=alexandre.derumier@groupe-cyllene.com \
    --cc=f.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal