all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "DERUMIER, Alexandre via pve-devel" <pve-devel@lists.proxmox.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>,
	"f.ebner@proxmox.com" <f.ebner@proxmox.com>,
	"f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 12:30:22 +0000	[thread overview]
Message-ID: <mailman.215.1736512256.441.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <ba801ed2-5a65-4ab1-81e3-122a2d522006@proxmox.com>

[-- Attachment #1: Type: message/rfc822, Size: 21680 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>, "f.ebner@proxmox.com" <f.ebner@proxmox.com>, "f.gruenbichler@proxmox.com" <f.gruenbichler@proxmox.com>
Subject: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support
Date: Fri, 10 Jan 2025 12:30:22 +0000
Message-ID: <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>

-------- Message initial --------
De: Fiona Ebner <f.ebner@proxmox.com>
À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
f.gruenbichler@proxmox.com <f.gruenbichler@proxmox.com>
Cc: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
qemu] add external qcow2 snapshot support
Date: 10/01/2025 10:55:14

Am 10.01.25 um 08:44 schrieb DERUMIER, Alexandre via pve-devel:
> -------- Message initial --------
> De: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> À: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
> Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
> qemu] add external qcow2 snapshot support
> Date: 09/01/2025 15:13:14
> 
> > Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat
> > am
> > 16.12.2024 10:12 CET geschrieben:
> 
> > This patch series implement qcow2 external snapshot support for
> > files
> > && lvm volumes
> > 
> > The current internal qcow2 snapshots have bad write performance
> > because no metadatas can be preallocated.
> > 
> > This is particulary visible on a shared filesystem like ocfs2 or
> > gfs2.
> > 
> > Also other bugs are freeze/lock reported by users since years on
> > snapshots delete on nfs
> > (The disk access seem to be frozen during all the delete duration)
> > 
> > This also open doors for remote snapshot export-import for storage
> > replication.
> > > 
> > > a few high level remarks:
> > > - I am not sure whether we want to/can switch over to blockdev on
> > > the
> > > fly (i.e., without some sort of opt-in phase to iron out kinks).
> > > what
> > > about upgrades with running VMs? I guess some sort of flag and
> > > per-VM
> > > switching would be a better plan..
> 
> I have tested live migration, and it's works for the small tests I
> have
> done. (I was surprised myself). I'll try to do more longer test to be
> 100% sure that they are not corruption of datas.
> 
>  on the guest side, it's transparent. on qemu side, the devices and
> pci
> plumbing is still the same, and qemu already use blockdev behind.
> 
> If needed, we could make a switch based on qemu version, or or manual
> option.

>>Yes, we need to be very careful that all the defaults/behavior would
be
>>the same to not break live-migration. A known difference is format
>>autodetection, which happens with "-drive file=" but not with
>>"-blockdev", but not relevant as we explicitly set the format.
>>Dumping
>>the QObject JSON configs of the drives might be a good heuristic to
>>check that the properties really are the same before and after the
>>switch.
I had looked manually at qdev info, and dumped the blockdevs generated
by -drive command to compare, I didn't see difference (only the node
names and the additional throttle group node)

>>Switching based on QEMU version would need to be the creation QEMU
>>from
>>the meta config property. Using machine or running binary version
>>would
>>mean we would automatically switch for non-Windows OSes which are not
>>version pinned by default, so that doesn't help if there would be
>>breakage.

That's why I was thinking to implement this for pve9. (based on qemu
version)

>>I'd really hope it is compatible, because for a per-VM flag,
>>for backwards-compat reasons (e.g. rolling back to a snapshot with
>>VMstate) it would need to start out as being off by default.

I think that a vmstate is not a problem, because this is only the guest
memory right ? and devices are not changing.


>>We wouldn't even need to switch to using '-blockdev' right now (still
>>good thing to do long-term wise, but if it is opt-in, we can't rely
>>on
>>all VMs having it, which is bad), 
>>you could also set the node-name for the '-drive'. 

Are you sure about this ? I don't have seen any documentation about
adding the node-name to drive.   (and we need it for hotplug hmp
drive_add too :/ )

not even sure this could define specific name to the file nodename,
which is needed for the snapshot renaming with blockdev-reopen.


>>I.e. switching to '-blockdev' can be done separately to
>>switching to 'blockdev-*' QMP operations.


I really don't known if you can implement qmp blockdev-*, with --drive
syntax (where it could be possible to define nodename).

I known that qmp blockdev-* command accept "device" (for --drive) or
"node-name" (for --blockdev)


(BTW,switching to -blockdev is already breaking qmp proxmox backup ^_^
possibly because of the throttle-group top node, I don't remember
exactly).



I'll take time to retest  live migration with differents os, restore 
snapshot with state, and see if I have crash or silent data
corruptions.


> > > - if you see a way to name the block graph nodes in a
> > > deterministic
> > > fashion (i.e., have a 1:1 mapping between snapshot and block
> > > graph
> > > node name) that would be wonderful, else we'd have to find
> > > another
> > > way to improve the lookup there..
> 
> 1:1 mapping with snapshot is not possible (I have tried it a lot),
> because:
>   - snapshot name can be too long (blockdev name is 31 characters
> max,
> hash based on filename is difficult)
>   - with external snapshot file renaming, this don't work  (snap-->
> current). We can't rename a blockdev, so the mapping will drift.
> 
>   So, I don't think that it's possible to avoid lookup. (I really
> don't
> have idea how to manage it).  
> I'm not sure it's really a problem ?  it's just an extra qmp call,
> but
> it's super fast.

>>Are we sure the node-name for the drive is always stable? I.e. is the
>>block node that the guest sees inserted in the drive, always the one
>>named by the 'node-name' that was initially set when attaching the
>>drive
>>via '-blockdev' or QMP 'blockdev-add'? After all kinds of block
>>operations? Even if there are partially completed/failed block
>>operations? After live migration from a not-yet-updated node?
>>Otherwise,

No, the only stable nodename for me (in my implementation) is the top
throttle-group node. as it never change during mirroring, snaphot
rename,...

The drive node (format-node or file-node), can change (2 file-node for
live file renaming with blockdev-reopen for example , 2 format-node
switching after a mirror, ...)


>>I'd prefer always querying the node-name before doing a QMP
>>'blockdev-*'
>>command to make sure it's actually the node that the guest sees as
>>well,
>>like we currently do for 'block-export-add'. 
That's the way I have done it

>>And we wouldn't even >>need
>>to set the node-names ourselves at all if always querying first.
>>Seems a
>>bit more future-proof as well.

blockdev-reopen don't work with autogenerated nodenames (block#<id>)
(not sure if it's a bug or not).
That's why I'm currently naming all of them (including backing chain
snapshots too)








[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-01-10 12:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16  9:12 Alexandre Derumier via pve-devel
2025-01-09 14:13 ` Fabian Grünbichler
2025-01-10  7:44   ` DERUMIER, Alexandre via pve-devel
2025-01-10  9:55     ` Fiona Ebner
2025-01-10 12:30       ` DERUMIER, Alexandre via pve-devel [this message]
     [not found]       ` <8f309dfe189379acf72db07398a37a98e8fc3550.camel@groupe-cyllene.com>
2025-01-13 10:06         ` Fiona Ebner
2025-01-13 10:54           ` Fiona Ebner
2025-01-13 10:57             ` DERUMIER, Alexandre via pve-devel
2025-01-13 11:54               ` Fiona Ebner
2025-01-13 11:58                 ` DERUMIER, Alexandre via pve-devel
     [not found]                 ` <2cbef7d2a33ed5ea6fab15b97f611fc4bf207c0f.camel@groupe-cyllene.com>
2025-01-13 13:42                   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mailman.215.1736512256.441.pve-devel@lists.proxmox.com \
    --to=pve-devel@lists.proxmox.com \
    --cc=alexandre.derumier@groupe-cyllene.com \
    --cc=f.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal