public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH SERIES storage/qemu-server/-manager] RFC : add lvmqcow2 storage support
Date: Thu, 5 Sep 2024 09:51:52 +0200 (CEST)	[thread overview]
Message-ID: <123271268.23934.1725522712149@webmail.proxmox.com> (raw)
In-Reply-To: <mailman.412.1724670071.302.pve-devel@lists.proxmox.com>


> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 26.08.2024 13:00 CEST geschrieben:
> This patch series add support for a new lvmqcow2 storage format.
> 
> Currently, we can't do snasphot && thin provisionning on shared block devices because
> lvm thin can't share his metavolume. I have a lot of onprem vmware customers
> where it's really blocking the proxmox migration. (and they are looking for ovirt/oracle
> virtualisation where it's working fine).
> 
> It's possible to format a block device without filesystem with qcow2 format directly.
> This is used by redhat rhev/ovirt since almost 10year in their vsdm daemon.
> 
> For thin provisiniong or to handle extra size of snapshot, we need to be able to resize
> the lvm volume dynamically.
> The volume is increased by chunk of 1GB by default (can be changed).
> Qemu implement events to sent an alert when the write usage is reaching a threshold.
> (Threshold is 50% of last chunk, so when vm have 500MB free)
> 
> The resize is async (around 2s), so user need to choose a correct chunk size && threshold,
> if the storage is really fast (nvme for example, where you can write more than 500MB in 2ss)
> 
> If the resize is not enough fast, the vm will pause in io-error.
> pvestatd is looking for this error, and try to extend again if needed and resume the vm

I agree with Dominik about the downsides of this approach.

We had a brief chat this morning and came up with a possible alternative that would still allow snapshots (even if thin-provisioning would be out of scope):

- allocate the volume with the full size and put a fully pre-allocated qcow2 file on it
- no need to monitor regular guest I/O, it's guaranteed that the qcow2 file can be fully written
- when creating a snapshot
-- check the actual usage of the qcow2 file
-- extend the underlying volume so that the total size is current usage + size exposed to the guest
-- create the actual (qcwo2-internal) snapshot
- still no need to monitor guest I/O, the underlying volume should be big enough to overwrite all data

this would give us effectively the same semantics as thick-provisioned zvols, which also always reserve enough space at snapshot creation time to allow a full overwrite of the whole zvol. if the underlying volume cannot be extended by the required space, snapshot creation would fail.

some open questions:
- do we actually get enough information about space usage out of the qcow2 file (I think so, but haven't checked in detail)
- is there a way to compact/shrink either when removing snapshots, or as (potentially expensive) standalone action (worst case, compact by copying the whole disk?)

another, less involved approach would be to over-allocate the volume to provide a fixed, limited amount of slack for snapshots (e.g., "allocate 50% extra space for snapshots" when creating a guest volume) - but that has all the usual downsides of thin-provisioning (the guest is lied to about the disk size, and can run into weird error states when space runs out) and is less flexible.

what do you think about the above approaches?


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


      parent reply	other threads:[~2024-09-05  7:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-26 11:00 Alexandre Derumier via pve-devel
2024-08-28 12:53 ` Dominik Csapak
2024-08-29  8:27   ` DERUMIER, Alexandre via pve-devel
     [not found]   ` <98a4b03b8969f7c4aef42fc5cdd677752b4dbf83.camel@groupe-cyllene.com>
2024-08-30  8:44     ` DERUMIER, Alexandre via pve-devel
2024-09-05  7:51 ` Fabian Grünbichler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=123271268.23934.1725522712149@webmail.proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal