public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage
@ 2025-10-17 11:25 Tiago Sousa via pve-devel
  2025-11-12 15:20 ` DERUMIER, Alexandre via pve-devel
       [not found] ` <1682e2358c7051877ca1dafaa2adb9a207886eef.camel@groupe-cyllene.com>
  0 siblings, 2 replies; 3+ messages in thread
From: Tiago Sousa via pve-devel @ 2025-10-17 11:25 UTC (permalink / raw)
  To: pve-devel; +Cc: Tiago Sousa

[-- Attachment #1: Type: message/rfc822, Size: 7679 bytes --]

From: Tiago Sousa <joao.sousa@eurotux.com>
To: pve-devel@lists.proxmox.com
Subject: [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage
Date: Fri, 17 Oct 2025 12:25:25 +0100
Message-ID: <20251017112539.26471-1-joao.sousa@eurotux.com>

As discussed with Alexandre Derumier, I’m sharing a prototype daemon
that monitors an extend queue and performs live VM disk resizes to
enable thin provisioning on LVM storage.

The idea is to create disks smaller than their qcow2 virtual size,
currently hardcoded to 2 GiB. This applies when a VM disk
has a snapshot. A write threshold is set on the file blockdev node so
that it triggers only when the actual filesystem reaches it.

To handle this, a notion of underlay is introduced. For LVM,the
underlay controls the underlying LV of the disk. For example, a
200 GiB qcow2 disk starts with a 2 GiB LV and grows incrementally as
needed. Qcow2 preallocation must be fully off, which has performance
implications.

The block write threshold is calculated from two storage config
variables, chunksize and chunk-percentage, using:
underlay_size - chunksize * chunk-percentage.
For example, if chunk-percentage = 0.3, the event fires when 30% of
chunksize remains.

When qmeventd receives an event, it appends it to /etc/pve/extend-queue
as vmid:blockdev_nodename. In a cluster, all nodes write to the same
queue.

pvestord (PVE Storage Daemon) runs a 1-second cycle checking the
queue. If a request is found and the VM is local, the entry is dequeued
to avoid conflicts. It then queries the QMP socket for the VM’s
blockstats, identifies the disk, and extends the LV.

Flow: qemu-vm -> qmeventd -> /etc/pve/extend-queue <- pvestord

So far test have been done manually.

Some problems and questions:
- Thin provisioning is currently hardcoded for drives that have a parent
  snapshot.
- The thin variable that is introduced in the drive config needs
  review before wider implementation.
- Consider making thin optional via snapshot prompt checkbox.
- Could eventually offer the option for all qcow2 disks on LVM.
- Re-evaluate blockdev name generation: sha256 vs reversible
  encoding (like base64) to identify drives and allow offline extends.
- Since all extend requests share the same queue, drives on different
  LVM storages must wait for their turn, even though actions on
  separate storages could run concurrently.
- qmeventd writes to the queue aren’t cluster-safe. I couldn’t find any
  primitives in the C code to lock the file via pmxcfs (like
  cfs_lock_file in Perl). Is there any function that handles this?

pve-storage:

Tiago Sousa (4):
  pvestord: setup new pvestord daemon
  storage: add extend queue handling
  lvmplugin: add thin volume support for LVM external snapshots
  plugin: lvmplugin: add underlay functions

 src/Makefile                  |   1 +
 src/PVE/Makefile              |   1 +
 src/PVE/Service/Makefile      |  10 ++
 src/PVE/Service/pvestord.pm   | 193 ++++++++++++++++++++++++++++++++++
 src/PVE/Storage.pm            | 100 +++++++++++++++++-
 src/PVE/Storage/Common.pm     |   4 +-
 src/PVE/Storage/LVMPlugin.pm  |  84 ++++++++++++---
 src/PVE/Storage/Plugin.pm     |  29 ++++-
 src/bin/Makefile              |   3 +
 src/bin/pvestord              |  24 +++++
 src/services/Makefile         |  14 +++
 src/services/pvestord.service |  15 +++
 12 files changed, 462 insertions(+), 16 deletions(-)
 create mode 100644 src/PVE/Service/Makefile
 create mode 100644 src/PVE/Service/pvestord.pm
 create mode 100755 src/bin/pvestord
 create mode 100644 src/services/Makefile
 create mode 100644 src/services/pvestord.service

qemu-server:

Tiago Sousa (4):
  qmeventd: add block write threshold event handling
  blockdev: add set write threshold
  blockdev: add query-blockstats qmp command
  blockdev: add underlay resize

 src/PVE/QemuServer.pm          | 22 ++++++++++
 src/PVE/QemuServer/Blockdev.pm | 80 ++++++++++++++++++++++++++++++++++
 src/PVE/QemuServer/Drive.pm    |  7 +++
 src/qmeventd/qmeventd.c        | 21 ++++++++-
 4 files changed, 129 insertions(+), 1 deletion(-)

pve-manager:

Tiago Sousa (1):
  services: add pvestord service

 PVE/API2/Services.pm | 1 +
 1 file changed, 1 insertion(+)

pve-cluster:

Tiago Sousa (1):
  observe extend queue

 src/PVE/Cluster.pm | 1 +
 1 file changed, 1 insertion(+)

--
2.47.3



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage
  2025-10-17 11:25 [pve-devel] [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage Tiago Sousa via pve-devel
@ 2025-11-12 15:20 ` DERUMIER, Alexandre via pve-devel
       [not found] ` <1682e2358c7051877ca1dafaa2adb9a207886eef.camel@groupe-cyllene.com>
  1 sibling, 0 replies; 3+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-11-12 15:20 UTC (permalink / raw)
  To: pve-devel; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 14173 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage
Date: Wed, 12 Nov 2025 15:20:16 +0000
Message-ID: <1682e2358c7051877ca1dafaa2adb9a207886eef.camel@groupe-cyllene.com>

Hi Tiago,
Sorry I was super busy theses last weeks.
I don't have read your code yes, but first reponses to your questions:


>>Some problems and questions:
>>- Thin provisioning is currently hardcoded for drives that have a
>>parent
>>  snapshot.
>>- The thin variable that is introduced in the drive config needs
>>  review before wider implementation.
>>- Consider making thin optional via snapshot prompt checkbox.
>>- Could eventually offer the option for all qcow2 disks on LVM.


The thin option should be defined at storage level,  as drive config is
generic for any storage.   
They are already a "thin" option in zfs storage plugin for example.

If user really need some vm with or without thin provisioning, he can
create 2 differents storage.


>>- Re-evaluate blockdev name generation: sha256 vs reversible
>>  encoding (like base64) to identify drives and allow offline
>>extends.
not possible because the blockdev is space limited (15char max).

I don't have read your code yet, but why do you need that ?

for offline extend, we can compare manually qcow2 size vs lvm size.
could be done at vm start for example

>>- Since all extend requests share the same queue, drives on different
>>  LVM storages must wait for their turn, even though actions on
>>  separate storages could run concurrently.

use 1 queue by storage ? 


>>- qmeventd writes to the queue aren’t cluster-safe. I couldn’t find
>>any
>>  primitives in the C code to lock the file via pmxcfs (like
>>  cfs_lock_file in Perl). Is there any function that handles this?

I really don't known





[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage
       [not found] ` <1682e2358c7051877ca1dafaa2adb9a207886eef.camel@groupe-cyllene.com>
@ 2025-11-13 14:09   ` DERUMIER, Alexandre via pve-devel
  0 siblings, 0 replies; 3+ messages in thread
From: DERUMIER, Alexandre via pve-devel @ 2025-11-13 14:09 UTC (permalink / raw)
  To: pve-devel; +Cc: DERUMIER, Alexandre

[-- Attachment #1: Type: message/rfc822, Size: 13957 bytes --]

From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage
Date: Thu, 13 Nov 2025 14:09:48 +0000
Message-ID: <f85630ffdc70fecc7765be6e6cd3aba3f8fffcd4.camel@groupe-cyllene.com>

Hi,

>>- qmeventd writes to the queue aren’t cluster-safe. I couldn’t find
>>any
>>  primitives in the C code to lock the file via pmxcfs (like
>>  cfs_lock_file in Perl). Is there any function that handles this?

Thinking about that, the qmeventd shouldn't write directly to /etc/pve
, because it'll hang if the machine don't have quorum. (and we don't
want to loose notifications in case of poweroff)

Maybe it could be better to write to a local queue file, then your
pvestord daemon could pull to local queue and push to the central queue
with the cfs lock.

(and in parrallel, pvestord is also dequeuing the top of the queue to
do the extend)


Also, about the central queue, what happen if a node is down (or loose
quorum but vm is still running), and can't process the top of queue if
the top vm to resize was on this node ?
I don't have verified, but it should need to kind of ttl, or requeue at
the bottom of queue, to not lock the other nodes 






Another way could be to only have only local queue for each node, 
and a simple cfs_lock on storageid to avoid parallel resize. 
(so no real resize order across cluster, but anyway resize is not so
slow).

in case of crash, if the vm is restarted by HA on another node, we need
to check if we need to resize at start looking at qcow2 vs lvm size. (I
think we should do it by defaul at start in any case).




[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-11-13 14:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-17 11:25 [pve-devel] [RFC pve-storage/qemu-server 00/10] introduce thin provisioned drives to thick LVM storage Tiago Sousa via pve-devel
2025-11-12 15:20 ` DERUMIER, Alexandre via pve-devel
     [not found] ` <1682e2358c7051877ca1dafaa2adb9a207886eef.camel@groupe-cyllene.com>
2025-11-13 14:09   ` DERUMIER, Alexandre via pve-devel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal