* [pve-devel] [RFC storage/qemu-server] Thin provisioning on LVM
@ 2025-07-25 10:00 Joao Sousa via pve-devel
2025-07-25 14:46 ` Tiago Sousa via pve-devel
0 siblings, 1 reply; 7+ messages in thread
From: Joao Sousa via pve-devel @ 2025-07-25 10:00 UTC (permalink / raw)
To: Proxmox VE development discussion; +Cc: Joao Sousa
[-- Attachment #1: Type: message/rfc822, Size: 6415 bytes --]
From: Joao Sousa <joao.sousa@eurotux.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: [RFC storage/qemu-server] Thin provisioning on LVM
Date: Fri, 25 Jul 2025 11:00:40 +0100
Message-ID: <1b1df67f-4c6a-4cf5-942f-b2ed752506a6@eurotux.com>
Hi,
As previously discussed with Alexandre, we talked about an architecture
that enables the use of thin-provisioned LVs with LVM. The idea is to
implement a daemon that processes LV extend requests from a queue.
We considered two possible implementations for the queue and the daemon:
1. One queue and daemon per node. This approach increases complexity,
particularly for live migrations and node failures. If a node fails,
other nodes would need to "steal" pending requests from the failed
node’s queue. It also introduces challenges in preserving the execution
order of extend operations, since multiple daemons would compete for a
storage lock without a guaranteed order.
2. A centralized queue in /etc/pve and one daemon per node. Each daemon
would check the first entry in the queue and process the extend request
only if the target volume is local to that node. This approach is
simpler and easier to manage. However, we’d need to ensure proper
locking when writing to the queue. Is there a C-based alternative to
cfs_lock_file that we can use to coordinate writes to the queue from
qmeventd and pvestatd?
For both implementations, we need to configure a write threshold for
each VM's block devices. When this threshold is reached, it should
trigger an event that qmeventd catches. As a fallback, if a VM is locked
due to an I/O error, pvestatd should also submit an extend request. This
one should be prioritized by placing it at the front of the queue.
The write threshold must be applied to the top node of the block device
chain (drive-$drive_id) during the QemuServer::Blockdev::attach function
when the VM starts. It should also be updated each time the volume is
extended, so the daemon must reset it accordingly.
Here’s a simplified flow of the architecture:
qemu -> qmeventd -> extend_queue <- storage_monitor_daemon
-> pvestatd
I’m currently implementing the write threshold in the attach function
but running into issues with debugging. Are there any recommended
methods or tools for debugging qemu-server functions? I’m not seeing any
relevant logs in syslog.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pve-devel] [RFC storage/qemu-server] Thin provisioning on LVM
2025-07-25 10:00 [pve-devel] [RFC storage/qemu-server] Thin provisioning on LVM Joao Sousa via pve-devel
@ 2025-07-25 14:46 ` Tiago Sousa via pve-devel
2025-08-02 16:42 ` [pve-devel] [PATCH qemu-server 0/2] set write threshold logic Tiago Sousa via pve-devel
[not found] ` <20250802164240.21751-1-joao.sousa@eurotux.com>
0 siblings, 2 replies; 7+ messages in thread
From: Tiago Sousa via pve-devel @ 2025-07-25 14:46 UTC (permalink / raw)
To: pve-devel; +Cc: Tiago Sousa
[-- Attachment #1: Type: message/rfc822, Size: 4590 bytes --]
From: Tiago Sousa <joao.sousa@eurotux.com>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [RFC storage/qemu-server] Thin provisioning on LVM
Date: Fri, 25 Jul 2025 15:46:41 +0100
Message-ID: <6045867d-4fd6-4e29-b462-e37bfaba583c@eurotux.com>
Small correction, the write threshold should be applied to the file
node, since it's the one that will have the correct filesystem
wr_highest_offset.
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* [pve-devel] [PATCH qemu-server 0/2] set write threshold logic
2025-07-25 14:46 ` Tiago Sousa via pve-devel
@ 2025-08-02 16:42 ` Tiago Sousa via pve-devel
2025-08-04 9:14 ` Fiona Ebner
[not found] ` <20250802164240.21751-1-joao.sousa@eurotux.com>
1 sibling, 1 reply; 7+ messages in thread
From: Tiago Sousa via pve-devel @ 2025-08-02 16:42 UTC (permalink / raw)
To: pve-devel; +Cc: Tiago Sousa
[-- Attachment #1: Type: message/rfc822, Size: 5300 bytes --]
From: Tiago Sousa <joao.sousa@eurotux.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH qemu-server 0/2] set write threshold logic
Date: Sat, 2 Aug 2025 17:42:38 +0100
Message-ID: <20250802164240.21751-1-joao.sousa@eurotux.com>
Hello,
I'm looking for some assistance in understanding why the block_write_threshold
is not being set as expected.
When the VM starts and I run the query-named-block-nodes command, the
write_threshold value always appears as the default 0. However, if I set it
manually through the QMP socket after startup, the threshold is applied
correctly.
Is the attach function not the appropriate place to set this value? If not,
could you clarify the correct point in the VM startup sequence to set it? I
had assumed it would be applied during blockdev_add, but that doesn't seem
to be the case.
Any guidance would be appreciated.
Tiago Sousa (2):
blockdev: add set write threshold
qmeventd: add block write threshold event handling
src/PVE/QemuServer/Blockdev.pm | 52 ++++++++++++++++++++++++++++++++++
src/qmeventd/qmeventd.c | 19 +++++++++++++
2 files changed, 71 insertions(+)
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* [pve-devel] [PATCH qemu-server 1/2] blockdev: add set write threshold
[not found] ` <20250802164240.21751-1-joao.sousa@eurotux.com>
@ 2025-08-02 16:42 ` Tiago Sousa via pve-devel
2025-08-02 16:42 ` [pve-devel] [PATCH qemu-server 2/2] qmeventd: add block write threshold event handling Tiago Sousa via pve-devel
1 sibling, 0 replies; 7+ messages in thread
From: Tiago Sousa via pve-devel @ 2025-08-02 16:42 UTC (permalink / raw)
To: pve-devel; +Cc: Tiago Sousa
[-- Attachment #1: Type: message/rfc822, Size: 6815 bytes --]
From: Tiago Sousa <joao.sousa@eurotux.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH qemu-server 1/2] blockdev: add set write threshold
Date: Sat, 2 Aug 2025 17:42:39 +0100
Message-ID: <20250802164240.21751-2-joao.sousa@eurotux.com>
Signed-off-by: Tiago Sousa <joao.sousa@eurotux.com>
---
src/PVE/QemuServer/Blockdev.pm | 52 ++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/src/PVE/QemuServer/Blockdev.pm b/src/PVE/QemuServer/Blockdev.pm
index 0cc4a0f6..a5d7a3b9 100644
--- a/src/PVE/QemuServer/Blockdev.pm
+++ b/src/PVE/QemuServer/Blockdev.pm
@@ -567,9 +567,61 @@ sub attach {
die $err;
}
+ set_write_threshold($storecfg, $vmid, $drive, $options);
+
return $blockdev->{'node-name'};
}
+sub block_set_write_threshold {
+ my ($vmid, $nodename, $threshold) = @_;
+
+ print "set threshold $nodename $threshold\n";
+
+ PVE::QemuServer::mon_cmd(
+ $vmid,
+ "block-set-write-threshold",
+ 'node-name' => $nodename,
+ 'write-threshold' => int($threshold),
+ );
+}
+
+sub compute_write_threshold {
+ my ($scfg, $volid) = @_;
+ my $lv_size = PVE::Storage::volume_size_info($scfg, $volid, 5);
+
+ # FIX: change these vars to config inputs
+ my $chunksize = 1024 * 1024 * 1024; # 1 GB
+ my $alert_chunk_percentage = 0.5; # alert when percetage of chunk used
+
+ my $write_threshold = $lv_size - $chunksize * (1 - $alert_chunk_percentage);
+
+ return $write_threshold;
+}
+
+sub set_write_threshold {
+ my ($storecfg, $vmid, $drive, $options) = @_;
+
+ my $volid = $drive->{'file'};
+ my ($storeid) = PVE::Storage::parse_volume_id($volid);
+ my $support_qemu_snapshots = PVE::Storage::volume_qemu_snapshot_method($storecfg, $volid);
+ my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+
+ # set write threshold is only supported for lvm storage using
+ # qcow2+external snapshots
+ return if $scfg->{type} ne 'lvm' || $support_qemu_snapshots ne 'mixed';
+
+ my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
+ my $parentid = $snapshots->{'current'}->{parent};
+ # for now only set write_threshold for volumes that have snapshots
+ if ($parentid) {
+ my $drive_id = PVE::QemuServer::Drive::get_drive_id($drive);
+ my $nodename = get_node_name('file', $drive_id, $drive->{file}, $options);
+ my $write_threshold = compute_write_threshold($scfg, $volid);
+
+ block_set_write_threshold($vmid, $nodename, $write_threshold);
+ }
+}
+
=pod
=head3 detach
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* [pve-devel] [PATCH qemu-server 2/2] qmeventd: add block write threshold event handling
[not found] ` <20250802164240.21751-1-joao.sousa@eurotux.com>
2025-08-02 16:42 ` [pve-devel] [PATCH qemu-server 1/2] blockdev: add set write threshold Tiago Sousa via pve-devel
@ 2025-08-02 16:42 ` Tiago Sousa via pve-devel
1 sibling, 0 replies; 7+ messages in thread
From: Tiago Sousa via pve-devel @ 2025-08-02 16:42 UTC (permalink / raw)
To: pve-devel; +Cc: Tiago Sousa
[-- Attachment #1: Type: message/rfc822, Size: 5104 bytes --]
From: Tiago Sousa <joao.sousa@eurotux.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH qemu-server 2/2] qmeventd: add block write threshold event handling
Date: Sat, 2 Aug 2025 17:42:40 +0100
Message-ID: <20250802164240.21751-3-joao.sousa@eurotux.com>
Signed-off-by: Tiago Sousa <joao.sousa@eurotux.com>
---
src/qmeventd/qmeventd.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/src/qmeventd/qmeventd.c b/src/qmeventd/qmeventd.c
index 1d9eb74a..616b6521 100644
--- a/src/qmeventd/qmeventd.c
+++ b/src/qmeventd/qmeventd.c
@@ -209,6 +209,25 @@ void handle_qmp_event(struct Client *client, struct json_object *obj) {
// check if a backup is running and kill QEMU process if not
terminate_check(client);
+ } else if (!strcmp(json_object_get_string(event), "BLOCK_WRITE_THRESHOLD")) {
+ struct json_object *data;
+ struct json_object *nodename;
+ if (json_object_object_get_ex(obj, "data", &data) &&
+ json_object_object_get_ex(data, "node-name", &nodename)) {
+
+ // needs concurrency control
+ char extend_queue_path[] = "/etc/pve/extend_queue";
+ FILE *p_extend_queue = fopen(extend_queue_path, "a");
+ if (p_extend_queue == NULL) {
+ VERBOSE_PRINT(
+ "%s: Couldn't open extend queue file %s", client->qemu.vmid, extend_queue_path
+ );
+ } else {
+ const char *nodename_string = json_object_get_string(nodename);
+ fprintf(p_extend_queue, "%s\n", nodename_string);
+ }
+ fclose(p_extend_queue);
+ }
}
}
--
2.39.5
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pve-devel] [PATCH qemu-server 0/2] set write threshold logic
2025-08-02 16:42 ` [pve-devel] [PATCH qemu-server 0/2] set write threshold logic Tiago Sousa via pve-devel
@ 2025-08-04 9:14 ` Fiona Ebner
2025-08-11 20:34 ` Joao Sousa via pve-devel
0 siblings, 1 reply; 7+ messages in thread
From: Fiona Ebner @ 2025-08-04 9:14 UTC (permalink / raw)
To: Proxmox VE development discussion
Am 02.08.25 um 6:42 PM schrieb Tiago Sousa via pve-devel:
>
> Hello,
>
> I'm looking for some assistance in understanding why the block_write_threshold
> is not being set as expected.
>
> When the VM starts and I run the query-named-block-nodes command, the
> write_threshold value always appears as the default 0. However, if I set it
> manually through the QMP socket after startup, the threshold is applied
> correctly.
>
> Is the attach function not the appropriate place to set this value? If not,
> could you clarify the correct point in the VM startup sequence to set it? I
> had assumed it would be applied during blockdev_add, but that doesn't seem
> to be the case.
>
> Any guidance would be appreciated.
Hi,
blockdev_add()/Blockdev::attach() are only called for hot-plugged disks,
not disks already present at start-up time. You either need to set the
threshold as part of the blockdev options in the generate_*_blockdev()
functions (if that is possible, would be preferred) or issue the QMP
commands for the initially present disks right after VM start (in
vm_start_nolock(), we already do something similar for ballooning).
Best Regards,
Fiona
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pve-devel] [PATCH qemu-server 0/2] set write threshold logic
2025-08-04 9:14 ` Fiona Ebner
@ 2025-08-11 20:34 ` Joao Sousa via pve-devel
0 siblings, 0 replies; 7+ messages in thread
From: Joao Sousa via pve-devel @ 2025-08-11 20:34 UTC (permalink / raw)
To: Fiona Ebner, Proxmox VE development discussion; +Cc: Joao Sousa
[-- Attachment #1: Type: message/rfc822, Size: 4986 bytes --]
From: Joao Sousa <joao.sousa@eurotux.com>
To: Fiona Ebner <f.ebner@proxmox.com>, Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server 0/2] set write threshold logic
Date: Mon, 11 Aug 2025 21:34:34 +0100
Message-ID: <65e2f286-320a-435c-9846-24aad68f9db7@eurotux.com>
On 8/4/25 10:14 AM, Fiona Ebner wrote:
> blockdev_add()/Blockdev::attach() are only called for hot-plugged disks,
> not disks already present at start-up time. You either need to set the
> threshold as part of the blockdev options in the generate_*_blockdev()
> functions (if that is possible, would be preferred) or issue the QMP
> commands for the initially present disks right after VM start (in
> vm_start_nolock(), we already do something similar for ballooning).
AFAIK the write threshold cannot be set when the blockdev is created, at
least for now. As you suggested, I was able to loop over every volume in
vm_start_lock() and the threshold is being set correctly.
Thanks!
Best regards,
Tiago
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-11 20:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-25 10:00 [pve-devel] [RFC storage/qemu-server] Thin provisioning on LVM Joao Sousa via pve-devel
2025-07-25 14:46 ` Tiago Sousa via pve-devel
2025-08-02 16:42 ` [pve-devel] [PATCH qemu-server 0/2] set write threshold logic Tiago Sousa via pve-devel
2025-08-04 9:14 ` Fiona Ebner
2025-08-11 20:34 ` Joao Sousa via pve-devel
[not found] ` <20250802164240.21751-1-joao.sousa@eurotux.com>
2025-08-02 16:42 ` [pve-devel] [PATCH qemu-server 1/2] blockdev: add set write threshold Tiago Sousa via pve-devel
2025-08-02 16:42 ` [pve-devel] [PATCH qemu-server 2/2] qmeventd: add block write threshold event handling Tiago Sousa via pve-devel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.