all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com>
To: pve-devel@lists.proxmox.com
Cc: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
Subject: [pve-devel] [PATCH qemu-server 10/10] RFC: add multiple iothreads support
Date: Wed,  2 Jul 2025 16:49:00 +0200	[thread overview]
Message-ID: <mailman.816.1751467786.395.pve-devel@lists.proxmox.com> (raw)
In-Reply-To: <20250702144900.3963405-1-alexandre.derumier@groupe-cyllene.com>

[-- Attachment #1: Type: message/rfc822, Size: 14406 bytes --]

From: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH qemu-server 10/10] RFC: add multiple iothreads support
Date: Wed,  2 Jul 2025 16:49:00 +0200
Message-ID: <20250702144900.3963405-11-alexandre.derumier@groupe-cyllene.com>

This add support for multiple iothreads by disk.

iothreads are defined globally at vm start, and are shared across disk.

iothreads are different threads than vm vcpus, and can be optionally pinned
to host cpu cores (not yet implemented).

The number of iothreads can't be bigger than number of host cores.
(or performance will decrease because of context switches)

not implement: add an option to specify specific iothread mapping by disk.

Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
---
 src/PVE/QemuServer.pm               | 22 ++++++++++-
 src/PVE/QemuServer/DriveDevice.pm   | 60 ++++++++++++++++++++++++++++-
 src/test/cfg2cmd/iothreads.conf     |  7 ++++
 src/test/cfg2cmd/iothreads.conf.cmd | 45 ++++++++++++++++++++++
 4 files changed, 131 insertions(+), 3 deletions(-)
 create mode 100644 src/test/cfg2cmd/iothreads.conf
 create mode 100644 src/test/cfg2cmd/iothreads.conf.cmd

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index 66712672..1da278a3 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -73,7 +73,7 @@ use PVE::QemuServer::Drive qw(
     print_drive
     storage_allows_io_uring_default
 );
-use PVE::QemuServer::DriveDevice qw(get_drivedevice_controller get_drivedevice get_drivedevice_iothread scsihw_infos);
+use PVE::QemuServer::DriveDevice qw(get_drivedevice_controller get_drivedevice get_drivedevice_iothread parse_iothreads scsihw_infos);
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::MetaInfo;
@@ -313,6 +313,16 @@ my $confdesc = {
         maximum => 262144,
         default => 'cgroup v1: 1024, cgroup v2: 100',
     },
+    iothreads => {
+        optional => 1,
+        type => 'string',
+        verbose_description =>
+	    "Create multiple iothreads and share them between disks where iothread is enabled."
+	    . "Iothreads can be optionally pinned to specific host cpu. (Ideally on different"
+	    . "host cores than the vm vcpus)",
+        description => "Multiple iothreads configuration.",
+        format => $PVE::QemuServer::DriveDevice::iothreads_fmt,
+    },
     memory => {
         optional => 1,
         type => 'string',
@@ -3452,6 +3462,16 @@ sub config_to_command {
         push @$devices, '-iscsi', "initiator-name=$initiator";
     }
 
+    if ($conf->{iothreads}) {
+	my $iothreads = parse_iothreads($conf->{iothreads});
+	die "MAX $allowed_vcpus iothreads allowed per VM on this node\n" if ($allowed_vcpus < $iothreads->{iothreads});
+
+	for (my $i = 0; $i < $iothreads->{iothreads}; $i++) {
+	    my $iothread = {'qom-type' => 'iothread', id => "iothread$i"};
+	    push @$cmd, '-object', to_json($iothread, { canonical => 1 });
+	}
+    }
+
     PVE::QemuConfig->foreach_volume(
         $conf,
         sub {
diff --git a/src/PVE/QemuServer/DriveDevice.pm b/src/PVE/QemuServer/DriveDevice.pm
index 7ed06144..0657b714 100644
--- a/src/PVE/QemuServer/DriveDevice.pm
+++ b/src/PVE/QemuServer/DriveDevice.pm
@@ -5,6 +5,7 @@ use warnings;
 
 use URI::Escape;
 
+use PVE::JSONSchema qw(parse_property_string);
 use PVE::QemuServer::Drive qw (drive_is_cdrom);
 use PVE::QemuServer::Helpers qw(kvm_user_version min_version);
 use PVE::QemuServer::Machine;
@@ -16,9 +17,44 @@ our @EXPORT_OK = qw(
     get_drivedevice
     get_drivedevice_controller
     get_drivedevice_iothread
+    parse_iothreads
     scsihw_infos
 );
 
+our $iothreads_fmt = {
+    iothreads => {
+        optional => 1,
+        type => 'integer',
+	default_key => 1,
+        description => "Number of iothreads.",
+        minimum => 1,
+        default => 0,
+    },
+    affinity => {
+        type => 'string',
+        format => 'pve-cpuset',
+        format_description => "id[-id];...",
+        description =>
+            "List of host cores used to execute iothreads, for example: 0,5,8-11",
+        optional => 1,
+    },
+};
+
+sub parse_iothreads {
+    my ($data) = @_;
+
+    my $res = eval { parse_property_string($iothreads_fmt, $data) };
+    warn $@ if $@;
+    return $res;
+}
+
+sub print_iothreads {
+    my ($iothreads) = @_;
+    return PVE::JSONSchema::print_property_string($iothreads, $iothreads_fmt);
+}
+
+PVE::JSONSchema::register_format("pve-qm-iothreads", $iothreads_fmt);
+
 sub scsihw_infos {
     my ($conf, $drive) = @_;
 
@@ -41,6 +77,23 @@ sub scsihw_infos {
     return ($maxdev, $controller, $controller_prefix);
 }
 
+my sub add_iothread_mapping {
+    my ($conf, $device, $iothread_id) = @_;
+
+    if($conf->{iothreads}) {
+	my $iothreads_mapping = [];
+	#if multiple iothreads are defined, we share them all by default
+	#improve me: add a manual mapping options for advanced users ?
+	my $iothreads = parse_iothreads($conf->{iothreads});
+	for (my $i = 0; $i < $iothreads->{iothreads}; $i++) {
+	    push @$iothreads_mapping, { iothread => "iothread$i" };
+	}
+	$device->{'iothread-vq-mapping'} = $iothreads_mapping;
+    } else {
+	$device->{iothread} = $iothread_id;
+    }
+}
+
 sub get_drivedevice {
     my ($storecfg, $conf, $vmid, $drive, $bridges, $arch, $machine_type) = @_;
 
@@ -59,7 +112,7 @@ sub get_drivedevice {
 	if (!min_version($machine_version, 10, 0) || $drive->{file} ne 'none') {
 	    $device->{drive} = "drive-$drive_id";
 	}
-	$device->{iothread} = "iothread-$drive_id" if $drive->{iothread};
+	add_iothread_mapping($conf, $device, "iothread-$drive_id") if $drive->{iothread};
     } elsif ($drive->{interface} eq 'scsi') {
 
 	my ($maxdev, $controller, $controller_prefix) = scsihw_infos($conf, $drive);
@@ -182,7 +235,7 @@ sub get_drivedevice_controller {
 	    && $conf->{scsihw} eq "virtio-scsi-single"
 	    && $drive->{iothread}
 	) {
-	    $device->{iothread} = "iothread-$controllerid";
+	    add_iothread_mapping($conf, $device, "iothread-$controllerid");
 	}
 
 	my $queues = '';
@@ -213,6 +266,9 @@ sub get_drivedevice_controller {
 sub get_drivedevice_iothread {
     my ($conf, $drive) = @_;
 
+    #we don't generate single iothread by disk if multiple iothreads are defined
+    return if $conf->{iothreads};
+
     my $drive_id = PVE::QemuServer::Drive::get_drive_id($drive);
 
     if ($drive->{interface} eq 'virtio') {
diff --git a/src/test/cfg2cmd/iothreads.conf b/src/test/cfg2cmd/iothreads.conf
new file mode 100644
index 00000000..f803499e
--- /dev/null
+++ b/src/test/cfg2cmd/iothreads.conf
@@ -0,0 +1,7 @@
+# TEST: global multiple iothreads with disk with or without iothread enabled
+iothreads: 4,affinity=2-6
+scsihw: virtio-scsi-single
+scsi0: local:8006/vm-8006-disk-0.raw,size=104858K,iothread=1
+scsi1: local:8006/vm-8006-disk-1.raw,size=104858K
+virtio0: local:8006/vm-8006-disk-3.raw,size=104858K,iothread=1
+virtio1: local:8006/vm-8006-disk-4.raw,size=104858K
diff --git a/src/test/cfg2cmd/iothreads.conf.cmd b/src/test/cfg2cmd/iothreads.conf.cmd
new file mode 100644
index 00000000..39bec041
--- /dev/null
+++ b/src/test/cfg2cmd/iothreads.conf.cmd
@@ -0,0 +1,45 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'vm8006,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smp '1,sockets=1,cores=1,maxcpus=1' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 512 \
+  -object '{"id":"iothread0","qom-type":"iothread"}' \
+  -object '{"id":"iothread1","qom-type":"iothread"}' \
+  -object '{"id":"iothread2","qom-type":"iothread"}' \
+  -object '{"id":"iothread3","qom-type":"iothread"}' \
+  -object '{"id":"throttle-drive-scsi0","limits":{},"qom-type":"throttle-group"}' \
+  -object '{"id":"throttle-drive-scsi1","limits":{},"qom-type":"throttle-group"}' \
+  -object '{"id":"throttle-drive-virtio0","limits":{},"qom-type":"throttle-group"}' \
+  -object '{"id":"throttle-drive-virtio1","limits":{},"qom-type":"throttle-group"}' \
+  -global 'PIIX4_PM.disable_s3=1' \
+  -global 'PIIX4_PM.disable_s4=1' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -device '{"addr":"0x1","bus":"pci.3","driver":"virtio-scsi-pci","id":"virtioscsi0","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},{"iothread":"iothread2"},{"iothread":"iothread3"}]}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-0.raw","node-name":"e3b2553803d55d43b9986a0aac3e9a7","read-only":false},"node-name":"f3b2553803d55d43b9986a0aac3e9a7","read-only":false},"node-name":"drive-scsi0","throttle-group":"throttle-drive-scsi0"}' \
+  -device '{"bus":"virtioscsi0.0","channel":0,"drive":"drive-scsi0","driver":"scsi-hd","id":"scsi0","lun":0,"scsi-id":0,"write-cache":"on"}' \
+  -device '{"addr":"0x2","bus":"pci.3","driver":"virtio-scsi-pci","id":"virtioscsi1"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-1.raw","node-name":"e08707d013893852b3d4d42301a4298","read-only":false},"node-name":"f08707d013893852b3d4d42301a4298","read-only":false},"node-name":"drive-scsi1","throttle-group":"throttle-drive-scsi1"}' \
+  -device '{"bus":"virtioscsi1.0","channel":0,"drive":"drive-scsi1","driver":"scsi-hd","id":"scsi1","lun":1,"scsi-id":0,"write-cache":"on"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-3.raw","node-name":"ee025d619f851351566850c50231c52","read-only":false},"node-name":"fe025d619f851351566850c50231c52","read-only":false},"node-name":"drive-virtio0","throttle-group":"throttle-drive-virtio0"}' \
+  -device '{"addr":"0xa","bus":"pci.0","drive":"drive-virtio0","driver":"virtio-blk-pci","id":"virtio0","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},{"iothread":"iothread2"},{"iothread":"iothread3"}],"write-cache":"on"}' \
+  -blockdev '{"driver":"throttle","file":{"cache":{"direct":true,"no-flush":false},"driver":"raw","file":{"aio":"io_uring","cache":{"direct":true,"no-flush":false},"detect-zeroes":"on","discard":"ignore","driver":"file","filename":"/var/lib/vz/images/8006/vm-8006-disk-4.raw","node-name":"e3cbbe5a2921aa63c4a946b7317e87c","read-only":false},"node-name":"f3cbbe5a2921aa63c4a946b7317e87c","read-only":false},"node-name":"drive-virtio1","throttle-group":"throttle-drive-virtio1"}' \
+  -device '{"addr":"0xb","bus":"pci.0","drive":"drive-virtio1","driver":"virtio-blk-pci","id":"virtio1","write-cache":"on"}' \
+  -machine 'type=pc+pve0'
-- 
2.39.5



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

      parent reply	other threads:[~2025-07-02 14:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250702144900.3963405-1-alexandre.derumier@groupe-cyllene.com>
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 01/10] introduce DriveDevice module Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 02/10] add print_drivedevice_controller && print_drivedevice_iothread Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 03/10] hotplug: drive controller : use print_drivedevice_iothread && print_drivedevice_controller Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 04/10] pci: add get_pci_addr Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 05/10] qmphelpers: add qmp_deviceadd && qmp_devicedel Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 06/10] convert drive device to json format Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 07/10] convert iothread to json Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 08/10] convert disk controller device to json format Alexandre Derumier via pve-devel
2025-07-02 14:48 ` [pve-devel] [PATCH qemu-server 09/10] tests: cfg2cmd: convert drive devices " Alexandre Derumier via pve-devel
2025-07-02 14:49 ` Alexandre Derumier via pve-devel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mailman.816.1751467786.395.pve-devel@lists.proxmox.com \
    --to=pve-devel@lists.proxmox.com \
    --cc=alexandre.derumier@groupe-cyllene.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal