public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option
@ 2025-08-27 15:03 Daniel Kral
  2025-08-29  9:46 ` Daniel Kral
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Kral @ 2025-08-27 15:03 UTC (permalink / raw)
  To: pve-devel

Since QEMU 9.2 [0], the default I/O address space bit width was raised
from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
aw-bits check introduced in [1] to trip for host CPUs with less than 48
bits physical address width from QEMU 9.2 onwards:

vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39

For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
the behavior of the check.

Therefore, expose the 'aw-bits' option of the intel-iommu and
virtio-iommu QEMU drivers to allow users to set the value.

[0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
[1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
There were quite a few changes in the way in qemu upstream since 9.0 for
the vIOMMU drivers to utilize the Intel VT-d's dual-stage vIOMMU
translation better, but I'm not entirely sure why the default value was
changed for legacy mode too, i.e. when scalable mode (x-scalable-mode)
and first level translation support (x-flts) is off, as I haven't looked
into it too much whether there are any strict requirements for this in
the future when 5-level paging is supported.

My CPU itself reports 39 bits physical address size according to
/proc/cpuinfo and setting aw-bits=39 made the check mentioned above
happy and the VM startable again. I haven't tested this yet with any CPU
that has 46 or 48 bit physical address width.

 src/PVE/QemuServer.pm                         |  9 +++++--
 src/PVE/QemuServer/Machine.pm                 | 21 +++++++++++++---
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf     |  1 +
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd | 25 +++++++++++++++++++
 4 files changed, 51 insertions(+), 5 deletions(-)
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index f263fedb..b0e05ce1 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -3896,11 +3896,16 @@ sub config_to_command {
     PVE::QemuServer::Machine::assert_valid_machine_property($machine_conf);
 
     if (my $viommu = $machine_conf->{viommu}) {
+        my $viommu_devstr = '';
+        $viommu_devstr .= ",aw-bits=$machine_conf->{'aw-bits'}" if $machine_conf->{'aw-bits'};
+
         if ($viommu eq 'intel') {
-            unshift @$devices, '-device', 'intel-iommu,intremap=on,caching-mode=on';
+            $viommu_devstr = "intel-iommu,intremap=on,caching-mode=on$viommu_devstr";
+            unshift @$devices, '-device', $viommu_devstr;
             push @$machineFlags, 'kernel-irqchip=split';
         } elsif ($viommu eq 'virtio') {
-            push @$devices, '-device', 'virtio-iommu-pci';
+            $viommu_devstr = "virtio-iommu-pci$viommu_devstr";
+            push @$devices, '-device', $viommu_devstr;
         }
     }
 
diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm
index 9d17344a..3aceb485 100644
--- a/src/PVE/QemuServer/Machine.pm
+++ b/src/PVE/QemuServer/Machine.pm
@@ -58,6 +58,16 @@ my $machine_fmt = {
         enum => ['intel', 'virtio'],
         optional => 1,
     },
+    'aw-bits' => {
+        type => 'number',
+        description => "Specifies the vIOMMU address space bit width.",
+        verbose_description => "Specifies the vIOMMU address space bit width.\n\n"
+            . "Intel vIOMMU supports a bit width of either 39 or 48 bits and"
+            . " VirtIO vIOMMU supports any bit width between 32 and 64 bits.",
+        minimum => 32,
+        maximum => 64,
+        optional => 1,
+    },
     'enable-s3' => {
         type => 'boolean',
         description =>
@@ -112,9 +122,14 @@ sub default_machine_for_arch {
 
 sub assert_valid_machine_property {
     my ($machine_conf) = @_;
-    my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
-    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel" && !$q35) {
-        die "to use Intel vIOMMU please set the machine type to q35\n";
+    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel") {
+        my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
+        die "to use Intel vIOMMU please set the machine type to q35\n" if !$q35;
+
+        die "Intel vIOMMU supports only 39 or 48 bits as address width\n"
+            if $machine_conf->{'aw-bits'}
+            && $machine_conf->{'aw-bits'} != 39
+            && $machine_conf->{'aw-bits'} != 48;
     }
 }
 
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
new file mode 100644
index 00000000..8d696ef3
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
@@ -0,0 +1 @@
+machine: q35,viommu=intel,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
new file mode 100644
index 00000000..030ccaa5
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'vm8006,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smp '1,sockets=1,cores=1,maxcpus=1' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 512 \
+  -global 'ICH9-LPC.disable_s3=1' \
+  -global 'ICH9-LPC.disable_s4=1' \
+  -device 'intel-iommu,intremap=on,caching-mode=on,aw-bits=39' \
+  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=q35+pve0,kernel-irqchip=split'
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option
  2025-08-27 15:03 [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option Daniel Kral
@ 2025-08-29  9:46 ` Daniel Kral
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Kral @ 2025-08-29  9:46 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: pve-devel

On Wed Aug 27, 2025 at 5:03 PM CEST, Daniel Kral wrote:
> Since QEMU 9.2 [0], the default I/O address space bit width was raised
> from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
> aw-bits check introduced in [1] to trip for host CPUs with less than 48
> bits physical address width from QEMU 9.2 onwards:
>
> vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
>
> For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
> with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
> VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
> the behavior of the check.
>
> Therefore, expose the 'aw-bits' option of the intel-iommu and
> virtio-iommu QEMU drivers to allow users to set the value.
>
> [0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> There were quite a few changes in the way in qemu upstream since 9.0 for
> the vIOMMU drivers to utilize the Intel VT-d's dual-stage vIOMMU
> translation better, but I'm not entirely sure why the default value was
> changed for legacy mode too, i.e. when scalable mode (x-scalable-mode)
> and first level translation support (x-flts) is off, as I haven't looked
> into it too much whether there are any strict requirements for this in
> the future when 5-level paging is supported.
>
> My CPU itself reports 39 bits physical address size according to
> /proc/cpuinfo and setting aw-bits=39 made the check mentioned above
> happy and the VM startable again. I haven't tested this yet with any CPU
> that has 46 or 48 bit physical address width.

A user reported [0] that both errors vanished (the one mentioned above
in the commit message and the vfio_container_dma_map(...) = -22 one) by
setting the combination of cpu.guest-phys-bits and intel-iommu.aw-bits
so that these are equal on systems where these differ.

It seems like mostly Intel consumer-grade CPUs are the ones where these
mismatch or are below the default 48 bits - it seems the physical
address width ranges from anywhere between 39 and 48 bits on Intel CPUs;
the other 2 AMD CPUs I checked were both 48 bits physical address width
- even though these were quite beefy enthusiast 7900X / 9900X ones.

There was a patch that wasn't applied in qemu upstream [1] that should
warn users about the mismatch but wasn't perfect as one can see in the
replies.

I'll follow up on this patch with a possible check that compares the
cpu's physical bits (or the phys-bits / guest-phys-bits) to the IOMMU's
address width size, which can be found through the iommu's sysfs.

[0] https://forum.proxmox.com/threads/169586/post-795813
[1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-9-clg@redhat.com/


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-29  9:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-27 15:03 [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option Daniel Kral
2025-08-29  9:46 ` Daniel Kral

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal