all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option
@ 2025-08-27 15:03 Daniel Kral
  2025-08-29  9:46 ` Daniel Kral
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Kral @ 2025-08-27 15:03 UTC (permalink / raw)
  To: pve-devel

Since QEMU 9.2 [0], the default I/O address space bit width was raised
from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
aw-bits check introduced in [1] to trip for host CPUs with less than 48
bits physical address width from QEMU 9.2 onwards:

vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39

For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
the behavior of the check.

Therefore, expose the 'aw-bits' option of the intel-iommu and
virtio-iommu QEMU drivers to allow users to set the value.

[0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
[1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
There were quite a few changes in the way in qemu upstream since 9.0 for
the vIOMMU drivers to utilize the Intel VT-d's dual-stage vIOMMU
translation better, but I'm not entirely sure why the default value was
changed for legacy mode too, i.e. when scalable mode (x-scalable-mode)
and first level translation support (x-flts) is off, as I haven't looked
into it too much whether there are any strict requirements for this in
the future when 5-level paging is supported.

My CPU itself reports 39 bits physical address size according to
/proc/cpuinfo and setting aw-bits=39 made the check mentioned above
happy and the VM startable again. I haven't tested this yet with any CPU
that has 46 or 48 bit physical address width.

 src/PVE/QemuServer.pm                         |  9 +++++--
 src/PVE/QemuServer/Machine.pm                 | 21 +++++++++++++---
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf     |  1 +
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd | 25 +++++++++++++++++++
 4 files changed, 51 insertions(+), 5 deletions(-)
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index f263fedb..b0e05ce1 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -3896,11 +3896,16 @@ sub config_to_command {
     PVE::QemuServer::Machine::assert_valid_machine_property($machine_conf);
 
     if (my $viommu = $machine_conf->{viommu}) {
+        my $viommu_devstr = '';
+        $viommu_devstr .= ",aw-bits=$machine_conf->{'aw-bits'}" if $machine_conf->{'aw-bits'};
+
         if ($viommu eq 'intel') {
-            unshift @$devices, '-device', 'intel-iommu,intremap=on,caching-mode=on';
+            $viommu_devstr = "intel-iommu,intremap=on,caching-mode=on$viommu_devstr";
+            unshift @$devices, '-device', $viommu_devstr;
             push @$machineFlags, 'kernel-irqchip=split';
         } elsif ($viommu eq 'virtio') {
-            push @$devices, '-device', 'virtio-iommu-pci';
+            $viommu_devstr = "virtio-iommu-pci$viommu_devstr";
+            push @$devices, '-device', $viommu_devstr;
         }
     }
 
diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm
index 9d17344a..3aceb485 100644
--- a/src/PVE/QemuServer/Machine.pm
+++ b/src/PVE/QemuServer/Machine.pm
@@ -58,6 +58,16 @@ my $machine_fmt = {
         enum => ['intel', 'virtio'],
         optional => 1,
     },
+    'aw-bits' => {
+        type => 'number',
+        description => "Specifies the vIOMMU address space bit width.",
+        verbose_description => "Specifies the vIOMMU address space bit width.\n\n"
+            . "Intel vIOMMU supports a bit width of either 39 or 48 bits and"
+            . " VirtIO vIOMMU supports any bit width between 32 and 64 bits.",
+        minimum => 32,
+        maximum => 64,
+        optional => 1,
+    },
     'enable-s3' => {
         type => 'boolean',
         description =>
@@ -112,9 +122,14 @@ sub default_machine_for_arch {
 
 sub assert_valid_machine_property {
     my ($machine_conf) = @_;
-    my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
-    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel" && !$q35) {
-        die "to use Intel vIOMMU please set the machine type to q35\n";
+    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel") {
+        my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
+        die "to use Intel vIOMMU please set the machine type to q35\n" if !$q35;
+
+        die "Intel vIOMMU supports only 39 or 48 bits as address width\n"
+            if $machine_conf->{'aw-bits'}
+            && $machine_conf->{'aw-bits'} != 39
+            && $machine_conf->{'aw-bits'} != 48;
     }
 }
 
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
new file mode 100644
index 00000000..8d696ef3
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
@@ -0,0 +1 @@
+machine: q35,viommu=intel,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
new file mode 100644
index 00000000..030ccaa5
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'vm8006,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smp '1,sockets=1,cores=1,maxcpus=1' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 512 \
+  -global 'ICH9-LPC.disable_s3=1' \
+  -global 'ICH9-LPC.disable_s4=1' \
+  -device 'intel-iommu,intremap=on,caching-mode=on,aw-bits=39' \
+  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=q35+pve0,kernel-irqchip=split'
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option
  2025-08-27 15:03 [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option Daniel Kral
@ 2025-08-29  9:46 ` Daniel Kral
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Kral @ 2025-08-29  9:46 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: pve-devel

On Wed Aug 27, 2025 at 5:03 PM CEST, Daniel Kral wrote:
> Since QEMU 9.2 [0], the default I/O address space bit width was raised
> from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
> aw-bits check introduced in [1] to trip for host CPUs with less than 48
> bits physical address width from QEMU 9.2 onwards:
>
> vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
>
> For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
> with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
> VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
> the behavior of the check.
>
> Therefore, expose the 'aw-bits' option of the intel-iommu and
> virtio-iommu QEMU drivers to allow users to set the value.
>
> [0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> There were quite a few changes in the way in qemu upstream since 9.0 for
> the vIOMMU drivers to utilize the Intel VT-d's dual-stage vIOMMU
> translation better, but I'm not entirely sure why the default value was
> changed for legacy mode too, i.e. when scalable mode (x-scalable-mode)
> and first level translation support (x-flts) is off, as I haven't looked
> into it too much whether there are any strict requirements for this in
> the future when 5-level paging is supported.
>
> My CPU itself reports 39 bits physical address size according to
> /proc/cpuinfo and setting aw-bits=39 made the check mentioned above
> happy and the VM startable again. I haven't tested this yet with any CPU
> that has 46 or 48 bit physical address width.

A user reported [0] that both errors vanished (the one mentioned above
in the commit message and the vfio_container_dma_map(...) = -22 one) by
setting the combination of cpu.guest-phys-bits and intel-iommu.aw-bits
so that these are equal on systems where these differ.

It seems like mostly Intel consumer-grade CPUs are the ones where these
mismatch or are below the default 48 bits - it seems the physical
address width ranges from anywhere between 39 and 48 bits on Intel CPUs;
the other 2 AMD CPUs I checked were both 48 bits physical address width
- even though these were quite beefy enthusiast 7900X / 9900X ones.

There was a patch that wasn't applied in qemu upstream [1] that should
warn users about the mismatch but wasn't perfect as one can see in the
replies.

I'll follow up on this patch with a possible check that compares the
cpu's physical bits (or the phys-bits / guest-phys-bits) to the IOMMU's
address width size, which can be found through the iommu's sysfs.

[0] https://forum.proxmox.com/threads/169586/post-795813
[1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-9-clg@redhat.com/


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-29  9:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-27 15:03 [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option Daniel Kral
2025-08-29  9:46 ` Daniel Kral

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal