all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [RFC qemu-server] fix #6608: expose viommu driver aw-bits option
Date: Wed, 27 Aug 2025 17:03:26 +0200	[thread overview]
Message-ID: <20250827150419.275285-1-d.kral@proxmox.com> (raw)

Since QEMU 9.2 [0], the default I/O address space bit width was raised
from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
aw-bits check introduced in [1] to trip for host CPUs with less than 48
bits physical address width from QEMU 9.2 onwards:

vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39

For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
the behavior of the check.

Therefore, expose the 'aw-bits' option of the intel-iommu and
virtio-iommu QEMU drivers to allow users to set the value.

[0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
[1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
There were quite a few changes in the way in qemu upstream since 9.0 for
the vIOMMU drivers to utilize the Intel VT-d's dual-stage vIOMMU
translation better, but I'm not entirely sure why the default value was
changed for legacy mode too, i.e. when scalable mode (x-scalable-mode)
and first level translation support (x-flts) is off, as I haven't looked
into it too much whether there are any strict requirements for this in
the future when 5-level paging is supported.

My CPU itself reports 39 bits physical address size according to
/proc/cpuinfo and setting aw-bits=39 made the check mentioned above
happy and the VM startable again. I haven't tested this yet with any CPU
that has 46 or 48 bit physical address width.

 src/PVE/QemuServer.pm                         |  9 +++++--
 src/PVE/QemuServer/Machine.pm                 | 21 +++++++++++++---
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf     |  1 +
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd | 25 +++++++++++++++++++
 4 files changed, 51 insertions(+), 5 deletions(-)
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index f263fedb..b0e05ce1 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -3896,11 +3896,16 @@ sub config_to_command {
     PVE::QemuServer::Machine::assert_valid_machine_property($machine_conf);
 
     if (my $viommu = $machine_conf->{viommu}) {
+        my $viommu_devstr = '';
+        $viommu_devstr .= ",aw-bits=$machine_conf->{'aw-bits'}" if $machine_conf->{'aw-bits'};
+
         if ($viommu eq 'intel') {
-            unshift @$devices, '-device', 'intel-iommu,intremap=on,caching-mode=on';
+            $viommu_devstr = "intel-iommu,intremap=on,caching-mode=on$viommu_devstr";
+            unshift @$devices, '-device', $viommu_devstr;
             push @$machineFlags, 'kernel-irqchip=split';
         } elsif ($viommu eq 'virtio') {
-            push @$devices, '-device', 'virtio-iommu-pci';
+            $viommu_devstr = "virtio-iommu-pci$viommu_devstr";
+            push @$devices, '-device', $viommu_devstr;
         }
     }
 
diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm
index 9d17344a..3aceb485 100644
--- a/src/PVE/QemuServer/Machine.pm
+++ b/src/PVE/QemuServer/Machine.pm
@@ -58,6 +58,16 @@ my $machine_fmt = {
         enum => ['intel', 'virtio'],
         optional => 1,
     },
+    'aw-bits' => {
+        type => 'number',
+        description => "Specifies the vIOMMU address space bit width.",
+        verbose_description => "Specifies the vIOMMU address space bit width.\n\n"
+            . "Intel vIOMMU supports a bit width of either 39 or 48 bits and"
+            . " VirtIO vIOMMU supports any bit width between 32 and 64 bits.",
+        minimum => 32,
+        maximum => 64,
+        optional => 1,
+    },
     'enable-s3' => {
         type => 'boolean',
         description =>
@@ -112,9 +122,14 @@ sub default_machine_for_arch {
 
 sub assert_valid_machine_property {
     my ($machine_conf) = @_;
-    my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
-    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel" && !$q35) {
-        die "to use Intel vIOMMU please set the machine type to q35\n";
+    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel") {
+        my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
+        die "to use Intel vIOMMU please set the machine type to q35\n" if !$q35;
+
+        die "Intel vIOMMU supports only 39 or 48 bits as address width\n"
+            if $machine_conf->{'aw-bits'}
+            && $machine_conf->{'aw-bits'} != 39
+            && $machine_conf->{'aw-bits'} != 48;
     }
 }
 
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
new file mode 100644
index 00000000..8d696ef3
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
@@ -0,0 +1 @@
+machine: q35,viommu=intel,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
new file mode 100644
index 00000000..030ccaa5
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'vm8006,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smp '1,sockets=1,cores=1,maxcpus=1' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 512 \
+  -global 'ICH9-LPC.disable_s3=1' \
+  -global 'ICH9-LPC.disable_s4=1' \
+  -device 'intel-iommu,intremap=on,caching-mode=on,aw-bits=39' \
+  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=q35+pve0,kernel-irqchip=split'
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


             reply	other threads:[~2025-08-27 15:04 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27 15:03 Daniel Kral [this message]
2025-08-29  9:46 ` Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250827150419.275285-1-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal