* [pve-devel] [PATCH qemu-server v3] fix #6608: expose viommu driver aw-bits option
@ 2025-09-05 14:15 Daniel Kral
2025-09-08 9:35 ` Fiona Ebner
2025-09-08 10:07 ` [pve-devel] applied: " Fiona Ebner
0 siblings, 2 replies; 4+ messages in thread
From: Daniel Kral @ 2025-09-05 14:15 UTC (permalink / raw)
To: pve-devel
Since QEMU 9.2 [0], the default I/O address space bit width was raised
from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
aw-bits check introduced in [1] to trip for host CPUs with less than 48
bits physical address width from QEMU 9.2 onwards:
vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
the behavior of the check.
Therefore, expose the 'aw-bits' option of the intel-iommu and
virtio-iommu QEMU drivers to allow users to set the value.
[0] qemu ddd84fd0c1 ("intel_iommu: Set default aw_bits to 48 starting from QEMU 9.2")
[1] qemu 77f6efc0ab ("intel_iommu: Check compatibility with host IOMMU capabilities")
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes from v2:
- changed quotes from qemu-devel lore to upstream qemu commits
- added die when aw-bits is set without viommu
- added test case for aw-bits without viommu
src/PVE/QemuServer.pm | 9 +++++--
src/PVE/QemuServer/Machine.pm | 24 +++++++++++++++---
.../cfg2cmd/q35-no-viommu-with-aw-bits.conf | 3 +++
.../cfg2cmd/q35-viommu-intel-aw-bits.conf | 2 ++
.../cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd | 25 +++++++++++++++++++
.../cfg2cmd/q35-viommu-virtio-aw-bits.conf | 2 ++
.../q35-viommu-virtio-aw-bits.conf.cmd | 25 +++++++++++++++++++
7 files changed, 85 insertions(+), 5 deletions(-)
create mode 100644 src/test/cfg2cmd/q35-no-viommu-with-aw-bits.conf
create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
create mode 100644 src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf
create mode 100644 src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd
diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index 9597d316..04e988c7 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -3903,11 +3903,16 @@ sub config_to_command {
PVE::QemuServer::Machine::assert_valid_machine_property($machine_conf);
if (my $viommu = $machine_conf->{viommu}) {
+ my $viommu_devstr = '';
+ $viommu_devstr .= ",aw-bits=$machine_conf->{'aw-bits'}" if $machine_conf->{'aw-bits'};
+
if ($viommu eq 'intel') {
- unshift @$devices, '-device', 'intel-iommu,intremap=on,caching-mode=on';
+ $viommu_devstr = "intel-iommu,intremap=on,caching-mode=on$viommu_devstr";
+ unshift @$devices, '-device', $viommu_devstr;
push @$machineFlags, 'kernel-irqchip=split';
} elsif ($viommu eq 'virtio') {
- push @$devices, '-device', 'virtio-iommu-pci';
+ $viommu_devstr = "virtio-iommu-pci$viommu_devstr";
+ push @$devices, '-device', $viommu_devstr;
}
}
diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm
index b61667e0..7e6ee6a4 100644
--- a/src/PVE/QemuServer/Machine.pm
+++ b/src/PVE/QemuServer/Machine.pm
@@ -58,6 +58,16 @@ my $machine_fmt = {
enum => ['intel', 'virtio'],
optional => 1,
},
+ 'aw-bits' => {
+ type => 'number',
+ description => "Specifies the vIOMMU address space bit width.",
+ verbose_description => "Specifies the vIOMMU address space bit width.\n\n"
+ . "Intel vIOMMU supports a bit width of either 39 or 48 bits and"
+ . " VirtIO vIOMMU supports any bit width between 32 and 64 bits.",
+ minimum => 32,
+ maximum => 64,
+ optional => 1,
+ },
'enable-s3' => {
type => 'boolean',
description =>
@@ -112,10 +122,18 @@ sub default_machine_for_arch {
sub assert_valid_machine_property {
my ($machine_conf) = @_;
- my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
- if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel" && !$q35) {
- die "to use Intel vIOMMU please set the machine type to q35\n";
+ if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel") {
+ my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
+ die "to use Intel vIOMMU please set the machine type to q35\n" if !$q35;
+
+ die "Intel vIOMMU supports only 39 or 48 bits as address width\n"
+ if $machine_conf->{'aw-bits'}
+ && $machine_conf->{'aw-bits'} != 39
+ && $machine_conf->{'aw-bits'} != 48;
}
+
+ die "cannot set aw-bits if no vIOMMU is configured\n"
+ if $machine_conf->{'aw-bits'} && !$machine_conf->{viommu};
}
sub machine_type_is_q35 {
diff --git a/src/test/cfg2cmd/q35-no-viommu-with-aw-bits.conf b/src/test/cfg2cmd/q35-no-viommu-with-aw-bits.conf
new file mode 100644
index 00000000..06db33ab
--- /dev/null
+++ b/src/test/cfg2cmd/q35-no-viommu-with-aw-bits.conf
@@ -0,0 +1,3 @@
+# TEST: Check that aw-bits cannot be set without viommu
+# EXPECT_ERROR: cannot set aw-bits if no vIOMMU is configured
+machine: q35,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
new file mode 100644
index 00000000..9e84e42e
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
@@ -0,0 +1,2 @@
+# TEST: Check if aw-bits are propagated correctly to intel-iommu device
+machine: q35,viommu=intel,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
new file mode 100644
index 00000000..030ccaa5
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+ -id 8006 \
+ -name 'vm8006,debug-threads=on' \
+ -no-shutdown \
+ -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+ -mon 'chardev=qmp,mode=control' \
+ -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+ -mon 'chardev=qmp-event,mode=control' \
+ -pidfile /var/run/qemu-server/8006.pid \
+ -daemonize \
+ -smp '1,sockets=1,cores=1,maxcpus=1' \
+ -nodefaults \
+ -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+ -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+ -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+ -m 512 \
+ -global 'ICH9-LPC.disable_s3=1' \
+ -global 'ICH9-LPC.disable_s4=1' \
+ -device 'intel-iommu,intremap=on,caching-mode=on,aw-bits=39' \
+ -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+ -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+ -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+ -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+ -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+ -machine 'type=q35+pve0,kernel-irqchip=split'
diff --git a/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf
new file mode 100644
index 00000000..dd8ef1fd
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf
@@ -0,0 +1,2 @@
+# TEST: Check if aw-bits are propagated correctly to virtio-iommu-pci device
+machine: q35,viommu=virtio,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd
new file mode 100644
index 00000000..c3b12eee
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+ -id 8006 \
+ -name 'vm8006,debug-threads=on' \
+ -no-shutdown \
+ -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+ -mon 'chardev=qmp,mode=control' \
+ -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+ -mon 'chardev=qmp-event,mode=control' \
+ -pidfile /var/run/qemu-server/8006.pid \
+ -daemonize \
+ -smp '1,sockets=1,cores=1,maxcpus=1' \
+ -nodefaults \
+ -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+ -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+ -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+ -m 512 \
+ -global 'ICH9-LPC.disable_s3=1' \
+ -global 'ICH9-LPC.disable_s4=1' \
+ -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+ -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+ -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+ -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+ -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+ -device 'virtio-iommu-pci,aw-bits=39' \
+ -machine 'type=q35+pve0'
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v3] fix #6608: expose viommu driver aw-bits option
2025-09-05 14:15 [pve-devel] [PATCH qemu-server v3] fix #6608: expose viommu driver aw-bits option Daniel Kral
@ 2025-09-08 9:35 ` Fiona Ebner
2025-09-08 9:40 ` Daniel Kral
2025-09-08 10:07 ` [pve-devel] applied: " Fiona Ebner
1 sibling, 1 reply; 4+ messages in thread
From: Fiona Ebner @ 2025-09-08 9:35 UTC (permalink / raw)
To: Proxmox VE development discussion, Daniel Kral
Am 05.09.25 um 4:15 PM schrieb Daniel Kral:
> Since QEMU 9.2 [0], the default I/O address space bit width was raised
> from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
> aw-bits check introduced in [1] to trip for host CPUs with less than 48
> bits physical address width from QEMU 9.2 onwards:
>
> vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
>
> For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
> with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
> VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
> the behavior of the check.
>
> Therefore, expose the 'aw-bits' option of the intel-iommu and
> virtio-iommu QEMU drivers to allow users to set the value.
>
> [0] qemu ddd84fd0c1 ("intel_iommu: Set default aw_bits to 48 starting from QEMU 9.2")
> [1] qemu 77f6efc0ab ("intel_iommu: Check compatibility with host IOMMU capabilities")
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
I'll go ahead and apply this and the below, if that addition is fine by you?
> commit 05eb8e6394ca83e53cbf6a01e1a8848ff3d4d3e8
> Author: Fiona Ebner <f.ebner@proxmox.com>
> Date: Mon Sep 8 10:37:24 2025 +0200
>
> cfg2cmd: inform users that setting guest-phys-bits might be necessary when setting aw-bits
>
> Until QEMU warns about this itself, inform the users here. Commit
> message below copied from [1].
>
> If a virtual machine is setup with an intel-iommu device, QEMU
> allocates and maps the (virtual) I/O address space (IOAS) for a VFIO
> passthrough device with iommufd.
>
> In case of a mismatch of the address width of the host CPU and IOMMU
> CPU, the guest physical address space (GPAS) and memory-type range
> registers (MTRRs) are setup to the host CPU's address width, which
> causes IOAS to be allocated and mapped outside of the IOMMU's maximum
> guest address width (MGAW) and causes the following error from QEMU
> (the error message is copied from the user forum [0]):
>
> kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
>
> [0]: https://forum.proxmox.com/threads/169586/page-3#post-795717
> [1]: https://lore.proxmox.com/pve-devel/20250902112307.124706-5-d.kral@proxmox.com/
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> src/PVE/QemuServer.pm | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> index c428e2d7..bf229610 100644
> --- a/src/PVE/QemuServer.pm
> +++ b/src/PVE/QemuServer.pm
> @@ -3944,7 +3944,13 @@ sub config_to_command {
>
> if (my $viommu = $machine_conf->{viommu}) {
> my $viommu_devstr = '';
> - $viommu_devstr .= ",aw-bits=$machine_conf->{'aw-bits'}" if $machine_conf->{'aw-bits'};
> + if ($machine_conf->{'aw-bits'}) {
> + $viommu_devstr .= ",aw-bits=$machine_conf->{'aw-bits'}";
> +
> + # TODO remove message once this gets properly checked/warned about in QEMU itself.
> + print "vIOMMU 'aw-bits' set to $machine_conf->{'aw-bits'}. Sometimes it is necessary to"
> + . " set the CPU's 'guest-phys-bits' to the same value.\n";
> + }
>
> if ($viommu eq 'intel') {
> $viommu_devstr = "intel-iommu,intremap=on,caching-mode=on$viommu_devstr";
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* [pve-devel] applied: [PATCH qemu-server v3] fix #6608: expose viommu driver aw-bits option
2025-09-05 14:15 [pve-devel] [PATCH qemu-server v3] fix #6608: expose viommu driver aw-bits option Daniel Kral
2025-09-08 9:35 ` Fiona Ebner
@ 2025-09-08 10:07 ` Fiona Ebner
1 sibling, 0 replies; 4+ messages in thread
From: Fiona Ebner @ 2025-09-08 10:07 UTC (permalink / raw)
To: pve-devel, Daniel Kral
On Fri, 05 Sep 2025 16:15:06 +0200, Daniel Kral wrote:
> Since QEMU 9.2 [0], the default I/O address space bit width was raised
> from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
> aw-bits check introduced in [1] to trip for host CPUs with less than 48
> bits physical address width from QEMU 9.2 onwards:
>
> vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
>
> [...]
Applied, together with my follow-up, thanks!
[1/1] fix #6608: expose viommu driver aw-bits option
commit: dc52c006ce0527181556aaa363f49082cd613b5c
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-08 10:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-05 14:15 [pve-devel] [PATCH qemu-server v3] fix #6608: expose viommu driver aw-bits option Daniel Kral
2025-09-08 9:35 ` Fiona Ebner
2025-09-08 9:40 ` Daniel Kral
2025-09-08 10:07 ` [pve-devel] applied: " Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox