public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378
@ 2025-09-02 11:21 Daniel Kral
  2025-09-02 11:21 ` [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values Daniel Kral
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Daniel Kral @ 2025-09-02 11:21 UTC (permalink / raw)
  To: pve-devel

This exposes the intel-iommu and virtio-iommu's aw-bits option through
the machine conf property string to users to properly set the aw-bits
for the vIOMMU. This allows intel-iommu users to override the new
default value of 48 for host IOMMUs that have a maximum guest address
width less than 48 bits (e.g. 39, 41, or 46 bits for Intel
consumer-grade CPUs). Else they get the fatal error on startup:

vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39

Additionally, qemu-server #1, qemu-server #2-4 add warnings about the
above (qemu-server #4) and another error that users run into:

    kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)

See qemu-server #3 for more information for the latter.


pve-common.git:

Daniel Kral (1):
  procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values

 src/PVE/ProcFSTools.pm | 5 +++++
 1 file changed, 5 insertions(+)


qemu-server.git:

Daniel Kral (4):
  fix #6608: expose viommu driver aw-bits option
  cpu config: factor out gathering common cpu properties
  fix #6378 (continued): warn intel-iommu users about iommu and host aw
    bits mismatch
  machine: warn intel-iommu users about too large address width

 src/PVE/QemuServer.pm                         |  16 ++-
 src/PVE/QemuServer/CPUConfig.pm               | 100 ++++++++++--------
 src/PVE/QemuServer/Machine.pm                 |  50 ++++++++-
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf     |   2 +
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd |  25 +++++
 .../q35-viommu-intel-exceeding-aw-bits.conf   |   4 +
 ...35-viommu-intel-exceeding-aw-bits.conf.cmd |  25 +++++
 .../cfg2cmd/q35-viommu-virtio-aw-bits.conf    |   2 +
 .../q35-viommu-virtio-aw-bits.conf.cmd        |  25 +++++
 src/test/run_config2command_tests.pl          |   8 ++
 10 files changed, 208 insertions(+), 49 deletions(-)
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd
 create mode 100644 src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd


Summary over all repositories:
  11 files changed, 213 insertions(+), 49 deletions(-)

-- 
Generated by git-murpp 0.8.0


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values
  2025-09-02 11:21 [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378 Daniel Kral
@ 2025-09-02 11:21 ` Daniel Kral
  2025-09-05  9:10   ` Fiona Ebner
  2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option Daniel Kral
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Daniel Kral @ 2025-09-02 11:21 UTC (permalink / raw)
  To: pve-devel

The address sizes line is taken from the kernel's implementation of
/proc/cpuinfo in arch/x86/kernel/cpu/proc.c.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/ProcFSTools.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/PVE/ProcFSTools.pm b/src/PVE/ProcFSTools.pm
index 9bfac2c..41efd1f 100644
--- a/src/PVE/ProcFSTools.pm
+++ b/src/PVE/ProcFSTools.pm
@@ -32,6 +32,8 @@ sub read_cpuinfo {
         cpus => 1,
         sockets => 1,
         flags => '',
+        phys_bits => 0,
+        virt_bits => 0,
     };
 
     my $fh = IO::File->new($fn, "r");
@@ -54,6 +56,9 @@ sub read_cpuinfo {
             $idhash->{$1} = 1 if not defined($idhash->{$1});
         } elsif ($line =~ m/^cpu cores\s*:\s*(\d+)\s*$/i) {
             $idhash->{$cpuid} = $1 if defined($idhash->{$cpuid});
+        } elsif ($line =~ m/^address sizes\t: (\d+) bits physical, (\d+) bits virtual$/i) {
+            $res->{phys_bits} = $1 if !$res->{phys_bits};
+            $res->{virt_bits} = $2 if !$res->{virt_bits};
         }
     }
 
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option
  2025-09-02 11:21 [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378 Daniel Kral
  2025-09-02 11:21 ` [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values Daniel Kral
@ 2025-09-02 11:21 ` Daniel Kral
  2025-09-05 10:07   ` Fiona Ebner
  2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 2/4] cpu config: factor out gathering common cpu properties Daniel Kral
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Daniel Kral @ 2025-09-02 11:21 UTC (permalink / raw)
  To: pve-devel

Since QEMU 9.2 [0], the default I/O address space bit width was raised
from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
aw-bits check introduced in [1] to trip for host CPUs with less than 48
bits physical address width from QEMU 9.2 onwards:

vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39

For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
the behavior of the check.

Therefore, expose the 'aw-bits' option of the intel-iommu and
virtio-iommu QEMU drivers to allow users to set the value.

[0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
[1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes from v1:
  - add test names
  - add virtio-iommu test as suggested by @Fiona off-list

 src/PVE/QemuServer.pm                         |  9 +++++--
 src/PVE/QemuServer/Machine.pm                 | 21 +++++++++++++---
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf     |  2 ++
 .../cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd | 25 +++++++++++++++++++
 .../cfg2cmd/q35-viommu-virtio-aw-bits.conf    |  2 ++
 .../q35-viommu-virtio-aw-bits.conf.cmd        | 25 +++++++++++++++++++
 6 files changed, 79 insertions(+), 5 deletions(-)
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
 create mode 100644 src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index 9597d316..04e988c7 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -3903,11 +3903,16 @@ sub config_to_command {
     PVE::QemuServer::Machine::assert_valid_machine_property($machine_conf);
 
     if (my $viommu = $machine_conf->{viommu}) {
+        my $viommu_devstr = '';
+        $viommu_devstr .= ",aw-bits=$machine_conf->{'aw-bits'}" if $machine_conf->{'aw-bits'};
+
         if ($viommu eq 'intel') {
-            unshift @$devices, '-device', 'intel-iommu,intremap=on,caching-mode=on';
+            $viommu_devstr = "intel-iommu,intremap=on,caching-mode=on$viommu_devstr";
+            unshift @$devices, '-device', $viommu_devstr;
             push @$machineFlags, 'kernel-irqchip=split';
         } elsif ($viommu eq 'virtio') {
-            push @$devices, '-device', 'virtio-iommu-pci';
+            $viommu_devstr = "virtio-iommu-pci$viommu_devstr";
+            push @$devices, '-device', $viommu_devstr;
         }
     }
 
diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm
index b61667e0..57d583c2 100644
--- a/src/PVE/QemuServer/Machine.pm
+++ b/src/PVE/QemuServer/Machine.pm
@@ -58,6 +58,16 @@ my $machine_fmt = {
         enum => ['intel', 'virtio'],
         optional => 1,
     },
+    'aw-bits' => {
+        type => 'number',
+        description => "Specifies the vIOMMU address space bit width.",
+        verbose_description => "Specifies the vIOMMU address space bit width.\n\n"
+            . "Intel vIOMMU supports a bit width of either 39 or 48 bits and"
+            . " VirtIO vIOMMU supports any bit width between 32 and 64 bits.",
+        minimum => 32,
+        maximum => 64,
+        optional => 1,
+    },
     'enable-s3' => {
         type => 'boolean',
         description =>
@@ -112,9 +122,14 @@ sub default_machine_for_arch {
 
 sub assert_valid_machine_property {
     my ($machine_conf) = @_;
-    my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
-    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel" && !$q35) {
-        die "to use Intel vIOMMU please set the machine type to q35\n";
+    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel") {
+        my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
+        die "to use Intel vIOMMU please set the machine type to q35\n" if !$q35;
+
+        die "Intel vIOMMU supports only 39 or 48 bits as address width\n"
+            if $machine_conf->{'aw-bits'}
+            && $machine_conf->{'aw-bits'} != 39
+            && $machine_conf->{'aw-bits'} != 48;
     }
 }
 
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
new file mode 100644
index 00000000..9e84e42e
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf
@@ -0,0 +1,2 @@
+# TEST: Check if aw-bits are propagated correctly to intel-iommu device
+machine: q35,viommu=intel,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
new file mode 100644
index 00000000..030ccaa5
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'vm8006,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smp '1,sockets=1,cores=1,maxcpus=1' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 512 \
+  -global 'ICH9-LPC.disable_s3=1' \
+  -global 'ICH9-LPC.disable_s4=1' \
+  -device 'intel-iommu,intremap=on,caching-mode=on,aw-bits=39' \
+  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=q35+pve0,kernel-irqchip=split'
diff --git a/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf
new file mode 100644
index 00000000..dd8ef1fd
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf
@@ -0,0 +1,2 @@
+# TEST: Check if aw-bits are propagated correctly to virtio-iommu-pci device
+machine: q35,viommu=virtio,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd
new file mode 100644
index 00000000..c3b12eee
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-virtio-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'vm8006,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smp '1,sockets=1,cores=1,maxcpus=1' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 512 \
+  -global 'ICH9-LPC.disable_s3=1' \
+  -global 'ICH9-LPC.disable_s4=1' \
+  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -device 'virtio-iommu-pci,aw-bits=39' \
+  -machine 'type=q35+pve0'
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [PATCH qemu-server v2 2/4] cpu config: factor out gathering common cpu properties
  2025-09-02 11:21 [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378 Daniel Kral
  2025-09-02 11:21 ` [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values Daniel Kral
  2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option Daniel Kral
@ 2025-09-02 11:21 ` Daniel Kral
  2025-09-05 10:32   ` Fiona Ebner
  2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch Daniel Kral
  2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 4/4] machine: warn intel-iommu users about too large address width Daniel Kral
  4 siblings, 1 reply; 17+ messages in thread
From: Daniel Kral @ 2025-09-02 11:21 UTC (permalink / raw)
  To: pve-devel

The same logic is already present in print_cpu_device(...),
get_cpu_options(...), and get_cpu_bitness(...) and will also be used in
a new helper the next patch, so factor it out in preparation.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/QemuServer/CPUConfig.pm | 54 +++++++++++----------------------
 1 file changed, 17 insertions(+), 37 deletions(-)

diff --git a/src/PVE/QemuServer/CPUConfig.pm b/src/PVE/QemuServer/CPUConfig.pm
index 786a99d8..f57275dd 100644
--- a/src/PVE/QemuServer/CPUConfig.pm
+++ b/src/PVE/QemuServer/CPUConfig.pm
@@ -492,30 +492,15 @@ sub print_cpu_device {
     die "Hotplug of non x86_64 CPU not yet supported" if $arch ne 'x86_64';
 
     my $kvm = $conf->{kvm} // is_native_arch($arch);
-    my $cpu = get_default_cpu_type('x86_64', $kvm);
-    if (my $cputype = $conf->{cpu}) {
-        my $cpuconf = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $cputype)
-            or die "Cannot parse cpu description: $cputype\n";
-        $cpu = $cpuconf->{cputype};
-
-        if (my $model = $builtin_models->{$cpu}) {
-            $cpu = $model->{'reported-model'};
-        } elsif (is_custom_model($cputype)) {
-            my $custom_cpu = get_custom_model($cpu);
-
-            $cpu = $custom_cpu->{'reported-model'} // $cpu_fmt->{'reported-model'}->{default};
-        }
-        if (my $replacement_type = $depreacated_cpu_map->{$cpu}) {
-            $cpu = $replacement_type;
-        }
-    }
+    my ($cputype) = get_cpu_properties($conf->{cpu}, 'x86_64', $kvm);
 
     my $cores = $conf->{cores} || 1;
 
     my $current_core = ($id - 1) % $cores;
     my $current_socket = int(($id - 1 - $current_core) / $cores);
 
-    return "$cpu-x86_64-cpu,id=cpu$id,socket-id=$current_socket,core-id=$current_core,thread-id=0";
+    return
+        "$cputype-x86_64-cpu,id=cpu$id,socket-id=$current_socket,core-id=$current_core,thread-id=0";
 }
 
 # Resolves multiple arrays of hashes representing CPU flags with metadata to a
@@ -597,9 +582,8 @@ sub parse_cpuflag_list {
     return $res;
 }
 
-# Calculate QEMU's '-cpu' argument from a given VM configuration
-sub get_cpu_options {
-    my ($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough) = @_;
+sub get_cpu_properties {
+    my ($cpu_prop_str, $arch, $kvm, $kvm_off) = @_;
 
     my $cputype = get_default_cpu_type($arch, $kvm);
 
@@ -607,7 +591,7 @@ sub get_cpu_options {
     my $custom_cpu;
     my $builtin_cpu;
     my $hv_vendor_id;
-    if (my $cpu_prop_str = $conf->{cpu}) {
+    if ($cpu_prop_str) {
         $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $cpu_prop_str)
             or die "Cannot parse cpu description: $cpu_prop_str\n";
 
@@ -632,6 +616,16 @@ sub get_cpu_options {
         $hv_vendor_id = $cpu->{'hv-vendor-id'} if defined($cpu->{'hv-vendor-id'});
     }
 
+    return ($cputype, $cpu, $custom_cpu, $builtin_cpu, $kvm_off, $hv_vendor_id);
+}
+
+# Calculate QEMU's '-cpu' argument from a given VM configuration
+sub get_cpu_options {
+    my ($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough) = @_;
+
+    (my $cputype, my $cpu, my $custom_cpu, my $builtin_cpu, $kvm_off, my $hv_vendor_id) =
+        get_cpu_properties($conf->{cpu}, $arch, $kvm, $kvm_off);
+
     my $pve_flags = get_pve_cpu_flags($conf, $kvm, $cputype, $arch, $machine_version);
 
     my $hv_flags =
@@ -842,21 +836,7 @@ sub get_cpu_bitness {
 
     $arch //= get_host_arch();
 
-    my $cputype = get_default_cpu_type($arch, 0);
-
-    if ($cpu_prop_str) {
-        my $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $cpu_prop_str)
-            or die "Cannot parse cpu description: $cpu_prop_str\n";
-
-        $cputype = $cpu->{cputype};
-
-        if (my $model = $builtin_models->{$cputype}) {
-            $cputype = $model->{'reported-model'};
-        } elsif (is_custom_model($cputype)) {
-            my $custom_cpu = get_custom_model($cputype);
-            $cputype = $custom_cpu->{'reported-model'} // $cpu_fmt->{'reported-model'}->{default};
-        }
-    }
+    my ($cputype) = get_cpu_properties($cpu_prop_str, $arch);
 
     return $cputypes_32bit->{$cputype} ? 32 : 64 if $arch eq 'x86_64';
     return 64 if $arch eq 'aarch64';
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch
  2025-09-02 11:21 [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378 Daniel Kral
                   ` (2 preceding siblings ...)
  2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 2/4] cpu config: factor out gathering common cpu properties Daniel Kral
@ 2025-09-02 11:22 ` Daniel Kral
  2025-09-02 11:26   ` Daniel Kral
  2025-09-05 10:50   ` Fiona Ebner
  2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 4/4] machine: warn intel-iommu users about too large address width Daniel Kral
  4 siblings, 2 replies; 17+ messages in thread
From: Daniel Kral @ 2025-09-02 11:22 UTC (permalink / raw)
  To: pve-devel

For certain host CPUs, such as Intel consumer-grade CPUs, there is a
frequent mismatch between the CPU's physical address width and the
IOMMU's address width.

If a virtual machine is setup with an intel-iommu device, qemu allocates
and maps the (virtual) I/O address space (IOAS) for a VFIO passthrough
device with iommufd.

In case of a mismatch of the address width of the host CPU and IOMMU
CPU, the guest physical address space (GPAS) and memory-type range
registers (MTRRs) are setup to the host CPU's address width, which
causes IOAS to be allocated and mapped outside of the IOMMU's maximum
guest address width (MGAW) and causes the following error from qemu (the
error message is copied from the user forum [0]):

    kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)

This error is rather confusing and unhelpful to users, so warn them
about a CPU physical address width that exceeds the IOMMU address width.

[0] https://forum.proxmox.com/threads/vm-wont-start-with-pci-passthrough-after-upgrade-to-9-0.169586/page-3#post-795717

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
I already talked about this with @Fiona off-list, but the code this
adds to qemu-server only for a warning is quite a lot, but is more
readable than the above error that is only issued when the VM is already
run.

Particularily, I don't like the logic duplication of
get_cpu_address_width(...), which tries to copy what
target/i386/{,host-,kvm/kvm-}cpu.c do to retrieve the {,guest_}phys_bits
value, where I'd rather see this implemented in pve-qemu as in [0].

There are two qemu and edk2 discussion threads that might help in
deciding how to go with this patch [0] [1]. It could also be better to
implement this downstream in pve-qemu for now similar to [0], or of
course contribute to upstream with an actual fix.

[0] https://lore.kernel.org/qemu-devel/20250130115800.60b7cbe6.alex.williamson@redhat.com/
[1] https://edk2.groups.io/g/devel/topic/patch_v1/102359124

 src/PVE/QemuServer.pm                         |  7 ++-
 src/PVE/QemuServer/CPUConfig.pm               | 46 +++++++++++++++++--
 src/PVE/QemuServer/Machine.pm                 | 13 +++++-
 .../q35-viommu-intel-exceeding-aw-bits.conf   |  4 ++
 ...35-viommu-intel-exceeding-aw-bits.conf.cmd | 25 ++++++++++
 5 files changed, 88 insertions(+), 7 deletions(-)
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf
 create mode 100644 src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index 04e988c7..6d31bf40 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -61,7 +61,7 @@ use PVE::QemuServer::Helpers
 use PVE::QemuServer::Cloudinit;
 use PVE::QemuServer::CGroup;
 use PVE::QemuServer::CPUConfig
-    qw(print_cpu_device get_cpu_options get_cpu_bitness is_native_arch get_amd_sev_object get_amd_sev_type);
+    qw(print_cpu_device get_cpu_options get_cpu_bitness get_cpu_address_width is_native_arch get_amd_sev_object get_amd_sev_type);
 use PVE::QemuServer::Drive qw(
     is_valid_drivename
     checked_volume_format
@@ -3901,6 +3901,11 @@ sub config_to_command {
     push @$machineFlags, "type=${machine_type_min}";
 
     PVE::QemuServer::Machine::assert_valid_machine_property($machine_conf);
+    PVE::QemuServer::Machine::check_valid_iommu_address_width(
+        $machine_conf,
+        $machine_version,
+        get_cpu_address_width($conf->{cpu}, $arch, $cpuinfo->{phys_bits}),
+    );
 
     if (my $viommu = $machine_conf->{viommu}) {
         my $viommu_devstr = '';
diff --git a/src/PVE/QemuServer/CPUConfig.pm b/src/PVE/QemuServer/CPUConfig.pm
index f57275dd..4671ead9 100644
--- a/src/PVE/QemuServer/CPUConfig.pm
+++ b/src/PVE/QemuServer/CPUConfig.pm
@@ -16,6 +16,7 @@ our @EXPORT_OK = qw(
     print_cpu_device
     get_cpu_options
     get_cpu_bitness
+    get_cpu_address_width
     is_native_arch
     get_amd_sev_object
     get_amd_sev_type
@@ -681,8 +682,21 @@ sub get_cpu_options {
         $pve_forced_flags,
     );
 
+    my $phys_bits_options = get_cpu_phys_bits_options($cpu, $custom_cpu);
+    for my $key (sort keys %$phys_bits_options) {
+        $cpu_str .= ",$key=$phys_bits_options->{$key}";
+    }
+
+    return ('-cpu', $cpu_str);
+}
+
+sub get_cpu_phys_bits_options {
+    my ($cpu, $custom_cpu) = @_;
+
+    my $phys_bits_options = {};
+
     for my $phys_bits_opt (qw(guest-phys-bits phys-bits)) {
-        my $phys_bits = '';
+        my ($key, $value) = ($phys_bits_opt, undef);
         foreach my $conf ($custom_cpu, $cpu) {
             next if !defined($conf);
             my $conf_val = $conf->{$phys_bits_opt};
@@ -690,15 +704,15 @@ sub get_cpu_options {
             if ($conf_val eq 'host') {
                 die "unexpected value 'host' for guest-phys-bits"
                     if $phys_bits_opt eq 'guest-phys-bits';
-                $phys_bits = ",host-phys-bits=true";
+                ($key, $value) = ('host-phys-bits', 'true');
             } else {
-                $phys_bits = ",${phys_bits_opt}=${conf_val}";
+                $value = $conf_val;
             }
         }
-        $cpu_str .= $phys_bits;
+        $phys_bits_options->{$key} = $value if $value;
     }
 
-    return ('-cpu', $cpu_str);
+    return $phys_bits_options;
 }
 
 # Some hardcoded flags required by certain configurations
@@ -844,6 +858,28 @@ sub get_cpu_bitness {
     die "unsupported architecture '$arch'\n";
 }
 
+sub get_cpu_address_width {
+    my ($cpu_prop_str, $arch, $host_phys_bits) = @_;
+
+    $arch //= get_host_arch();
+
+    my ($cputype, $cpu, $custom_cpu) = get_cpu_properties($cpu_prop_str, $arch);
+    my $phys_bits_options = get_cpu_phys_bits_options($cpu, $custom_cpu);
+    my ($phys_bits, $guest_phys_bits) = $phys_bits_options->@{qw(phys-bits guest-phys-bits)};
+
+    my $cpu_aw_bits = 0;
+    $cpu_aw_bits = $guest_phys_bits if $guest_phys_bits;
+    $cpu_aw_bits = $phys_bits if $phys_bits && $cpu_aw_bits > $phys_bits;
+    $cpu_aw_bits = $phys_bits if $phys_bits && !$cpu_aw_bits;
+    $cpu_aw_bits = $host_phys_bits if $host_phys_bits && !$cpu_aw_bits;
+    $cpu_aw_bits = 40 if !$cpu_aw_bits; # fallback to TCG_PHYS_ADDR_BITS
+
+    return int($cpu_aw_bits) if $arch eq 'x86_64';
+    return undef if $arch eq 'aarch64';
+
+    die "unsupported architecture '$arch'\n";
+}
+
 sub get_hw_capabilities {
     # Get reduced-phys-bits & cbitpos from host-hw-capabilities.json
     # TODO: Find better location than /run/qemu-server/
diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm
index 57d583c2..c083a27b 100644
--- a/src/PVE/QemuServer/Machine.pm
+++ b/src/PVE/QemuServer/Machine.pm
@@ -3,7 +3,7 @@ package PVE::QemuServer::Machine;
 use strict;
 use warnings;
 
-use PVE::QemuServer::Helpers;
+use PVE::QemuServer::Helpers qw(min_version);
 use PVE::QemuServer::MetaInfo;
 use PVE::QemuServer::Monitor;
 use PVE::JSONSchema qw(get_standard_option parse_property_string print_property_string);
@@ -133,6 +133,17 @@ sub assert_valid_machine_property {
     }
 }
 
+sub check_valid_iommu_address_width {
+    my ($machine_conf, $machine_version, $cpu_aw_bits) = @_;
+    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq 'intel') {
+        my $iommu_aw_bits_default = min_version($machine_version, 9, 2) ? 48 : 39;
+        my $iommu_aw_bits = $machine_conf->{'aw-bits'} // $iommu_aw_bits_default;
+
+        warn "guest address width exceeds vIOMMU address width: $cpu_aw_bits > $iommu_aw_bits\n"
+            if $cpu_aw_bits && $iommu_aw_bits && $cpu_aw_bits > $iommu_aw_bits;
+    }
+}
+
 sub machine_type_is_q35 {
     my ($conf) = @_;
 
diff --git a/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf
new file mode 100644
index 00000000..d6cff715
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf
@@ -0,0 +1,4 @@
+# TEST: Check if exceeding guest-phys-bits > iommu aw-bits is correctly warned about
+# EXPECT_WARN: guest address width exceeds vIOMMU address width: 46 > 39
+cpu: host,guest-phys-bits=46
+machine: q35,viommu=intel,aw-bits=39
diff --git a/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd
new file mode 100644
index 00000000..0ec488ae
--- /dev/null
+++ b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd
@@ -0,0 +1,25 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'vm8006,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smp '1,sockets=1,cores=1,maxcpus=1' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu 'host,+kvm_pv_eoi,+kvm_pv_unhalt,guest-phys-bits=46' \
+  -m 512 \
+  -global 'ICH9-LPC.disable_s3=1' \
+  -global 'ICH9-LPC.disable_s4=1' \
+  -device 'intel-iommu,intremap=on,caching-mode=on,aw-bits=39' \
+  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
+  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
+  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=q35+pve0,kernel-irqchip=split'
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [pve-devel] [RFC qemu-server v2 4/4] machine: warn intel-iommu users about too large address width
  2025-09-02 11:21 [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378 Daniel Kral
                   ` (3 preceding siblings ...)
  2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch Daniel Kral
@ 2025-09-02 11:22 ` Daniel Kral
  2025-09-05 10:55   ` Fiona Ebner
  4 siblings, 1 reply; 17+ messages in thread
From: Daniel Kral @ 2025-09-02 11:22 UTC (permalink / raw)
  To: pve-devel

Similarily to the guest address width exceeding the vIOMMU address
width, also warn users about the Intel vIOMMU address width being larger
than the maximum allowed value, the maximum guest address width, as
reported by the hardware.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
I only added this as it was relatively cheap to implement now, but the
user will be stopped by starting the VM anyway by issuing something
like:

kvm: -device vfio-pci,host=0000:00:02.0,id=hostpci0,bus=pci.0,addr=0x10:
    vfio 0000:00:02.0: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39

 src/PVE/QemuServer/Machine.pm        | 16 ++++++++++++++++
 src/test/run_config2command_tests.pl |  8 ++++++++
 2 files changed, 24 insertions(+)

diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm
index c083a27b..18b4a1b5 100644
--- a/src/PVE/QemuServer/Machine.pm
+++ b/src/PVE/QemuServer/Machine.pm
@@ -133,12 +133,28 @@ sub assert_valid_machine_property {
     }
 }
 
+sub get_maximum_iommu_address_width {
+    my $max_iommu_aw_bits;
+
+    if (-d "/sys/class/iommu/dmar0") {
+        my $cap = PVE::Tools::file_read_firstline("/sys/class/iommu/dmar0/intel-iommu/cap");
+        # bits 21:16 contain the host's iommu maximum guest address width (MGAW) value
+        # hex(...) warns on 64-bit hex values, so use substr(...) to retrieve needed byte
+        $max_iommu_aw_bits = (hex("0x" . substr($cap, -6, 2)) & 0x3F) + 1;
+    }
+
+    return $max_iommu_aw_bits;
+}
+
 sub check_valid_iommu_address_width {
     my ($machine_conf, $machine_version, $cpu_aw_bits) = @_;
     if ($machine_conf->{viommu} && $machine_conf->{viommu} eq 'intel') {
         my $iommu_aw_bits_default = min_version($machine_version, 9, 2) ? 48 : 39;
         my $iommu_aw_bits = $machine_conf->{'aw-bits'} // $iommu_aw_bits_default;
+        my $max_iommu_aw_bits = get_maximum_iommu_address_width();
 
+        warn "Intel vIOMMU address width larger than maximum: $iommu_aw_bits > $max_iommu_aw_bits\n"
+            if $iommu_aw_bits && $max_iommu_aw_bits && $iommu_aw_bits > $max_iommu_aw_bits;
         warn "guest address width exceeds vIOMMU address width: $cpu_aw_bits > $iommu_aw_bits\n"
             if $cpu_aw_bits && $iommu_aw_bits && $cpu_aw_bits > $iommu_aw_bits;
     }
diff --git a/src/test/run_config2command_tests.pl b/src/test/run_config2command_tests.pl
index 0623b5c1..7594c52a 100755
--- a/src/test/run_config2command_tests.pl
+++ b/src/test/run_config2command_tests.pl
@@ -350,6 +350,14 @@ $qemu_server_config->mock(
     },
 );
 
+my $qemu_server_machine;
+$qemu_server_machine = Test::MockModule->new('PVE::QemuServer::Machine');
+$qemu_server_machine->mock(
+    get_maximum_iommu_address_width => sub {
+        return 48;
+    },
+);
+
 my $qemu_server_memory;
 $qemu_server_memory = Test::MockModule->new('PVE::QemuServer::Memory');
 $qemu_server_memory->mock(
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch
  2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch Daniel Kral
@ 2025-09-02 11:26   ` Daniel Kral
  2025-09-05 10:50   ` Fiona Ebner
  1 sibling, 0 replies; 17+ messages in thread
From: Daniel Kral @ 2025-09-02 11:26 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: pve-devel

On Tue Sep 2, 2025 at 1:22 PM CEST, Daniel Kral wrote:
> For certain host CPUs, such as Intel consumer-grade CPUs, there is a
> frequent mismatch between the CPU's physical address width and the
> IOMMU's address width.
>
> If a virtual machine is setup with an intel-iommu device, qemu allocates
> and maps the (virtual) I/O address space (IOAS) for a VFIO passthrough
> device with iommufd.
>
> In case of a mismatch of the address width of the host CPU and IOMMU
> CPU, the guest physical address space (GPAS) and memory-type range

small error: it's just IOMMU, not "IOMMU CPU"

> registers (MTRRs) are setup to the host CPU's address width, which
> causes IOAS to be allocated and mapped outside of the IOMMU's maximum
> guest address width (MGAW) and causes the following error from qemu (the
> error message is copied from the user forum [0]):
>
>     kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
>
> This error is rather confusing and unhelpful to users, so warn them
> about a CPU physical address width that exceeds the IOMMU address width.
>
> [0] https://forum.proxmox.com/threads/vm-wont-start-with-pci-passthrough-after-upgrade-to-9-0.169586/page-3#post-795717
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values
  2025-09-02 11:21 ` [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values Daniel Kral
@ 2025-09-05  9:10   ` Fiona Ebner
  2025-09-05 11:47     ` Daniel Kral
  0 siblings, 1 reply; 17+ messages in thread
From: Fiona Ebner @ 2025-09-05  9:10 UTC (permalink / raw)
  To: Proxmox VE development discussion, Daniel Kral

Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
> The address sizes line is taken from the kernel's implementation of
> /proc/cpuinfo in arch/x86/kernel/cpu/proc.c.
> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
>  src/PVE/ProcFSTools.pm | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/PVE/ProcFSTools.pm b/src/PVE/ProcFSTools.pm
> index 9bfac2c..41efd1f 100644
> --- a/src/PVE/ProcFSTools.pm
> +++ b/src/PVE/ProcFSTools.pm
> @@ -32,6 +32,8 @@ sub read_cpuinfo {
>          cpus => 1,
>          sockets => 1,
>          flags => '',
> +        phys_bits => 0,
> +        virt_bits => 0,
>      };
>  
>      my $fh = IO::File->new($fn, "r");
> @@ -54,6 +56,9 @@ sub read_cpuinfo {
>              $idhash->{$1} = 1 if not defined($idhash->{$1});
>          } elsif ($line =~ m/^cpu cores\s*:\s*(\d+)\s*$/i) {
>              $idhash->{$cpuid} = $1 if defined($idhash->{$cpuid});
> +        } elsif ($line =~ m/^address sizes\t: (\d+) bits physical, (\d+) bits virtual$/i) {

I'd prefer to match whitespaces with \s* and also allow whitespaces at
the end like is done in the regex for CPU cores. This is for
future-proofing, because e.g. the file could get a new option with a
long name for which additional tabs will be introduced.

> +            $res->{phys_bits} = $1 if !$res->{phys_bits};
> +            $res->{virt_bits} = $2 if !$res->{virt_bits};
>          }
>      }
>  



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option
  2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option Daniel Kral
@ 2025-09-05 10:07   ` Fiona Ebner
  2025-09-05 11:45     ` Daniel Kral
  0 siblings, 1 reply; 17+ messages in thread
From: Fiona Ebner @ 2025-09-05 10:07 UTC (permalink / raw)
  To: Proxmox VE development discussion, Daniel Kral

Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
> Since QEMU 9.2 [0], the default I/O address space bit width was raised
> from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
> aw-bits check introduced in [1] to trip for host CPUs with less than 48

s/to trip/fail/

> bits physical address width from QEMU 9.2 onwards:
> 
> vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
> 
> For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
> with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
> VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
> the behavior of the check.
> 
> Therefore, expose the 'aw-bits' option of the intel-iommu and
> virtio-iommu QEMU drivers to allow users to set the value.
> 
> [0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/
> 

Nit: I'd prefer references to qemu commits rather than mails

> @@ -112,9 +122,14 @@ sub default_machine_for_arch {
>  
>  sub assert_valid_machine_property {
>      my ($machine_conf) = @_;
> -    my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
> -    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel" && !$q35) {
> -        die "to use Intel vIOMMU please set the machine type to q35\n";
> +    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel") {
> +        my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
> +        die "to use Intel vIOMMU please set the machine type to q35\n" if !$q35;
> +
> +        die "Intel vIOMMU supports only 39 or 48 bits as address width\n"
> +            if $machine_conf->{'aw-bits'}
> +            && $machine_conf->{'aw-bits'} != 39
> +            && $machine_conf->{'aw-bits'} != 48;
>      }

There should be an error (or at least warning) when aw-bits is set
without setting a viommu.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 2/4] cpu config: factor out gathering common cpu properties
  2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 2/4] cpu config: factor out gathering common cpu properties Daniel Kral
@ 2025-09-05 10:32   ` Fiona Ebner
  0 siblings, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2025-09-05 10:32 UTC (permalink / raw)
  To: Proxmox VE development discussion, Daniel Kral

Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
> The same logic is already present in print_cpu_device(...),
> get_cpu_options(...), and get_cpu_bitness(...) and will also be used in
> a new helper the next patch, so factor it out in preparation.
> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>

with two comments below

> ---
>  src/PVE/QemuServer/CPUConfig.pm | 54 +++++++++++----------------------
>  1 file changed, 17 insertions(+), 37 deletions(-)
> 
> diff --git a/src/PVE/QemuServer/CPUConfig.pm b/src/PVE/QemuServer/CPUConfig.pm
> index 786a99d8..f57275dd 100644
> --- a/src/PVE/QemuServer/CPUConfig.pm
> +++ b/src/PVE/QemuServer/CPUConfig.pm
> @@ -492,30 +492,15 @@ sub print_cpu_device {
>      die "Hotplug of non x86_64 CPU not yet supported" if $arch ne 'x86_64';
>  
>      my $kvm = $conf->{kvm} // is_native_arch($arch);
> -    my $cpu = get_default_cpu_type('x86_64', $kvm);
> -    if (my $cputype = $conf->{cpu}) {
> -        my $cpuconf = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $cputype)
> -            or die "Cannot parse cpu description: $cputype\n";
> -        $cpu = $cpuconf->{cputype};
> -
> -        if (my $model = $builtin_models->{$cpu}) {
> -            $cpu = $model->{'reported-model'};
> -        } elsif (is_custom_model($cputype)) {
> -            my $custom_cpu = get_custom_model($cpu);
> -
> -            $cpu = $custom_cpu->{'reported-model'} // $cpu_fmt->{'reported-model'}->{default};
> -        }
> -        if (my $replacement_type = $depreacated_cpu_map->{$cpu}) {
> -            $cpu = $replacement_type;
> -        }
> -    }
> +    my ($cputype) = get_cpu_properties($conf->{cpu}, 'x86_64', $kvm);

Nit: even if it's the only possible value right now, I'd still use $arch
instead of hardcoding 'x86_64'.

>  
>      my $cores = $conf->{cores} || 1;
>  
>      my $current_core = ($id - 1) % $cores;
>      my $current_socket = int(($id - 1 - $current_core) / $cores);
>  
> -    return "$cpu-x86_64-cpu,id=cpu$id,socket-id=$current_socket,core-id=$current_core,thread-id=0";
> +    return
> +        "$cputype-x86_64-cpu,id=cpu$id,socket-id=$current_socket,core-id=$current_core,thread-id=0";
>  }
>  
>  # Resolves multiple arrays of hashes representing CPU flags with metadata to a
> @@ -597,9 +582,8 @@ sub parse_cpuflag_list {
>      return $res;
>  }
>  
> -# Calculate QEMU's '-cpu' argument from a given VM configuration
> -sub get_cpu_options {
> -    my ($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough) = @_;
> +sub get_cpu_properties {
> +    my ($cpu_prop_str, $arch, $kvm, $kvm_off) = @_;

Alternatively, we could not pass $kvm_off and have the single caller
override its own $kvm_off only when the returned value is defined. Not
sure if that's cleaner, both seem slightly awkward.

>  
>      my $cputype = get_default_cpu_type($arch, $kvm);
>  
> @@ -607,7 +591,7 @@ sub get_cpu_options {
>      my $custom_cpu;
>      my $builtin_cpu;
>      my $hv_vendor_id;
> -    if (my $cpu_prop_str = $conf->{cpu}) {
> +    if ($cpu_prop_str) {
>          $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $cpu_prop_str)
>              or die "Cannot parse cpu description: $cpu_prop_str\n";
>  
> @@ -632,6 +616,16 @@ sub get_cpu_options {
>          $hv_vendor_id = $cpu->{'hv-vendor-id'} if defined($cpu->{'hv-vendor-id'});
>      }
>  
> +    return ($cputype, $cpu, $custom_cpu, $builtin_cpu, $kvm_off, $hv_vendor_id);
> +}
> +
> +# Calculate QEMU's '-cpu' argument from a given VM configuration
> +sub get_cpu_options {
> +    my ($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough) = @_;
> +
> +    (my $cputype, my $cpu, my $custom_cpu, my $builtin_cpu, $kvm_off, my $hv_vendor_id) =
> +        get_cpu_properties($conf->{cpu}, $arch, $kvm, $kvm_off);
> +
>      my $pve_flags = get_pve_cpu_flags($conf, $kvm, $cputype, $arch, $machine_version);
>  
>      my $hv_flags =
> @@ -842,21 +836,7 @@ sub get_cpu_bitness {
>  
>      $arch //= get_host_arch();
>  
> -    my $cputype = get_default_cpu_type($arch, 0);
> -
> -    if ($cpu_prop_str) {
> -        my $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $cpu_prop_str)
> -            or die "Cannot parse cpu description: $cpu_prop_str\n";
> -
> -        $cputype = $cpu->{cputype};
> -
> -        if (my $model = $builtin_models->{$cputype}) {
> -            $cputype = $model->{'reported-model'};
> -        } elsif (is_custom_model($cputype)) {
> -            my $custom_cpu = get_custom_model($cputype);
> -            $cputype = $custom_cpu->{'reported-model'} // $cpu_fmt->{'reported-model'}->{default};
> -        }
> -    }
> +    my ($cputype) = get_cpu_properties($cpu_prop_str, $arch);
>  
>      return $cputypes_32bit->{$cputype} ? 32 : 64 if $arch eq 'x86_64';
>      return 64 if $arch eq 'aarch64';



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch
  2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch Daniel Kral
  2025-09-02 11:26   ` Daniel Kral
@ 2025-09-05 10:50   ` Fiona Ebner
  2025-09-05 11:38     ` Daniel Kral
  1 sibling, 1 reply; 17+ messages in thread
From: Fiona Ebner @ 2025-09-05 10:50 UTC (permalink / raw)
  To: Proxmox VE development discussion, Daniel Kral

Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
> For certain host CPUs, such as Intel consumer-grade CPUs, there is a
> frequent mismatch between the CPU's physical address width and the

What do you mean by "frequent"? You already conditionalized with "For
certain host CPUs". Do you mean "the default IOMMU's address witdth"?

> IOMMU's address width.
> 
> If a virtual machine is setup with an intel-iommu device, qemu allocates
> and maps the (virtual) I/O address space (IOAS) for a VFIO passthrough
> device with iommufd.
> 
> In case of a mismatch of the address width of the host CPU and IOMMU
> CPU, the guest physical address space (GPAS) and memory-type range
> registers (MTRRs) are setup to the host CPU's address width, which
> causes IOAS to be allocated and mapped outside of the IOMMU's maximum
> guest address width (MGAW) and causes the following error from qemu (the
> error message is copied from the user forum [0]):
> 
>     kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
> 
> This error is rather confusing and unhelpful to users, so warn them
> about a CPU physical address width that exceeds the IOMMU address width.
> 
> [0] https://forum.proxmox.com/threads/vm-wont-start-with-pci-passthrough-after-upgrade-to-9-0.169586/page-3#post-795717
> 

After this commit, the test added by qemu-server 1/4 fails on my system:
not ok 51 - 'q35-viommu-intel-aw-bits.conf' - Check if aw-bits are
propagated correctly to intel-iommu device
#   Failed test ''q35-viommu-intel-aw-bits.conf' - Check if aw-bits are
propagated correctly to intel-iommu device'
#   at ./run_config2command_tests.pl line 599.
# got unexpected warning 'guest address width exceeds vIOMMU address
width: 40 > 39'

You'd need to mock the relevant parts to avoid querying the real host.

> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> I already talked about this with @Fiona off-list, but the code this
> adds to qemu-server only for a warning is quite a lot, but is more
> readable than the above error that is only issued when the VM is already
> run.
> 
> Particularily, I don't like the logic duplication of
> get_cpu_address_width(...), which tries to copy what
> target/i386/{,host-,kvm/kvm-}cpu.c do to retrieve the {,guest_}phys_bits
> value, where I'd rather see this implemented in pve-qemu as in [0].
> 
> There are two qemu and edk2 discussion threads that might help in
> deciding how to go with this patch [0] [1]. It could also be better to
> implement this downstream in pve-qemu for now similar to [0], or of
> course contribute to upstream with an actual fix.
> 
> [0] https://lore.kernel.org/qemu-devel/20250130115800.60b7cbe6.alex.williamson@redhat.com/
> [1] https://edk2.groups.io/g/devel/topic/patch_v1/102359124

To avoid all the complexity and maintainability burden to stay
compatible with how QEMU calculates, can we simply notify/warn users who
set aw-bits that they might need to set guest-phys-bits to the same
value too?

> @@ -133,6 +133,17 @@ sub assert_valid_machine_property {
>      }
>  }
>  
> +sub check_valid_iommu_address_width {
> +    my ($machine_conf, $machine_version, $cpu_aw_bits) = @_;
> +    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq 'intel') {
> +        my $iommu_aw_bits_default = min_version($machine_version, 9, 2) ? 48 : 39;
> +        my $iommu_aw_bits = $machine_conf->{'aw-bits'} // $iommu_aw_bits_default;
> +
> +        warn "guest address width exceeds vIOMMU address width: $cpu_aw_bits > $iommu_aw_bits\n"
> +            if $cpu_aw_bits && $iommu_aw_bits && $cpu_aw_bits > $iommu_aw_bits;

Should mention that it can be fixed by setting the guest-phys-bits
accordingly.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [RFC qemu-server v2 4/4] machine: warn intel-iommu users about too large address width
  2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 4/4] machine: warn intel-iommu users about too large address width Daniel Kral
@ 2025-09-05 10:55   ` Fiona Ebner
  0 siblings, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2025-09-05 10:55 UTC (permalink / raw)
  To: pve-devel, Daniel Kral

Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
> Similarily to the guest address width exceeding the vIOMMU address
> width, also warn users about the Intel vIOMMU address width being larger
> than the maximum allowed value, the maximum guest address width, as
> reported by the hardware.
> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> I only added this as it was relatively cheap to implement now, but the
> user will be stopped by starting the VM anyway by issuing something
> like:
> 
> kvm: -device vfio-pci,host=0000:00:02.0,id=hostpci0,bus=pci.0,addr=0x10:
>     vfio 0000:00:02.0: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39

I feel like we don't gain much by having an additional warning here. The
error from QEMU is already rather clear IMHO. If it's still cheap after
deciding which way we go with patch qemu-server 3/4, then we can still
go for it, but the warning should mention what the user can do to fix
the issue.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch
  2025-09-05 10:50   ` Fiona Ebner
@ 2025-09-05 11:38     ` Daniel Kral
  2025-09-05 12:52       ` Fiona Ebner
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Kral @ 2025-09-05 11:38 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion

On Fri Sep 5, 2025 at 12:50 PM CEST, Fiona Ebner wrote:
> Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
>> For certain host CPUs, such as Intel consumer-grade CPUs, there is a
>> frequent mismatch between the CPU's physical address width and the
>
> What do you mean by "frequent"? You already conditionalized with "For
> certain host CPUs". Do you mean "the default IOMMU's address witdth"?

Right, it should only be "For certain host CPUs".

The 'frequent' is referencing that it seems like these mismatches happen
most on Intel consumer-grade CPUs, but I'll remove that bit as it's only
anecdotal evidence from a few user reports and some tests I have done on
some machines. I haven't seen any AMD CPU where this was the case (yet).

>
>> IOMMU's address width.
>> 
>> If a virtual machine is setup with an intel-iommu device, qemu allocates
>> and maps the (virtual) I/O address space (IOAS) for a VFIO passthrough
>> device with iommufd.
>> 
>> In case of a mismatch of the address width of the host CPU and IOMMU
>> CPU, the guest physical address space (GPAS) and memory-type range
>> registers (MTRRs) are setup to the host CPU's address width, which
>> causes IOAS to be allocated and mapped outside of the IOMMU's maximum
>> guest address width (MGAW) and causes the following error from qemu (the
>> error message is copied from the user forum [0]):
>> 
>>     kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
>> 
>> This error is rather confusing and unhelpful to users, so warn them
>> about a CPU physical address width that exceeds the IOMMU address width.
>> 
>> [0] https://forum.proxmox.com/threads/vm-wont-start-with-pci-passthrough-after-upgrade-to-9-0.169586/page-3#post-795717
>> 
>
> After this commit, the test added by qemu-server 1/4 fails on my system:
> not ok 51 - 'q35-viommu-intel-aw-bits.conf' - Check if aw-bits are
> propagated correctly to intel-iommu device
> #   Failed test ''q35-viommu-intel-aw-bits.conf' - Check if aw-bits are
> propagated correctly to intel-iommu device'
> #   at ./run_config2command_tests.pl line 599.
> # got unexpected warning 'guest address width exceeds vIOMMU address
> width: 40 > 39'
>
> You'd need to mock the relevant parts to avoid querying the real host.

Sorry for missing that, will fix that!

>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> I already talked about this with @Fiona off-list, but the code this
>> adds to qemu-server only for a warning is quite a lot, but is more
>> readable than the above error that is only issued when the VM is already
>> run.
>> 
>> Particularily, I don't like the logic duplication of
>> get_cpu_address_width(...), which tries to copy what
>> target/i386/{,host-,kvm/kvm-}cpu.c do to retrieve the {,guest_}phys_bits
>> value, where I'd rather see this implemented in pve-qemu as in [0].
>> 
>> There are two qemu and edk2 discussion threads that might help in
>> deciding how to go with this patch [0] [1]. It could also be better to
>> implement this downstream in pve-qemu for now similar to [0], or of
>> course contribute to upstream with an actual fix.
>> 
>> [0] https://lore.kernel.org/qemu-devel/20250130115800.60b7cbe6.alex.williamson@redhat.com/
>> [1] https://edk2.groups.io/g/devel/topic/patch_v1/102359124
>
> To avoid all the complexity and maintainability burden to stay
> compatible with how QEMU calculates, can we simply notify/warn users who
> set aw-bits that they might need to set guest-phys-bits to the same
> value too?

Hm, the reason for this warning is for people that get the above
vfio_container_dma_map(...) error, which was happening before aw-bits
was increased from 39 to 48 bits with qemu 9.2 already.

Now that the default value for aw-bits is 48 bits, the people that have
less than 48 bits physical address width will set aw-bits more often, as
their machine cannot start anyway because of the fatal aw-bits > host
aw-bits error.

So we could go for that warning at all times, but that leave out users
who don't have aw-bits set (e.g. machine version set to < 9.2) or other
cases that could come in the future (e.g. when CPUs with 5-level paging
are more present)..

But I agree with you about the maintainability burden, so maybe we'll
just do a warning whenever aw-bits is set, then guest-phys-bits should
also be set to a value guest-phys-bits = aw-bits?

>
>> @@ -133,6 +133,17 @@ sub assert_valid_machine_property {
>>      }
>>  }
>>  
>> +sub check_valid_iommu_address_width {
>> +    my ($machine_conf, $machine_version, $cpu_aw_bits) = @_;
>> +    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq 'intel') {
>> +        my $iommu_aw_bits_default = min_version($machine_version, 9, 2) ? 48 : 39;
>> +        my $iommu_aw_bits = $machine_conf->{'aw-bits'} // $iommu_aw_bits_default;
>> +
>> +        warn "guest address width exceeds vIOMMU address width: $cpu_aw_bits > $iommu_aw_bits\n"
>> +            if $cpu_aw_bits && $iommu_aw_bits && $cpu_aw_bits > $iommu_aw_bits;
>
> Should mention that it can be fixed by setting the guest-phys-bits
> accordingly.

ACK we'll do that!


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option
  2025-09-05 10:07   ` Fiona Ebner
@ 2025-09-05 11:45     ` Daniel Kral
  2025-09-05 12:00       ` Fiona Ebner
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Kral @ 2025-09-05 11:45 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion

On Fri Sep 5, 2025 at 12:07 PM CEST, Fiona Ebner wrote:
> Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
>> Since QEMU 9.2 [0], the default I/O address space bit width was raised
>> from 39 bits to 48 bits for the Intel vIOMMU driver, which makes the
>> aw-bits check introduced in [1] to trip for host CPUs with less than 48
>
> s/to trip/fail/
>
>> bits physical address width from QEMU 9.2 onwards:
>> 
>> vfio 0000:XX:YY.Z: Failed to set vIOMMU: aw-bits 48 > host aw-bits 39
>> 
>> For VFIO devices where a vIOMMU is in-use, QEMU fetches the IOVA ranges
>> with the iommufd ioctl IOMMU_IOAS_IOVA_RANGES or the vfio_iommu_type1's
>> VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE info, so 'phys-bits' doesn't change
>> the behavior of the check.
>> 
>> Therefore, expose the 'aw-bits' option of the intel-iommu and
>> virtio-iommu QEMU drivers to allow users to set the value.
>> 
>> [0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
>> [1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/
>> 
>
> Nit: I'd prefer references to qemu commits rather than mails

ACK, but not sure in what format we reference external repos, to the
qemu-project gitlab or just the repo + commit hash + summary?

So

[0] https://gitlab.com/qemu-project/qemu/-/commit/ddd84fd0c1
[1] https://gitlab.com/qemu-project/qemu/-/commit/77f6efc0ab

or

[0] qemu ddd84fd0c1 ("intel_iommu: Set default aw_bits to 48 starting from QEMU 9.2")
[1] qemu 77f6efc0ab ("intel_iommu: Check compatibility with host IOMMU capabilities")

>
>> @@ -112,9 +122,14 @@ sub default_machine_for_arch {
>>  
>>  sub assert_valid_machine_property {
>>      my ($machine_conf) = @_;
>> -    my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
>> -    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel" && !$q35) {
>> -        die "to use Intel vIOMMU please set the machine type to q35\n";
>> +    if ($machine_conf->{viommu} && $machine_conf->{viommu} eq "intel") {
>> +        my $q35 = $machine_conf->{type} && ($machine_conf->{type} =~ m/q35/) ? 1 : 0;
>> +        die "to use Intel vIOMMU please set the machine type to q35\n" if !$q35;
>> +
>> +        die "Intel vIOMMU supports only 39 or 48 bits as address width\n"
>> +            if $machine_conf->{'aw-bits'}
>> +            && $machine_conf->{'aw-bits'} != 39
>> +            && $machine_conf->{'aw-bits'} != 48;
>>      }
>
> There should be an error (or at least warning) when aw-bits is set
> without setting a viommu.

ACK will add that


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values
  2025-09-05  9:10   ` Fiona Ebner
@ 2025-09-05 11:47     ` Daniel Kral
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Kral @ 2025-09-05 11:47 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion

On Fri Sep 5, 2025 at 11:10 AM CEST, Fiona Ebner wrote:
> I'd prefer to match whitespaces with \s* and also allow whitespaces at
> the end like is done in the regex for CPU cores. This is for
> future-proofing, because e.g. the file could get a new option with a
> long name for which additional tabs will be introduced.

ACK will do


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option
  2025-09-05 11:45     ` Daniel Kral
@ 2025-09-05 12:00       ` Fiona Ebner
  0 siblings, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2025-09-05 12:00 UTC (permalink / raw)
  To: Daniel Kral, Proxmox VE development discussion

Am 05.09.25 um 1:45 PM schrieb Daniel Kral:
> On Fri Sep 5, 2025 at 12:07 PM CEST, Fiona Ebner wrote:
>> Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
>>> Therefore, expose the 'aw-bits' option of the intel-iommu and
>>> virtio-iommu QEMU drivers to allow users to set the value.
>>>
>>> [0] https://lore.kernel.org/qemu-devel/20241212083757.605022-17-zhenzhong.duan@intel.com/
>>> [1] https://lore.kernel.org/qemu-devel/20240605083043.317831-18-zhenzhong.duan@intel.com/
>>>
>>
>> Nit: I'd prefer references to qemu commits rather than mails
> 
> ACK, but not sure in what format we reference external repos, to the
> qemu-project gitlab or just the repo + commit hash + summary?
> 
> So
> 
> [0] https://gitlab.com/qemu-project/qemu/-/commit/ddd84fd0c1
> [1] https://gitlab.com/qemu-project/qemu/-/commit/77f6efc0ab
> 
> or
> 
> [0] qemu ddd84fd0c1 ("intel_iommu: Set default aw_bits to 48 starting from QEMU 9.2")
> [1] qemu 77f6efc0ab ("intel_iommu: Check compatibility with host IOMMU capabilities")

I much prefer this as one can also see the commit title and as web
locations of repos are not stable long-term.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch
  2025-09-05 11:38     ` Daniel Kral
@ 2025-09-05 12:52       ` Fiona Ebner
  0 siblings, 0 replies; 17+ messages in thread
From: Fiona Ebner @ 2025-09-05 12:52 UTC (permalink / raw)
  To: Daniel Kral, Proxmox VE development discussion

Am 05.09.25 um 1:38 PM schrieb Daniel Kral:
> On Fri Sep 5, 2025 at 12:50 PM CEST, Fiona Ebner wrote:
>> Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
>>
>>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>>> ---
>>> I already talked about this with @Fiona off-list, but the code this
>>> adds to qemu-server only for a warning is quite a lot, but is more
>>> readable than the above error that is only issued when the VM is already
>>> run.
>>>
>>> Particularily, I don't like the logic duplication of
>>> get_cpu_address_width(...), which tries to copy what
>>> target/i386/{,host-,kvm/kvm-}cpu.c do to retrieve the {,guest_}phys_bits
>>> value, where I'd rather see this implemented in pve-qemu as in [0].
>>>
>>> There are two qemu and edk2 discussion threads that might help in
>>> deciding how to go with this patch [0] [1]. It could also be better to
>>> implement this downstream in pve-qemu for now similar to [0], or of
>>> course contribute to upstream with an actual fix.
>>>
>>> [0] https://lore.kernel.org/qemu-devel/20250130115800.60b7cbe6.alex.williamson@redhat.com/
>>> [1] https://edk2.groups.io/g/devel/topic/patch_v1/102359124
>>
>> To avoid all the complexity and maintainability burden to stay
>> compatible with how QEMU calculates, can we simply notify/warn users who
>> set aw-bits that they might need to set guest-phys-bits to the same
>> value too?
> 
> Hm, the reason for this warning is for people that get the above
> vfio_container_dma_map(...) error, which was happening before aw-bits
> was increased from 39 to 48 bits with qemu 9.2 already.
> 
> Now that the default value for aw-bits is 48 bits, the people that have
> less than 48 bits physical address width will set aw-bits more often, as
> their machine cannot start anyway because of the fatal aw-bits > host
> aw-bits error.
> 
> So we could go for that warning at all times, but that leave out users
> who don't have aw-bits set (e.g. machine version set to < 9.2) or other
> cases that could come in the future (e.g. when CPUs with 5-level paging
> are more present)..
> 
> But I agree with you about the maintainability burden, so maybe we'll
> just do a warning whenever aw-bits is set, then guest-phys-bits should
> also be set to a value guest-phys-bits = aw-bits?

Ah, I wasn't aware this issue could also happen without aw-bits set.

As discussed off-list:

The simple notice/warning when aw-bits is set (and vfio is used) would
still catch most newly affected people. Would be nice to have the
aw-bits feature available, so that users can work around the regression.

The other warning is best done in QEMU itself and it just seems like
there was no follow-up series for that yet [0]. We could also go ahead
and apply/backport the warning [1] ourselves without waiting for
upstream. Still would be good to briefly ask the author if this is still
planned or if it should/can be picked up.

[0]:
https://lore.kernel.org/qemu-devel/20250206131438.1505542-1-clg@redhat.com/
[1]:
https://lore.kernel.org/qemu-devel/20250130134346.1754143-9-clg@redhat.com/


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-09-05 12:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-02 11:21 [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378 Daniel Kral
2025-09-02 11:21 ` [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values Daniel Kral
2025-09-05  9:10   ` Fiona Ebner
2025-09-05 11:47     ` Daniel Kral
2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option Daniel Kral
2025-09-05 10:07   ` Fiona Ebner
2025-09-05 11:45     ` Daniel Kral
2025-09-05 12:00       ` Fiona Ebner
2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 2/4] cpu config: factor out gathering common cpu properties Daniel Kral
2025-09-05 10:32   ` Fiona Ebner
2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch Daniel Kral
2025-09-02 11:26   ` Daniel Kral
2025-09-05 10:50   ` Fiona Ebner
2025-09-05 11:38     ` Daniel Kral
2025-09-05 12:52       ` Fiona Ebner
2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 4/4] machine: warn intel-iommu users about too large address width Daniel Kral
2025-09-05 10:55   ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal