From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id D09FB1FF16F for ; Tue, 2 Sep 2025 13:23:41 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F02D11109D; Tue, 2 Sep 2025 13:23:45 +0200 (CEST) From: Daniel Kral To: pve-devel@lists.proxmox.com Date: Tue, 2 Sep 2025 13:22:00 +0200 Message-ID: <20250902112307.124706-5-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250902112307.124706-1-d.kral@proxmox.com> References: <20250902112307.124706-1-d.kral@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1756812177041 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.014 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" For certain host CPUs, such as Intel consumer-grade CPUs, there is a frequent mismatch between the CPU's physical address width and the IOMMU's address width. If a virtual machine is setup with an intel-iommu device, qemu allocates and maps the (virtual) I/O address space (IOAS) for a VFIO passthrough device with iommufd. In case of a mismatch of the address width of the host CPU and IOMMU CPU, the guest physical address space (GPAS) and memory-type range registers (MTRRs) are setup to the host CPU's address width, which causes IOAS to be allocated and mapped outside of the IOMMU's maximum guest address width (MGAW) and causes the following error from qemu (the error message is copied from the user forum [0]): kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument) This error is rather confusing and unhelpful to users, so warn them about a CPU physical address width that exceeds the IOMMU address width. [0] https://forum.proxmox.com/threads/vm-wont-start-with-pci-passthrough-after-upgrade-to-9-0.169586/page-3#post-795717 Signed-off-by: Daniel Kral --- I already talked about this with @Fiona off-list, but the code this adds to qemu-server only for a warning is quite a lot, but is more readable than the above error that is only issued when the VM is already run. Particularily, I don't like the logic duplication of get_cpu_address_width(...), which tries to copy what target/i386/{,host-,kvm/kvm-}cpu.c do to retrieve the {,guest_}phys_bits value, where I'd rather see this implemented in pve-qemu as in [0]. There are two qemu and edk2 discussion threads that might help in deciding how to go with this patch [0] [1]. It could also be better to implement this downstream in pve-qemu for now similar to [0], or of course contribute to upstream with an actual fix. [0] https://lore.kernel.org/qemu-devel/20250130115800.60b7cbe6.alex.williamson@redhat.com/ [1] https://edk2.groups.io/g/devel/topic/patch_v1/102359124 src/PVE/QemuServer.pm | 7 ++- src/PVE/QemuServer/CPUConfig.pm | 46 +++++++++++++++++-- src/PVE/QemuServer/Machine.pm | 13 +++++- .../q35-viommu-intel-exceeding-aw-bits.conf | 4 ++ ...35-viommu-intel-exceeding-aw-bits.conf.cmd | 25 ++++++++++ 5 files changed, 88 insertions(+), 7 deletions(-) create mode 100644 src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf create mode 100644 src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm index 04e988c7..6d31bf40 100644 --- a/src/PVE/QemuServer.pm +++ b/src/PVE/QemuServer.pm @@ -61,7 +61,7 @@ use PVE::QemuServer::Helpers use PVE::QemuServer::Cloudinit; use PVE::QemuServer::CGroup; use PVE::QemuServer::CPUConfig - qw(print_cpu_device get_cpu_options get_cpu_bitness is_native_arch get_amd_sev_object get_amd_sev_type); + qw(print_cpu_device get_cpu_options get_cpu_bitness get_cpu_address_width is_native_arch get_amd_sev_object get_amd_sev_type); use PVE::QemuServer::Drive qw( is_valid_drivename checked_volume_format @@ -3901,6 +3901,11 @@ sub config_to_command { push @$machineFlags, "type=${machine_type_min}"; PVE::QemuServer::Machine::assert_valid_machine_property($machine_conf); + PVE::QemuServer::Machine::check_valid_iommu_address_width( + $machine_conf, + $machine_version, + get_cpu_address_width($conf->{cpu}, $arch, $cpuinfo->{phys_bits}), + ); if (my $viommu = $machine_conf->{viommu}) { my $viommu_devstr = ''; diff --git a/src/PVE/QemuServer/CPUConfig.pm b/src/PVE/QemuServer/CPUConfig.pm index f57275dd..4671ead9 100644 --- a/src/PVE/QemuServer/CPUConfig.pm +++ b/src/PVE/QemuServer/CPUConfig.pm @@ -16,6 +16,7 @@ our @EXPORT_OK = qw( print_cpu_device get_cpu_options get_cpu_bitness + get_cpu_address_width is_native_arch get_amd_sev_object get_amd_sev_type @@ -681,8 +682,21 @@ sub get_cpu_options { $pve_forced_flags, ); + my $phys_bits_options = get_cpu_phys_bits_options($cpu, $custom_cpu); + for my $key (sort keys %$phys_bits_options) { + $cpu_str .= ",$key=$phys_bits_options->{$key}"; + } + + return ('-cpu', $cpu_str); +} + +sub get_cpu_phys_bits_options { + my ($cpu, $custom_cpu) = @_; + + my $phys_bits_options = {}; + for my $phys_bits_opt (qw(guest-phys-bits phys-bits)) { - my $phys_bits = ''; + my ($key, $value) = ($phys_bits_opt, undef); foreach my $conf ($custom_cpu, $cpu) { next if !defined($conf); my $conf_val = $conf->{$phys_bits_opt}; @@ -690,15 +704,15 @@ sub get_cpu_options { if ($conf_val eq 'host') { die "unexpected value 'host' for guest-phys-bits" if $phys_bits_opt eq 'guest-phys-bits'; - $phys_bits = ",host-phys-bits=true"; + ($key, $value) = ('host-phys-bits', 'true'); } else { - $phys_bits = ",${phys_bits_opt}=${conf_val}"; + $value = $conf_val; } } - $cpu_str .= $phys_bits; + $phys_bits_options->{$key} = $value if $value; } - return ('-cpu', $cpu_str); + return $phys_bits_options; } # Some hardcoded flags required by certain configurations @@ -844,6 +858,28 @@ sub get_cpu_bitness { die "unsupported architecture '$arch'\n"; } +sub get_cpu_address_width { + my ($cpu_prop_str, $arch, $host_phys_bits) = @_; + + $arch //= get_host_arch(); + + my ($cputype, $cpu, $custom_cpu) = get_cpu_properties($cpu_prop_str, $arch); + my $phys_bits_options = get_cpu_phys_bits_options($cpu, $custom_cpu); + my ($phys_bits, $guest_phys_bits) = $phys_bits_options->@{qw(phys-bits guest-phys-bits)}; + + my $cpu_aw_bits = 0; + $cpu_aw_bits = $guest_phys_bits if $guest_phys_bits; + $cpu_aw_bits = $phys_bits if $phys_bits && $cpu_aw_bits > $phys_bits; + $cpu_aw_bits = $phys_bits if $phys_bits && !$cpu_aw_bits; + $cpu_aw_bits = $host_phys_bits if $host_phys_bits && !$cpu_aw_bits; + $cpu_aw_bits = 40 if !$cpu_aw_bits; # fallback to TCG_PHYS_ADDR_BITS + + return int($cpu_aw_bits) if $arch eq 'x86_64'; + return undef if $arch eq 'aarch64'; + + die "unsupported architecture '$arch'\n"; +} + sub get_hw_capabilities { # Get reduced-phys-bits & cbitpos from host-hw-capabilities.json # TODO: Find better location than /run/qemu-server/ diff --git a/src/PVE/QemuServer/Machine.pm b/src/PVE/QemuServer/Machine.pm index 57d583c2..c083a27b 100644 --- a/src/PVE/QemuServer/Machine.pm +++ b/src/PVE/QemuServer/Machine.pm @@ -3,7 +3,7 @@ package PVE::QemuServer::Machine; use strict; use warnings; -use PVE::QemuServer::Helpers; +use PVE::QemuServer::Helpers qw(min_version); use PVE::QemuServer::MetaInfo; use PVE::QemuServer::Monitor; use PVE::JSONSchema qw(get_standard_option parse_property_string print_property_string); @@ -133,6 +133,17 @@ sub assert_valid_machine_property { } } +sub check_valid_iommu_address_width { + my ($machine_conf, $machine_version, $cpu_aw_bits) = @_; + if ($machine_conf->{viommu} && $machine_conf->{viommu} eq 'intel') { + my $iommu_aw_bits_default = min_version($machine_version, 9, 2) ? 48 : 39; + my $iommu_aw_bits = $machine_conf->{'aw-bits'} // $iommu_aw_bits_default; + + warn "guest address width exceeds vIOMMU address width: $cpu_aw_bits > $iommu_aw_bits\n" + if $cpu_aw_bits && $iommu_aw_bits && $cpu_aw_bits > $iommu_aw_bits; + } +} + sub machine_type_is_q35 { my ($conf) = @_; diff --git a/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf new file mode 100644 index 00000000..d6cff715 --- /dev/null +++ b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf @@ -0,0 +1,4 @@ +# TEST: Check if exceeding guest-phys-bits > iommu aw-bits is correctly warned about +# EXPECT_WARN: guest address width exceeds vIOMMU address width: 46 > 39 +cpu: host,guest-phys-bits=46 +machine: q35,viommu=intel,aw-bits=39 diff --git a/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd new file mode 100644 index 00000000..0ec488ae --- /dev/null +++ b/src/test/cfg2cmd/q35-viommu-intel-exceeding-aw-bits.conf.cmd @@ -0,0 +1,25 @@ +/usr/bin/kvm \ + -id 8006 \ + -name 'vm8006,debug-threads=on' \ + -no-shutdown \ + -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \ + -mon 'chardev=qmp,mode=control' \ + -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000' \ + -mon 'chardev=qmp-event,mode=control' \ + -pidfile /var/run/qemu-server/8006.pid \ + -daemonize \ + -smp '1,sockets=1,cores=1,maxcpus=1' \ + -nodefaults \ + -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \ + -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \ + -cpu 'host,+kvm_pv_eoi,+kvm_pv_unhalt,guest-phys-bits=46' \ + -m 512 \ + -global 'ICH9-LPC.disable_s3=1' \ + -global 'ICH9-LPC.disable_s4=1' \ + -device 'intel-iommu,intremap=on,caching-mode=on,aw-bits=39' \ + -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \ + -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \ + -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \ + -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \ + -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \ + -machine 'type=q35+pve0,kernel-irqchip=split' -- 2.47.2 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel