From: "Daniel Kral" <d.kral@proxmox.com>
To: "Fiona Ebner" <f.ebner@proxmox.com>,
"Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch
Date: Fri, 05 Sep 2025 13:38:45 +0200 [thread overview]
Message-ID: <DCKU5N4L1IXW.2NZABFDZFQAEE@proxmox.com> (raw)
In-Reply-To: <828b157d-48d4-4540-9d87-2bbfb2865045@proxmox.com>
On Fri Sep 5, 2025 at 12:50 PM CEST, Fiona Ebner wrote:
> Am 02.09.25 um 1:23 PM schrieb Daniel Kral:
>> For certain host CPUs, such as Intel consumer-grade CPUs, there is a
>> frequent mismatch between the CPU's physical address width and the
>
> What do you mean by "frequent"? You already conditionalized with "For
> certain host CPUs". Do you mean "the default IOMMU's address witdth"?
Right, it should only be "For certain host CPUs".
The 'frequent' is referencing that it seems like these mismatches happen
most on Intel consumer-grade CPUs, but I'll remove that bit as it's only
anecdotal evidence from a few user reports and some tests I have done on
some machines. I haven't seen any AMD CPU where this was the case (yet).
>
>> IOMMU's address width.
>>
>> If a virtual machine is setup with an intel-iommu device, qemu allocates
>> and maps the (virtual) I/O address space (IOAS) for a VFIO passthrough
>> device with iommufd.
>>
>> In case of a mismatch of the address width of the host CPU and IOMMU
>> CPU, the guest physical address space (GPAS) and memory-type range
>> registers (MTRRs) are setup to the host CPU's address width, which
>> causes IOAS to be allocated and mapped outside of the IOMMU's maximum
>> guest address width (MGAW) and causes the following error from qemu (the
>> error message is copied from the user forum [0]):
>>
>> kvm: vfio_container_dma_map(0x5c9222494280, 0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
>>
>> This error is rather confusing and unhelpful to users, so warn them
>> about a CPU physical address width that exceeds the IOMMU address width.
>>
>> [0] https://forum.proxmox.com/threads/vm-wont-start-with-pci-passthrough-after-upgrade-to-9-0.169586/page-3#post-795717
>>
>
> After this commit, the test added by qemu-server 1/4 fails on my system:
> not ok 51 - 'q35-viommu-intel-aw-bits.conf' - Check if aw-bits are
> propagated correctly to intel-iommu device
> # Failed test ''q35-viommu-intel-aw-bits.conf' - Check if aw-bits are
> propagated correctly to intel-iommu device'
> # at ./run_config2command_tests.pl line 599.
> # got unexpected warning 'guest address width exceeds vIOMMU address
> width: 40 > 39'
>
> You'd need to mock the relevant parts to avoid querying the real host.
Sorry for missing that, will fix that!
>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> I already talked about this with @Fiona off-list, but the code this
>> adds to qemu-server only for a warning is quite a lot, but is more
>> readable than the above error that is only issued when the VM is already
>> run.
>>
>> Particularily, I don't like the logic duplication of
>> get_cpu_address_width(...), which tries to copy what
>> target/i386/{,host-,kvm/kvm-}cpu.c do to retrieve the {,guest_}phys_bits
>> value, where I'd rather see this implemented in pve-qemu as in [0].
>>
>> There are two qemu and edk2 discussion threads that might help in
>> deciding how to go with this patch [0] [1]. It could also be better to
>> implement this downstream in pve-qemu for now similar to [0], or of
>> course contribute to upstream with an actual fix.
>>
>> [0] https://lore.kernel.org/qemu-devel/20250130115800.60b7cbe6.alex.williamson@redhat.com/
>> [1] https://edk2.groups.io/g/devel/topic/patch_v1/102359124
>
> To avoid all the complexity and maintainability burden to stay
> compatible with how QEMU calculates, can we simply notify/warn users who
> set aw-bits that they might need to set guest-phys-bits to the same
> value too?
Hm, the reason for this warning is for people that get the above
vfio_container_dma_map(...) error, which was happening before aw-bits
was increased from 39 to 48 bits with qemu 9.2 already.
Now that the default value for aw-bits is 48 bits, the people that have
less than 48 bits physical address width will set aw-bits more often, as
their machine cannot start anyway because of the fatal aw-bits > host
aw-bits error.
So we could go for that warning at all times, but that leave out users
who don't have aw-bits set (e.g. machine version set to < 9.2) or other
cases that could come in the future (e.g. when CPUs with 5-level paging
are more present)..
But I agree with you about the maintainability burden, so maybe we'll
just do a warning whenever aw-bits is set, then guest-phys-bits should
also be set to a value guest-phys-bits = aw-bits?
>
>> @@ -133,6 +133,17 @@ sub assert_valid_machine_property {
>> }
>> }
>>
>> +sub check_valid_iommu_address_width {
>> + my ($machine_conf, $machine_version, $cpu_aw_bits) = @_;
>> + if ($machine_conf->{viommu} && $machine_conf->{viommu} eq 'intel') {
>> + my $iommu_aw_bits_default = min_version($machine_version, 9, 2) ? 48 : 39;
>> + my $iommu_aw_bits = $machine_conf->{'aw-bits'} // $iommu_aw_bits_default;
>> +
>> + warn "guest address width exceeds vIOMMU address width: $cpu_aw_bits > $iommu_aw_bits\n"
>> + if $cpu_aw_bits && $iommu_aw_bits && $cpu_aw_bits > $iommu_aw_bits;
>
> Should mention that it can be fixed by setting the guest-phys-bits
> accordingly.
ACK we'll do that!
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-09-05 11:39 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-02 11:21 [pve-devel] [PATCH common/qemu-server v2 0/5] fix issues with viommu+vfio passthrough in #6608, #6378 Daniel Kral
2025-09-02 11:21 ` [pve-devel] [PATCH common v2 1/1] procfs: cpuinfo: expose x86_phys_bits and x86_virt_bits values Daniel Kral
2025-09-05 9:10 ` Fiona Ebner
2025-09-05 11:47 ` Daniel Kral
2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 1/4] fix #6608: expose viommu driver aw-bits option Daniel Kral
2025-09-05 10:07 ` Fiona Ebner
2025-09-05 11:45 ` Daniel Kral
2025-09-05 12:00 ` Fiona Ebner
2025-09-05 14:18 ` Daniel Kral
2025-09-02 11:21 ` [pve-devel] [PATCH qemu-server v2 2/4] cpu config: factor out gathering common cpu properties Daniel Kral
2025-09-05 10:32 ` Fiona Ebner
2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 3/4] fix #6378 (continued): warn intel-iommu users about iommu and host aw bits mismatch Daniel Kral
2025-09-02 11:26 ` Daniel Kral
2025-09-05 10:50 ` Fiona Ebner
2025-09-05 11:38 ` Daniel Kral [this message]
2025-09-05 12:52 ` Fiona Ebner
2025-09-02 11:22 ` [pve-devel] [RFC qemu-server v2 4/4] machine: warn intel-iommu users about too large address width Daniel Kral
2025-09-05 10:55 ` Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DCKU5N4L1IXW.2NZABFDZFQAEE@proxmox.com \
--to=d.kral@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox