From: Fiona Ebner <f.ebner@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH qemu-server v2] fix #7711: pci: try to detect large memory region preconditions
Date: Fri, 3 Jul 2026 17:18:19 +0200 [thread overview]
Message-ID: <9fee4d26-467f-4db0-8322-1ca073200bb1@proxmox.com> (raw)
In-Reply-To: <20260702100320.1937512-1-d.csapak@proxmox.com>
Am 02.07.26 um 12:03 PM schrieb Dominik Csapak:
> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> index cdf66e89..cf7be99d 100644
> --- a/src/PVE/QemuServer.pm
> +++ b/src/PVE/QemuServer.pm
> @@ -5676,12 +5676,21 @@ sub vm_start_nolock {
> PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, $start_timeout);
>
> my $uuid;
> + my $need_min_phys_bits = 0;
> for my $id (sort keys %$pci_devices) {
> my $d = $pci_devices->{$id};
> my ($index) = ($id =~ m/^hostpci(\d+)$/);
>
> my $chosen_mdev;
> for my $dev ($d->{ids}->@*) {
> + my $bits =
Nit: too generic name inside such a long function. Maybe $phys_bits_for_dev?
> + eval { PVE::QemuServer::PCI::min_phys_bits_needed($conf, $dev->{id}) };
> + warn "could not determine needed MMIO size: $@\n" if $@;
I'd also log for which device for completeness
> +
> + if (defined($bits) && $bits > $need_min_phys_bits) {
> + $need_min_phys_bits = $bits;
> + }
> +
> my $info =
> eval { PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $index, $d) };
Style nit: that line actually goes to column 101 and make tidy doesn't
catch it. I'd suggest:
my $info = eval {
foo(...);
};
> if ($d->{mdev} || $d->{nvidia}) {
> @@ -5707,6 +5716,20 @@ sub vm_start_nolock {
> }
> }
> push @$cmd, '-uuid', $uuid if defined($uuid);
> +
> + if ($need_min_phys_bits > 0) {
> + my $size = render_bytes(2**$need_min_phys_bits);
> + my $warn_text =
> + "A PCI device with a large memory region (e.g. VRAM) was detected, but VM"
> + . " is not configured for a big enough address space for OVMF (needs $size). Consider"
> + . " enabling CPU type 'host' or setting 'phys-bits' (at least $need_min_phys_bits).";
> +
Style nit: does string lines are also too long.
> + if ($need_min_phys_bits > 40) {
> + $warn_text .= " You also need to set the 'pdpe1gb' flag.";
> + }
> +
> + log_warn($warn_text);
> + }
> };
> if (my $err = $@) {
> eval { PVE::Storage::deactivate_volumes($storecfg, $vollist); };
> diff --git a/src/PVE/QemuServer/PCI.pm b/src/PVE/QemuServer/PCI.pm
> index 0b67943c..4b5c9681 100644
> --- a/src/PVE/QemuServer/PCI.pm
> +++ b/src/PVE/QemuServer/PCI.pm
> @@ -10,6 +10,7 @@ use PVE::Mapping::PCI;
> use PVE::SysFSTools;
> use PVE::Tools;
>
> +use PVE::QemuServer::CPUConfig;
> use PVE::QemuServer::Helpers;
> use PVE::QemuServer::Machine;
> use PVE::QemuServer::PCI::Mdev;
> @@ -871,4 +872,95 @@ sub reserve_pci_usage {
> die $@ if $@;
> }
>
> +# Returns the size of biggest memory region for a PCI device in bytes
> +# This can be used to check if the config is correct for having an MMIO size that is large enough.
Style nit: POD would be nice for new functions going forward (same for
the fuction below)
> +sub get_biggest_memory_region {
> + my ($pci_id) = @_;
> +
> + $pci_id = PVE::SysFSTools::normalize_pci_id($pci_id);
> +
> + # read resource regions from sysfs
> + my $resource_file = "/sys/bus/pci/devices/$pci_id/resource";
> + my $regions = PVE::Tools::file_get_contents($resource_file);
Please use PVE::File for new usages
> +
> + # for each line parse start/end/flags.
> + my $biggest_size = 0;
> + for my $line (split('\n', $regions)) {
> + if ($line =~ m/^0x([a-f0-9]{16})\s0x([a-f0-9]{16})\s0x([a-f0-9]{16})$/i) {
> + # avoid warning when parsing long hex values with hex()
> + no warnings 'portable'; # Support for 64-bit ints required
> +
> + my $start = hex($1);
> + my $end = hex($2);
> + my $flags = hex($3);
> +
> + # find largest memory region with 'IORESOURCE_MEM' flag (see include/linux/ioport.h in kernel source)
Style nit: comment too long
> + if (($flags & 0x200) != 0) {
> + # $end is inclusive, so + 1 for the overall size
> + my $size = $end - $start + 1;
> + if ($size > $biggest_size) {
> + $biggest_size = $size;
> + }
> + }
> + }
> + }
> +
> + return $biggest_size;
> +}
> +
> +# returns the minimum phys-bits value that needs to be configured so that the MMIO size is enough.
> +# For PCI devices with memory regions > 16G, the vm either has to:
> +# * boot with seabios
> +# * use 'host' type cpu
> +# * use high enough 'phys-bits' value (or 'host') and (possibly) 'pdpe1gb'
> +#
> +# return 0 if vm config does have enough MMIO space configured already
> +sub min_phys_bits_needed {
> + my ($conf, $pci_id) = @_;
> +
> + return 0 if ($conf->{bios} // 'seabios') eq 'seabios';
> +
> + my $size = get_biggest_memory_region($pci_id);
> +
> + return 0 if $size <= 16 * 1024 * 1024 * 1024;
> +
> + # calculate the needed phys bits. The MMIO size needs to be larger than $size,
> + # so the bits needed must be bigger than log2($size)) + 3. So simply add a bit after truncating.
Nit: additional parenthesis in the comment
> + # edk2 limits the mmio space to an eigth of the overall space. (PhysMemAddressWidth - 3)
> + #
> + # see edk2 source code: OvmfPkg/Library/PlatformInitLib/MemDetect.c
> +
> + my $needed_phys_bits = int((log($size) / log(2))) + 4;
It might work out in practice (at least for exact powers of 2, it can be
wrong for others, see below), but I still feel a bit uneasy about having
a floating point calculation for this.
[I] root@pve9a1 ~# perl -e 'my $foo = shift; print int(log($foo)/log(2))
. "\n"' $(expr 1024 '*' 1024 '*' 1024 '*' 1024 '*' 256)
48
[I] root@pve9a1 ~# perl -e 'my $foo = shift; print int(log($foo)/log(2))
. "\n"' $(expr 1024 '*' 1024 '*' 1024 '*' 1024 '*' 256 '-' 1)
48
Apparently, sprintf is pretty good for this:
https://perlmonks.org/?node_id=11158298
> +
> + return $needed_phys_bits if !defined($conf->{cpu});
> +
> + my $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $conf->{cpu})
> + or die "Cannot parse cpu description: $conf->{cpu}\n";
> +
> + my $cpu_type = $cpu->{cputype} // '';
> +
> + return 0 if $cpu_type eq 'host';
> +
> + if (my $phys_bits = $cpu->{'phys-bits'}) {
> + return 0 if $phys_bits eq 'host';
> + # if it's not 'host' it must be a number between 8 and 64
> +
> + # Same as for 'check_phys_bits_above_40_compat', we'd need CPU model expansion, but this is
> + # not cheap to get. Check the pdpe1gb flag only for qemu64/kvm64 cpu types.
I'd mention that most other models do already have the flag.
> + if (
> + ($cpu_type eq 'qemu64' || $cpu_type eq 'kvm64')
> + && ($cpu->{flags} // '') !~ m/\+pdpe1gb/
> + ) {
> + # edk2 limits the phys bits to 40 in case of no 1gb pages
> + #
> + # see edk2 source code: OvmfPkg/Library/PlatformInitLib/MemDetect.c
> + $phys_bits = 40 if $phys_bits > 40;
> + }
> +
> + return 0 if $phys_bits >= $needed_phys_bits;
> + }
> +
> + return $needed_phys_bits;
> +}
> +
> 1;
prev parent reply other threads:[~2026-07-03 15:18 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-02 9:58 [PATCH qemu-server v2] fix #7711: pci: try to detect large memory region preconditions Dominik Csapak
2026-07-03 15:18 ` Fiona Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9fee4d26-467f-4db0-8322-1ca073200bb1@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.