From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate001.proxmox.com (gate001.proxmox.com [IPv6:2a0f:8001:1:32::40]) by lore.proxmox.com (Postfix) with ESMTPS id 956351FF142 for ; Fri, 03 Jul 2026 17:18:56 +0200 (CEST) Received: from gate001.proxmox.com (localhost.localdomain [127.0.0.1]) by gate001.proxmox.com (Proxmox) with ESMTP id 946E121375; Fri, 03 Jul 2026 17:18:55 +0200 (CEST) Message-ID: <9fee4d26-467f-4db0-8322-1ca073200bb1@proxmox.com> Date: Fri, 3 Jul 2026 17:18:19 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH qemu-server v2] fix #7711: pci: try to detect large memory region preconditions To: Dominik Csapak , pve-devel@lists.proxmox.com References: <20260702100320.1937512-1-d.csapak@proxmox.com> Content-Language: en-US From: Fiona Ebner In-Reply-To: <20260702100320.1937512-1-d.csapak@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1783091891766 X-SPAM-LEVEL: Spam detection results: 1 AWL -0.994 Adjusted score from AWL reputation of From: address DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment (newer systems) KAM_MAILER 2 Automated Mailer Tag Left in Email SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: OZ33NHH3DEU5ZB6VSJ3KIKELHZA3RI5X X-Message-ID-Hash: OZ33NHH3DEU5ZB6VSJ3KIKELHZA3RI5X X-MailFrom: f.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 02.07.26 um 12:03 PM schrieb Dominik Csapak: > diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm > index cdf66e89..cf7be99d 100644 > --- a/src/PVE/QemuServer.pm > +++ b/src/PVE/QemuServer.pm > @@ -5676,12 +5676,21 @@ sub vm_start_nolock { > PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, $start_timeout); > > my $uuid; > + my $need_min_phys_bits = 0; > for my $id (sort keys %$pci_devices) { > my $d = $pci_devices->{$id}; > my ($index) = ($id =~ m/^hostpci(\d+)$/); > > my $chosen_mdev; > for my $dev ($d->{ids}->@*) { > + my $bits = Nit: too generic name inside such a long function. Maybe $phys_bits_for_dev? > + eval { PVE::QemuServer::PCI::min_phys_bits_needed($conf, $dev->{id}) }; > + warn "could not determine needed MMIO size: $@\n" if $@; I'd also log for which device for completeness > + > + if (defined($bits) && $bits > $need_min_phys_bits) { > + $need_min_phys_bits = $bits; > + } > + > my $info = > eval { PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $index, $d) }; Style nit: that line actually goes to column 101 and make tidy doesn't catch it. I'd suggest: my $info = eval { foo(...); }; > if ($d->{mdev} || $d->{nvidia}) { > @@ -5707,6 +5716,20 @@ sub vm_start_nolock { > } > } > push @$cmd, '-uuid', $uuid if defined($uuid); > + > + if ($need_min_phys_bits > 0) { > + my $size = render_bytes(2**$need_min_phys_bits); > + my $warn_text = > + "A PCI device with a large memory region (e.g. VRAM) was detected, but VM" > + . " is not configured for a big enough address space for OVMF (needs $size). Consider" > + . " enabling CPU type 'host' or setting 'phys-bits' (at least $need_min_phys_bits)."; > + Style nit: does string lines are also too long. > + if ($need_min_phys_bits > 40) { > + $warn_text .= " You also need to set the 'pdpe1gb' flag."; > + } > + > + log_warn($warn_text); > + } > }; > if (my $err = $@) { > eval { PVE::Storage::deactivate_volumes($storecfg, $vollist); }; > diff --git a/src/PVE/QemuServer/PCI.pm b/src/PVE/QemuServer/PCI.pm > index 0b67943c..4b5c9681 100644 > --- a/src/PVE/QemuServer/PCI.pm > +++ b/src/PVE/QemuServer/PCI.pm > @@ -10,6 +10,7 @@ use PVE::Mapping::PCI; > use PVE::SysFSTools; > use PVE::Tools; > > +use PVE::QemuServer::CPUConfig; > use PVE::QemuServer::Helpers; > use PVE::QemuServer::Machine; > use PVE::QemuServer::PCI::Mdev; > @@ -871,4 +872,95 @@ sub reserve_pci_usage { > die $@ if $@; > } > > +# Returns the size of biggest memory region for a PCI device in bytes > +# This can be used to check if the config is correct for having an MMIO size that is large enough. Style nit: POD would be nice for new functions going forward (same for the fuction below) > +sub get_biggest_memory_region { > + my ($pci_id) = @_; > + > + $pci_id = PVE::SysFSTools::normalize_pci_id($pci_id); > + > + # read resource regions from sysfs > + my $resource_file = "/sys/bus/pci/devices/$pci_id/resource"; > + my $regions = PVE::Tools::file_get_contents($resource_file); Please use PVE::File for new usages > + > + # for each line parse start/end/flags. > + my $biggest_size = 0; > + for my $line (split('\n', $regions)) { > + if ($line =~ m/^0x([a-f0-9]{16})\s0x([a-f0-9]{16})\s0x([a-f0-9]{16})$/i) { > + # avoid warning when parsing long hex values with hex() > + no warnings 'portable'; # Support for 64-bit ints required > + > + my $start = hex($1); > + my $end = hex($2); > + my $flags = hex($3); > + > + # find largest memory region with 'IORESOURCE_MEM' flag (see include/linux/ioport.h in kernel source) Style nit: comment too long > + if (($flags & 0x200) != 0) { > + # $end is inclusive, so + 1 for the overall size > + my $size = $end - $start + 1; > + if ($size > $biggest_size) { > + $biggest_size = $size; > + } > + } > + } > + } > + > + return $biggest_size; > +} > + > +# returns the minimum phys-bits value that needs to be configured so that the MMIO size is enough. > +# For PCI devices with memory regions > 16G, the vm either has to: > +# * boot with seabios > +# * use 'host' type cpu > +# * use high enough 'phys-bits' value (or 'host') and (possibly) 'pdpe1gb' > +# > +# return 0 if vm config does have enough MMIO space configured already > +sub min_phys_bits_needed { > + my ($conf, $pci_id) = @_; > + > + return 0 if ($conf->{bios} // 'seabios') eq 'seabios'; > + > + my $size = get_biggest_memory_region($pci_id); > + > + return 0 if $size <= 16 * 1024 * 1024 * 1024; > + > + # calculate the needed phys bits. The MMIO size needs to be larger than $size, > + # so the bits needed must be bigger than log2($size)) + 3. So simply add a bit after truncating. Nit: additional parenthesis in the comment > + # edk2 limits the mmio space to an eigth of the overall space. (PhysMemAddressWidth - 3) > + # > + # see edk2 source code: OvmfPkg/Library/PlatformInitLib/MemDetect.c > + > + my $needed_phys_bits = int((log($size) / log(2))) + 4; It might work out in practice (at least for exact powers of 2, it can be wrong for others, see below), but I still feel a bit uneasy about having a floating point calculation for this. [I] root@pve9a1 ~# perl -e 'my $foo = shift; print int(log($foo)/log(2)) . "\n"' $(expr 1024 '*' 1024 '*' 1024 '*' 1024 '*' 256) 48 [I] root@pve9a1 ~# perl -e 'my $foo = shift; print int(log($foo)/log(2)) . "\n"' $(expr 1024 '*' 1024 '*' 1024 '*' 1024 '*' 256 '-' 1) 48 Apparently, sprintf is pretty good for this: https://perlmonks.org/?node_id=11158298 > + > + return $needed_phys_bits if !defined($conf->{cpu}); > + > + my $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $conf->{cpu}) > + or die "Cannot parse cpu description: $conf->{cpu}\n"; > + > + my $cpu_type = $cpu->{cputype} // ''; > + > + return 0 if $cpu_type eq 'host'; > + > + if (my $phys_bits = $cpu->{'phys-bits'}) { > + return 0 if $phys_bits eq 'host'; > + # if it's not 'host' it must be a number between 8 and 64 > + > + # Same as for 'check_phys_bits_above_40_compat', we'd need CPU model expansion, but this is > + # not cheap to get. Check the pdpe1gb flag only for qemu64/kvm64 cpu types. I'd mention that most other models do already have the flag. > + if ( > + ($cpu_type eq 'qemu64' || $cpu_type eq 'kvm64') > + && ($cpu->{flags} // '') !~ m/\+pdpe1gb/ > + ) { > + # edk2 limits the phys bits to 40 in case of no 1gb pages > + # > + # see edk2 source code: OvmfPkg/Library/PlatformInitLib/MemDetect.c > + $phys_bits = 40 if $phys_bits > 40; > + } > + > + return 0 if $phys_bits >= $needed_phys_bits; > + } > + > + return $needed_phys_bits; > +} > + > 1;