From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate001.proxmox.com (gate001.proxmox.com [45.144.208.40]) by lore.proxmox.com (Postfix) with ESMTPS id 12A481FF135 for ; Thu, 02 Jul 2026 12:03:56 +0200 (CEST) Received: from gate001.proxmox.com (localhost.localdomain [127.0.0.1]) by gate001.proxmox.com (Proxmox) with ESMTP id 37535213C8; Thu, 02 Jul 2026 12:03:55 +0200 (CEST) From: Dominik Csapak To: pve-devel@lists.proxmox.com Subject: [PATCH qemu-server v2] fix #7711: pci: try to detect large memory region preconditions Date: Thu, 2 Jul 2026 11:58:07 +0200 Message-ID: <20260702100320.1937512-1-d.csapak@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 1 AWL -1.000 Adjusted score from AWL reputation of From: address DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment (newer systems) KAM_MAILER 2 Automated Mailer Tag Left in Email SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: KQD7DYUBVQSNP6UFCDEED2ZGJEHRQ4WK X-Message-ID-Hash: KQD7DYUBVQSNP6UFCDEED2ZGJEHRQ4WK X-MailFrom: d.csapak@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: When passing through devices with a large memory region, for example video memory, there needs to be enough address space for OVMF to correctly map that region. By default, the address space is 32G, which should work for cards up to 16G of video memory. To get a bigger address space in OVMF, one needs to either: * set the CPU type to host (the address space from the host will be used) * set 'phys-bits' on the cpu (this sets the address space to that value) with possibly the cpu flag 'pdpe1gb', since without that, OVMF limits the mmio address space to 128G. Try to detect the largest memory region from sysfs, and warn when the VM config includes a situation where the conditions are not fulfilled. This won't detect all circumstances fully, but should detect a large chunk of them. Includes some tests for 0G, 16G, 32G and 512G regions. Signed-off-by: Dominik Csapak --- changes from v1: * include fix # in commit subject * use different approach (calculate the bits from the size and compare that, vs calculating the size from the physbits of the config) * improve the warning to include the necessary address space and only show the pdpe1gb hint when the necessary bits exceed 40 * include tests for the parsing * fix the size to the intended 512G bar size * only check for pdpe1gb for qemu64/kvm64 as we do elsewhere in the code too * change the order where we do the check and use an eval to not block the vm when some spurious error in the sysfs happens * use 'use v5.36' in test module * actually include the test in the default test list src/PVE/QemuServer.pm | 23 +++ src/PVE/QemuServer/PCI.pm | 92 ++++++++++++ src/test/Makefile | 5 +- src/test/run_pci_memory_detection_tests.pl | 162 +++++++++++++++++++++ 4 files changed, 281 insertions(+), 1 deletion(-) create mode 100755 src/test/run_pci_memory_detection_tests.pl diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm index cdf66e89..cf7be99d 100644 --- a/src/PVE/QemuServer.pm +++ b/src/PVE/QemuServer.pm @@ -5676,12 +5676,21 @@ sub vm_start_nolock { PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, $start_timeout); my $uuid; + my $need_min_phys_bits = 0; for my $id (sort keys %$pci_devices) { my $d = $pci_devices->{$id}; my ($index) = ($id =~ m/^hostpci(\d+)$/); my $chosen_mdev; for my $dev ($d->{ids}->@*) { + my $bits = + eval { PVE::QemuServer::PCI::min_phys_bits_needed($conf, $dev->{id}) }; + warn "could not determine needed MMIO size: $@\n" if $@; + + if (defined($bits) && $bits > $need_min_phys_bits) { + $need_min_phys_bits = $bits; + } + my $info = eval { PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $index, $d) }; if ($d->{mdev} || $d->{nvidia}) { @@ -5707,6 +5716,20 @@ sub vm_start_nolock { } } push @$cmd, '-uuid', $uuid if defined($uuid); + + if ($need_min_phys_bits > 0) { + my $size = render_bytes(2**$need_min_phys_bits); + my $warn_text = + "A PCI device with a large memory region (e.g. VRAM) was detected, but VM" + . " is not configured for a big enough address space for OVMF (needs $size). Consider" + . " enabling CPU type 'host' or setting 'phys-bits' (at least $need_min_phys_bits)."; + + if ($need_min_phys_bits > 40) { + $warn_text .= " You also need to set the 'pdpe1gb' flag."; + } + + log_warn($warn_text); + } }; if (my $err = $@) { eval { PVE::Storage::deactivate_volumes($storecfg, $vollist); }; diff --git a/src/PVE/QemuServer/PCI.pm b/src/PVE/QemuServer/PCI.pm index 0b67943c..4b5c9681 100644 --- a/src/PVE/QemuServer/PCI.pm +++ b/src/PVE/QemuServer/PCI.pm @@ -10,6 +10,7 @@ use PVE::Mapping::PCI; use PVE::SysFSTools; use PVE::Tools; +use PVE::QemuServer::CPUConfig; use PVE::QemuServer::Helpers; use PVE::QemuServer::Machine; use PVE::QemuServer::PCI::Mdev; @@ -871,4 +872,95 @@ sub reserve_pci_usage { die $@ if $@; } +# Returns the size of biggest memory region for a PCI device in bytes +# This can be used to check if the config is correct for having an MMIO size that is large enough. +sub get_biggest_memory_region { + my ($pci_id) = @_; + + $pci_id = PVE::SysFSTools::normalize_pci_id($pci_id); + + # read resource regions from sysfs + my $resource_file = "/sys/bus/pci/devices/$pci_id/resource"; + my $regions = PVE::Tools::file_get_contents($resource_file); + + # for each line parse start/end/flags. + my $biggest_size = 0; + for my $line (split('\n', $regions)) { + if ($line =~ m/^0x([a-f0-9]{16})\s0x([a-f0-9]{16})\s0x([a-f0-9]{16})$/i) { + # avoid warning when parsing long hex values with hex() + no warnings 'portable'; # Support for 64-bit ints required + + my $start = hex($1); + my $end = hex($2); + my $flags = hex($3); + + # find largest memory region with 'IORESOURCE_MEM' flag (see include/linux/ioport.h in kernel source) + if (($flags & 0x200) != 0) { + # $end is inclusive, so + 1 for the overall size + my $size = $end - $start + 1; + if ($size > $biggest_size) { + $biggest_size = $size; + } + } + } + } + + return $biggest_size; +} + +# returns the minimum phys-bits value that needs to be configured so that the MMIO size is enough. +# For PCI devices with memory regions > 16G, the vm either has to: +# * boot with seabios +# * use 'host' type cpu +# * use high enough 'phys-bits' value (or 'host') and (possibly) 'pdpe1gb' +# +# return 0 if vm config does have enough MMIO space configured already +sub min_phys_bits_needed { + my ($conf, $pci_id) = @_; + + return 0 if ($conf->{bios} // 'seabios') eq 'seabios'; + + my $size = get_biggest_memory_region($pci_id); + + return 0 if $size <= 16 * 1024 * 1024 * 1024; + + # calculate the needed phys bits. The MMIO size needs to be larger than $size, + # so the bits needed must be bigger than log2($size)) + 3. So simply add a bit after truncating. + # edk2 limits the mmio space to an eigth of the overall space. (PhysMemAddressWidth - 3) + # + # see edk2 source code: OvmfPkg/Library/PlatformInitLib/MemDetect.c + + my $needed_phys_bits = int((log($size) / log(2))) + 4; + + return $needed_phys_bits if !defined($conf->{cpu}); + + my $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $conf->{cpu}) + or die "Cannot parse cpu description: $conf->{cpu}\n"; + + my $cpu_type = $cpu->{cputype} // ''; + + return 0 if $cpu_type eq 'host'; + + if (my $phys_bits = $cpu->{'phys-bits'}) { + return 0 if $phys_bits eq 'host'; + # if it's not 'host' it must be a number between 8 and 64 + + # Same as for 'check_phys_bits_above_40_compat', we'd need CPU model expansion, but this is + # not cheap to get. Check the pdpe1gb flag only for qemu64/kvm64 cpu types. + if ( + ($cpu_type eq 'qemu64' || $cpu_type eq 'kvm64') + && ($cpu->{flags} // '') !~ m/\+pdpe1gb/ + ) { + # edk2 limits the phys bits to 40 in case of no 1gb pages + # + # see edk2 source code: OvmfPkg/Library/PlatformInitLib/MemDetect.c + $phys_bits = 40 if $phys_bits > 40; + } + + return 0 if $phys_bits >= $needed_phys_bits; + } + + return $needed_phys_bits; +} + 1; diff --git a/src/test/Makefile b/src/test/Makefile index cf589f41..25e0f2f6 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -1,6 +1,6 @@ all: test -test: test_snapshot test_cfg_to_cmd test_cfg_to_cmd_aarch64 test_pci_addr_conflicts test_pci_reservation test_qemu_img_convert test_migration test_restore_config test_parse_config +test: test_snapshot test_cfg_to_cmd test_cfg_to_cmd_aarch64 test_pci_addr_conflicts test_pci_reservation test_qemu_img_convert test_migration test_restore_config test_parse_config test_pci_memory_detection test_snapshot: run_snapshot_tests.pl ./run_snapshot_tests.pl @@ -18,6 +18,9 @@ test_qemu_img_convert: run_qemu_img_convert_tests.pl test_pci_addr_conflicts: run_pci_addr_checks.pl ./run_pci_addr_checks.pl +test_pci_memory_detection: run_pci_memory_detection_tests.pl + ./run_pci_memory_detection_tests.pl + test_pci_reservation: run_pci_reservation_tests.pl ./run_pci_reservation_tests.pl diff --git a/src/test/run_pci_memory_detection_tests.pl b/src/test/run_pci_memory_detection_tests.pl new file mode 100755 index 00000000..b30547c4 --- /dev/null +++ b/src/test/run_pci_memory_detection_tests.pl @@ -0,0 +1,162 @@ +#!/usr/bin/perl + +use v5.36; + +use lib qw(..); + +use JSON; +use Test::More; +use Test::MockModule; + +use PVE::JSONSchema; +use PVE::QemuServer::CPUConfig; +use PVE::QemuServer::PCI; + +my $tools_module; +$tools_module = Test::MockModule->new('PVE::Tools'); +$tools_module->mock( + 'file_get_contents' => sub($path) { + if ($path =~ m/01:00.0/) { + # 0 B region + return < { address => "01:00.0", size => 0 }, + "16G" => { address => "01:01.0", size => 16 * 1024 * 1024 * 1024 }, + "32G" => { address => "02:00.0", size => 32 * 1024 * 1024 * 1024 }, + "512G" => { address => "03:00.0", size => 512 * 1024 * 1024 * 1024 }, +}; + +# test parser +for my $test (sort keys $region_pci_map->%*) { + my $pci_id = $region_pci_map->{$test}->{address}; + my $size = PVE::QemuServer::PCI::get_biggest_memory_region($pci_id); + my $expected = $region_pci_map->{$test}->{size}; + + is($size, $expected, "Size parsing - $test"); +} + +my $tests = [ + { + name => "Empty Config", + conf => {}, + expected => { + "0G" => 0, + "16G" => 0, + "32G" => 0, + "512G" => 0, + }, + }, + { + name => "OVMF - no CPU configured", + conf => { + bios => 'ovmf', + }, + expected => { + "0G" => 0, + "16G" => 0, + "32G" => 39, + "512G" => 43, + }, + }, + { + name => "OVMF - HOST CPU configured", + conf => { + bios => 'ovmf', + cpu => 'host', + }, + expected => { + "0G" => 0, + "16G" => 0, + "32G" => 0, + "512G" => 0, + }, + }, + { + name => "OVMF - 38 phys-bits CPU configured", + conf => { + bios => 'ovmf', + cpu => 'qemu64,phys-bits=38', + }, + expected => { + "0G" => 0, + "16G" => 0, + "32G" => 39, + "512G" => 43, + }, + }, + { + name => "OVMF - 40 phys-bits CPU configured", + conf => { + bios => 'ovmf', + cpu => 'qemu64,phys-bits=40', + }, + expected => { + "0G" => 0, + "16G" => 0, + "32G" => 0, + "512G" => 43, + }, + }, + { + name => "OVMF - 43 phys-bits CPU configured", + conf => { + bios => 'ovmf', + cpu => 'qemu64,phys-bits=43', + }, + expected => { + "0G" => 0, + "16G" => 0, + "32G" => 0, + "512G" => 43, + }, + }, + { + name => "OVMF - 43 phys-bits + pdpe1gb CPU configured", + conf => { + bios => 'ovmf', + cpu => 'qemu64,phys-bits=43,flags=+pdpe1gb', + }, + expected => { + "0G" => 0, + "16G" => 0, + "32G" => 0, + "512G" => 0, + }, + }, +]; + +foreach my $test (@{$tests}) { + my $name = $test->{name}; + my $expected = $test->{expected}; + my $conf = $test->{conf}; + for my $size (sort keys $region_pci_map->%*) { + my $pciid = $region_pci_map->{$size}->{address}; + my $actual = PVE::QemuServer::PCI::min_phys_bits_needed($conf, $pciid); + + is($actual, $expected->{$size}, "$name - $size"); + } + +} + +done_testing(); -- 2.47.3