public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH qemu-server v2] fix #7711: pci: try to detect large memory region preconditions
@ 2026-07-02  9:58 Dominik Csapak
  0 siblings, 0 replies; only message in thread
From: Dominik Csapak @ 2026-07-02  9:58 UTC (permalink / raw)
  To: pve-devel

When passing through devices with a large memory region, for example
video memory, there needs to be enough address space for OVMF to
correctly map that region.

By default, the address space is 32G, which should work for cards up to
16G of video memory. To get a bigger address space in OVMF, one needs to
either:
* set the CPU type to host (the address space from the host will be used)
* set 'phys-bits' on the cpu (this sets the address space to that value)
  with possibly the cpu flag 'pdpe1gb', since without that, OVMF limits
  the mmio address space to 128G.

Try to detect the largest memory region from sysfs, and warn when the VM
config includes a situation where the conditions are not fulfilled.

This won't detect all circumstances fully, but should detect a large
chunk of them.

Includes some tests for 0G, 16G, 32G and 512G regions.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* include fix # in commit subject
* use different approach (calculate the bits from the size and compare
  that, vs calculating the size from the physbits of the config)
* improve the warning to include the necessary address space and
  only show the pdpe1gb hint when the necessary bits exceed 40
* include tests for the parsing
* fix the size to the intended 512G bar size
* only check for pdpe1gb for qemu64/kvm64 as we do elsewhere in the code
  too
* change the order where we do the check and use an eval to not block
  the vm when some spurious error in the sysfs happens
* use 'use v5.36' in test module
* actually include the test in the default test list

 src/PVE/QemuServer.pm                      |  23 +++
 src/PVE/QemuServer/PCI.pm                  |  92 ++++++++++++
 src/test/Makefile                          |   5 +-
 src/test/run_pci_memory_detection_tests.pl | 162 +++++++++++++++++++++
 4 files changed, 281 insertions(+), 1 deletion(-)
 create mode 100755 src/test/run_pci_memory_detection_tests.pl

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index cdf66e89..cf7be99d 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -5676,12 +5676,21 @@ sub vm_start_nolock {
         PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, $start_timeout);
 
         my $uuid;
+        my $need_min_phys_bits = 0;
         for my $id (sort keys %$pci_devices) {
             my $d = $pci_devices->{$id};
             my ($index) = ($id =~ m/^hostpci(\d+)$/);
 
             my $chosen_mdev;
             for my $dev ($d->{ids}->@*) {
+                my $bits =
+                    eval { PVE::QemuServer::PCI::min_phys_bits_needed($conf, $dev->{id}) };
+                warn "could not determine needed MMIO size: $@\n" if $@;
+
+                if (defined($bits) && $bits > $need_min_phys_bits) {
+                    $need_min_phys_bits = $bits;
+                }
+
                 my $info =
                     eval { PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $index, $d) };
                 if ($d->{mdev} || $d->{nvidia}) {
@@ -5707,6 +5716,20 @@ sub vm_start_nolock {
             }
         }
         push @$cmd, '-uuid', $uuid if defined($uuid);
+
+        if ($need_min_phys_bits > 0) {
+            my $size = render_bytes(2**$need_min_phys_bits);
+            my $warn_text =
+                "A PCI device with a large memory region (e.g. VRAM) was detected, but VM"
+                . " is not configured for a big enough address space for OVMF (needs $size). Consider"
+                . " enabling CPU type 'host' or setting 'phys-bits' (at least $need_min_phys_bits).";
+
+            if ($need_min_phys_bits > 40) {
+                $warn_text .= " You also need to set the 'pdpe1gb' flag.";
+            }
+
+            log_warn($warn_text);
+        }
     };
     if (my $err = $@) {
         eval { PVE::Storage::deactivate_volumes($storecfg, $vollist); };
diff --git a/src/PVE/QemuServer/PCI.pm b/src/PVE/QemuServer/PCI.pm
index 0b67943c..4b5c9681 100644
--- a/src/PVE/QemuServer/PCI.pm
+++ b/src/PVE/QemuServer/PCI.pm
@@ -10,6 +10,7 @@ use PVE::Mapping::PCI;
 use PVE::SysFSTools;
 use PVE::Tools;
 
+use PVE::QemuServer::CPUConfig;
 use PVE::QemuServer::Helpers;
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::PCI::Mdev;
@@ -871,4 +872,95 @@ sub reserve_pci_usage {
     die $@ if $@;
 }
 
+# Returns the size of biggest memory region for a PCI device in bytes
+# This can be used to check if the config is correct for having an MMIO size that is large enough.
+sub get_biggest_memory_region {
+    my ($pci_id) = @_;
+
+    $pci_id = PVE::SysFSTools::normalize_pci_id($pci_id);
+
+    # read resource regions from sysfs
+    my $resource_file = "/sys/bus/pci/devices/$pci_id/resource";
+    my $regions = PVE::Tools::file_get_contents($resource_file);
+
+    # for each line parse start/end/flags.
+    my $biggest_size = 0;
+    for my $line (split('\n', $regions)) {
+        if ($line =~ m/^0x([a-f0-9]{16})\s0x([a-f0-9]{16})\s0x([a-f0-9]{16})$/i) {
+            # avoid warning when parsing long hex values with hex()
+            no warnings 'portable'; # Support for 64-bit ints required
+
+            my $start = hex($1);
+            my $end = hex($2);
+            my $flags = hex($3);
+
+            # find largest memory region with 'IORESOURCE_MEM' flag (see include/linux/ioport.h in kernel source)
+            if (($flags & 0x200) != 0) {
+                # $end is inclusive, so + 1 for the overall size
+                my $size = $end - $start + 1;
+                if ($size > $biggest_size) {
+                    $biggest_size = $size;
+                }
+            }
+        }
+    }
+
+    return $biggest_size;
+}
+
+# returns the minimum phys-bits value that needs to be configured so that the MMIO size is enough.
+# For PCI devices with memory regions > 16G, the vm either has to:
+# * boot with seabios
+# * use 'host' type cpu
+# * use high enough 'phys-bits' value (or 'host') and (possibly) 'pdpe1gb'
+#
+# return 0 if vm config does have enough MMIO space configured already
+sub min_phys_bits_needed {
+    my ($conf, $pci_id) = @_;
+
+    return 0 if ($conf->{bios} // 'seabios') eq 'seabios';
+
+    my $size = get_biggest_memory_region($pci_id);
+
+    return 0 if $size <= 16 * 1024 * 1024 * 1024;
+
+    # calculate the needed phys bits. The MMIO size needs to be larger than $size,
+    # so the bits needed must be bigger than log2($size)) + 3. So simply add a bit after truncating.
+    # edk2 limits the mmio space to an eigth of the overall space. (PhysMemAddressWidth - 3)
+    #
+    # see edk2 source code:  OvmfPkg/Library/PlatformInitLib/MemDetect.c
+
+    my $needed_phys_bits = int((log($size) / log(2))) + 4;
+
+    return $needed_phys_bits if !defined($conf->{cpu});
+
+    my $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $conf->{cpu})
+        or die "Cannot parse cpu description: $conf->{cpu}\n";
+
+    my $cpu_type = $cpu->{cputype} // '';
+
+    return 0 if $cpu_type eq 'host';
+
+    if (my $phys_bits = $cpu->{'phys-bits'}) {
+        return 0 if $phys_bits eq 'host';
+        # if it's not 'host' it must be a number between 8 and 64
+
+        # Same as for 'check_phys_bits_above_40_compat', we'd need CPU model expansion, but this is
+        # not cheap to get. Check the pdpe1gb flag only for qemu64/kvm64 cpu types.
+        if (
+            ($cpu_type eq 'qemu64' || $cpu_type eq 'kvm64')
+            && ($cpu->{flags} // '') !~ m/\+pdpe1gb/
+        ) {
+            # edk2 limits the phys bits to 40 in case of no 1gb pages
+            #
+            # see edk2 source code:  OvmfPkg/Library/PlatformInitLib/MemDetect.c
+            $phys_bits = 40 if $phys_bits > 40;
+        }
+
+        return 0 if $phys_bits >= $needed_phys_bits;
+    }
+
+    return $needed_phys_bits;
+}
+
 1;
diff --git a/src/test/Makefile b/src/test/Makefile
index cf589f41..25e0f2f6 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -1,6 +1,6 @@
 all: test
 
-test: test_snapshot test_cfg_to_cmd test_cfg_to_cmd_aarch64 test_pci_addr_conflicts test_pci_reservation test_qemu_img_convert test_migration test_restore_config test_parse_config
+test: test_snapshot test_cfg_to_cmd test_cfg_to_cmd_aarch64 test_pci_addr_conflicts test_pci_reservation test_qemu_img_convert test_migration test_restore_config test_parse_config test_pci_memory_detection
 
 test_snapshot: run_snapshot_tests.pl
 	./run_snapshot_tests.pl
@@ -18,6 +18,9 @@ test_qemu_img_convert: run_qemu_img_convert_tests.pl
 test_pci_addr_conflicts: run_pci_addr_checks.pl
 	./run_pci_addr_checks.pl
 
+test_pci_memory_detection: run_pci_memory_detection_tests.pl
+	./run_pci_memory_detection_tests.pl
+
 test_pci_reservation: run_pci_reservation_tests.pl
 	./run_pci_reservation_tests.pl
 
diff --git a/src/test/run_pci_memory_detection_tests.pl b/src/test/run_pci_memory_detection_tests.pl
new file mode 100755
index 00000000..b30547c4
--- /dev/null
+++ b/src/test/run_pci_memory_detection_tests.pl
@@ -0,0 +1,162 @@
+#!/usr/bin/perl
+
+use v5.36;
+
+use lib qw(..);
+
+use JSON;
+use Test::More;
+use Test::MockModule;
+
+use PVE::JSONSchema;
+use PVE::QemuServer::CPUConfig;
+use PVE::QemuServer::PCI;
+
+my $tools_module;
+$tools_module = Test::MockModule->new('PVE::Tools');
+$tools_module->mock(
+    'file_get_contents' => sub($path) {
+        if ($path =~ m/01:00.0/) {
+            # 0 B region
+            return <<EOF;
+0x0000000000000000 0x0000000000000000 0x0000000000000000
+EOF
+        } elsif ($path =~ m/01:01.0/) {
+            # 16 G region
+            return <<EOF;
+0x0000017000000000 0x00000173ffffffff 0x000000000014220c
+EOF
+        } elsif ($path =~ m/02:00.0/) {
+            # 32 G region
+            return <<EOF;
+0x0000017000000000 0x00000177ffffffff 0x000000000014220c
+EOF
+        } elsif ($path =~ m/03:00.0/) {
+            # 512 G region
+            return <<EOF;
+0x0000000000000000 0x0000007FFFFFFFFF 0x0000000000000200
+EOF
+        }
+    },
+);
+
+my $region_pci_map = {
+    "0G" => { address => "01:00.0", size => 0 },
+    "16G" => { address => "01:01.0", size => 16 * 1024 * 1024 * 1024 },
+    "32G" => { address => "02:00.0", size => 32 * 1024 * 1024 * 1024 },
+    "512G" => { address => "03:00.0", size => 512 * 1024 * 1024 * 1024 },
+};
+
+# test parser
+for my $test (sort keys $region_pci_map->%*) {
+    my $pci_id = $region_pci_map->{$test}->{address};
+    my $size = PVE::QemuServer::PCI::get_biggest_memory_region($pci_id);
+    my $expected = $region_pci_map->{$test}->{size};
+
+    is($size, $expected, "Size parsing - $test");
+}
+
+my $tests = [
+    {
+        name => "Empty Config",
+        conf => {},
+        expected => {
+            "0G" => 0,
+            "16G" => 0,
+            "32G" => 0,
+            "512G" => 0,
+        },
+    },
+    {
+        name => "OVMF - no CPU configured",
+        conf => {
+            bios => 'ovmf',
+        },
+        expected => {
+            "0G" => 0,
+            "16G" => 0,
+            "32G" => 39,
+            "512G" => 43,
+        },
+    },
+    {
+        name => "OVMF - HOST CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'host',
+        },
+        expected => {
+            "0G" => 0,
+            "16G" => 0,
+            "32G" => 0,
+            "512G" => 0,
+        },
+    },
+    {
+        name => "OVMF - 38 phys-bits CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=38',
+        },
+        expected => {
+            "0G" => 0,
+            "16G" => 0,
+            "32G" => 39,
+            "512G" => 43,
+        },
+    },
+    {
+        name => "OVMF - 40 phys-bits CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=40',
+        },
+        expected => {
+            "0G" => 0,
+            "16G" => 0,
+            "32G" => 0,
+            "512G" => 43,
+        },
+    },
+    {
+        name => "OVMF - 43 phys-bits CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=43',
+        },
+        expected => {
+            "0G" => 0,
+            "16G" => 0,
+            "32G" => 0,
+            "512G" => 43,
+        },
+    },
+    {
+        name => "OVMF - 43 phys-bits + pdpe1gb CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=43,flags=+pdpe1gb',
+        },
+        expected => {
+            "0G" => 0,
+            "16G" => 0,
+            "32G" => 0,
+            "512G" => 0,
+        },
+    },
+];
+
+foreach my $test (@{$tests}) {
+    my $name = $test->{name};
+    my $expected = $test->{expected};
+    my $conf = $test->{conf};
+    for my $size (sort keys $region_pci_map->%*) {
+        my $pciid = $region_pci_map->{$size}->{address};
+        my $actual = PVE::QemuServer::PCI::min_phys_bits_needed($conf, $pciid);
+
+        is($actual, $expected->{$size}, "$name - $size");
+    }
+
+}
+
+done_testing();
-- 
2.47.3





^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-07-02 10:03 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02  9:58 [PATCH qemu-server v2] fix #7711: pci: try to detect large memory region preconditions Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal