public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server] pci: try to detect large memory region preconditions
Date: Mon, 19 Jan 2026 15:20:54 +0100	[thread overview]
Message-ID: <20260119142319.2631206-1-d.csapak@proxmox.com> (raw)

When passing through devices with a large memory region, for example
video memory, there needs to be enough address space for OVMF to
correctly map that region.

By default, the address space is 32G, which should work for cards up to
16G of video memory. To get a bigger address space in OVMF, one needs to
either:
* set the CPU type to host (the address space from the host will be used)
* set 'phys-bits' on the cpu (this sets the address space to that value)
  with possibly the cpu flag 'pdpe1gb', since without that, OVMF limits
  the mmio address space to 128G.

Try to detect the larges memory region from sysfs, and warn when the VM
config includes a situation where the conditions are not fulfilled.

This won't detect all circumstances fully, but should detect a large
chunk of them.

Includes some tests for 0G, 16G, 32G and 512G regions.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
Note: the warning message can probably be improved.

Also I'd like to move the section that exlains this, from the wiki[0]
to our reference docs. If nobody objects, I'd send a patch for that.

0: https://pve.proxmox.com/wiki/PCI_Passthrough#%22BAR0_is_0M%22_error_or_Windows_Code_12_Error

 src/PVE/QemuServer.pm                      |  10 ++
 src/PVE/QemuServer/PCI.pm                  |  82 +++++++++++
 src/test/Makefile                          |   3 +
 src/test/run_pci_memory_detection_tests.pl | 158 +++++++++++++++++++++
 4 files changed, 253 insertions(+)
 create mode 100755 src/test/run_pci_memory_detection_tests.pl

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index bad3527c..70748d93 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -5621,6 +5621,7 @@ sub vm_start_nolock {
         PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, $start_timeout);
 
         my $uuid;
+        my $warn_mmio_size = 0;
         for my $id (sort keys %$pci_devices) {
             my $d = $pci_devices->{$id};
             my ($index) = ($id =~ m/^hostpci(\d+)$/);
@@ -5629,6 +5630,8 @@ sub vm_start_nolock {
             for my $dev ($d->{ids}->@*) {
                 my $info =
                     eval { PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $index, $d) };
+                $warn_mmio_size = 1
+                    if !PVE::QemuServer::PCI::is_mmio_size_ok($conf, $dev->{id});
                 if ($d->{mdev} || $d->{nvidia}) {
                     warn $@ if $@;
                     $chosen_mdev = $info;
@@ -5652,6 +5655,13 @@ sub vm_start_nolock {
             }
         }
         push @$cmd, '-uuid', $uuid if defined($uuid);
+
+        if ($warn_mmio_size) {
+            log_warn("A PCI device with a large memory region detected (e.g. VRAM), but VM"
+                . " is not configured for a big enough MMIO size for OMVF. Consider enabling CPU"
+                . " type 'host' or setting 'phys-bits' and possibly adding the 'pdpe1gb' flag."
+            );
+        }
     };
     if (my $err = $@) {
         eval { PVE::Storage::deactivate_volumes($storecfg, $vollist); };
diff --git a/src/PVE/QemuServer/PCI.pm b/src/PVE/QemuServer/PCI.pm
index c9cf8de0..265b3a51 100644
--- a/src/PVE/QemuServer/PCI.pm
+++ b/src/PVE/QemuServer/PCI.pm
@@ -10,9 +10,13 @@ use PVE::Mapping::PCI;
 use PVE::SysFSTools;
 use PVE::Tools;
 
+use PVE::QemuServer::CPUConfig;
 use PVE::QemuServer::Helpers;
 use PVE::QemuServer::Machine;
 
+# avoid warning when parsing long hex values with hex()
+no warnings 'portable'; # Support for 64-bit ints required
+
 use base 'Exporter';
 
 our @EXPORT_OK = qw(
@@ -903,4 +907,82 @@ sub reserve_pci_usage {
     die $@ if $@;
 }
 
+# Returns the size of biggest memory region for a PCI device in bytes
+# This can be used to check if the config is correct for having an MMIO size that is large enough.
+sub get_biggest_memory_region {
+    my ($pciid) = @_;
+
+    $pciid = PVE::SysFSTools::normalize_pci_id($pciid);
+
+    # read resource regions from sysfs
+    my $resource_file = "/sys/bus/pci/devices/$pciid/resource";
+    my $regions = PVE::Tools::file_get_contents($resource_file);
+
+    # for each line parse start/end/flags.
+    my $size = 0;
+    for my $line (split('\n', $regions)) {
+        if ($line =~ m/^0x([a-f0-9]{16})\s0x([a-f0-9]{16})\s0x([a-f0-9]{16})$/) {
+            my $start = hex($1);
+            my $end = hex($2);
+            my $flags = hex($3);
+
+            # find largest memory region with 'IORESOURCE_MEM' flag (see include/linux/ioport.h in kernel source)
+            if (($flags & 0x200) != 0) {
+                my $cur_size = $end - $start + 1;
+                if ($cur_size > ($size // 0)) {
+                    $size = $cur_size;
+                }
+            }
+        }
+    }
+
+    return $size;
+}
+
+# returns 1 if the vm is configured so that the MMIO size is enough.
+# For PCI devices with memory regions >= 16G, the vm either has to:
+# * boot with seabios
+# * use 'host' type cpu
+# * use high enough 'phys-bits' value (or 'host') and (possibly) 'pdpe1gb'
+#
+# return 0 if vm config does not have either
+sub is_mmio_size_ok {
+    my ($conf, $pciid) = @_;
+
+    my $size = get_biggest_memory_region($pciid);
+
+    return 1 if $size <= 16 * 1024 * 1024 * 1024;
+
+    return 1 if ($conf->{bios} // 'seabios') eq 'seabios';
+
+    return 0 if !defined($conf->{cpu});
+
+    my $cpu = PVE::JSONSchema::parse_property_string('pve-vm-cpu-conf', $conf->{cpu})
+        or die "Cannot parse cpu description: $conf->{cpu}\n";
+
+    return 1 if ($cpu->{cputype} // '') eq 'host';
+
+    if (my $phys_bits = $cpu->{'phys-bits'}) {
+        return 1 if $phys_bits eq 'host';
+        # if it's not 'host' it must be a number between 8 and 64
+
+        # edk2 limits the phys bits to 40 in case of no 1gb pages
+        # and limits the mmio space to a quarter of the overall space.
+        #
+        # see edk2 source code:  OvmfPkg/Library/PlatformInitLib/MemDetect.c
+
+        if (($cpu->{flags} // '') !~ m/\+pdpe1gb/) {
+            $phys_bits = 40 if $phys_bits > 40;
+        }
+
+        my $mmio_bits = $phys_bits - 3;
+
+        my $mmio_space = 2**$mmio_bits;
+
+        return 1 if $mmio_space > $size;
+    }
+
+    return 0;
+}
+
 1;
diff --git a/src/test/Makefile b/src/test/Makefile
index 2ef9073a..1a5aaa95 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -15,6 +15,9 @@ test_qemu_img_convert: run_qemu_img_convert_tests.pl
 test_pci_addr_conflicts: run_pci_addr_checks.pl
 	./run_pci_addr_checks.pl
 
+test_pci_memory_detection: run_pci_memory_detection_tests.pl
+	./run_pci_memory_detection_tests.pl
+
 test_pci_reservation: run_pci_reservation_tests.pl
 	./run_pci_reservation_tests.pl
 
diff --git a/src/test/run_pci_memory_detection_tests.pl b/src/test/run_pci_memory_detection_tests.pl
new file mode 100755
index 00000000..7b30ab7e
--- /dev/null
+++ b/src/test/run_pci_memory_detection_tests.pl
@@ -0,0 +1,158 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use lib qw(..);
+
+use JSON;
+use Test::More;
+use Test::MockModule;
+
+use PVE::JSONSchema;
+use PVE::QemuServer::CPUConfig;
+use PVE::QemuServer::PCI;
+
+my $tools_module;
+$tools_module = Test::MockModule->new('PVE::Tools');
+$tools_module->mock(
+    'file_get_contents' => sub {
+        my ($path) = @_;
+
+        if ($path =~ m/01:00.0/) {
+            # 0 B region
+            return <<EOF;
+0x0000000000000000 0x0000000000000000 0x0000000000000000
+EOF
+        } elsif ($path =~ m/01:01.0/) {
+            # 16 G region
+            return <<EOF;
+0x0000017000000000 0x00000173ffffffff 0x000000000014220c
+EOF
+        } elsif ($path =~ m/02:00.0/) {
+            # 32 G region
+            return <<EOF;
+0x0000017000000000 0x00000177ffffffff 0x000000000014220c
+EOF
+        } elsif ($path =~ m/03:00.0/) {
+            # 512G region
+            return <<EOF;
+0x0000000000000000 0x0000008000000000 0x0000000000000200
+EOF
+        }
+    },
+);
+
+my $region_pci_map = {
+    "0G" => "01:00.0",
+    "16G" => "01:01.0",
+    "32G" => "02:00.0",
+    "512G" => "03:00.0",
+};
+
+my $tests = [
+    {
+        name => "Empty Config",
+        conf => {},
+        expected => {
+            "0G" => 1,
+            "16G" => 1,
+            "32G" => 1,
+            "512G" => 1,
+        },
+    },
+    {
+        name => "OVMF - no CPU configured",
+        conf => {
+            bios => 'ovmf',
+        },
+        expected => {
+            "0G" => 1,
+            "16G" => 1,
+            "32G" => 0,
+            "512G" => 0,
+        },
+    },
+    {
+        name => "OVMF - HOST CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'host',
+        },
+        expected => {
+            "0G" => 1,
+            "16G" => 1,
+            "32G" => 1,
+            "512G" => 1,
+        },
+    },
+    {
+        name => "OVMF - 38 phys-bits CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=38',
+        },
+        expected => {
+            "0G" => 1,
+            "16G" => 1,
+            "32G" => 0,
+            "512G" => 0,
+        },
+    },
+    {
+        name => "OVMF - 40 phys-bits CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=40',
+        },
+        expected => {
+            "0G" => 1,
+            "16G" => 1,
+            "32G" => 1,
+            "512G" => 0,
+        },
+    },
+    {
+        name => "OVMF - 43 phys-bits CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=43',
+        },
+        expected => {
+            "0G" => 1,
+            "16G" => 1,
+            "32G" => 1,
+            "512G" => 0,
+        },
+    },
+    {
+        name => "OVMF - 43 phys-bits + pdpe1gb CPU configured",
+        conf => {
+            bios => 'ovmf',
+            cpu => 'qemu64,phys-bits=43,flags=+pdpe1gb',
+        },
+        expected => {
+            "0G" => 1,
+            "16G" => 1,
+            "32G" => 1,
+            "512G" => 1,
+        },
+    },
+];
+
+my $single_test_name = shift;
+
+foreach my $test (@{$tests}) {
+    my $name = $test->{name};
+    my $expected = $test->{expected};
+    my $conf = $test->{conf};
+    for my $size (sort keys $region_pci_map->%*) {
+        my $pciid = $region_pci_map->{$size};
+        my $actual = PVE::QemuServer::PCI::is_mmio_size_ok($conf, $pciid);
+
+        is_deeply($actual, $expected->{$size}, "$name - $size");
+    }
+
+}
+
+done_testing();
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


                 reply	other threads:[~2026-01-19 14:23 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260119142319.2631206-1-d.csapak@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal