public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server v3 12/13] fix #3574: enable multi pci device mapping from config
Date: Tue, 20 Sep 2022 14:50:27 +0200	[thread overview]
Message-ID: <20220920125041.3636561-22-d.csapak@proxmox.com> (raw)
In-Reply-To: <20220920125041.3636561-1-d.csapak@proxmox.com>

The hardware config now supports multiple devices as a semicolon
seperated list. With this, instead of only having one device in a pci mapping,
we now have a list of which we can choose from on vm start. This way one can
dynamically start vms with a pool of (identical) pci devices without
having to manually assign the proper ids.

For that we have to change the internal representation of a parsed
device, such that we have the seperately configured paths in the mapping
in different lists (because multifunction devices still are interpreted
as single devices)

For mdev devices we now can also have multiple devices, where we simply
try to create the appropriate type on each until we either have one
created, or bail out.

Since we now have to reserve the pci ids in print_hostpci_devices, we
have to add a 'reserve' parameter to config_to_command (and chain it
through to reserve_pci_usage) so that a 'qm showcmd' does not actually
reserve any pci id (this would break when using that on running vms).
Additionally this also prevents the migration tests from failing
(they use vm_commandline which in turn uses config_to_command)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 PVE/QemuServer.pm     | 43 ++++++++++++++++++++++++++++-------
 PVE/QemuServer/PCI.pm | 53 +++++++++++++++++++++++++++++++++++--------
 2 files changed, 78 insertions(+), 18 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index d23cfc2..5833aba 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -3507,8 +3507,9 @@ my sub should_disable_smm {
 
 sub config_to_command {
     my ($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu,
-        $pbs_backing) = @_;
+        $pbs_backing, $reserve) = @_;
 
+    $reserve //= 1;
     my $cmd = [];
     my ($globalFlags, $machineFlags, $rtcFlags) = ([], [], []);
     my $devices = [];
@@ -3724,7 +3725,7 @@ sub config_to_command {
 
     # host pci device passthrough
     my ($kvm_off, $gpu_passthrough, $legacy_igd, $pci_devices) = PVE::QemuServer::PCI::print_hostpci_devices(
-	$vmid, $conf, $devices, $vga, $winversion, $q35, $bridges, $arch, $machine_type, $bootorder);
+	$vmid, $conf, $devices, $vga, $winversion, $q35, $bridges, $arch, $machine_type, $bootorder, $reserve);
 
     # usb devices
     my $usb_dev_features = {};
@@ -5623,13 +5624,30 @@ sub vm_start_nolock {
 	my $uuid;
 	for my $id (sort keys %$pci_devices) {
 	    my $d = $pci_devices->{$id}->{device};
-	    for my $dev ($d->{pciid}->@*) {
-		my $info = PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
 
-		# nvidia grid needs the uuid of the mdev as qemu parameter
-		if ($d->{mdev} && !defined($uuid) && $info->{vendor} eq '10de') {
-		    $uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $id);
+	    # used pci devices for non-mdev
+	    if (!$d->{mdev}) {
+		for my $dev ($pci_devices->{$id}->{used}->@*) {
+		    PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id);
 		}
+		next;
+	    }
+
+	    # try each configured pci device for mdevs
+	    my $devs = [map { $_->{id} } map { @$_ } $d->{ids}->@*]; # flatten ids
+
+	    my $info;
+	    for my $dev (@$devs) {
+		$info = eval { PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev, $id, $d->{mdev}) };
+		warn $@ if $@;
+		last if $info; # if successful, we're done
+	    }
+
+	    die "could not create mediated device\n" if !defined($info);
+
+	    # nvidia grid needs the uuid of the mdev as qemu parameter
+	    if (!defined($uuid) && $info->{vendor} eq '10de') {
+		$uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $id);
 	    }
 	}
 	push @$cmd, '-uuid', $uuid if defined($uuid);
@@ -5862,7 +5880,16 @@ sub vm_commandline {
 
     my $defaults = load_defaults();
 
-    my $cmd = config_to_command($storecfg, $vmid, $conf, $defaults, $forcemachine, $forcecpu);
+    my $cmd = config_to_command(
+	$storecfg,
+	$vmid,
+	$conf,
+	$defaults,
+	$forcemachine,
+	$forcecpu,
+	undef,
+	0,
+    );
 
     return PVE::Tools::cmd2string($cmd);
 }
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index 08244c1..1ad89ed 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -386,6 +386,7 @@ sub parse_hostpci {
 
     my $res = PVE::JSONSchema::parse_property_string($hostpci_fmt, $value);
 
+    my $idlist = [];
     my $mapping = 0;
     if ($res->{host} !~ m/:/) {
 	# we have no ordinary pci id, must be a mapping
@@ -398,17 +399,29 @@ sub parse_hostpci {
 	if (my $err = $@) {
 	    die "PCI device mapping invalid (hardware probably changed): $err\n";
 	}
-	$res->{host} = $device->{path};
+	$idlist = [split(/;/, $device->{path})];
+	# if we have a list of mapped devices, we want to choose the first available one
+	$res->{choose} = 1 if scalar(@$idlist > 1);
+    } else {
+	$idlist = [split(/;/, $res->{host})];
     }
 
-    my @idlist = split(/;/, $res->{host});
     delete $res->{host};
-    foreach my $id (@idlist) {
+    my $ignore_mdev = !$res->{choose} && scalar(@$idlist) > 1;
+
+    $res->{ids} = [];
+    foreach my $id (@$idlist) {
 	my $devs = PVE::SysFSTools::lspci($id);
-	die "cannot use mediated device with multifuntion device\n"
-	    if $mapping && $res->{mdev} && scalar(@$devs) > 1;
 	die "no PCI device found for '$id'\n" if !scalar(@$devs);
-	push @{$res->{pciid}}, @$devs;
+	$ignore_mdev = 1 if scalar(@$devs) > 1;
+	push @{$res->{ids}}, $devs;
+    }
+    # ignore mdev for multiple devices, except when from mapping
+    if ($res->{mdev} && $ignore_mdev) {
+	# FIXME in 8.0 we should also disallow that for 'normal' passthrough
+	die "cannot use mediated device with multifunction device\n" if $mapping;
+	warn "ignoring mediated device with multifunction device\n";
+	delete $res->{mdev};
     }
     return $res;
 }
@@ -437,11 +450,13 @@ my $print_pci_device = sub {
 };
 
 sub print_hostpci_devices {
-    my ($vmid, $conf, $devices, $vga, $winversion, $q35, $bridges, $arch, $machine_type, $bootorder) = @_;
+    my ($vmid, $conf, $devices, $vga, $winversion, $q35, $bridges, $arch, $machine_type, $bootorder, $reserve) = @_;
 
+    $reserve //= 1;
     my $kvm_off = 0;
     my $gpu_passthrough = 0;
     my $legacy_igd = 0;
+    my $used_pci_ids = {};
     my $parsed_devices = {};
 
     my $pciaddr;
@@ -473,7 +488,24 @@ sub print_hostpci_devices {
 	    $pciaddr = print_pci_addr($pci_name, $bridges, $arch, $machine_type);
 	}
 
-	my $pcidevices = $d->{pciid};
+	# choose devices
+	my $pcidevices = [];
+	if (!$d->{mdev}) {
+	    for my $devs ($d->{ids}->@*) {
+		my $ids = [map { $_->{id} } @$devs];
+
+		if ($d->{choose}) {
+		    next if grep { defined($used_pci_ids->{$_}) } @$ids; # already used
+		    eval { reserve_pci_usage($ids, $vmid, 10, undef, $reserve) };
+		    next if $@;
+		}
+
+		map { $used_pci_ids->{$_} = 1 } @$ids;
+		push @$pcidevices, @$devs;
+		last if $d->{choose};
+	    }
+	    die "could not find a free device\n" if scalar(@$pcidevices) < 1;
+	}
 	$parsed_devices->{$i}->{used} = $pcidevices;
 	my $multifunction = @$pcidevices > 1;
 
@@ -603,8 +635,9 @@ sub remove_pci_reservation {
 }
 
 sub reserve_pci_usage {
-    my ($requested_ids, $vmid, $timeout, $pid) = @_;
+    my ($requested_ids, $vmid, $timeout, $pid, $reserve) = @_;
 
+    $reserve //= 1;
     $requested_ids = [ $requested_ids ] if !ref($requested_ids);
     return if !scalar(@$requested_ids); # do nothing for empty list
 
@@ -637,7 +670,7 @@ sub reserve_pci_usage {
 		$reservation_list->{$id}->{time} = $ctime + $timeout + 5;
 	    }
 	}
-	$write_pci_reservation_unlocked->($reservation_list);
+	$write_pci_reservation_unlocked->($reservation_list) if $reserve;
     });
     die $@ if $@;
 }
-- 
2.30.2





  parent reply	other threads:[~2022-09-20 12:51 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20 12:50 [pve-devel] [PATCH many v3] add cluster-wide hardware device mapping Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH cluster v3 1/1] add nodes/hardware-map.conf Dominik Csapak
2022-11-08 18:03   ` [pve-devel] applied: " Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 10/13] PVE/API2/Qemu: migrate preconditions: use new check_local_resources info Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 11/13] PVE/QemuMigrate: check for mapped resources on migration Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 12/13] fix #3574: enable multi pci device mapping from config Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 13/13] add tests for mapped pci devices Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH access-control v3 1/1] PVE/AccessControl: add Hardware.* privileges and /hardware/ paths Dominik Csapak
2022-11-09 12:05   ` Fabian Grünbichler
2022-11-09 12:39     ` Dominik Csapak
2022-11-09 13:06       ` Fabian Grünbichler
2022-11-09 13:23         ` Dominik Csapak
2022-11-09 12:52     ` Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH common v3 1/3] SysFSTools: make mdev cleanup independent of pciid Dominik Csapak
2022-11-09  8:38   ` Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH common v3 2/3] add PVE/HardwareMap Dominik Csapak
2022-11-09  8:46   ` Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH common v3 3/3] HardwareMap: add support for multiple pci device paths per mapping Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 01/13] cleanup pci devices in more situations Dominik Csapak
2022-11-09  8:00   ` [pve-devel] applied: " Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 02/13] PCI: make mediated device path independent of pci id Dominik Csapak
2022-11-09  8:08   ` [pve-devel] applied: " Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 03/13] PCI: refactor print_pci_device Dominik Csapak
2022-11-09  7:49   ` Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 04/13] PCI: reuse parsed info from print_hostpci_devices Dominik Csapak
2022-11-09  8:23   ` Thomas Lamprecht
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 05/13] PVE/QemuServer: allow mapped usb devices in config Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 06/13] PVE/QemuServer: allow mapped pci deviced " Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 07/13] PVE/API2/Qemu: add permission checks for mapped usb devices Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 08/13] PVE/API2/Qemu: add permission checks for mapped pci devices Dominik Csapak
2022-11-09 12:14   ` Fabian Grünbichler
2022-11-09 12:51     ` Dominik Csapak
2022-11-09 13:28       ` Fabian Grünbichler
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 09/13] PVE/QemuServer: extend 'check_local_resources' for mapped resources Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 10/13] PVE/API2/Qemu: migrate preconditions: use new check_local_resources info Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 11/13] PVE/QemuMigrate: check for mapped resources on migration Dominik Csapak
2022-09-20 12:50 ` Dominik Csapak [this message]
2022-09-20 12:50 ` [pve-devel] [PATCH qemu-server v3 13/13] add tests for mapped pci devices Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 01/13] PVE/API2/Hardware: add Mapping.pm Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 02/13] PVE/API2/Cluster: add Hardware mapping list api call Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 03/13] ui: form/USBSelector: make it more flexible with nodename Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 04/13] ui: form: add PCIMapSelector Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 05/13] ui: form: add USBMapSelector Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 06/13] ui: qemu/PCIEdit: rework panel to add a mapped configuration Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 07/13] ui: qemu/USBEdit: add 'mapped' device case Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 08/13] ui: form: add MultiPCISelector Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 09/13] ui: add window/PCIEdit: edit window for pci mappings Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 10/13] ui: add window/USBEdit: edit window for usb mappings Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 11/13] ui: add dc/HardwareView: a CRUD interface for hardware mapping Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 12/13] ui: window/Migrate: allow mapped devices Dominik Csapak
2022-09-20 12:50 ` [pve-devel] [PATCH manager v3 13/13] ui: improve permission handling for hardware Dominik Csapak
2022-09-20 16:12 ` [pve-devel] [PATCH many v3] add cluster-wide hardware device mapping DERUMIER, Alexandre
2022-09-23 16:13 ` DERUMIER, Alexandre
2022-11-08 18:03 ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220920125041.3636561-22-d.csapak@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal