public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server v6 2/6] enable cluster mapped PCI devices for guests
Date: Fri, 16 Jun 2023 09:49:58 +0200	[thread overview]
Message-ID: <1686901207.x1ncj3w9te.astroid@yuna.none> (raw)
In-Reply-To: <20230614084622.1446211-3-d.csapak@proxmox.com>

On June 14, 2023 10:46 am, Dominik Csapak wrote:
> this patch allows configuring pci devices that are mapped via cluster
> resource mapping when the user has 'Resource.Use' on the ACL path
> '/mapping/pci/{ID}' (in  addition to the usual required vm config
> privileges)
> 
> When given multiple mappings in the config, we use them as alternatives
> for the passthrough, and will select the first free one on startup.
> It is using our regular pci reservation mechanism for regular devices and
> we introduce a selection mechanism for mediated devices.
> 
> A few changes to the inner workings were required to make this work well:
> * parse_hostpci now returns a different structure where we have a list
>   of lists (first level is for the different alternatives and second
>   level is for the different devices that should be passed through
>   together)
> * factor out the 'parse_hostpci_devices' which parses each device from
>   the config and does some precondition checks
> * reserve_pci_usage now behaves slightly different when trying to
>   reserve an device with the same VMID that's already reserved for,
>   since for checking which alternative we can use, we already must
>   reserve one (this means that qm showcmd can actually reserve devices,
>   albeit only for up to 10 seconds)
> * configuring a mediated device on a multifunction device is not
>   supported anymore, and results in failure to start (previously, it
>   just chose the first device to do it). This is a breaking change
> * configuring a single pci device twice on different hostpci slots now
>   fails during commandline generation instead on qemu start, so we had
>   to adapt one test where this occurred (it could never have worked
>   anyway)
> 
> Add the permission checks also to clone/restore. This fails now for non
> root users for raw devices, this is a breaking change!
> 
> Fixes #3574: Improve SR-IOV usability
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v5:
> * move permission check for clone/restore to QemuServer.pm
>  PVE/API2/Qemu.pm                        |  54 +++++-
>  PVE/QemuServer.pm                       |  71 ++++---
>  PVE/QemuServer/PCI.pm                   | 243 ++++++++++++++++++++----
>  test/cfg2cmd/q35-linux-hostpci.conf     |   2 +-
>  test/cfg2cmd/q35-linux-hostpci.conf.cmd |   2 +-
>  test/run_config2command_tests.pl        |   1 +
>  6 files changed, 299 insertions(+), 74 deletions(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 865edf7f..4c3c53d9 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -32,6 +32,7 @@ use PVE::QemuServer::Drive;
>  use PVE::QemuServer::ImportDisk;
>  use PVE::QemuServer::Monitor qw(mon_cmd);
>  use PVE::QemuServer::Machine;
> +use PVE::QemuServer::PCI;
>  use PVE::QemuServer::USB qw(parse_usb_device);
>  use PVE::QemuMigrate;
>  use PVE::RPCEnvironment;
> @@ -607,6 +608,26 @@ my $check_vm_create_usb_perm = sub {
>      return 1;
>  };
>  
> +my $check_vm_create_hostpci_perm = sub {
> +    my ($rpcenv, $authuser, $vmid, $pool, $param) = @_;
> +
> +    return 1 if $authuser eq 'root@pam';
> +
> +    foreach my $opt (keys %{$param}) {
> +	next if $opt !~ m/^hostpci\d+$/;
> +
> +	my $device = PVE::JSONSchema::parse_property_string('pve-qm-hostpci', $param->{$opt});
> +	if ($device->{host} !~ m/:/) {
> +	    $rpcenv->check_full($authuser, "/mapping/pci/$device->{host}", ['Mapping.Use']);
> +	    $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.HWType']);
> +	} else {
> +	    die "only root can set '$opt' config for non-mapped devices\n";
> +	}
> +    }
> +
> +    return 1;
> +};
> +
>  my $check_vm_modify_config_perm = sub {
>      my ($rpcenv, $authuser, $vmid, $pool, $key_list) = @_;
>  
> @@ -617,7 +638,7 @@ my $check_vm_modify_config_perm = sub {
>  	# else, as there the permission can be value dependend
>  	next if PVE::QemuServer::is_valid_drivename($opt);
>  	next if $opt eq 'cdrom';
> -	next if $opt =~ m/^(?:unused|serial|usb)\d+$/;
> +	next if $opt =~ m/^(?:unused|serial|usb|hostpci)\d+$/;
>  	next if $opt eq 'tags';
>  
>  
> @@ -646,7 +667,7 @@ my $check_vm_modify_config_perm = sub {
>  	    # also needs privileges on the storage, that will be checked later
>  	    $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Disk', 'VM.PowerMgmt' ]);
>  	} else {
> -	    # catches hostpci\d+, args, lock, etc.
> +	    # catches args, lock, etc.
>  	    # new options will be checked here
>  	    die "only root can set '$opt' config\n";
>  	}
> @@ -885,6 +906,8 @@ __PACKAGE__->register_method({
>  
>  	    &$check_vm_create_serial_perm($rpcenv, $authuser, $vmid, $pool, $param);
>  	    &$check_vm_create_usb_perm($rpcenv, $authuser, $vmid, $pool, $param);
> +	    &$check_vm_create_hostpci_perm($rpcenv, $authuser, $vmid, $pool, $param);
> +
>  	    PVE::QemuServer::check_bridge_access($rpcenv, $authuser, $param);
>  	    &$check_cpu_model_access($rpcenv, $authuser, $param);
>  
> @@ -1737,6 +1760,16 @@ my $update_vm_api  = sub {
>  		    }
>  		    PVE::QemuConfig->add_to_pending_delete($conf, $opt, $force);
>  		    PVE::QemuConfig->write_config($vmid, $conf);
> +		} elsif ($opt =~ m/^hostpci\d+$/) {
> +		    my $olddevice = PVE::JSONSchema::parse_property_string('pve-qm-hostpci', $val);
> +		    if ($olddevice->{host} !~ m/:/) {
> +			$rpcenv->check_full($authuser, "/mapping/pci/$olddevice->{host}", ['Mapping.Use']);
> +			$rpcenv->check_vm_perm($authuser, $vmid, undef, ['VM.Config.HWType']);
> +		    } elsif ($authuser ne 'root@pam') {
> +			die "only root can set '$opt' config for non-mapped devices\n";
> +		    }

this part

> +		    PVE::QemuConfig->add_to_pending_delete($conf, $opt, $force);
> +		    PVE::QemuConfig->write_config($vmid, $conf);
>  		} elsif ($opt eq 'tags') {
>  		    assert_tag_permissions($vmid, $val, '', $rpcenv, $authuser);
>  		    delete $conf->{$opt};
> @@ -1823,6 +1856,23 @@ my $update_vm_api  = sub {
>  		    } elsif ($authuser ne 'root@pam') {
>  			die "only root can modify '$opt' config for real devices\n";
>  		    }
> +
> +		    $conf->{pending}->{$opt} = $param->{$opt};
> +		} elsif ($opt =~ m/^hostpci\d+$/) {
> +		    my $olddevice;
> +		    if (defined($conf->{$opt})) {
> +			$olddevice = PVE::JSONSchema::parse_property_string('pve-qm-hostpci', $conf->{$opt});
> +		    }
> +		    my $newdevice = PVE::JSONSchema::parse_property_string('pve-qm-hostpci', $param->{$opt});
> +		    if ((!defined($olddevice) || $olddevice->{host} !~ m/:/) && $newdevice->{host} !~ m/:/) {
> +			if (defined($olddevice)) {
> +			    $rpcenv->check_full($authuser, "/mapping/pci/$olddevice->{host}", ['Mapping.Use']);
> +			}
> +			$rpcenv->check_full($authuser, "/mapping/pci/$newdevice->{host}", ['Mapping.Use']);
> +			$rpcenv->check_vm_perm($authuser, $vmid, undef, ['VM.Config.HWType']);

and this part could be the same, with the latter being two calls to the
same helper, which would be easier to read..

it checks:
  VM.Config.HWType
  access to the old value if there is one (root if raw, mapping if mapping)
  access to the new value (root if raw, mapping if mapping)

doing

0. (optional defined guard for existing value)
1. parse old
2. call check helper
3. parse new
4. call check helper

with the helper doing

if raw require root else check mapping && VM.Config.HWType

is way more readable and should have the same result, and only requires
one place to change should we change the format/checks in the future.

keep in mind that ACL results are cached, so doing the exact same check
twice in a row means the second one is almost free.

> +		    } elsif ($authuser ne 'root@pam') {
> +			die "only root can set '$opt' config for non-mapped devices\n";
> +		    }
>  		    $conf->{pending}->{$opt} = $param->{$opt};
>  		} elsif ($opt eq 'tags') {
>  		    assert_tag_permissions($vmid, $conf->{$opt}, $param->{$opt}, $rpcenv, $authuser);
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index ab5458c3..4fb8d1c2 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -3783,8 +3783,8 @@ sub config_to_command {
>      my $bootorder = device_bootorder($conf);
>  
>      # host pci device passthrough
> -    my ($kvm_off, $gpu_passthrough, $legacy_igd) = PVE::QemuServer::PCI::print_hostpci_devices(
> -	$vmid, $conf, $devices, $vga, $winversion, $q35, $bridges, $arch, $machine_type, $bootorder);
> +    my ($kvm_off, $gpu_passthrough, $legacy_igd, $pci_devices) = PVE::QemuServer::PCI::print_hostpci_devices(
> +	$vmid, $conf, $devices, $vga, $winversion, $bridges, $arch, $machine_type, $bootorder);
>  
>      # usb devices
>      my $usb_dev_features = {};
> @@ -4203,7 +4203,7 @@ sub config_to_command {
>  	push @$cmd, @$aa;
>      }
>  
> -    return wantarray ? ($cmd, $vollist, $spice_port) : $cmd;
> +    return wantarray ? ($cmd, $vollist, $spice_port, $pci_devices) : $cmd;
>  }
>  
>  sub check_rng_source {
> @@ -5762,7 +5762,7 @@ sub vm_start_nolock {
>  	print "Resuming suspended VM\n";
>      }
>  
> -    my ($cmd, $vollist, $spice_port) = config_to_command($storecfg, $vmid,
> +    my ($cmd, $vollist, $spice_port, $pci_devices) = config_to_command($storecfg, $vmid,
>  	$conf, $defaults, $forcemachine, $forcecpu, $params->{'pbs-backing'});
>  
>      my $migration_ip;
> @@ -5847,38 +5847,44 @@ sub vm_start_nolock {
>  
>      my $start_timeout = $params->{timeout} // config_aware_timeout($conf, $resume);
>  
> -    my $pci_devices = {}; # host pci devices
> -    for (my $i = 0; $i < $PVE::QemuServer::PCI::MAX_HOSTPCI_DEVICES; $i++)  {
> -	my $dev = $conf->{"hostpci$i"} or next;
> -	$pci_devices->{$i} = parse_hostpci($dev);
> +    my $pci_reserve_list = [];
> +    for my $device (values $pci_devices->%*) {
> +	next if $device->{mdev}; # we don't reserve for mdev devices
> +	push $pci_reserve_list->@*, map { $_->{id} } $device->{ids}->@*;
>      }
>  
> -    # do not reserve pciid for mediated devices, sysfs will error out for duplicate assignment
> -    my $real_pci_devices = [ grep { !(defined($_->{mdev}) && scalar($_->{pciid}->@*) == 1) } values $pci_devices->%* ];
> -
> -    # map to a flat list of pci ids
> -    my $pci_id_list = [ map { $_->{id} } map { $_->{pciid}->@* } $real_pci_devices->@* ];
> -
>      # reserve all PCI IDs before actually doing anything with them
> -    PVE::QemuServer::PCI::reserve_pci_usage($pci_id_list, $vmid, $start_timeout);
> +    PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, $start_timeout);
>  
>      eval {
>  	my $uuid;
>  	for my $id (sort keys %$pci_devices) {
>  	    my $d = $pci_devices->{$id};
> -	    for my $dev ($d->{pciid}->@*) {
> -		my $info = PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
> -
> -		# nvidia grid needs the qemu parameter '-uuid' set
> -		# use smbios uuid or mdev uuid as fallback for that
> -		if ($d->{mdev} && !defined($uuid) && $info->{vendor} eq '10de') {
> -		    if (defined($conf->{smbios1})) {
> -			my $smbios_conf = parse_smbios1($conf->{smbios1});
> -			$uuid = $smbios_conf->{uuid} if defined($smbios_conf->{uuid});
> -		    }
> -		    $uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $id) if !defined($uuid);
> +	    my ($index) = ($id =~ m/^hostpci(\d+)$/);
> +
> +	    my $chosen_mdev;
> +	    for my $dev ($d->{ids}->@*) {
> +		my $info = eval { PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $index, $d->{mdev}) };
> +		if ($d->{mdev}) {
> +		    warn $@ if $@;
> +		    $chosen_mdev = $info;
> +		    last if $chosen_mdev; # if successful, we're done
> +		} else {
> +		    die $@ if $@;
>  		}
>  	    }
> +
> +	    next if !$d->{mdev};
> +	    die "could not create mediated device\n" if !defined($chosen_mdev);
> +
> +	    # nvidia grid needs the uuid of the mdev as qemu parameter
> +	    if (!defined($uuid) && $chosen_mdev->{vendor} =~ m/^(0x)?10de$/) {
> +		if (defined($conf->{smbios1})) {
> +		    my $smbios_conf = parse_smbios1($conf->{smbios1});
> +		    $uuid = $smbios_conf->{uuid} if defined($smbios_conf->{uuid});
> +		}
> +		$uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $index) if !defined($uuid);
> +	    }
>  	}
>  	push @$cmd, '-uuid', $uuid if defined($uuid);
>      };
> @@ -5992,7 +5998,7 @@ sub vm_start_nolock {
>  
>      # re-reserve all PCI IDs now that we can know the actual VM PID
>      my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid);
> -    eval { PVE::QemuServer::PCI::reserve_pci_usage($pci_id_list, $vmid, undef, $pid) };
> +    eval { PVE::QemuServer::PCI::reserve_pci_usage($pci_reserve_list, $vmid, undef, $pid) };
>      warn $@ if $@;
>  
>      if (defined($res->{migrate})) {
> @@ -6187,9 +6193,7 @@ sub cleanup_pci_devices {
>  
>  	    # some nvidia vgpu driver versions want to clean the mdevs up themselves, and error
>  	    # out when we do it first. so wait for 10 seconds and then try it
> -	    my $pciid = $d->{pciid}->[0]->{id};
> -	    my $info = PVE::SysFSTools::pci_device_info("$pciid");
> -	    if ($info->{vendor} eq '10de') {
> +	    if ($d->{ids}->[0]->[0]->{vendor} =~ m/^(0x)?10de$/) {
>  		sleep 10;
>  	    }
>  
> @@ -6548,6 +6552,13 @@ sub check_mapping_access {
>  	   } elsif (!$device->{spice}) {
>  	       die "only root can set '$opt' config for real devices\n";
>  	   }
> +       } elsif ($opt =~ m/^hostpci\d+$/) {
> +	   my $device = PVE::JSONSchema::parse_property_string('pve-qm-hostpci', $conf->{$opt});
> +	   if ($device->{host} !~ m/:/) {
> +	       $rpcenv->check_full($user, "/mapping/pci/$device->{host}", ['Mapping.Use']);
> +	   } else {
> +	       die "only root can set '$opt' config for non-mapped devices\n";
> +	   }
>         }
>     }
>  };
> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
> index a18b9747..5a5cbcca 100644
> --- a/PVE/QemuServer/PCI.pm
> +++ b/PVE/QemuServer/PCI.pm
> @@ -4,6 +4,7 @@ use warnings;
>  use strict;
>  
>  use PVE::JSONSchema;
> +use PVE::Mapping::PCI;
>  use PVE::SysFSTools;
>  use PVE::Tools;
>  
> @@ -23,8 +24,8 @@ my $hostpci_fmt = {
>      host => {
>  	default_key => 1,
>  	type => 'string',
> -	pattern => qr/$PCIRE(;$PCIRE)*/,
> -	format_description => 'HOSTPCIID[;HOSTPCIID2...]',
> +	pattern => qr/(:?$PCIRE(;$PCIRE)*)|(:?$PVE::JSONSchema::CONFIGID_RE)/,
> +	format_description => 'HOSTPCIID[;HOSTPCIID2...] or configured mapping id',
>  	description => <<EODESCR,
>  Host PCI device pass through. The PCI ID of a host's PCI device or a list
>  of PCI virtual functions of the host. HOSTPCIID syntax is:
> @@ -32,6 +33,8 @@ of PCI virtual functions of the host. HOSTPCIID syntax is:
>  'bus:dev.func' (hexadecimal numbers)
>  
>  You can us the 'lspci' command to list existing PCI devices.
> +
> +Alternatively use the ID of a mapped pci device.
>  EODESCR
>      },
>      rombar => {
> @@ -376,6 +379,32 @@ sub print_pcie_root_port {
>      return $res;
>  }
>  
> +# returns the parsed pci config but parses the 'host' part into
> +# a list if lists into the 'id' property like this:
> +#
> +# {
> +#   mdev => 1,
> +#   rombar => ...
> +#   ...
> +#   ids => [
> +#       # this contains a list of alternative devices,
> +#       [
> +#           # which are itself lists of ids for one multifunction device
> +#           {
> +#               id => "0000:00:00.0",
> +#               vendor => "...",
> +#           },
> +#           {
> +#               id => "0000:00:00.1",
> +#               vendor => "...",
> +#           },
> +#       ],
> +#       [
> +#           ...
> +#       ],
> +#       ...
> +#   ],
> +# }
>  sub parse_hostpci {
>      my ($value) = @_;
>  
> @@ -383,31 +412,167 @@ sub parse_hostpci {
>  
>      my $res = PVE::JSONSchema::parse_property_string($hostpci_fmt, $value);
>  
> -    my @idlist = split(/;/, $res->{host});
> +    my $alternatives = [];
> +    if ($res->{host} !~ m/:/) {
> +	# we have no ordinary pci id, must be a mapping
> +	my $devices = PVE::Mapping::PCI::find_on_current_node($res->{host});
> +	die "PCI device mapping not found for '$res->{host}'\n" if !$devices || !scalar($devices->@*);
> +
> +	for my $device ($devices->@*) {
> +	    eval { PVE::Mapping::PCI::assert_valid($res->{host}, $device) };
> +	    die "PCI device mapping invalid (hardware probably changed): $@\n" if $@;
> +	    push $alternatives->@*, [split(/;/, $device->{path})];
> +	}
> +    } else {
> +	push $alternatives->@*, [split(/;/, $res->{host})];
> +    }
>      delete $res->{host};
> -    foreach my $id (@idlist) {
> -	my $devs = PVE::SysFSTools::lspci($id);
> -	die "no PCI device found for '$id'\n" if !scalar(@$devs);
> -	push @{$res->{pciid}}, @$devs;
> +
> +    $res->{ids} = [];
> +    for my $alternative ($alternatives->@*) {
> +	my $ids = [];
> +	foreach my $id ($alternative->@*) {
> +	    my $devs = PVE::SysFSTools::lspci($id);
> +	    die "no PCI device found for '$id'\n" if !scalar($devs->@*);
> +	    push $ids->@*, @$devs;
> +	}
> +	if (scalar($ids->@*) > 1) {
> +	    $res->{'has-multifunction'} = 1;
> +	    die "cannot use mediated device with multifunction device\n" if $res->{mdev};
> +	}
> +	push $res->{ids}->@*, $ids;
>      }
> +
>      return $res;
>  }
>  
> +# parses all hostpci devices from a config and does some sanity checks
> +# returns a hash like this:
> +# {
> +#     hostpci0 => {
> +#         # hash from parse_hostpci function
> +#     },
> +#     hostpci1 => { ... },
> +#     ...
> +# }
> +sub parse_hostpci_devices {
> +    my ($conf) = @_;
> +
> +    my $q35 = PVE::QemuServer::Machine::machine_type_is_q35($conf);
> +    my $legacy_igd = 0;
> +
> +    my $parsed_devices = {};
> +    for (my $i = 0; $i < $MAX_HOSTPCI_DEVICES; $i++)  {
> +	my $id = "hostpci$i";
> +	my $d = parse_hostpci($conf->{$id});
> +	next if !$d;
> +
> +	# check syntax
> +	die "q35 machine model is not enabled" if !$q35 && $d->{pcie};
> +
> +	if ($d->{'legacy-igd'}) {
> +	    die "only one device can be assigned in legacy-igd mode\n"
> +		if $legacy_igd;
> +	    $legacy_igd = 1;
> +
> +	    die "legacy IGD assignment requires VGA mode to be 'none'\n"
> +		if !defined($conf->{'vga'}) || $conf->{'vga'} ne 'none';
> +	    die "legacy IGD assignment requires rombar to be enabled\n"
> +		if defined($d->{rombar}) && !$d->{rombar};
> +	    die "legacy IGD assignment is not compatible with x-vga\n"
> +		if $d->{'x-vga'};
> +	    die "legacy IGD assignment is not compatible with mdev\n"
> +		if $d->{mdev};
> +	    die "legacy IGD assignment is not compatible with q35\n"
> +		if $q35;
> +	    die "legacy IGD assignment is not compatible with multifunction devices\n"
> +		if $d->{'has-multifunction'};
> +	    die "legacy IGD assignment is not compatible with alternate devices\n"
> +		if scalar($d->{ids}->@*) > 1;
> +	    # check first device for valid id
> +	    die "legacy IGD assignment only works for devices on host bus 00:02.0\n"
> +		if $d->{ids}->[0]->[0]->{id} !~ m/02\.0$/;
> +	}
> +
> +	$parsed_devices->{$id} = $d;
> +    }
> +
> +    return $parsed_devices;
> +}
> +
> +# takes the hash returned by parse_hostpci_devices and for all non mdev gpus,
> +# selects one of the given alternatives by trying to reserve it
> +#
> +# mdev devices must be chosen later when we actually allocate it, but we
> +# flatten the inner list since there can only be one device per alternative anyway
> +my sub choose_hostpci_devices {
> +    my ($devices, $vmid) = @_;
> +
> +    my $used = {};
> +
> +    my $add_used_device = sub {
> +	my ($devices) = @_;
> +	for my $used_device ($devices->@*) {
> +	    my $used_id = $used_device->{id};
> +	    die "device '$used_id' assigned more than once\n" if $used->{$used_id};
> +	    $used->{$used_id} = 1;
> +	}
> +    };
> +
> +    for (my $i = 0; $i < $MAX_HOSTPCI_DEVICES; $i++)  {
> +	my $device = $devices->{"hostpci$i"};
> +	next if !$device;
> +
> +	if ($device->{mdev}) {
> +	    $device->{ids} = [ map { $_->[0] } $device->{ids}->@* ];
> +	    next;
> +	}
> +
> +	if (scalar($device->{ids}->@* == 1)) {
> +	    # we only have one alternative, use that
> +	    $device->{ids} = $device->{ids}->[0];
> +	    $add_used_device->($device->{ids});
> +	    next;
> +	}
> +
> +	my $found = 0;
> +	for my $alternative ($device->{ids}->@*) {
> +	    my $ids = [map { $_->{id} } @$alternative];
> +
> +	    next if grep { defined($used->{$_}) } @$ids; # already used
> +	    eval { reserve_pci_usage($ids, $vmid, 10, undef) };
> +	    next if $@;
> +
> +	    # found one that is not used or reserved
> +	    $add_used_device->($alternative);
> +	    $device->{ids} = $alternative;
> +	    $found = 1;
> +	    last;
> +	}
> +	die "could not find a free device for 'hostpci$i'\n" if !$found;
> +    }
> +
> +    return $devices;
> +}
> +
>  sub print_hostpci_devices {
> -    my ($vmid, $conf, $devices, $vga, $winversion, $q35, $bridges, $arch, $machine_type, $bootorder) = @_;
> +    my ($vmid, $conf, $devices, $vga, $winversion, $bridges, $arch, $machine_type, $bootorder) = @_;
>  
>      my $kvm_off = 0;
>      my $gpu_passthrough = 0;
>      my $legacy_igd = 0;
>  
>      my $pciaddr;
> +    my $pci_devices = choose_hostpci_devices(parse_hostpci_devices($conf), $vmid);
> +
>      for (my $i = 0; $i < $MAX_HOSTPCI_DEVICES; $i++)  {
>  	my $id = "hostpci$i";
> -	my $d = parse_hostpci($conf->{$id});
> +	my $d = $pci_devices->{$id};
>  	next if !$d;
>  
> +	$legacy_igd = 1 if $d->{'legacy-igd'};
> +
>  	if (my $pcie = $d->{pcie}) {
> -	    die "q35 machine model is not enabled" if !$q35;
>  	    # win7 wants to have the pcie devices directly on the pcie bus
>  	    # instead of in the root port
>  	    if ($winversion == 7) {
> @@ -425,29 +590,8 @@ sub print_hostpci_devices {
>  	    $pciaddr = print_pci_addr($pci_name, $bridges, $arch, $machine_type);
>  	}
>  
> -	my $pcidevices = $d->{pciid};
> -	my $multifunction = @$pcidevices > 1;
> -
> -	if ($d->{'legacy-igd'}) {
> -	    die "only one device can be assigned in legacy-igd mode\n"
> -		if $legacy_igd;
> -	    $legacy_igd = 1;
> -
> -	    die "legacy IGD assignment requires VGA mode to be 'none'\n"
> -		if !defined($conf->{'vga'}) || $conf->{'vga'} ne 'none';
> -	    die "legacy IGD assignment requires rombar to be enabled\n"
> -		if defined($d->{rombar}) && !$d->{rombar};
> -	    die "legacy IGD assignment is not compatible with x-vga\n"
> -		if $d->{'x-vga'};
> -	    die "legacy IGD assignment is not compatible with mdev\n"
> -		if $d->{mdev};
> -	    die "legacy IGD assignment is not compatible with q35\n"
> -		if $q35;
> -	    die "legacy IGD assignment is not compatible with multifunction devices\n"
> -		if $multifunction;
> -	    die "legacy IGD assignment only works for devices on host bus 00:02.0\n"
> -		if $pcidevices->[0]->{id} !~ m/02\.0$/;
> -	}
> +	my $num_devices = scalar($d->{ids}->@*);
> +	my $multifunction = $num_devices > 1 && !$d->{mdev};
>  
>  	my $xvga = '';
>  	if ($d->{'x-vga'}) {
> @@ -458,15 +602,13 @@ sub print_hostpci_devices {
>  	}
>  
>  	my $sysfspath;
> -	if ($d->{mdev} && scalar(@$pcidevices) == 1) {
> +	if ($d->{mdev}) {
>  	    my $uuid = generate_mdev_uuid($vmid, $i);
>  	    $sysfspath = "/sys/bus/mdev/devices/$uuid";
> -	} elsif ($d->{mdev}) {
> -	    warn "ignoring mediated device '$id' with multifunction device\n";
>  	}
>  
> -	my $j = 0;
> -	foreach my $pcidevice (@$pcidevices) {
> +	for (my $j = 0; $j < $num_devices; $j++) {
> +	    my $pcidevice = $d->{ids}->[$j];
>  	    my $devicestr = "vfio-pci";
>  
>  	    if ($sysfspath) {
> @@ -489,12 +631,13 @@ sub print_hostpci_devices {
>  		}
>  	    }
>  
> +
>  	    push @$devices, '-device', $devicestr;
> -	    $j++;
> +	    last if $d->{mdev};
>  	}
>      }
>  
> -    return ($kvm_off, $gpu_passthrough, $legacy_igd);
> +    return ($kvm_off, $gpu_passthrough, $legacy_igd, $pci_devices);
>  }
>  
>  sub prepare_pci_device {
> @@ -596,6 +739,26 @@ sub reserve_pci_usage {
>  			warn "leftover PCI reservation found for $id, lets take it...\n";
>  		    }
>  		}
> +	    } elsif ($reservation) {
> +		# already reserved by the same vmid
> +		if (my $reserved_time = $reservation->{time}) {
> +		    if (defined($timeout)) {
> +			# use the longer timeout
> +			my $old_timeout = $reservation->{time} - 5 - $ctime;
> +			$timeout = $old_timeout if $old_timeout > $timeout;
> +		    }
> +		} elsif (my $reserved_pid = $reservation->{pid}) {
> +		    my $running_pid = PVE::QemuServer::Helpers::vm_running_locally($reservation->{vmid});
> +		    if (defined($running_pid) && $running_pid == $reservation->{pid}) {
> +			if (defined($pid)) {
> +			    die "PCI device '$id' already in use by running VMID '$reservation->{vmid}'\n";
> +			} elsif (defined($timeout)) {
> +			    # ignore timeout reservation for running vms, can happen with e.g.
> +			    # qm showcmd
> +			    return;
> +			}
> +		    }
> +		}
>  	    }
>  
>  	    $reservation_list->{$id} = { vmid => $vmid };
> diff --git a/test/cfg2cmd/q35-linux-hostpci.conf b/test/cfg2cmd/q35-linux-hostpci.conf
> index 749f9839..7290120a 100644
> --- a/test/cfg2cmd/q35-linux-hostpci.conf
> +++ b/test/cfg2cmd/q35-linux-hostpci.conf
> @@ -8,7 +8,7 @@ hostpci1: d0:13.0,pcie=1
>  hostpci2: 00:f4.0
>  hostpci3: d0:15.1,pcie=1
>  hostpci4: d0:17.0,pcie=1,rombar=0
> -hostpci7: d0:15.1,pcie=1
> +hostpci7: d0:15.2,pcie=1
>  machine: q35
>  memory: 512
>  net0: virtio=2E:01:68:F9:9C:87,bridge=vmbr0
> diff --git a/test/cfg2cmd/q35-linux-hostpci.conf.cmd b/test/cfg2cmd/q35-linux-hostpci.conf.cmd
> index 0774047d..0fedbd2c 100644
> --- a/test/cfg2cmd/q35-linux-hostpci.conf.cmd
> +++ b/test/cfg2cmd/q35-linux-hostpci.conf.cmd
> @@ -32,7 +32,7 @@
>    -device 'pcie-root-port,id=ich9-pcie-port-5,addr=10.0,x-speed=16,x-width=32,multifunction=on,bus=pcie.0,port=5,chassis=5' \
>    -device 'vfio-pci,host=0000:d0:17.0,id=hostpci4,bus=ich9-pcie-port-5,addr=0x0,rombar=0' \
>    -device 'pcie-root-port,id=ich9-pcie-port-8,addr=10.3,x-speed=16,x-width=32,multifunction=on,bus=pcie.0,port=8,chassis=8' \
> -  -device 'vfio-pci,host=0000:d0:15.1,id=hostpci7,bus=ich9-pcie-port-8,addr=0x0' \
> +  -device 'vfio-pci,host=0000:d0:15.2,id=hostpci7,bus=ich9-pcie-port-8,addr=0x0' \
>    -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
>    -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
>    -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
> diff --git a/test/run_config2command_tests.pl b/test/run_config2command_tests.pl
> index c4966902..dda44d99 100755
> --- a/test/run_config2command_tests.pl
> +++ b/test/run_config2command_tests.pl
> @@ -81,6 +81,7 @@ my $pci_devs = [
>      "0000:0f:f2.0",
>      "0000:d0:13.0",
>      "0000:d0:15.1",
> +    "0000:d0:15.2",
>      "0000:d0:17.0",
>      "0000:f0:42.0",
>      "0000:f0:43.0",
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




  reply	other threads:[~2023-06-16  7:50 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-14  8:46 [pve-devel] [PATCH qemu-server/manger/docs v6] cluster mapping Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH qemu-server v6 1/6] enable cluster mapped USB devices for guests Dominik Csapak
2023-06-16  7:50   ` Fabian Grünbichler
2023-06-14  8:46 ` [pve-devel] [PATCH qemu-server v6 2/6] enable cluster mapped PCI " Dominik Csapak
2023-06-16  7:49   ` Fabian Grünbichler [this message]
2023-06-14  8:46 ` [pve-devel] [PATCH qemu-server v6 3/6] check_local_resources: extend for mapped resources Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH qemu-server v6 4/6] api: migrate preconditions: use new check_local_resources info Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH qemu-server v6 5/6] migration: check for mapped resources Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH qemu-server v6 6/6] add test for mapped pci devices Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 01/15] pvesh: fix parameters for proxyto_callback Dominik Csapak
2023-06-16  9:27   ` [pve-devel] applied: " Wolfgang Bumiller
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 02/15] api: add resource map api endpoints for PCI and USB Dominik Csapak
2023-06-16  7:50   ` Fabian Grünbichler
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 03/15] ui: parser: add helper for lists of property strings Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 04/15] ui: form/USBSelector: make it more flexible with nodename Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 05/15] ui: form: add PCIMapSelector Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 06/15] ui: form: add USBMapSelector Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 07/15] ui: qemu/PCIEdit: rework panel to add a mapped configuration Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 08/15] ui: qemu/USBEdit: add 'mapped' device case Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 09/15] ui: form: add MultiPCISelector Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 10/15] ui: add edit window for pci mappings Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 11/15] ui: add edit window for usb mappings Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 12/15] ui: add ResourceMapTree Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 13/15] ui: allow configuring pci and usb mapping Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 14/15] ui: window/Migrate: allow mapped devices Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH manager v6 15/15] ui: improve permission handling for hardware Dominik Csapak
2023-06-14  8:46 ` [pve-devel] [PATCH docs v6 1/1] qemu: add documentation about cluster device mapping Dominik Csapak
2023-06-14 12:01 ` [pve-devel] [PATCH qemu-server/manger/docs v6] cluster mapping Markus Frank
2023-06-16  7:51 ` Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1686901207.x1ncj3w9te.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal