[pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
@ 2022-07-26  6:55 Dominik Csapak
  2022-07-26  6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26  6:55 UTC (permalink / raw)
  To: pve-devel

This series improves the handling of NVIDIA vGPUs by exposing the optional name
and automatically adding the uuid to the qemu process (required by NVIDIA
driver). Also adds the name to the UI for selecting a mediated devices as well
as making the dropdown larger so users can see all the relevant info.

After a bit of testing with an RTX A5000 here, i'd like to tackle the bug
#3574 ("Improve SR-IOV usability")[0]

but i'd like to integrate it in my other pci passthrough series[1]
"add cluster-wide hardware device mapping"

so maybe someone can look at that and give some feedback?
my idea there would be to allow multiple device mappings per node
(instead of one only) and the qemu code would select one automatically

0: https://bugzilla.proxmox.com/show_bug.cgi?id=3574
1: https://lists.proxmox.com/pipermail/pve-devel/2022-July/053565.html

pve-common:

Dominik Csapak (1):
  SysFSTools: get name from mediated device types

 src/PVE/SysFSTools.pm | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

qemu-server:

Dominik Csapak (1):
  automatically add 'uuid' parameter when passing through NVIDIA vGPU

 PVE/QemuServer.pm     | 8 +++++++-
 PVE/QemuServer/PCI.pm | 4 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

pve-manager:

Dominik Csapak (1):
  ui: improve form/MDevSelector

 www/manager6/form/MDevSelector.js | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types
  2022-07-26  6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
@ 2022-07-26  6:55 ` Dominik Csapak
  2022-08-12  7:25   ` Wolfgang Bumiller
  2022-07-26  6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26  6:55 UTC (permalink / raw)
  To: pve-devel

Some vendors also provide a 'name' file here for the type, which, in case of
NVIDIA, is the official name for the vGPU type in their documentation,
so extract and return it too (if it exists).

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/PVE/SysFSTools.pm | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm
index ac48f2c..e897b22 100644
--- a/src/PVE/SysFSTools.pm
+++ b/src/PVE/SysFSTools.pm
@@ -172,11 +172,15 @@ sub get_mdev_types {
 	my $available = int(file_read_firstline("$type_path/available_instances"));
 	my $description = PVE::Tools::file_get_contents("$type_path/description");
 
-	push @$types, {
+	my $entry = {
 	    type => $type,
 	    description => $description,
 	    available => $available,
 	};
+
+	$entry->{name} = PVE::Tools::file_get_contents("$type_path/name") if -e "$type_path/name";
+
+	push @$types, $entry;
     });
 
     return $types;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU
  2022-07-26  6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
  2022-07-26  6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
@ 2022-07-26  6:55 ` Dominik Csapak
  2022-08-12  7:32   ` Wolfgang Bumiller
  2022-07-26  6:55 ` [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector Dominik Csapak
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26  6:55 UTC (permalink / raw)
  To: pve-devel

When passing through an NVIDIA vGPU via mediated devices, their
software needs the qemu process to have the 'uuid' parameter set to the
one of the vGPU. Since it's currently not possible to pass through multiple
vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
we don't have to take care about that.

Sadly, the place we do this, it does not show up in 'qm showcmd' as we
don't (want to) query the pci devices in that case, and then we don't
have a way of knowing if it's an NVIDIA card or not. But since this
is informational with QEMU anyway, i'd say we can ignore that.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 PVE/QemuServer.pm     | 8 +++++++-
 PVE/QemuServer/PCI.pm | 4 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 7d9cf22..c4eb031 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5597,7 +5597,13 @@ sub vm_start_nolock {
 	for my $id (sort keys %$pci_devices) {
 	    my $d = $pci_devices->{$id};
 	    for my $dev ($d->{pciid}->@*) {
-		PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
+		my $info = PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
+
+		# nvidia grid needs the uuid of the mdev as qemu parameter
+		if ($d->{mdev} && $info->{vendor} eq '10de') {
+		    my $uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $id);
+		    push @$cmd, '-uuid', $uuid;
+		}
 	    }
 	}
     };
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index 23fe508..3d0e70e 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -253,7 +253,7 @@ sub get_pci_addr_map {
     return $pci_addr_map;
 }
 
-my sub generate_mdev_uuid {
+sub generate_mdev_uuid {
     my ($vmid, $index) = @_;
     return sprintf("%08d-0000-0000-0000-%012d", $index, $vmid);
 }
@@ -514,6 +514,8 @@ sub prepare_pci_device {
 	die "can't reset PCI device '$pciid'\n"
 	    if $info->{has_fl_reset} && !PVE::SysFSTools::pci_dev_reset($info);
     }
+
+    return $info;
 }
 
 my $RUNDIR = '/run/qemu-server';
-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector
  2022-07-26  6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
  2022-07-26  6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
  2022-07-26  6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
@ 2022-07-26  6:55 ` Dominik Csapak
  2022-08-02 16:21 ` [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA DERUMIER, Alexandre
  2022-08-09  7:59 ` DERUMIER, Alexandre
  4 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26  6:55 UTC (permalink / raw)
  To: pve-devel

by
* showing the (optional) name in front of the type
* making the 'availble' column a bit narrower
* enabling 'cellWrap' for the description
* making the dropdown a bit wider (so all the information can fit)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 www/manager6/form/MDevSelector.js | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/www/manager6/form/MDevSelector.js b/www/manager6/form/MDevSelector.js
index 8ee73c0c..0a157813 100644
--- a/www/manager6/form/MDevSelector.js
+++ b/www/manager6/form/MDevSelector.js
@@ -16,21 +16,29 @@ Ext.define('PVE.form.MDevSelector', {
     valueField: 'type',
     displayField: 'type',
     listConfig: {
+	width: 550,
 	columns: [
 	    {
 		header: gettext('Type'),
 		dataIndex: 'type',
+		renderer: function(value, md, rec) {
+		    if (rec.data.name !== undefined) {
+			return `${rec.data.name} (${value})`;
+		    }
+		    return value;
+		},
 		flex: 1,
 	    },
 	    {
-		header: gettext('Available'),
+		header: gettext('Avail.'),
 		dataIndex: 'available',
-		width: 80,
+		width: 60,
 	    },
 	    {
 		header: gettext('Description'),
 		dataIndex: 'description',
 		flex: 1,
+		cellWrap: true,
 		renderer: function(value) {
 		    if (!value) {
 			return '';
-- 
2.30.2





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-07-26  6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
                   ` (2 preceding siblings ...)
  2022-07-26  6:55 ` [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector Dominik Csapak
@ 2022-08-02 16:21 ` DERUMIER, Alexandre
  2022-08-09  7:59 ` DERUMIER, Alexandre
  4 siblings, 0 replies; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-02 16:21 UTC (permalink / raw)
  To: pve-devel, Moula BADJI




Le 26/07/22 à 08:55, Dominik Csapak a écrit :

> This series improves the handling of NVIDIA vGPUs by exposing the optional name
> and automatically adding the uuid to the qemu process (required by NVIDIA
> driver). Also adds the name to the UI for selecting a mediated devices as well
> as making the dropdown larger so users can see all the relevant info.
>
> After a bit of testing with an RTX A5000 here, i'd like to tackle the bug
> #3574 ("Improve SR-IOV usability")[0]
>
> but i'd like to integrate it in my other pci passthrough series[1]
> "add cluster-wide hardware device mapping"
>
> so maybe someone can look at that and give some feedback?
> my idea there would be to allow multiple device mappings per node
> (instead of one only) and the qemu code would select one automatically

Hi,

maybe I could look at it next week

ping @moula :  is it still possible to have access to your proxmox vgpu 
cluster ?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-07-26  6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
                   ` (3 preceding siblings ...)
  2022-08-02 16:21 ` [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA DERUMIER, Alexandre
@ 2022-08-09  7:59 ` DERUMIER, Alexandre
  2022-08-09  8:39   ` Dominik Csapak
  4 siblings, 1 reply; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-09  7:59 UTC (permalink / raw)
  To: pve-devel

Le 26/07/22 à 08:55, Dominik Csapak a écrit :
> so maybe someone can look at that and give some feedback?
> my idea there would be to allow multiple device mappings per node
> (instead of one only) and the qemu code would select one automatically
Hi Dominik,

do you want to create some kind of pool of pci devices in your ""add cluster-wide hardware device mapping" patches series ?

Maybe in hardwaremap, allow to define multiple pci address on same node ?

Then, for mdev, look if a mdev already exist in 1 of the device.
If not, try to create the mdev if 1 device, if it's failing (max number of mdev reached), try to create mdev on the other device,...

if not mdev, choose a pci device in the pool not yet detached from host.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-08-09  7:59 ` DERUMIER, Alexandre
@ 2022-08-09  8:39   ` Dominik Csapak
  2022-08-16 23:15     ` DERUMIER, Alexandre
  0 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-08-09  8:39 UTC (permalink / raw)
  To: pve-devel

On 8/9/22 09:59, DERUMIER, Alexandre wrote:
> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>> so maybe someone can look at that and give some feedback?
>> my idea there would be to allow multiple device mappings per node
>> (instead of one only) and the qemu code would select one automatically
> Hi Dominik,
> 
> do you want to create some kind of pool of pci devices in your ""add cluster-wide hardware device mapping" patches series ?
> 
> Maybe in hardwaremap, allow to define multiple pci address on same node ?
> 
> Then, for mdev, look if a mdev already exist in 1 of the device.
> If not, try to create the mdev if 1 device, if it's failing (max number of mdev reached), try to create mdev on the other device,...
> 
> if not mdev, choose a pci device in the pool not yet detached from host.
> 

yes i plan to do this in my next iteration of the mapping series
(basically what you describe)

my (rough) idea:

have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
(should be enough, i don't think grouping unrelated devices (different vendor/product) makes much 
sense?)

* non mdev:
   qemu-server checks the pci reservations (which we already have)
   and takes the first not yet reserved path

* mdev
   qemu-server iterates over the devices until it finds one
   with the given mdev type available

if none is found, error out

(relevant bug for this: https://bugzilla.proxmox.com/show_bug.cgi?id=3574)




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types
  2022-07-26  6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
@ 2022-08-12  7:25   ` Wolfgang Bumiller
  0 siblings, 0 replies; 14+ messages in thread
From: Wolfgang Bumiller @ 2022-08-12  7:25 UTC (permalink / raw)
  To: Dominik Csapak; +Cc: pve-devel

On Tue, Jul 26, 2022 at 08:55:57AM +0200, Dominik Csapak wrote:
> Some vendors also provide a 'name' file here for the type, which, in case of
> NVIDIA, is the official name for the vGPU type in their documentation,
> so extract and return it too (if it exists).
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  src/PVE/SysFSTools.pm | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm
> index ac48f2c..e897b22 100644
> --- a/src/PVE/SysFSTools.pm
> +++ b/src/PVE/SysFSTools.pm
> @@ -172,11 +172,15 @@ sub get_mdev_types {
>  	my $available = int(file_read_firstline("$type_path/available_instances"));
>  	my $description = PVE::Tools::file_get_contents("$type_path/description");
>  
> -	push @$types, {
> +	my $entry = {
>  	    type => $type,
>  	    description => $description,
>  	    available => $available,
>  	};
> +
> +	$entry->{name} = PVE::Tools::file_get_contents("$type_path/name") if -e "$type_path/name";

Since this is a sysfs file I'd expect this to end in a newline?
Otherwise this is fine, though I'm not a fan of `-e` checks in general.

You could use `file_read_firstline` which would `chomp` the newline and
return `undef` if opening the file fails (and you could then (maybe
optionally) check `$!` for `ENOENT`).

> +
> +	push @$types, $entry;
>      });
>  
>      return $types;
> -- 
> 2.30.2




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU
  2022-07-26  6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
@ 2022-08-12  7:32   ` Wolfgang Bumiller
  0 siblings, 0 replies; 14+ messages in thread
From: Wolfgang Bumiller @ 2022-08-12  7:32 UTC (permalink / raw)
  To: Dominik Csapak; +Cc: pve-devel

On Tue, Jul 26, 2022 at 08:55:58AM +0200, Dominik Csapak wrote:
> When passing through an NVIDIA vGPU via mediated devices, their
> software needs the qemu process to have the 'uuid' parameter set to the
> one of the vGPU. Since it's currently not possible to pass through multiple
> vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
> we don't have to take care about that.
> 
> Sadly, the place we do this, it does not show up in 'qm showcmd' as we
> don't (want to) query the pci devices in that case, and then we don't
> have a way of knowing if it's an NVIDIA card or not. But since this
> is informational with QEMU anyway, i'd say we can ignore that.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  PVE/QemuServer.pm     | 8 +++++++-
>  PVE/QemuServer/PCI.pm | 4 +++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 7d9cf22..c4eb031 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -5597,7 +5597,13 @@ sub vm_start_nolock {
>  	for my $id (sort keys %$pci_devices) {
>  	    my $d = $pci_devices->{$id};
>  	    for my $dev ($d->{pciid}->@*) {
> -		PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
> +		my $info = PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
> +
> +		# nvidia grid needs the uuid of the mdev as qemu parameter
> +		if ($d->{mdev} && $info->{vendor} eq '10de') {
> +		    my $uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $id);
> +		    push @$cmd, '-uuid', $uuid;

You mention you can only pass through one device, but I'd still prefer
the code to make sure we only pass a single `-uuid` parameter here,
since this is not at all clear when just reading the code.

Otherwise this seems fine.

> +		}
>  	    }
>  	}
>      };
> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
> index 23fe508..3d0e70e 100644
> --- a/PVE/QemuServer/PCI.pm
> +++ b/PVE/QemuServer/PCI.pm
> @@ -253,7 +253,7 @@ sub get_pci_addr_map {
>      return $pci_addr_map;
>  }
>  
> -my sub generate_mdev_uuid {
> +sub generate_mdev_uuid {
>      my ($vmid, $index) = @_;
>      return sprintf("%08d-0000-0000-0000-%012d", $index, $vmid);
>  }
> @@ -514,6 +514,8 @@ sub prepare_pci_device {
>  	die "can't reset PCI device '$pciid'\n"
>  	    if $info->{has_fl_reset} && !PVE::SysFSTools::pci_dev_reset($info);
>      }
> +
> +    return $info;
>  }
>  
>  my $RUNDIR = '/run/qemu-server';
> -- 
> 2.30.2




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-08-09  8:39   ` Dominik Csapak
@ 2022-08-16 23:15     ` DERUMIER, Alexandre
  2022-08-22 10:16       ` Dominik Csapak
  0 siblings, 1 reply; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-16 23:15 UTC (permalink / raw)
  To: pve-devel

Le 9/08/22 à 10:39, Dominik Csapak a écrit :
> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>> so maybe someone can look at that and give some feedback?
>>> my idea there would be to allow multiple device mappings per node
>>> (instead of one only) and the qemu code would select one automatically
>> Hi Dominik,
>>
>> do you want to create some kind of pool of pci devices in your ""add 
>> cluster-wide hardware device mapping" patches series ?
>>
>> Maybe in hardwaremap, allow to define multiple pci address on same node ?
>>
>> Then, for mdev, look if a mdev already exist in 1 of the device.
>> If not, try to create the mdev if 1 device, if it's failing (max 
>> number of mdev reached), try to create mdev on the other device,...
>>
>> if not mdev, choose a pci device in the pool not yet detached from host.
>>
> 
> yes i plan to do this in my next iteration of the mapping series
> (basically what you describe)
Hi, sorry to be late.


> my (rough) idea:
> 
> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
> (should be enough, i don't think grouping unrelated devices (different 
> vendor/product) makes much sense?)
yes, that's enough for me. we don't want to mix unrelated devices.

BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need 
to compile the nvidia vfio driver with an option to enable it + add
"-device vfio-pci,x-enable-migration=on,..."

So, maybe adding a "livemigrate" flag on the hardwaremap could be great :)

Could be usefull for stateless usb device, like usb dongle,where we 
could unplug usb/livemigrate/replug usb.



> 
> * non mdev:
>    qemu-server checks the pci reservations (which we already have)
>    and takes the first not yet reserved path
> 
> * mdev
>    qemu-server iterates over the devices until it finds one
>    with the given mdev type available
> 
> if none is found, error out
> seem great :)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-08-16 23:15     ` DERUMIER, Alexandre
@ 2022-08-22 10:16       ` Dominik Csapak
  2022-08-22 13:39         ` DERUMIER, Alexandre
  0 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-08-22 10:16 UTC (permalink / raw)
  To: Proxmox VE development discussion, DERUMIER, Alexandre

On 8/17/22 01:15, DERUMIER, Alexandre wrote:
> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>> so maybe someone can look at that and give some feedback?
>>>> my idea there would be to allow multiple device mappings per node
>>>> (instead of one only) and the qemu code would select one automatically
>>> Hi Dominik,
>>>
>>> do you want to create some kind of pool of pci devices in your ""add
>>> cluster-wide hardware device mapping" patches series ?
>>>
>>> Maybe in hardwaremap, allow to define multiple pci address on same node ?
>>>
>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>> If not, try to create the mdev if 1 device, if it's failing (max
>>> number of mdev reached), try to create mdev on the other device,...
>>>
>>> if not mdev, choose a pci device in the pool not yet detached from host.
>>>
>>
>> yes i plan to do this in my next iteration of the mapping series
>> (basically what you describe)
> Hi, sorry to be late.
> 
> 
>> my (rough) idea:
>>
>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>> (should be enough, i don't think grouping unrelated devices (different
>> vendor/product) makes much sense?)
> yes, that's enough for me. we don't want to mix unrelated devices.
> 
> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
> to compile the nvidia vfio driver with an option to enable it + add
> "-device vfio-pci,x-enable-migration=on,..."

nice (what flag do you need on the driver install? i did not find it)
i'll see if i can test that on a single card (only have one here)

> 
> So, maybe adding a "livemigrate" flag on the hardwaremap could be great :)

it's probably better suited for the hostpci setting in the qemu config,
since that's the place we need it

> 
> Could be usefull for stateless usb device, like usb dongle,where we
> could unplug usb/livemigrate/replug usb.
> 
> 
> 
also probably better suited for the usbX setting

but those can be done after (some version of) this series
is applied




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-08-22 10:16       ` Dominik Csapak
@ 2022-08-22 13:39         ` DERUMIER, Alexandre
  2022-08-22 14:07           ` Dominik Csapak
  0 siblings, 1 reply; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-22 13:39 UTC (permalink / raw)
  To: Dominik Csapak, Proxmox VE development discussion

Le 22/08/22 à 12:16, Dominik Csapak a écrit :
> On 8/17/22 01:15, DERUMIER, Alexandre wrote:
>> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>>> so maybe someone can look at that and give some feedback?
>>>>> my idea there would be to allow multiple device mappings per node
>>>>> (instead of one only) and the qemu code would select one automatically
>>>> Hi Dominik,
>>>>
>>>> do you want to create some kind of pool of pci devices in your ""add
>>>> cluster-wide hardware device mapping" patches series ?
>>>>
>>>> Maybe in hardwaremap, allow to define multiple pci address on same 
>>>> node ?
>>>>
>>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>>> If not, try to create the mdev if 1 device, if it's failing (max
>>>> number of mdev reached), try to create mdev on the other device,...
>>>>
>>>> if not mdev, choose a pci device in the pool not yet detached from 
>>>> host.
>>>>
>>>
>>> yes i plan to do this in my next iteration of the mapping series
>>> (basically what you describe)
>> Hi, sorry to be late.
>>
>>
>>> my (rough) idea:
>>>
>>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>>> (should be enough, i don't think grouping unrelated devices (different
>>> vendor/product) makes much sense?)
>> yes, that's enough for me. we don't want to mix unrelated devices.
>>
>> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
>> to compile the nvidia vfio driver with an option to enable it + add
>> "-device vfio-pci,x-enable-migration=on,..."
> 
> nice (what flag do you need on the driver install? i did not find it)
> i'll see if i can test that on a single card (only have one here)
> 


I have use 460.73.01 driver.  (last 510 driver don't have the flag and 
code, don't known why)
https://github.com/mbilker/vgpu_unlock-rs/issues/15


the flag is NV_KVM_MIGRATION_UAP=1.
As I didn't known to pass the flag,

I have simply decompress the driver
"NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run -x"
edit the "kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild" to add 
NV_KVM_MIGRATION_UAP=1

then ./nvidia-installer







>>
>> So, maybe adding a "livemigrate" flag on the hardwaremap could be 
>> great :)
> 
> it's probably better suited for the hostpci setting in the qemu config,
> since that's the place we need it
> 
>>
>> Could be usefull for stateless usb device, like usb dongle,where we
>> could unplug usb/livemigrate/replug usb.
>>
>>
>>
> also probably better suited for the usbX setting
> 
> but those can be done after (some version of) this series
> is applied
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-08-22 13:39         ` DERUMIER, Alexandre
@ 2022-08-22 14:07           ` Dominik Csapak
  2022-08-23  7:50             ` Dominik Csapak
  0 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-08-22 14:07 UTC (permalink / raw)
  To: DERUMIER, Alexandre, Proxmox VE development discussion

On 8/22/22 15:39, DERUMIER, Alexandre wrote:
> Le 22/08/22 à 12:16, Dominik Csapak a écrit :
>> On 8/17/22 01:15, DERUMIER, Alexandre wrote:
>>> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>>>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>>>> so maybe someone can look at that and give some feedback?
>>>>>> my idea there would be to allow multiple device mappings per node
>>>>>> (instead of one only) and the qemu code would select one automatically
>>>>> Hi Dominik,
>>>>>
>>>>> do you want to create some kind of pool of pci devices in your ""add
>>>>> cluster-wide hardware device mapping" patches series ?
>>>>>
>>>>> Maybe in hardwaremap, allow to define multiple pci address on same
>>>>> node ?
>>>>>
>>>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>>>> If not, try to create the mdev if 1 device, if it's failing (max
>>>>> number of mdev reached), try to create mdev on the other device,...
>>>>>
>>>>> if not mdev, choose a pci device in the pool not yet detached from
>>>>> host.
>>>>>
>>>>
>>>> yes i plan to do this in my next iteration of the mapping series
>>>> (basically what you describe)
>>> Hi, sorry to be late.
>>>
>>>
>>>> my (rough) idea:
>>>>
>>>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>>>> (should be enough, i don't think grouping unrelated devices (different
>>>> vendor/product) makes much sense?)
>>> yes, that's enough for me. we don't want to mix unrelated devices.
>>>
>>> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
>>> to compile the nvidia vfio driver with an option to enable it + add
>>> "-device vfio-pci,x-enable-migration=on,..."
>>
>> nice (what flag do you need on the driver install? i did not find it)
>> i'll see if i can test that on a single card (only have one here)
>>
> 
> 
> I have use 460.73.01 driver.  (last 510 driver don't have the flag and
> code, don't known why)
> https://github.com/mbilker/vgpu_unlock-rs/issues/15
> 
> 
> the flag is NV_KVM_MIGRATION_UAP=1.
> As I didn't known to pass the flag,
> 
> I have simply decompress the driver
> "NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run -x"
> edit the "kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild" to add
> NV_KVM_MIGRATION_UAP=1
> 
> then ./nvidia-installer
> 

thx, i am using the 510.73.06 driver here (official grid driver) and the
dkms source has that flag, so i changed the .Kbuild in my /usr/src folder
and rebuilt it. i'll test it tomorrow






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
  2022-08-22 14:07           ` Dominik Csapak
@ 2022-08-23  7:50             ` Dominik Csapak
  0 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-08-23  7:50 UTC (permalink / raw)
  To: DERUMIER, Alexandre, Proxmox VE development discussion

On 8/22/22 16:07, Dominik Csapak wrote:
> On 8/22/22 15:39, DERUMIER, Alexandre wrote:
>> Le 22/08/22 à 12:16, Dominik Csapak a écrit :
>>> On 8/17/22 01:15, DERUMIER, Alexandre wrote:
>>>> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>>>>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>>>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>>>>> so maybe someone can look at that and give some feedback?
>>>>>>> my idea there would be to allow multiple device mappings per node
>>>>>>> (instead of one only) and the qemu code would select one automatically
>>>>>> Hi Dominik,
>>>>>>
>>>>>> do you want to create some kind of pool of pci devices in your ""add
>>>>>> cluster-wide hardware device mapping" patches series ?
>>>>>>
>>>>>> Maybe in hardwaremap, allow to define multiple pci address on same
>>>>>> node ?
>>>>>>
>>>>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>>>>> If not, try to create the mdev if 1 device, if it's failing (max
>>>>>> number of mdev reached), try to create mdev on the other device,...
>>>>>>
>>>>>> if not mdev, choose a pci device in the pool not yet detached from
>>>>>> host.
>>>>>>
>>>>>
>>>>> yes i plan to do this in my next iteration of the mapping series
>>>>> (basically what you describe)
>>>> Hi, sorry to be late.
>>>>
>>>>
>>>>> my (rough) idea:
>>>>>
>>>>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>>>>> (should be enough, i don't think grouping unrelated devices (different
>>>>> vendor/product) makes much sense?)
>>>> yes, that's enough for me. we don't want to mix unrelated devices.
>>>>
>>>> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
>>>> to compile the nvidia vfio driver with an option to enable it + add
>>>> "-device vfio-pci,x-enable-migration=on,..."
>>>
>>> nice (what flag do you need on the driver install? i did not find it)
>>> i'll see if i can test that on a single card (only have one here)
>>>
>>
>>
>> I have use 460.73.01 driver.  (last 510 driver don't have the flag and
>> code, don't known why)
>> https://github.com/mbilker/vgpu_unlock-rs/issues/15
>>
>>
>> the flag is NV_KVM_MIGRATION_UAP=1.
>> As I didn't known to pass the flag,
>>
>> I have simply decompress the driver
>> "NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run -x"
>> edit the "kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild" to add
>> NV_KVM_MIGRATION_UAP=1
>>
>> then ./nvidia-installer
>>
> 
> thx, i am using the 510.73.06 driver here (official grid driver) and the
> dkms source has that flag, so i changed the .Kbuild in my /usr/src folder
> and rebuilt it. i'll test it tomorrow
> 
> 
> 

so i tested it here on a single machine and single card, and it worked.
i started a second (manual qemu) vm with '-incoming' and used the qemu monitor
to intiate the migration. a vnc session to that vm with running benchmark ran
without any noticable interruption :)

i guess though that since nvidia does not really advertise that feature for their
'linux kvm' drivers, that is a rather experimental and unsupported there.
(In their documentation only citrix/vmware is supported for live migration of vgpus)

so i'll see that after my current cluster mapping patches, i'll add a
'migration' flag to hostpci devices, but only for cli, since there does not
seem to be a supported way for any hw right now (or is there any other vendor with
that feature?)





^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-08-23  7:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-26  6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
2022-07-26  6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
2022-08-12  7:25   ` Wolfgang Bumiller
2022-07-26  6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
2022-08-12  7:32   ` Wolfgang Bumiller
2022-07-26  6:55 ` [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector Dominik Csapak
2022-08-02 16:21 ` [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA DERUMIER, Alexandre
2022-08-09  7:59 ` DERUMIER, Alexandre
2022-08-09  8:39   ` Dominik Csapak
2022-08-16 23:15     ` DERUMIER, Alexandre
2022-08-22 10:16       ` Dominik Csapak
2022-08-22 13:39         ` DERUMIER, Alexandre
2022-08-22 14:07           ` Dominik Csapak
2022-08-23  7:50             ` Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal