* [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
@ 2022-07-26 6:55 Dominik Csapak
2022-07-26 6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26 6:55 UTC (permalink / raw)
To: pve-devel
This series improves the handling of NVIDIA vGPUs by exposing the optional name
and automatically adding the uuid to the qemu process (required by NVIDIA
driver). Also adds the name to the UI for selecting a mediated devices as well
as making the dropdown larger so users can see all the relevant info.
After a bit of testing with an RTX A5000 here, i'd like to tackle the bug
#3574 ("Improve SR-IOV usability")[0]
but i'd like to integrate it in my other pci passthrough series[1]
"add cluster-wide hardware device mapping"
so maybe someone can look at that and give some feedback?
my idea there would be to allow multiple device mappings per node
(instead of one only) and the qemu code would select one automatically
0: https://bugzilla.proxmox.com/show_bug.cgi?id=3574
1: https://lists.proxmox.com/pipermail/pve-devel/2022-July/053565.html
pve-common:
Dominik Csapak (1):
SysFSTools: get name from mediated device types
src/PVE/SysFSTools.pm | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
qemu-server:
Dominik Csapak (1):
automatically add 'uuid' parameter when passing through NVIDIA vGPU
PVE/QemuServer.pm | 8 +++++++-
PVE/QemuServer/PCI.pm | 4 +++-
2 files changed, 10 insertions(+), 2 deletions(-)
pve-manager:
Dominik Csapak (1):
ui: improve form/MDevSelector
www/manager6/form/MDevSelector.js | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types
2022-07-26 6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
@ 2022-07-26 6:55 ` Dominik Csapak
2022-08-12 7:25 ` Wolfgang Bumiller
2022-07-26 6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26 6:55 UTC (permalink / raw)
To: pve-devel
Some vendors also provide a 'name' file here for the type, which, in case of
NVIDIA, is the official name for the vGPU type in their documentation,
so extract and return it too (if it exists).
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/PVE/SysFSTools.pm | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm
index ac48f2c..e897b22 100644
--- a/src/PVE/SysFSTools.pm
+++ b/src/PVE/SysFSTools.pm
@@ -172,11 +172,15 @@ sub get_mdev_types {
my $available = int(file_read_firstline("$type_path/available_instances"));
my $description = PVE::Tools::file_get_contents("$type_path/description");
- push @$types, {
+ my $entry = {
type => $type,
description => $description,
available => $available,
};
+
+ $entry->{name} = PVE::Tools::file_get_contents("$type_path/name") if -e "$type_path/name";
+
+ push @$types, $entry;
});
return $types;
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU
2022-07-26 6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
2022-07-26 6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
@ 2022-07-26 6:55 ` Dominik Csapak
2022-08-12 7:32 ` Wolfgang Bumiller
2022-07-26 6:55 ` [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector Dominik Csapak
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26 6:55 UTC (permalink / raw)
To: pve-devel
When passing through an NVIDIA vGPU via mediated devices, their
software needs the qemu process to have the 'uuid' parameter set to the
one of the vGPU. Since it's currently not possible to pass through multiple
vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
we don't have to take care about that.
Sadly, the place we do this, it does not show up in 'qm showcmd' as we
don't (want to) query the pci devices in that case, and then we don't
have a way of knowing if it's an NVIDIA card or not. But since this
is informational with QEMU anyway, i'd say we can ignore that.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer.pm | 8 +++++++-
PVE/QemuServer/PCI.pm | 4 +++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 7d9cf22..c4eb031 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5597,7 +5597,13 @@ sub vm_start_nolock {
for my $id (sort keys %$pci_devices) {
my $d = $pci_devices->{$id};
for my $dev ($d->{pciid}->@*) {
- PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
+ my $info = PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
+
+ # nvidia grid needs the uuid of the mdev as qemu parameter
+ if ($d->{mdev} && $info->{vendor} eq '10de') {
+ my $uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $id);
+ push @$cmd, '-uuid', $uuid;
+ }
}
}
};
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index 23fe508..3d0e70e 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -253,7 +253,7 @@ sub get_pci_addr_map {
return $pci_addr_map;
}
-my sub generate_mdev_uuid {
+sub generate_mdev_uuid {
my ($vmid, $index) = @_;
return sprintf("%08d-0000-0000-0000-%012d", $index, $vmid);
}
@@ -514,6 +514,8 @@ sub prepare_pci_device {
die "can't reset PCI device '$pciid'\n"
if $info->{has_fl_reset} && !PVE::SysFSTools::pci_dev_reset($info);
}
+
+ return $info;
}
my $RUNDIR = '/run/qemu-server';
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector
2022-07-26 6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
2022-07-26 6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
2022-07-26 6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
@ 2022-07-26 6:55 ` Dominik Csapak
2022-08-02 16:21 ` [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA DERUMIER, Alexandre
2022-08-09 7:59 ` DERUMIER, Alexandre
4 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-07-26 6:55 UTC (permalink / raw)
To: pve-devel
by
* showing the (optional) name in front of the type
* making the 'availble' column a bit narrower
* enabling 'cellWrap' for the description
* making the dropdown a bit wider (so all the information can fit)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
www/manager6/form/MDevSelector.js | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/www/manager6/form/MDevSelector.js b/www/manager6/form/MDevSelector.js
index 8ee73c0c..0a157813 100644
--- a/www/manager6/form/MDevSelector.js
+++ b/www/manager6/form/MDevSelector.js
@@ -16,21 +16,29 @@ Ext.define('PVE.form.MDevSelector', {
valueField: 'type',
displayField: 'type',
listConfig: {
+ width: 550,
columns: [
{
header: gettext('Type'),
dataIndex: 'type',
+ renderer: function(value, md, rec) {
+ if (rec.data.name !== undefined) {
+ return `${rec.data.name} (${value})`;
+ }
+ return value;
+ },
flex: 1,
},
{
- header: gettext('Available'),
+ header: gettext('Avail.'),
dataIndex: 'available',
- width: 80,
+ width: 60,
},
{
header: gettext('Description'),
dataIndex: 'description',
flex: 1,
+ cellWrap: true,
renderer: function(value) {
if (!value) {
return '';
--
2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-07-26 6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
` (2 preceding siblings ...)
2022-07-26 6:55 ` [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector Dominik Csapak
@ 2022-08-02 16:21 ` DERUMIER, Alexandre
2022-08-09 7:59 ` DERUMIER, Alexandre
4 siblings, 0 replies; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-02 16:21 UTC (permalink / raw)
To: pve-devel, Moula BADJI
Le 26/07/22 à 08:55, Dominik Csapak a écrit :
> This series improves the handling of NVIDIA vGPUs by exposing the optional name
> and automatically adding the uuid to the qemu process (required by NVIDIA
> driver). Also adds the name to the UI for selecting a mediated devices as well
> as making the dropdown larger so users can see all the relevant info.
>
> After a bit of testing with an RTX A5000 here, i'd like to tackle the bug
> #3574 ("Improve SR-IOV usability")[0]
>
> but i'd like to integrate it in my other pci passthrough series[1]
> "add cluster-wide hardware device mapping"
>
> so maybe someone can look at that and give some feedback?
> my idea there would be to allow multiple device mappings per node
> (instead of one only) and the qemu code would select one automatically
Hi,
maybe I could look at it next week
ping @moula : is it still possible to have access to your proxmox vgpu
cluster ?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-07-26 6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
` (3 preceding siblings ...)
2022-08-02 16:21 ` [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA DERUMIER, Alexandre
@ 2022-08-09 7:59 ` DERUMIER, Alexandre
2022-08-09 8:39 ` Dominik Csapak
4 siblings, 1 reply; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-09 7:59 UTC (permalink / raw)
To: pve-devel
Le 26/07/22 à 08:55, Dominik Csapak a écrit :
> so maybe someone can look at that and give some feedback?
> my idea there would be to allow multiple device mappings per node
> (instead of one only) and the qemu code would select one automatically
Hi Dominik,
do you want to create some kind of pool of pci devices in your ""add cluster-wide hardware device mapping" patches series ?
Maybe in hardwaremap, allow to define multiple pci address on same node ?
Then, for mdev, look if a mdev already exist in 1 of the device.
If not, try to create the mdev if 1 device, if it's failing (max number of mdev reached), try to create mdev on the other device,...
if not mdev, choose a pci device in the pool not yet detached from host.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-08-09 7:59 ` DERUMIER, Alexandre
@ 2022-08-09 8:39 ` Dominik Csapak
2022-08-16 23:15 ` DERUMIER, Alexandre
0 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-08-09 8:39 UTC (permalink / raw)
To: pve-devel
On 8/9/22 09:59, DERUMIER, Alexandre wrote:
> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>> so maybe someone can look at that and give some feedback?
>> my idea there would be to allow multiple device mappings per node
>> (instead of one only) and the qemu code would select one automatically
> Hi Dominik,
>
> do you want to create some kind of pool of pci devices in your ""add cluster-wide hardware device mapping" patches series ?
>
> Maybe in hardwaremap, allow to define multiple pci address on same node ?
>
> Then, for mdev, look if a mdev already exist in 1 of the device.
> If not, try to create the mdev if 1 device, if it's failing (max number of mdev reached), try to create mdev on the other device,...
>
> if not mdev, choose a pci device in the pool not yet detached from host.
>
yes i plan to do this in my next iteration of the mapping series
(basically what you describe)
my (rough) idea:
have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
(should be enough, i don't think grouping unrelated devices (different vendor/product) makes much
sense?)
* non mdev:
qemu-server checks the pci reservations (which we already have)
and takes the first not yet reserved path
* mdev
qemu-server iterates over the devices until it finds one
with the given mdev type available
if none is found, error out
(relevant bug for this: https://bugzilla.proxmox.com/show_bug.cgi?id=3574)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types
2022-07-26 6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
@ 2022-08-12 7:25 ` Wolfgang Bumiller
0 siblings, 0 replies; 14+ messages in thread
From: Wolfgang Bumiller @ 2022-08-12 7:25 UTC (permalink / raw)
To: Dominik Csapak; +Cc: pve-devel
On Tue, Jul 26, 2022 at 08:55:57AM +0200, Dominik Csapak wrote:
> Some vendors also provide a 'name' file here for the type, which, in case of
> NVIDIA, is the official name for the vGPU type in their documentation,
> so extract and return it too (if it exists).
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> src/PVE/SysFSTools.pm | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm
> index ac48f2c..e897b22 100644
> --- a/src/PVE/SysFSTools.pm
> +++ b/src/PVE/SysFSTools.pm
> @@ -172,11 +172,15 @@ sub get_mdev_types {
> my $available = int(file_read_firstline("$type_path/available_instances"));
> my $description = PVE::Tools::file_get_contents("$type_path/description");
>
> - push @$types, {
> + my $entry = {
> type => $type,
> description => $description,
> available => $available,
> };
> +
> + $entry->{name} = PVE::Tools::file_get_contents("$type_path/name") if -e "$type_path/name";
Since this is a sysfs file I'd expect this to end in a newline?
Otherwise this is fine, though I'm not a fan of `-e` checks in general.
You could use `file_read_firstline` which would `chomp` the newline and
return `undef` if opening the file fails (and you could then (maybe
optionally) check `$!` for `ENOENT`).
> +
> + push @$types, $entry;
> });
>
> return $types;
> --
> 2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU
2022-07-26 6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
@ 2022-08-12 7:32 ` Wolfgang Bumiller
0 siblings, 0 replies; 14+ messages in thread
From: Wolfgang Bumiller @ 2022-08-12 7:32 UTC (permalink / raw)
To: Dominik Csapak; +Cc: pve-devel
On Tue, Jul 26, 2022 at 08:55:58AM +0200, Dominik Csapak wrote:
> When passing through an NVIDIA vGPU via mediated devices, their
> software needs the qemu process to have the 'uuid' parameter set to the
> one of the vGPU. Since it's currently not possible to pass through multiple
> vGPUs to one VM (seems to be an NVIDIA driver limitation at the moment),
> we don't have to take care about that.
>
> Sadly, the place we do this, it does not show up in 'qm showcmd' as we
> don't (want to) query the pci devices in that case, and then we don't
> have a way of knowing if it's an NVIDIA card or not. But since this
> is informational with QEMU anyway, i'd say we can ignore that.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> PVE/QemuServer.pm | 8 +++++++-
> PVE/QemuServer/PCI.pm | 4 +++-
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 7d9cf22..c4eb031 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -5597,7 +5597,13 @@ sub vm_start_nolock {
> for my $id (sort keys %$pci_devices) {
> my $d = $pci_devices->{$id};
> for my $dev ($d->{pciid}->@*) {
> - PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
> + my $info = PVE::QemuServer::PCI::prepare_pci_device($vmid, $dev->{id}, $id, $d->{mdev});
> +
> + # nvidia grid needs the uuid of the mdev as qemu parameter
> + if ($d->{mdev} && $info->{vendor} eq '10de') {
> + my $uuid = PVE::QemuServer::PCI::generate_mdev_uuid($vmid, $id);
> + push @$cmd, '-uuid', $uuid;
You mention you can only pass through one device, but I'd still prefer
the code to make sure we only pass a single `-uuid` parameter here,
since this is not at all clear when just reading the code.
Otherwise this seems fine.
> + }
> }
> }
> };
> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
> index 23fe508..3d0e70e 100644
> --- a/PVE/QemuServer/PCI.pm
> +++ b/PVE/QemuServer/PCI.pm
> @@ -253,7 +253,7 @@ sub get_pci_addr_map {
> return $pci_addr_map;
> }
>
> -my sub generate_mdev_uuid {
> +sub generate_mdev_uuid {
> my ($vmid, $index) = @_;
> return sprintf("%08d-0000-0000-0000-%012d", $index, $vmid);
> }
> @@ -514,6 +514,8 @@ sub prepare_pci_device {
> die "can't reset PCI device '$pciid'\n"
> if $info->{has_fl_reset} && !PVE::SysFSTools::pci_dev_reset($info);
> }
> +
> + return $info;
> }
>
> my $RUNDIR = '/run/qemu-server';
> --
> 2.30.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-08-09 8:39 ` Dominik Csapak
@ 2022-08-16 23:15 ` DERUMIER, Alexandre
2022-08-22 10:16 ` Dominik Csapak
0 siblings, 1 reply; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-16 23:15 UTC (permalink / raw)
To: pve-devel
Le 9/08/22 à 10:39, Dominik Csapak a écrit :
> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>> so maybe someone can look at that and give some feedback?
>>> my idea there would be to allow multiple device mappings per node
>>> (instead of one only) and the qemu code would select one automatically
>> Hi Dominik,
>>
>> do you want to create some kind of pool of pci devices in your ""add
>> cluster-wide hardware device mapping" patches series ?
>>
>> Maybe in hardwaremap, allow to define multiple pci address on same node ?
>>
>> Then, for mdev, look if a mdev already exist in 1 of the device.
>> If not, try to create the mdev if 1 device, if it's failing (max
>> number of mdev reached), try to create mdev on the other device,...
>>
>> if not mdev, choose a pci device in the pool not yet detached from host.
>>
>
> yes i plan to do this in my next iteration of the mapping series
> (basically what you describe)
Hi, sorry to be late.
> my (rough) idea:
>
> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
> (should be enough, i don't think grouping unrelated devices (different
> vendor/product) makes much sense?)
yes, that's enough for me. we don't want to mix unrelated devices.
BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
to compile the nvidia vfio driver with an option to enable it + add
"-device vfio-pci,x-enable-migration=on,..."
So, maybe adding a "livemigrate" flag on the hardwaremap could be great :)
Could be usefull for stateless usb device, like usb dongle,where we
could unplug usb/livemigrate/replug usb.
>
> * non mdev:
> qemu-server checks the pci reservations (which we already have)
> and takes the first not yet reserved path
>
> * mdev
> qemu-server iterates over the devices until it finds one
> with the given mdev type available
>
> if none is found, error out
> seem great :)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-08-16 23:15 ` DERUMIER, Alexandre
@ 2022-08-22 10:16 ` Dominik Csapak
2022-08-22 13:39 ` DERUMIER, Alexandre
0 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-08-22 10:16 UTC (permalink / raw)
To: Proxmox VE development discussion, DERUMIER, Alexandre
On 8/17/22 01:15, DERUMIER, Alexandre wrote:
> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>> so maybe someone can look at that and give some feedback?
>>>> my idea there would be to allow multiple device mappings per node
>>>> (instead of one only) and the qemu code would select one automatically
>>> Hi Dominik,
>>>
>>> do you want to create some kind of pool of pci devices in your ""add
>>> cluster-wide hardware device mapping" patches series ?
>>>
>>> Maybe in hardwaremap, allow to define multiple pci address on same node ?
>>>
>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>> If not, try to create the mdev if 1 device, if it's failing (max
>>> number of mdev reached), try to create mdev on the other device,...
>>>
>>> if not mdev, choose a pci device in the pool not yet detached from host.
>>>
>>
>> yes i plan to do this in my next iteration of the mapping series
>> (basically what you describe)
> Hi, sorry to be late.
>
>
>> my (rough) idea:
>>
>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>> (should be enough, i don't think grouping unrelated devices (different
>> vendor/product) makes much sense?)
> yes, that's enough for me. we don't want to mix unrelated devices.
>
> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
> to compile the nvidia vfio driver with an option to enable it + add
> "-device vfio-pci,x-enable-migration=on,..."
nice (what flag do you need on the driver install? i did not find it)
i'll see if i can test that on a single card (only have one here)
>
> So, maybe adding a "livemigrate" flag on the hardwaremap could be great :)
it's probably better suited for the hostpci setting in the qemu config,
since that's the place we need it
>
> Could be usefull for stateless usb device, like usb dongle,where we
> could unplug usb/livemigrate/replug usb.
>
>
>
also probably better suited for the usbX setting
but those can be done after (some version of) this series
is applied
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-08-22 10:16 ` Dominik Csapak
@ 2022-08-22 13:39 ` DERUMIER, Alexandre
2022-08-22 14:07 ` Dominik Csapak
0 siblings, 1 reply; 14+ messages in thread
From: DERUMIER, Alexandre @ 2022-08-22 13:39 UTC (permalink / raw)
To: Dominik Csapak, Proxmox VE development discussion
Le 22/08/22 à 12:16, Dominik Csapak a écrit :
> On 8/17/22 01:15, DERUMIER, Alexandre wrote:
>> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>>> so maybe someone can look at that and give some feedback?
>>>>> my idea there would be to allow multiple device mappings per node
>>>>> (instead of one only) and the qemu code would select one automatically
>>>> Hi Dominik,
>>>>
>>>> do you want to create some kind of pool of pci devices in your ""add
>>>> cluster-wide hardware device mapping" patches series ?
>>>>
>>>> Maybe in hardwaremap, allow to define multiple pci address on same
>>>> node ?
>>>>
>>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>>> If not, try to create the mdev if 1 device, if it's failing (max
>>>> number of mdev reached), try to create mdev on the other device,...
>>>>
>>>> if not mdev, choose a pci device in the pool not yet detached from
>>>> host.
>>>>
>>>
>>> yes i plan to do this in my next iteration of the mapping series
>>> (basically what you describe)
>> Hi, sorry to be late.
>>
>>
>>> my (rough) idea:
>>>
>>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>>> (should be enough, i don't think grouping unrelated devices (different
>>> vendor/product) makes much sense?)
>> yes, that's enough for me. we don't want to mix unrelated devices.
>>
>> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
>> to compile the nvidia vfio driver with an option to enable it + add
>> "-device vfio-pci,x-enable-migration=on,..."
>
> nice (what flag do you need on the driver install? i did not find it)
> i'll see if i can test that on a single card (only have one here)
>
I have use 460.73.01 driver. (last 510 driver don't have the flag and
code, don't known why)
https://github.com/mbilker/vgpu_unlock-rs/issues/15
the flag is NV_KVM_MIGRATION_UAP=1.
As I didn't known to pass the flag,
I have simply decompress the driver
"NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run -x"
edit the "kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild" to add
NV_KVM_MIGRATION_UAP=1
then ./nvidia-installer
>>
>> So, maybe adding a "livemigrate" flag on the hardwaremap could be
>> great :)
>
> it's probably better suited for the hostpci setting in the qemu config,
> since that's the place we need it
>
>>
>> Could be usefull for stateless usb device, like usb dongle,where we
>> could unplug usb/livemigrate/replug usb.
>>
>>
>>
> also probably better suited for the usbX setting
>
> but those can be done after (some version of) this series
> is applied
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-08-22 13:39 ` DERUMIER, Alexandre
@ 2022-08-22 14:07 ` Dominik Csapak
2022-08-23 7:50 ` Dominik Csapak
0 siblings, 1 reply; 14+ messages in thread
From: Dominik Csapak @ 2022-08-22 14:07 UTC (permalink / raw)
To: DERUMIER, Alexandre, Proxmox VE development discussion
On 8/22/22 15:39, DERUMIER, Alexandre wrote:
> Le 22/08/22 à 12:16, Dominik Csapak a écrit :
>> On 8/17/22 01:15, DERUMIER, Alexandre wrote:
>>> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>>>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>>>> so maybe someone can look at that and give some feedback?
>>>>>> my idea there would be to allow multiple device mappings per node
>>>>>> (instead of one only) and the qemu code would select one automatically
>>>>> Hi Dominik,
>>>>>
>>>>> do you want to create some kind of pool of pci devices in your ""add
>>>>> cluster-wide hardware device mapping" patches series ?
>>>>>
>>>>> Maybe in hardwaremap, allow to define multiple pci address on same
>>>>> node ?
>>>>>
>>>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>>>> If not, try to create the mdev if 1 device, if it's failing (max
>>>>> number of mdev reached), try to create mdev on the other device,...
>>>>>
>>>>> if not mdev, choose a pci device in the pool not yet detached from
>>>>> host.
>>>>>
>>>>
>>>> yes i plan to do this in my next iteration of the mapping series
>>>> (basically what you describe)
>>> Hi, sorry to be late.
>>>
>>>
>>>> my (rough) idea:
>>>>
>>>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>>>> (should be enough, i don't think grouping unrelated devices (different
>>>> vendor/product) makes much sense?)
>>> yes, that's enough for me. we don't want to mix unrelated devices.
>>>
>>> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
>>> to compile the nvidia vfio driver with an option to enable it + add
>>> "-device vfio-pci,x-enable-migration=on,..."
>>
>> nice (what flag do you need on the driver install? i did not find it)
>> i'll see if i can test that on a single card (only have one here)
>>
>
>
> I have use 460.73.01 driver. (last 510 driver don't have the flag and
> code, don't known why)
> https://github.com/mbilker/vgpu_unlock-rs/issues/15
>
>
> the flag is NV_KVM_MIGRATION_UAP=1.
> As I didn't known to pass the flag,
>
> I have simply decompress the driver
> "NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run -x"
> edit the "kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild" to add
> NV_KVM_MIGRATION_UAP=1
>
> then ./nvidia-installer
>
thx, i am using the 510.73.06 driver here (official grid driver) and the
dkms source has that flag, so i changed the .Kbuild in my /usr/src folder
and rebuilt it. i'll test it tomorrow
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
2022-08-22 14:07 ` Dominik Csapak
@ 2022-08-23 7:50 ` Dominik Csapak
0 siblings, 0 replies; 14+ messages in thread
From: Dominik Csapak @ 2022-08-23 7:50 UTC (permalink / raw)
To: DERUMIER, Alexandre, Proxmox VE development discussion
On 8/22/22 16:07, Dominik Csapak wrote:
> On 8/22/22 15:39, DERUMIER, Alexandre wrote:
>> Le 22/08/22 à 12:16, Dominik Csapak a écrit :
>>> On 8/17/22 01:15, DERUMIER, Alexandre wrote:
>>>> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>>>>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>>>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>>>>> so maybe someone can look at that and give some feedback?
>>>>>>> my idea there would be to allow multiple device mappings per node
>>>>>>> (instead of one only) and the qemu code would select one automatically
>>>>>> Hi Dominik,
>>>>>>
>>>>>> do you want to create some kind of pool of pci devices in your ""add
>>>>>> cluster-wide hardware device mapping" patches series ?
>>>>>>
>>>>>> Maybe in hardwaremap, allow to define multiple pci address on same
>>>>>> node ?
>>>>>>
>>>>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>>>>> If not, try to create the mdev if 1 device, if it's failing (max
>>>>>> number of mdev reached), try to create mdev on the other device,...
>>>>>>
>>>>>> if not mdev, choose a pci device in the pool not yet detached from
>>>>>> host.
>>>>>>
>>>>>
>>>>> yes i plan to do this in my next iteration of the mapping series
>>>>> (basically what you describe)
>>>> Hi, sorry to be late.
>>>>
>>>>
>>>>> my (rough) idea:
>>>>>
>>>>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>>>>> (should be enough, i don't think grouping unrelated devices (different
>>>>> vendor/product) makes much sense?)
>>>> yes, that's enough for me. we don't want to mix unrelated devices.
>>>>
>>>> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
>>>> to compile the nvidia vfio driver with an option to enable it + add
>>>> "-device vfio-pci,x-enable-migration=on,..."
>>>
>>> nice (what flag do you need on the driver install? i did not find it)
>>> i'll see if i can test that on a single card (only have one here)
>>>
>>
>>
>> I have use 460.73.01 driver. (last 510 driver don't have the flag and
>> code, don't known why)
>> https://github.com/mbilker/vgpu_unlock-rs/issues/15
>>
>>
>> the flag is NV_KVM_MIGRATION_UAP=1.
>> As I didn't known to pass the flag,
>>
>> I have simply decompress the driver
>> "NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run -x"
>> edit the "kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild" to add
>> NV_KVM_MIGRATION_UAP=1
>>
>> then ./nvidia-installer
>>
>
> thx, i am using the 510.73.06 driver here (official grid driver) and the
> dkms source has that flag, so i changed the .Kbuild in my /usr/src folder
> and rebuilt it. i'll test it tomorrow
>
>
>
so i tested it here on a single machine and single card, and it worked.
i started a second (manual qemu) vm with '-incoming' and used the qemu monitor
to intiate the migration. a vnc session to that vm with running benchmark ran
without any noticable interruption :)
i guess though that since nvidia does not really advertise that feature for their
'linux kvm' drivers, that is a rather experimental and unsupported there.
(In their documentation only citrix/vmware is supported for live migration of vgpus)
so i'll see that after my current cluster mapping patches, i'll add a
'migration' flag to hostpci devices, but only for cli, since there does not
seem to be a supported way for any hw right now (or is there any other vendor with
that feature?)
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-08-23 7:50 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-26 6:55 [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA Dominik Csapak
2022-07-26 6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
2022-08-12 7:25 ` Wolfgang Bumiller
2022-07-26 6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
2022-08-12 7:32 ` Wolfgang Bumiller
2022-07-26 6:55 ` [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector Dominik Csapak
2022-08-02 16:21 ` [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA DERUMIER, Alexandre
2022-08-09 7:59 ` DERUMIER, Alexandre
2022-08-09 8:39 ` Dominik Csapak
2022-08-16 23:15 ` DERUMIER, Alexandre
2022-08-22 10:16 ` Dominik Csapak
2022-08-22 13:39 ` DERUMIER, Alexandre
2022-08-22 14:07 ` Dominik Csapak
2022-08-23 7:50 ` Dominik Csapak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox