all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: "DERUMIER, Alexandre" <Alexandre.DERUMIER@groupe-cyllene.com>,
	Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA
Date: Tue, 23 Aug 2022 09:50:51 +0200	[thread overview]
Message-ID: <d5f6883f-4516-4fb0-3178-6a62a2e25f02@proxmox.com> (raw)
In-Reply-To: <ae2802e2-6e93-2cdf-1063-19351b294eec@proxmox.com>

On 8/22/22 16:07, Dominik Csapak wrote:
> On 8/22/22 15:39, DERUMIER, Alexandre wrote:
>> Le 22/08/22 à 12:16, Dominik Csapak a écrit :
>>> On 8/17/22 01:15, DERUMIER, Alexandre wrote:
>>>> Le 9/08/22 à 10:39, Dominik Csapak a écrit :
>>>>> On 8/9/22 09:59, DERUMIER, Alexandre wrote:
>>>>>> Le 26/07/22 à 08:55, Dominik Csapak a écrit :
>>>>>>> so maybe someone can look at that and give some feedback?
>>>>>>> my idea there would be to allow multiple device mappings per node
>>>>>>> (instead of one only) and the qemu code would select one automatically
>>>>>> Hi Dominik,
>>>>>>
>>>>>> do you want to create some kind of pool of pci devices in your ""add
>>>>>> cluster-wide hardware device mapping" patches series ?
>>>>>>
>>>>>> Maybe in hardwaremap, allow to define multiple pci address on same
>>>>>> node ?
>>>>>>
>>>>>> Then, for mdev, look if a mdev already exist in 1 of the device.
>>>>>> If not, try to create the mdev if 1 device, if it's failing (max
>>>>>> number of mdev reached), try to create mdev on the other device,...
>>>>>>
>>>>>> if not mdev, choose a pci device in the pool not yet detached from
>>>>>> host.
>>>>>>
>>>>>
>>>>> yes i plan to do this in my next iteration of the mapping series
>>>>> (basically what you describe)
>>>> Hi, sorry to be late.
>>>>
>>>>
>>>>> my (rough) idea:
>>>>>
>>>>> have a list of pci paths in mapping (e.g. 01:00.0;01:00.4;...)
>>>>> (should be enough, i don't think grouping unrelated devices (different
>>>>> vendor/product) makes much sense?)
>>>> yes, that's enough for me. we don't want to mix unrelated devices.
>>>>
>>>> BTW, I'm finally able to do live migration with nvidia mdev vgpu. (need
>>>> to compile the nvidia vfio driver with an option to enable it + add
>>>> "-device vfio-pci,x-enable-migration=on,..."
>>>
>>> nice (what flag do you need on the driver install? i did not find it)
>>> i'll see if i can test that on a single card (only have one here)
>>>
>>
>>
>> I have use 460.73.01 driver.  (last 510 driver don't have the flag and
>> code, don't known why)
>> https://github.com/mbilker/vgpu_unlock-rs/issues/15
>>
>>
>> the flag is NV_KVM_MIGRATION_UAP=1.
>> As I didn't known to pass the flag,
>>
>> I have simply decompress the driver
>> "NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5.run -x"
>> edit the "kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.Kbuild" to add
>> NV_KVM_MIGRATION_UAP=1
>>
>> then ./nvidia-installer
>>
> 
> thx, i am using the 510.73.06 driver here (official grid driver) and the
> dkms source has that flag, so i changed the .Kbuild in my /usr/src folder
> and rebuilt it. i'll test it tomorrow
> 
> 
> 

so i tested it here on a single machine and single card, and it worked.
i started a second (manual qemu) vm with '-incoming' and used the qemu monitor
to intiate the migration. a vnc session to that vm with running benchmark ran
without any noticable interruption :)

i guess though that since nvidia does not really advertise that feature for their
'linux kvm' drivers, that is a rather experimental and unsupported there.
(In their documentation only citrix/vmware is supported for live migration of vgpus)

so i'll see that after my current cluster mapping patches, i'll add a
'migration' flag to hostpci devices, but only for cli, since there does not
seem to be a supported way for any hw right now (or is there any other vendor with
that feature?)





      reply	other threads:[~2022-08-23  7:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-26  6:55 Dominik Csapak
2022-07-26  6:55 ` [pve-devel] [PATCH common 1/1] SysFSTools: get name from mediated device types Dominik Csapak
2022-08-12  7:25   ` Wolfgang Bumiller
2022-07-26  6:55 ` [pve-devel] [PATCH qemu-server 1/1] automatically add 'uuid' parameter when passing through NVIDIA vGPU Dominik Csapak
2022-08-12  7:32   ` Wolfgang Bumiller
2022-07-26  6:55 ` [pve-devel] [PATCH manager 1/1] ui: improve form/MDevSelector Dominik Csapak
2022-08-02 16:21 ` [pve-devel] [PATCH common/qemu-server/manager] improve vGPU (mdev) usage for NVIDIA DERUMIER, Alexandre
2022-08-09  7:59 ` DERUMIER, Alexandre
2022-08-09  8:39   ` Dominik Csapak
2022-08-16 23:15     ` DERUMIER, Alexandre
2022-08-22 10:16       ` Dominik Csapak
2022-08-22 13:39         ` DERUMIER, Alexandre
2022-08-22 14:07           ` Dominik Csapak
2022-08-23  7:50             ` Dominik Csapak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5f6883f-4516-4fb0-3178-6a62a2e25f02@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=Alexandre.DERUMIER@groupe-cyllene.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal