Re: [PVE-User] vGPU scheduling

public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed

* Re: [PVE-User] vGPU scheduling
       [not found] <mailman.190.1684933430.348.pve-user@lists.proxmox.com>
@ 2023-05-24 13:47 ` Dominik Csapak
  2023-05-25  7:32   ` DERUMIER, Alexandre
       [not found]   ` <mailman.193.1684936457.348.pve-user@lists.proxmox.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Dominik Csapak @ 2023-05-24 13:47 UTC (permalink / raw)
  To: pve-user

On 5/24/23 15:03, Eneko Lacunza via pve-user wrote:
> Hi,

Hi,


> 
> We're looking to move a PoC in a customer to full-scale production.
> 
> Proxmox/Ceph cluster will be for VDI, and some VMs will use vGPU.
> 
> I'd like to know if vGPU status is being exposed right now (as of 7.4) for each node through API, as 
> it is done for RAM/CPU, and if not, about any plans to implement that so that a scheduler (in our 
> case that should be UDS Enterprise VDI manager) can choose a node with free vGPUs to deploy VDIs.

what exactly do you mean with vGPU status?

there currently is no api to see which pci devices are in use of a vm
(though that could be done per node, not really for mediated devices though)

there is the /nodes/NODENAME/hardware/pci api call which shows what devices exist
and if they have mdev (mediated device) capability (e.g. NVIDIA GRID vGPU)

for those cards there also exists the api call

/nodes/NODENAME/hardware/pci/PCIID/mdev

which gives a list of mdev types and how many are available of them


does that help?

if you have more specific requirements (or i misunderstood you), please
open a bug/feature request on https://bugzilla.proxmox.com

> 
> Thanks

Kind Regards
Dominik




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-24 13:47 ` [PVE-User] vGPU scheduling Dominik Csapak
@ 2023-05-25  7:32   ` DERUMIER, Alexandre
  2023-05-25  7:43     ` Dominik Csapak
       [not found]   ` <mailman.193.1684936457.348.pve-user@lists.proxmox.com>
  1 sibling, 1 reply; 7+ messages in thread
From: DERUMIER, Alexandre @ 2023-05-25  7:32 UTC (permalink / raw)
  To: pve-user

Hi Dominik,

any news about your patches "add cluster-wide hardware device mapping"
?

Do you think it'll be merged for proxmox 8 ?

I think it could help for this usecase.

Le mercredi 24 mai 2023 à 15:47 +0200, Dominik Csapak a écrit :
> On 5/24/23 15:03, Eneko Lacunza via pve-user wrote:
> > Hi,
> 
> Hi,
> 
> 
> > 
> > We're looking to move a PoC in a customer to full-scale production.
> > 
> > Proxmox/Ceph cluster will be for VDI, and some VMs will use vGPU.
> > 
> > I'd like to know if vGPU status is being exposed right now (as of
> > 7.4) for each node through API, as 
> > it is done for RAM/CPU, and if not, about any plans to implement
> > that so that a scheduler (in our 
> > case that should be UDS Enterprise VDI manager) can choose a node
> > with free vGPUs to deploy VDIs.
> 
> what exactly do you mean with vGPU status?
> 
> there currently is no api to see which pci devices are in use of a vm
> (though that could be done per node, not really for mediated devices
> though)
> 
> there is the /nodes/NODENAME/hardware/pci api call which shows what
> devices exist
> and if they have mdev (mediated device) capability (e.g. NVIDIA GRID
> vGPU)
> 
> for those cards there also exists the api call
> 
> /nodes/NODENAME/hardware/pci/PCIID/mdev
> 
> which gives a list of mdev types and how many are available of them
> 
> 
> does that help?
> 
> if you have more specific requirements (or i misunderstood you),
> please
> open a bug/feature request on
> https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0HXt7OYBHs7zwFT11HmTkxxI8I-pATquRuU6kvITTv9cBf2Zui06iS-0a3kahQ3A&f=S1Zkd042VWdrZG5qQUxxWk5ps4tr-FB5X49U18fsoL28uwE1E1KkbxV-Cz-yhkuBYlgSb2bYN4CyFf-pEZOdcQ&u=https%3A//bugzilla.proxmox.com&k=F1is
> 
> > 
> > Thanks
> 
> Kind Regards
> Dominik
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0HXt7OYBHs7zwFT11HmTkxxI8I-pATquRuU6kvITTv9cBf2Zui06iS-0a3kahQ3A&f=S1Zkd042VWdrZG5qQUxxWk5ps4tr-FB5X49U18fsoL28uwE1E1KkbxV-Cz-yhkuBYlgSb2bYN4CyFf-pEZOdcQ&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user&k=F1is
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-25  7:32   ` DERUMIER, Alexandre
@ 2023-05-25  7:43     ` Dominik Csapak
  2023-05-25  9:03       ` Thomas Lamprecht
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dominik Csapak @ 2023-05-25  7:43 UTC (permalink / raw)
  To: pve-user

On 5/25/23 09:32, DERUMIER, Alexandre wrote:
> Hi Dominik,
> 
> any news about your patches "add cluster-wide hardware device mapping"

i'm currently on a new version of this
first part was my recent series for the section config/api array support

i think i can send the new version for the backend this week

> ?
> 
> Do you think it'll be merged for proxmox 8 ?

i don't know, but this also depends on the capacity of my colleagues to review ;)

> 
> I think it could help for this usecase.
> 

yes i think so too, but it was not directly connected to the request so
i did not mention it




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
       [not found]         ` <mailman.252.1685000179.348.pve-user@lists.proxmox.com>
@ 2023-05-25  7:53           ` Dominik Csapak
  0 siblings, 0 replies; 7+ messages in thread
From: Dominik Csapak @ 2023-05-25  7:53 UTC (permalink / raw)
  To: pve-user

On 5/25/23 09:36, Eneko Lacunza via pve-user wrote:
> Hi Dominik,
> 
> 
> El 25/5/23 a las 9:24, Dominik Csapak escribió:
>>
>>>     2.12.0 (qemu-kvm-2.12.0-64.el8.2.27782638)
>>>   * Microsoft Windows Server with Hyper-V 2019 Datacenter edition
>>>   * Red Hat Enterprise Linux Kernel-based Virtual Machine (KVM) 9.0 and 9.1
>>>   * Red Hat Virtualization 4.3
>>>   * Ubuntu Hypervisor 22.04
>>>   * VMware vSphere Hypervisor (ESXi) 7.0.1, 7.0.2, and 7.0.3
>>>
>>> Is there any effort planned or on the way to have Proxmox added to the above list?
>>
>> We'd generally like to be on the supported hypervisor list, but currently
>> none of our efforts to contact NVIDIA regarding this were successful,
>> but i hope we can solve this sometime in the future...
> 
> I can try to report this via customer request to nvidia, where should I refer them to?

You can refer them directly to me (d.csapak@proxmox.com) or our office mail (office@proxmox.com).
Maybe it helps if the request comes also from the customer side.

> 
>>
>>>
>>> As Ubuntu 22.04 is in it and the Proxmox kernel is derived from it, the technical effort may not 
>>> be so large.
>>
>> Yes, their current Linux KVM package (15.2) should work with our 5.15 kernel,
>> it's what i use here locally to test, e.g. [0]
> 
> We had varying success with 5.15 kernels, some versions work but others do not (refused to work 
> after kernel upgrade and had to pin older kernel). Maybe it would be worth to keep a list of known 
> to work/known not to work kernels?

Normally i have my tests running with each update of the 5.15 kernel and i did not see
any special problems there. The only recent thing was that we had to change how we
clean up the mediated devices for their newer versions [0]

Note that i only test the latest supported GRID version though (Currently 15.2)

Regards
Dominik

0: https://git.proxmox.com/?p=qemu-server.git;a=commit;h=49c51a60db7f12d7fe2073b755d18b4d9b628fbd




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-25  7:43     ` Dominik Csapak
@ 2023-05-25  9:03       ` Thomas Lamprecht
  2024-05-31  7:59       ` Eneko Lacunza via pve-user
       [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Lamprecht @ 2023-05-25  9:03 UTC (permalink / raw)
  To: Proxmox VE user list, Dominik Csapak, Alexandre Derumier

Am 25/05/2023 um 09:43 schrieb Dominik Csapak:
>>
>> Do you think it'll be merged for proxmox 8 ?
> 
> i don't know, but this also depends on the capacity of my colleagues to review 😉

making it easy to digest and adding (good) tests will surely help to
accelerate this ;-P

But, you're naturally right, and tbh., while I'll try hard to get the access-control
and some other fundaments in, especially those where we can profit from the higher
freedom/flexibility of a major release, I cannot definitely say that the actual HW
mapping will make it for initial 8.0.

For initial major release I prefer having a bit less features but focus more on that
the existing features keep working and that there's a stable and well tested upgrade path.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-25  7:43     ` Dominik Csapak
  2023-05-25  9:03       ` Thomas Lamprecht
@ 2024-05-31  7:59       ` Eneko Lacunza via pve-user
       [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
  2 siblings, 0 replies; 7+ messages in thread
From: Eneko Lacunza via pve-user @ 2024-05-31  7:59 UTC (permalink / raw)
  To: pve-user; +Cc: Eneko Lacunza

[-- Attachment #1: Type: message/rfc822, Size: 6471 bytes --]

From: Eneko Lacunza <elacunza@binovo.es>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] vGPU scheduling
Date: Fri, 31 May 2024 09:59:14 +0200
Message-ID: <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>


Hi Dominik,

Do you have any expected timeline/version for this to be merged?

Thanks

El 25/5/23 a las 9:43, Dominik Csapak escribió:
> On 5/25/23 09:32, DERUMIER, Alexandre wrote:
>> Hi Dominik,
>>
>> any news about your patches "add cluster-wide hardware device mapping"
>
> i'm currently on a new version of this
> first part was my recent series for the section config/api array support
>
> i think i can send the new version for the backend this week
>
>> ?
>>
>> Do you think it'll be merged for proxmox 8 ?
>
> i don't know, but this also depends on the capacity of my colleagues 
> to review ;)
>
>>
>> I think it could help for this usecase.
>>
>
> yes i think so too, but it was not directly connected to the request so
> i did not mention it
>
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
       [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
@ 2024-05-31  8:08         ` Dominik Csapak
  0 siblings, 0 replies; 7+ messages in thread
From: Dominik Csapak @ 2024-05-31  8:08 UTC (permalink / raw)
  To: Eneko Lacunza, pve-user

On 5/31/24 09:59, Eneko Lacunza wrote:
> 
> Hi Dominik,
> 
> Do you have any expected timeline/version for this to be merged?

the cluster wide device mapping was merged last year already and is included in pve 8.0 and
onwards. or do you mean something different?

> 
> Thanks
> 
> El 25/5/23 a las 9:43, Dominik Csapak escribió:
>> On 5/25/23 09:32, DERUMIER, Alexandre wrote:
>>> Hi Dominik,
>>>
>>> any news about your patches "add cluster-wide hardware device mapping"
>>
>> i'm currently on a new version of this
>> first part was my recent series for the section config/api array support
>>
>> i think i can send the new version for the backend this week
>>
>>> ?
>>>
>>> Do you think it'll be merged for proxmox 8 ?
>>
>> i don't know, but this also depends on the capacity of my colleagues to review ;)
>>
>>>
>>> I think it could help for this usecase.
>>>
>>
>> yes i think so too, but it was not directly connected to the request so
>> i did not mention it
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> 
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
> 
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> 
> https://www.youtube.com/user/CANALBINOVO
> https://www.linkedin.com/company/37269706/
> 
> 



_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-31  8:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.190.1684933430.348.pve-user@lists.proxmox.com>
2023-05-24 13:47 ` [PVE-User] vGPU scheduling Dominik Csapak
2023-05-25  7:32   ` DERUMIER, Alexandre
2023-05-25  7:43     ` Dominik Csapak
2023-05-25  9:03       ` Thomas Lamprecht
2024-05-31  7:59       ` Eneko Lacunza via pve-user
     [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
2024-05-31  8:08         ` Dominik Csapak
     [not found]   ` <mailman.193.1684936457.348.pve-user@lists.proxmox.com>
     [not found]     ` <mailman.222.1684948008.348.pve-user@lists.proxmox.com>
     [not found]       ` <0e48cb4a-7fa0-2bd7-9d4e-f18ab8e03d20@proxmox.com>
     [not found]         ` <mailman.252.1685000179.348.pve-user@lists.proxmox.com>
2023-05-25  7:53           ` Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal