Re: [PVE-User] vGPU scheduling

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* Re: [PVE-User] vGPU scheduling
       [not found] <mailman.190.1684933430.348.pve-user@lists.proxmox.com>
@ 2023-05-24 13:47 ` Dominik Csapak
  2023-05-25  7:32   ` DERUMIER, Alexandre
       [not found]   ` <mailman.193.1684936457.348.pve-user@lists.proxmox.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Dominik Csapak @ 2023-05-24 13:47 UTC (permalink / raw)
  To: pve-user

On 5/24/23 15:03, Eneko Lacunza via pve-user wrote:
> Hi,

Hi,


> 
> We're looking to move a PoC in a customer to full-scale production.
> 
> Proxmox/Ceph cluster will be for VDI, and some VMs will use vGPU.
> 
> I'd like to know if vGPU status is being exposed right now (as of 7.4) for each node through API, as 
> it is done for RAM/CPU, and if not, about any plans to implement that so that a scheduler (in our 
> case that should be UDS Enterprise VDI manager) can choose a node with free vGPUs to deploy VDIs.

what exactly do you mean with vGPU status?

there currently is no api to see which pci devices are in use of a vm
(though that could be done per node, not really for mediated devices though)

there is the /nodes/NODENAME/hardware/pci api call which shows what devices exist
and if they have mdev (mediated device) capability (e.g. NVIDIA GRID vGPU)

for those cards there also exists the api call

/nodes/NODENAME/hardware/pci/PCIID/mdev

which gives a list of mdev types and how many are available of them


does that help?

if you have more specific requirements (or i misunderstood you), please
open a bug/feature request on https://bugzilla.proxmox.com

> 
> Thanks

Kind Regards
Dominik




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-24 13:47 ` [PVE-User] vGPU scheduling Dominik Csapak
@ 2023-05-25  7:32   ` DERUMIER, Alexandre
  2023-05-25  7:43     ` Dominik Csapak
       [not found]   ` <mailman.193.1684936457.348.pve-user@lists.proxmox.com>
  1 sibling, 1 reply; 7+ messages in thread
From: DERUMIER, Alexandre @ 2023-05-25  7:32 UTC (permalink / raw)
  To: pve-user

Hi Dominik,

any news about your patches "add cluster-wide hardware device mapping"
?

Do you think it'll be merged for proxmox 8 ?

I think it could help for this usecase.

Le mercredi 24 mai 2023 à 15:47 +0200, Dominik Csapak a écrit :
> On 5/24/23 15:03, Eneko Lacunza via pve-user wrote:
> > Hi,
> 
> Hi,
> 
> 
> > 
> > We're looking to move a PoC in a customer to full-scale production.
> > 
> > Proxmox/Ceph cluster will be for VDI, and some VMs will use vGPU.
> > 
> > I'd like to know if vGPU status is being exposed right now (as of
> > 7.4) for each node through API, as 
> > it is done for RAM/CPU, and if not, about any plans to implement
> > that so that a scheduler (in our 
> > case that should be UDS Enterprise VDI manager) can choose a node
> > with free vGPUs to deploy VDIs.
> 
> what exactly do you mean with vGPU status?
> 
> there currently is no api to see which pci devices are in use of a vm
> (though that could be done per node, not really for mediated devices
> though)
> 
> there is the /nodes/NODENAME/hardware/pci api call which shows what
> devices exist
> and if they have mdev (mediated device) capability (e.g. NVIDIA GRID
> vGPU)
> 
> for those cards there also exists the api call
> 
> /nodes/NODENAME/hardware/pci/PCIID/mdev
> 
> which gives a list of mdev types and how many are available of them
> 
> 
> does that help?
> 
> if you have more specific requirements (or i misunderstood you),
> please
> open a bug/feature request on
> https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0HXt7OYBHs7zwFT11HmTkxxI8I-pATquRuU6kvITTv9cBf2Zui06iS-0a3kahQ3A&f=S1Zkd042VWdrZG5qQUxxWk5ps4tr-FB5X49U18fsoL28uwE1E1KkbxV-Cz-yhkuBYlgSb2bYN4CyFf-pEZOdcQ&u=https%3A//bugzilla.proxmox.com&k=F1is
> 
> > 
> > Thanks
> 
> Kind Regards
> Dominik
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0HXt7OYBHs7zwFT11HmTkxxI8I-pATquRuU6kvITTv9cBf2Zui06iS-0a3kahQ3A&f=S1Zkd042VWdrZG5qQUxxWk5ps4tr-FB5X49U18fsoL28uwE1E1KkbxV-Cz-yhkuBYlgSb2bYN4CyFf-pEZOdcQ&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user&k=F1is
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-25  7:32   ` DERUMIER, Alexandre
@ 2023-05-25  7:43     ` Dominik Csapak
  2023-05-25  9:03       ` Thomas Lamprecht
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dominik Csapak @ 2023-05-25  7:43 UTC (permalink / raw)
  To: pve-user

On 5/25/23 09:32, DERUMIER, Alexandre wrote:
> Hi Dominik,
> 
> any news about your patches "add cluster-wide hardware device mapping"

i'm currently on a new version of this
first part was my recent series for the section config/api array support

i think i can send the new version for the backend this week

> ?
> 
> Do you think it'll be merged for proxmox 8 ?

i don't know, but this also depends on the capacity of my colleagues to review ;)

> 
> I think it could help for this usecase.
> 

yes i think so too, but it was not directly connected to the request so
i did not mention it




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
       [not found]         ` <mailman.252.1685000179.348.pve-user@lists.proxmox.com>
@ 2023-05-25  7:53           ` Dominik Csapak
  0 siblings, 0 replies; 7+ messages in thread
From: Dominik Csapak @ 2023-05-25  7:53 UTC (permalink / raw)
  To: pve-user

On 5/25/23 09:36, Eneko Lacunza via pve-user wrote:
> Hi Dominik,
> 
> 
> El 25/5/23 a las 9:24, Dominik Csapak escribió:
>>
>>>     2.12.0 (qemu-kvm-2.12.0-64.el8.2.27782638)
>>>   * Microsoft Windows Server with Hyper-V 2019 Datacenter edition
>>>   * Red Hat Enterprise Linux Kernel-based Virtual Machine (KVM) 9.0 and 9.1
>>>   * Red Hat Virtualization 4.3
>>>   * Ubuntu Hypervisor 22.04
>>>   * VMware vSphere Hypervisor (ESXi) 7.0.1, 7.0.2, and 7.0.3
>>>
>>> Is there any effort planned or on the way to have Proxmox added to the above list?
>>
>> We'd generally like to be on the supported hypervisor list, but currently
>> none of our efforts to contact NVIDIA regarding this were successful,
>> but i hope we can solve this sometime in the future...
> 
> I can try to report this via customer request to nvidia, where should I refer them to?

You can refer them directly to me (d.csapak@proxmox.com) or our office mail (office@proxmox.com).
Maybe it helps if the request comes also from the customer side.

> 
>>
>>>
>>> As Ubuntu 22.04 is in it and the Proxmox kernel is derived from it, the technical effort may not 
>>> be so large.
>>
>> Yes, their current Linux KVM package (15.2) should work with our 5.15 kernel,
>> it's what i use here locally to test, e.g. [0]
> 
> We had varying success with 5.15 kernels, some versions work but others do not (refused to work 
> after kernel upgrade and had to pin older kernel). Maybe it would be worth to keep a list of known 
> to work/known not to work kernels?

Normally i have my tests running with each update of the 5.15 kernel and i did not see
any special problems there. The only recent thing was that we had to change how we
clean up the mediated devices for their newer versions [0]

Note that i only test the latest supported GRID version though (Currently 15.2)

Regards
Dominik

0: https://git.proxmox.com/?p=qemu-server.git;a=commit;h=49c51a60db7f12d7fe2073b755d18b4d9b628fbd




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-25  7:43     ` Dominik Csapak
@ 2023-05-25  9:03       ` Thomas Lamprecht
  2024-05-31  7:59       ` Eneko Lacunza via pve-user
       [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Lamprecht @ 2023-05-25  9:03 UTC (permalink / raw)
  To: Proxmox VE user list, Dominik Csapak, Alexandre Derumier

Am 25/05/2023 um 09:43 schrieb Dominik Csapak:
>>
>> Do you think it'll be merged for proxmox 8 ?
> 
> i don't know, but this also depends on the capacity of my colleagues to review 😉

making it easy to digest and adding (good) tests will surely help to
accelerate this ;-P

But, you're naturally right, and tbh., while I'll try hard to get the access-control
and some other fundaments in, especially those where we can profit from the higher
freedom/flexibility of a major release, I cannot definitely say that the actual HW
mapping will make it for initial 8.0.

For initial major release I prefer having a bit less features but focus more on that
the existing features keep working and that there's a stable and well tested upgrade path.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
  2023-05-25  7:43     ` Dominik Csapak
  2023-05-25  9:03       ` Thomas Lamprecht
@ 2024-05-31  7:59       ` Eneko Lacunza via pve-user
       [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
  2 siblings, 0 replies; 7+ messages in thread
From: Eneko Lacunza via pve-user @ 2024-05-31  7:59 UTC (permalink / raw)
  To: pve-user; +Cc: Eneko Lacunza

[-- Attachment #1: Type: message/rfc822, Size: 6471 bytes --]

From: Eneko Lacunza <elacunza@binovo.es>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] vGPU scheduling
Date: Fri, 31 May 2024 09:59:14 +0200
Message-ID: <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>


Hi Dominik,

Do you have any expected timeline/version for this to be merged?

Thanks

El 25/5/23 a las 9:43, Dominik Csapak escribió:
> On 5/25/23 09:32, DERUMIER, Alexandre wrote:
>> Hi Dominik,
>>
>> any news about your patches "add cluster-wide hardware device mapping"
>
> i'm currently on a new version of this
> first part was my recent series for the section config/api array support
>
> i think i can send the new version for the backend this week
>
>> ?
>>
>> Do you think it'll be merged for proxmox 8 ?
>
> i don't know, but this also depends on the capacity of my colleagues 
> to review ;)
>
>>
>> I think it could help for this usecase.
>>
>
> yes i think so too, but it was not directly connected to the request so
> i did not mention it
>
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PVE-User] vGPU scheduling
       [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
@ 2024-05-31  8:08         ` Dominik Csapak
  0 siblings, 0 replies; 7+ messages in thread
From: Dominik Csapak @ 2024-05-31  8:08 UTC (permalink / raw)
  To: Eneko Lacunza, pve-user

On 5/31/24 09:59, Eneko Lacunza wrote:
> 
> Hi Dominik,
> 
> Do you have any expected timeline/version for this to be merged?

the cluster wide device mapping was merged last year already and is included in pve 8.0 and
onwards. or do you mean something different?

> 
> Thanks
> 
> El 25/5/23 a las 9:43, Dominik Csapak escribió:
>> On 5/25/23 09:32, DERUMIER, Alexandre wrote:
>>> Hi Dominik,
>>>
>>> any news about your patches "add cluster-wide hardware device mapping"
>>
>> i'm currently on a new version of this
>> first part was my recent series for the section config/api array support
>>
>> i think i can send the new version for the backend this week
>>
>>> ?
>>>
>>> Do you think it'll be merged for proxmox 8 ?
>>
>> i don't know, but this also depends on the capacity of my colleagues to review ;)
>>
>>>
>>> I think it could help for this usecase.
>>>
>>
>> yes i think so too, but it was not directly connected to the request so
>> i did not mention it
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> 
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
> 
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> 
> https://www.youtube.com/user/CANALBINOVO
> https://www.linkedin.com/company/37269706/
> 
> 



_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-31  8:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.190.1684933430.348.pve-user@lists.proxmox.com>
2023-05-24 13:47 ` [PVE-User] vGPU scheduling Dominik Csapak
2023-05-25  7:32   ` DERUMIER, Alexandre
2023-05-25  7:43     ` Dominik Csapak
2023-05-25  9:03       ` Thomas Lamprecht
2024-05-31  7:59       ` Eneko Lacunza via pve-user
     [not found]       ` <cbb9c11d-5b8e-4f1a-98dd-b3e1cf4be45c@binovo.es>
2024-05-31  8:08         ` Dominik Csapak
     [not found]   ` <mailman.193.1684936457.348.pve-user@lists.proxmox.com>
     [not found]     ` <mailman.222.1684948008.348.pve-user@lists.proxmox.com>
     [not found]       ` <0e48cb4a-7fa0-2bd7-9d4e-f18ab8e03d20@proxmox.com>
     [not found]         ` <mailman.252.1685000179.348.pve-user@lists.proxmox.com>
2023-05-25  7:53           ` Dominik Csapak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal