From: Christoph Heiss <c.heiss@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH common/qemu-server/manager] adapt to nvidia vgpu api changes
Date: Thu, 17 Oct 2024 12:16:30 +0200	[thread overview]
Message-ID: <ul3faxwxsi45hk4e44cl2fw4jomoncy53owx3you2wuvo7pymf@dchu53bqfa3p> (raw)
In-Reply-To: <20240806122203.2266054-1-d.csapak@proxmox.com>
Tested this entire series (+ the one prerequisite patch) using an RTX
A5000. Everything applied cleanly on latest master on each respective
repo. Tested on (latest) kernel 6.8.12-2-pve for reference.
vGPU datacenter resource mapping and VM PCI device setup worked fine as
it should, once I've got the drivers & co properly set up.
Verified the available vGPU PCI devices on the host as well as in the
VM, as well as using `nvidia-smi`. Further I've tested running some CUDA
workloads in the VM, to ensure everything is properly set up.
The only "regression" I've noticed is that the Nvidia devices now have
an empty description in the mdev device list, but it's not critically
and can be improved in the future - as also duly noted in the code.
Also looked through the code, looks good IMO. Having to special-case
it for "normal" mdevs and nvidia is rather unfortunate, but - also
talked off-list a bit with Dominik about it - implementing it as a
plugin system would be way more work than justifiable in this case.
So please consider the entire series:
Tested-by: Christoph Heiss <c.heiss@proxmox.com>
Reviewed-by: Christoph Heiss <c.heiss@proxmox.com>
On Tue, Aug 06, 2024 at 02:21:57PM GMT, Dominik Csapak wrote:
> For many new cards, nvidia changed the kernel interface since kernel
> verion 6.8. Instead of using mediated devices, they provide their own
> api.
>
> This series adapts to that, with no required change to the vm config,
> and only minimal changes to our api.
>
> The biggest change is that the mdev types can now be queried on
> /nodes/NODE/hardware/pci/<pciid-or-mapping/mdev either via a pci id
> (like it was before) or via the name of a pci mapping (now checks all
> local devices from that mapping)
>
> A thing to improve could be to parse the available vgpu types from
> nvidia-smi instead of the sysfs, since that not always contains all
> types (see the common patch 1/2 for details)
>
> We could abstract the code that deals with different types probably a
> bit more, but for me it seems Ok for now, and finding a good API for
> that is hard with only 3 modes that are very different from each other
> (raw/mdev/nvidia).
>
> qemu-server patches depend on the common patches, but the manager patch
> does not rely on any other in this series. It is required though
> for the user to be able to select types (in certain conditions).
>
> note that this series requires my previous patch to the sysfstools to
> improve write reliability[0], otherwise the cleanup or creation may
> fail.
>
> 0: https://lists.proxmox.com/pipermail/pve-devel/2024-July/064814.html
>
> pve-common:
>
> Dominik Csapak (2):
>   SysFSTools: handle new nvidia syfsapi as mdev
>   SysFSTools: lscpi: move mdev and iommugroup check outside of verbose
>
>  src/PVE/SysFSTools.pm | 83 ++++++++++++++++++++++++++-----------------
>  1 file changed, 51 insertions(+), 32 deletions(-)
>
> qemu-server:
>
> Dominik Csapak (3):
>   pci: choose devices: don't reserve pciids when vm is already running
>   pci: remove pci reservation: optionally give list of ids to remove
>   pci: mdev: adapt to nvidia interface with kernel >= 6.8
>
>  PVE/QemuServer.pm                | 30 +++++++++--
>  PVE/QemuServer/PCI.pm            | 92 +++++++++++++++++++++++++++++---
>  test/run_config2command_tests.pl |  8 ++-
>  3 files changed, 118 insertions(+), 12 deletions(-)
>
> pve-manager:
>
> Dominik Csapak (1):
>   api/ui: improve mdev listing for pci mappings
>
>  PVE/API2/Hardware/PCI.pm     | 45 +++++++++++++++++++++++++++++-------
>  www/manager6/qemu/PCIEdit.js | 12 +---------
>  2 files changed, 38 insertions(+), 19 deletions(-)
>
> --
> 2.39.2
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply	other threads:[~2024-10-17 10:16 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-06 12:21 Dominik Csapak
2024-08-06 12:21 ` [pve-devel] [PATCH common 1/2] SysFSTools: handle new nvidia syfsapi as mdev Dominik Csapak
2024-08-06 12:21 ` [pve-devel] [PATCH common 2/2] SysFSTools: lscpi: move mdev and iommugroup check outside of verbose Dominik Csapak
2024-08-06 12:22 ` [pve-devel] [PATCH qemu-server 1/3] pci: choose devices: don't reserve pciids when vm is already running Dominik Csapak
2024-08-06 12:22 ` [pve-devel] [PATCH qemu-server 2/3] pci: remove pci reservation: optionally give list of ids to remove Dominik Csapak
2024-08-06 12:22 ` [pve-devel] [PATCH qemu-server 3/3] pci: mdev: adapt to nvidia interface with kernel >= 6.8 Dominik Csapak
2024-08-06 12:22 ` [pve-devel] [PATCH manager 1/1] api/ui: improve mdev listing for pci mappings Dominik Csapak
2024-10-24 16:53   ` Thomas Lamprecht
2024-10-30  8:42     ` Dominik Csapak
2024-10-30  8:46       ` Thomas Lamprecht
2024-10-17 10:16 ` Christoph Heiss [this message]
2024-10-24 17:06 ` [pve-devel] partially-applied: [PATCH common/qemu-server/manager] adapt to nvidia vgpu api changes Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=ul3faxwxsi45hk4e44cl2fw4jomoncy53owx3you2wuvo7pymf@dchu53bqfa3p \
    --to=c.heiss@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox