From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 61DA11FF15D for ; Thu, 17 Oct 2024 12:16:06 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 80857CF21; Thu, 17 Oct 2024 12:16:38 +0200 (CEST) Date: Thu, 17 Oct 2024 12:16:30 +0200 From: Christoph Heiss To: Dominik Csapak Message-ID: References: <20240806122203.2266054-1-d.csapak@proxmox.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20240806122203.2266054-1-d.csapak@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -0.121 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH common/qemu-server/manager] adapt to nvidia vgpu api changes X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Cc: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Tested this entire series (+ the one prerequisite patch) using an RTX A5000. Everything applied cleanly on latest master on each respective repo. Tested on (latest) kernel 6.8.12-2-pve for reference. vGPU datacenter resource mapping and VM PCI device setup worked fine as it should, once I've got the drivers & co properly set up. Verified the available vGPU PCI devices on the host as well as in the VM, as well as using `nvidia-smi`. Further I've tested running some CUDA workloads in the VM, to ensure everything is properly set up. The only "regression" I've noticed is that the Nvidia devices now have an empty description in the mdev device list, but it's not critically and can be improved in the future - as also duly noted in the code. Also looked through the code, looks good IMO. Having to special-case it for "normal" mdevs and nvidia is rather unfortunate, but - also talked off-list a bit with Dominik about it - implementing it as a plugin system would be way more work than justifiable in this case. So please consider the entire series: Tested-by: Christoph Heiss Reviewed-by: Christoph Heiss On Tue, Aug 06, 2024 at 02:21:57PM GMT, Dominik Csapak wrote: > For many new cards, nvidia changed the kernel interface since kernel > verion 6.8. Instead of using mediated devices, they provide their own > api. > > This series adapts to that, with no required change to the vm config, > and only minimal changes to our api. > > The biggest change is that the mdev types can now be queried on > /nodes/NODE/hardware/pci/ (like it was before) or via the name of a pci mapping (now checks all > local devices from that mapping) > > A thing to improve could be to parse the available vgpu types from > nvidia-smi instead of the sysfs, since that not always contains all > types (see the common patch 1/2 for details) > > We could abstract the code that deals with different types probably a > bit more, but for me it seems Ok for now, and finding a good API for > that is hard with only 3 modes that are very different from each other > (raw/mdev/nvidia). > > qemu-server patches depend on the common patches, but the manager patch > does not rely on any other in this series. It is required though > for the user to be able to select types (in certain conditions). > > note that this series requires my previous patch to the sysfstools to > improve write reliability[0], otherwise the cleanup or creation may > fail. > > 0: https://lists.proxmox.com/pipermail/pve-devel/2024-July/064814.html > > pve-common: > > Dominik Csapak (2): > SysFSTools: handle new nvidia syfsapi as mdev > SysFSTools: lscpi: move mdev and iommugroup check outside of verbose > > src/PVE/SysFSTools.pm | 83 ++++++++++++++++++++++++++----------------- > 1 file changed, 51 insertions(+), 32 deletions(-) > > qemu-server: > > Dominik Csapak (3): > pci: choose devices: don't reserve pciids when vm is already running > pci: remove pci reservation: optionally give list of ids to remove > pci: mdev: adapt to nvidia interface with kernel >= 6.8 > > PVE/QemuServer.pm | 30 +++++++++-- > PVE/QemuServer/PCI.pm | 92 +++++++++++++++++++++++++++++--- > test/run_config2command_tests.pl | 8 ++- > 3 files changed, 118 insertions(+), 12 deletions(-) > > pve-manager: > > Dominik Csapak (1): > api/ui: improve mdev listing for pci mappings > > PVE/API2/Hardware/PCI.pm | 45 +++++++++++++++++++++++++++++------- > www/manager6/qemu/PCIEdit.js | 12 +--------- > 2 files changed, 38 insertions(+), 19 deletions(-) > > -- > 2.39.2 > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel