From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id CDF441FF136
	for <inbox@lore.proxmox.com>; Mon, 23 Mar 2026 12:02:00 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 06171127AE;
	Mon, 23 Mar 2026 12:02:20 +0100 (CET)
Message-ID: <1df04afb-7dcd-4576-bc78-b36cfbe50a92@proxmox.com>
Date: Mon, 23 Mar 2026 12:01:44 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
Subject: Re: [PATCH proxmox-perl-rs 1/1] pve: add binding for accessing vgpu
 info
To: Christoph Heiss <c.heiss@proxmox.com>
References: <20260305091711.1221589-1-d.csapak@proxmox.com>
 <20260305091711.1221589-10-d.csapak@proxmox.com>
 <DH6PT65DFX1Q.36ZZKMY6K8LG5@proxmox.com>
Content-Language: en-US
From: Dominik Csapak <d.csapak@proxmox.com>
In-Reply-To: <DH6PT65DFX1Q.36ZZKMY6K8LG5@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1774263659126
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.041 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: DWKCMSFGLGF3GQJ6SY7WFTDIETVZYDNR
X-Message-ID-Hash: DWKCMSFGLGF3GQJ6SY7WFTDIETVZYDNR
X-MailFrom: d.csapak@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
CC: pve-devel@lists.proxmox.com
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>


On 3/19/26 12:16 PM, Christoph Heiss wrote:
> Two comments inline.
> 
> Other than that, please consider it:
> 
> Reviewed-by: Christoph Heiss <c.heiss@proxmox.com>
> 
> On Thu Mar 5, 2026 at 10:16 AM CET, Dominik Csapak wrote:
> [..]
>> diff --git a/pve-rs/Cargo.toml b/pve-rs/Cargo.toml
>> index 45389b5..3b6c2fc 100644
>> --- a/pve-rs/Cargo.toml
>> +++ b/pve-rs/Cargo.toml
>> @@ -20,6 +20,7 @@ hex = "0.4"
>>   http = "1"
>>   libc = "0.2"
>>   nix = "0.29"
>> +nvml-wrapper = "0.12"
> 
> Missing the respective entry in d/control.
> 
> [..]
>> diff --git a/pve-rs/src/bindings/nvml.rs b/pve-rs/src/bindings/nvml.rs
>> new file mode 100644
>> index 0000000..0f4c81e
>> --- /dev/null
>> +++ b/pve-rs/src/bindings/nvml.rs
>> @@ -0,0 +1,91 @@
>> +//! Provides access to the state of NVIDIA (v)GPU devices connected to the system.
>> +
>> +#[perlmod::package(name = "PVE::RS::NVML", lib = "pve_rs")]
>> +pub mod pve_rs_nvml {
>> +    //! The `PVE::RS::NVML` package.
>> +    //!
>> +    //! Provides high level helpers to get info from the system with NVML.
>> +
>> +    use anyhow::Result;
>> +    use nvml_wrapper::Nvml;
>> +    use perlmod::Value;
>> +
>> +    /// Retrieves a list of *creatable* vGPU types for the specified GPU by bus id.
>> +    ///
>> +    /// The [`bus_id`] is of format "\<domain\>:\<bus\>:\<device\>.\<function\>",
>> +    /// e.g. "0000:01:01.0".
>> +    ///
>> +    /// # See also
>> +    ///
>> +    /// [`nvmlDeviceGetCreatableVgpus`]: <https://docs.nvidia.com/deploy/nvml-api/group__nvmlVgpu.html#group__nvmlVgpu_1ge86fff933c262740f7a374973c4747b6>
>> +    /// [`nvmlDeviceGetHandleByPciBusId_v2`]: <https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1gea7484bb9eac412c28e8a73842254c05>
>> +    /// [`struct nvmlPciInfo_t`]: <https://docs.nvidia.com/deploy/nvml-api/structnvmlPciInfo__t.html#structnvmlPciInfo__t_1a4d54ad9b596d7cab96ecc34613adbe4>
>> +    #[export]
>> +    fn creatable_vgpu_types_for_dev(bus_id: &str) -> Result<Vec<Value>> {
>> +        let nvml = Nvml::init()?;
> 
> Looking at this, I was wondering how expensive that call is, considering
> this path is triggered from the API. Same for
> supported_vgpu_types_for_dev() below.
> 
> Did some quick & simple benchmarking - on average, `Nvml::init()` took
> ~32ms, with quite some variance; at best ~26ms up to an worst case
> of >150ms.
> 
> IMO nothing worth blocking the series on, as this falls into premature
> optimization territory and can be fixed in the future, if needed.
> 
> Holding an instance in memory might also be problematic on driver
> upgrades? I.e. we keep an old version of the library loaded, and thus
> mismatched API.
> 
> The above results were done with one GPU only though, so potentially
> could be worse on multi-GPU systems.

what we could do is to cache the results from this either
here, or in perl (i think it's easier to do on the perl side)

that way the cost has to be only paid once, and the amount
of data should be in the KBs only.

I think this should work because the available models/devices
can't change while the server is up?

> 
>> +        let device = nvml.device_by_pci_bus_id(bus_id)?;
>> +
>> +        build_vgpu_type_list(device.vgpu_creatable_types()?)
>> +    }
>> +
>> +    /// Retrieves a list of *supported* vGPU types for the specified GPU by bus id.
>> +    ///
>> +    /// The [`bus_id`] is of format "\<domain\>:\<bus\>:\<device\>.\<function\>",
>> +    /// e.g. "0000:01:01.0".
>> +    ///
>> +    /// # See also
>> +    ///
>> +    /// [`nvmlDeviceGetSupportedVgpus`]: <https://docs.nvidia.com/deploy/nvml-api/group__nvmlVgpu.html#group__nvmlVgpu_1ge084b87e80350165859500ebec714274>
>> +    /// [`nvmlDeviceGetHandleByPciBusId_v2`]: <https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1gea7484bb9eac412c28e8a73842254c05>
>> +    /// [`struct nvmlPciInfo_t`]: <https://docs.nvidia.com/deploy/nvml-api/structnvmlPciInfo__t.html#structnvmlPciInfo__t_1a4d54ad9b596d7cab96ecc34613adbe4>
>> +    #[export]
>> +    fn supported_vgpu_types_for_dev(bus_id: &str) -> Result<Vec<Value>> {
>> +        let nvml = Nvml::init()?;
>> +        let device = nvml.device_by_pci_bus_id(bus_id)?;
>> +
>> +        build_vgpu_type_list(device.vgpu_supported_types()?)
>> +    }