From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 8CBEC1FF15C for ; Wed, 30 Oct 2024 12:10:12 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 51CAE16186; Wed, 30 Oct 2024 12:10:15 +0100 (CET) Date: Wed, 30 Oct 2024 12:09:41 +0100 From: Christoph Heiss To: Dominik Csapak Message-ID: <6obxtzisy4patvz6yifrtib7xoudrfmrgagv3udr5wssbqbjxq@7krizy3j7ys7> References: <20241028113114.550887-1-c.heiss@proxmox.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-SPAM-LEVEL: Spam detection results: 0 AWL 0.029 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Cc: pve-devel@lists.proxmox.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Thanks for the review! On Wed, Oct 30, 2024 at 11:13:36AM GMT, Dominik Csapak wrote: > Hi, > > thanks for tackling this! > > a few comments inline > > On 10/28/24 12:31, Christoph Heiss wrote: > > [..] > > And FWIW, these properties could also be retrieved without going through > > nvidia-smi using the NVML API directly [0], the same API nvidia-smi uses > > anyway under the hood. > > > > But that would require either using something like e.g. DynaLoader in > > perl [1] or calling it from Rust using e.g. the nvml-wrapper-sys [2] and > > wrapping it using perlmod. > > > > Both ways would be a bit involved of course, but also a lot more > > future-proof than parsing the human-readable output from `nvidia-smi`. > > If preferred I'd be happy to re-write it in some way or another. > > another alternative could be to wrap the few necessary calls in a small > c binary that we can call in place of nvidia-smi. > This tool could then output more structured (e.g. json) output, > though not sure if it's worth that, Sounds like a good idea, TBH. Although that should then be probably be packaged separately from libpve-common-perl, so there is still that "overhead". > since I hope that the nvidia-smi output is relatively stable I'd hope too, but I (personally, at least) wouldn't want to count on it. It is still a proprietary tool after all, so really could be changed any way seen fit w/o notice. > > (we can still change this method should parsing nvidia-smi output > turn out to be too brittle) Using it as a stop-gap measure basically would work for me, so that this isn't blocked unnecessarily long and gives us time to properly implement something like this. > > > [..] > > diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm > > index 0bde6d7..fc6282d 100644 > > --- a/src/PVE/SysFSTools.pm > > +++ b/src/PVE/SysFSTools.pm > > @@ -4,8 +4,10 @@ use strict; > > [..] > > + > > + my $command = ['nvidia-smi', 'vgpu', '--creatable', '--verbose']; > > IMO we should not use 'creatable' here but 'available' (or however it's named) FWIW, for reference, would be `--supported`. > AFAIR the 'creatable' are only the ones currently allowed to be created, > not the ones the cards generally support. > > This could become a problem e.g. when a vm is started with some profile, > then 'creatable' will only return models compatible to that, and even if > the vm is stopped after, since we cached the output, will not show the > description of the others) Oh I see, yeah, that makes sense then. I'll change it for v2, thanks! > > Of course we could instead change the caching logic to reparse it, whenever > we don't find the particular id in the cached config, but that > seems too much work if we can just have a list of all 'available' > > > + my $parsefn = sub { > > + my ($line) = @_; > > + return if $line =~ m/^GPU/; > > + > > + my @parts = split(':', $line); > > + return if scalar(@parts) != 2; > > + > > + my ($key, $value) = @parts; > > + > > + $key =~ s/^\s+|\s+$//g; # trim whitespace from start and end > > + $value =~ s/\s+//g; # trim all whitespace > > + $value =~ s/,/-/g; # replace any commas with dashes > > + > > + if ($key eq 'vGPU Type ID') { > > + $cur_id = hex($value); > > + } elsif (defined($generic_propmap->{$key}) && $value ne 'N/A') { > > + $configs->{$cur_id}->{$generic_propmap->{$key}} = $value; > > + } > > + > > + # `nvidia-smi` prints these keys/values in a deterministic order, > > + # so the order they appear in can be relied upon. > > + if ($key eq 'Maximum X Resolution') { > > + $configs->{$cur_id}->{'max-resolution'} = $value; > > + } elsif ($key eq 'Maximum Y Resolution') { > > + $configs->{$cur_id}->{'max-resolution'} .= "x$value"; > > + } > > + }; > > it would be great to have some tests here with some example output, that > way we can compare output if it changes, and it makes reasoning about the > parsing logic a bit easier. Sure, will add some tests with the next revision! > > > + > > + eval { > > + PVE::Tools::run_command($command, outfunc => $parsefn); > > + }; > > + > > + if (my $err = $@) { > > + warn "failed to run nvidia-smi: $err\n"; > > + return undef; > > + } > > + > > + for my $k (keys %$configs) { > > + $configs->{$k} = PVE::JSONSchema::print_property_string($configs->{$k}, $prop_schema); > > + } > > since the schema above is mostly emptly besides defining which keys there are, > is there a specific reason you don't just use something like (untested!): > > join(",", map { $_ ."=". $configs->{$_} } keys $configs->%*); > > ? > that way you could omit the prop_schema entirely No particular reason, other than of not thinking of it :^) I'll rework it, since the property schema is indeed mostly redundant w.r.t to the property map above it. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel