public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi
@ 2024-10-28 11:31 Christoph Heiss
  2024-10-30 10:13 ` Dominik Csapak
  2024-10-30 12:08 ` Thomas Lamprecht
  0 siblings, 2 replies; 5+ messages in thread
From: Christoph Heiss @ 2024-10-28 11:31 UTC (permalink / raw)
  To: pve-devel

This calls the `nvidia-smi` to retrieve vGPU type properties and parses
them into a property string - much like the old vGPU mdev interface
presented them as description directly.

Unfortunately, `nvidia-smi` does not support some machine-readable
output format for the `vgpu` subcommand, so we're basically stuck with
parsing the human-readable.

The result is cached in /var/tmp, so that subsequent invocations do not
need to call `nvidia-smi` and parse the whole output again, as suggested
by Dominik off-list.

The final description for the devices is a proper property string and
looks something like e.g. this:

class=NVS,framebuffer-size=24576MiB,license=GRID-Virtual-Apps-3.0,max-instances=1,max-instances-per-vm=1,max-resolution=1280x1024,num-heads=1,fps-limit=60FPS

Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
---
Sending this as RFC for now, to see if the current approach is
acceptable.

There is also `/usr/share/nvidia/vgpu/vgpuConfig.xml`, which contains
information about all the available profiles, but it's missing some
important (runtime) information unfortunately - such as frame rate
limit.

And FWIW, these properties could also be retrieved without going through
nvidia-smi using the NVML API directly [0], the same API nvidia-smi uses
anyway under the hood.

But that would require either using something like e.g. DynaLoader in
perl [1] or calling it from Rust using e.g. the nvml-wrapper-sys [2] and
wrapping it using perlmod.

Both ways would be a bit involved of course, but also a lot more
future-proof than parsing the human-readable output from `nvidia-smi`.
If preferred I'd be happy to re-write it in some way or another.

[0] https://docs.nvidia.com/deploy/nvml-api/group__nvmlVgpu.html#group__nvmlVgpu
[1] https://perldoc.perl.org/DynaLoader
[2] https://docs.rs/nvml-wrapper-sys/0.8.0/nvml_wrapper_sys/index.html

 src/PVE/SysFSTools.pm | 99 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 98 insertions(+), 1 deletion(-)

diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm
index 0bde6d7..fc6282d 100644
--- a/src/PVE/SysFSTools.pm
+++ b/src/PVE/SysFSTools.pm
@@ -4,8 +4,10 @@ use strict;
 use warnings;
 
 use IO::File;
+use JSON qw(decode_json encode_json);
 
 use PVE::Tools qw(file_read_firstline dir_glob_foreach);
+use PVE::JSONSchema;
 
 my $pcisysfs = "/sys/bus/pci";
 my $domainregex = "[a-f0-9]{4,}";
@@ -145,6 +147,98 @@ sub lspci {
     return $devices;
 }
 
+my sub nvidia_parse_vgpu_config_from_smi {
+    # generic properties which values will be taken as-is
+    my $generic_propmap = {
+	'Class' => 'class',
+	'Max Instances' => 'max-instances',
+	'Max Instances Per VM' => 'max-instances-per-vm',
+	'FB Memory' => 'framebuffer-size',
+	'Frame Rate Limit' => 'fps-limit',
+	'Display Heads' => 'num-heads',
+	'Placement Size' => 'placement-size',
+	'GRID License' => 'license',
+    };
+
+    my $prop_schema = {
+	'class' => {},
+	'max-instances' => {},
+	'max-instances-per-vm' => {},
+	'framebuffer-size' => {},
+	'num-heads' => {},
+	'max-resolution' => {},
+	'license' => {},
+	'fps-limit' => { optional => 1 },
+	'placement-size' => { optional => 1 },
+    };
+
+    my $configs = {};
+    my $cur_id;
+
+    my $command = ['nvidia-smi', 'vgpu', '--creatable', '--verbose'];
+    my $parsefn = sub {
+	my ($line) = @_;
+	return if $line =~ m/^GPU/;
+
+	my @parts = split(':', $line);
+	return if scalar(@parts) != 2;
+
+	my ($key, $value) = @parts;
+
+	$key =~ s/^\s+|\s+$//g; # trim whitespace from start and end
+	$value =~ s/\s+//g; # trim all whitespace
+	$value =~ s/,/-/g; # replace any commas with dashes
+
+	if ($key eq 'vGPU Type ID') {
+	    $cur_id = hex($value);
+	} elsif (defined($generic_propmap->{$key}) && $value ne 'N/A') {
+	    $configs->{$cur_id}->{$generic_propmap->{$key}} = $value;
+	}
+
+	# `nvidia-smi` prints these keys/values in a deterministic order,
+	# so the order they appear in can be relied upon.
+	if ($key eq 'Maximum X Resolution') {
+	    $configs->{$cur_id}->{'max-resolution'} = $value;
+	} elsif ($key eq 'Maximum Y Resolution') {
+	    $configs->{$cur_id}->{'max-resolution'} .= "x$value";
+	}
+    };
+
+    eval {
+	PVE::Tools::run_command($command, outfunc => $parsefn);
+    };
+
+    if (my $err = $@) {
+	warn "failed to run nvidia-smi: $err\n";
+	return undef;
+    }
+
+    for my $k (keys %$configs) {
+	$configs->{$k} = PVE::JSONSchema::print_property_string($configs->{$k}, $prop_schema);
+    }
+
+    return $configs;
+}
+
+my sub nvidia_parse_vgpu_config_cached {
+    my $cachefile = '/var/tmp/pve-nvidia-vgpu-configs.json';
+
+    # First try reading from cached file
+    eval {
+	my $contents = PVE::Tools::file_get_contents($cachefile);
+	return decode_json($contents);
+    };
+
+    # Otherwise, go the slow path and parse it from nvidia-smi
+    my $configs = nvidia_parse_vgpu_config_from_smi();
+    return {} if !defined($configs);
+
+    # .. and cache it
+    PVE::Tools::file_set_contents($cachefile, encode_json($configs));
+
+    return $configs;
+}
+
 #
 # return format:
 # [
@@ -152,6 +246,7 @@ sub lspci {
 #         type => 'FooType_1',
 #         description => "a longer description with custom format\nand newlines",
 #         available => 5,
+#         name => "human-readable name of mdev/vGPU"
 #     },
 #     ...
 # ]
@@ -188,6 +283,8 @@ sub get_mdev_types {
 	});
     } elsif (-f $nvidia_path) {
 	my $creatable = PVE::Tools::file_get_contents($nvidia_path);
+	my $configs = nvidia_parse_vgpu_config_cached();
+
 	for my $line (split("\n", $creatable)) {
 	    next if $line =~ m/^ID/; # header
 	    next if $line !~ m/^(.*?)\s*:\s*(.*)$/;
@@ -196,7 +293,7 @@ sub get_mdev_types {
 
 	    push $types->@*, {
 		type => "nvidia-$id", # backwards compatibility
-		description => "", # TODO, read from xml/nvidia-smi ?
+		description => $configs->{$id} || '',
 		available => 1,
 		name  => $name,
 	    }
-- 
2.46.0



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi
  2024-10-28 11:31 [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi Christoph Heiss
@ 2024-10-30 10:13 ` Dominik Csapak
  2024-10-30 11:09   ` Christoph Heiss
  2024-10-30 12:08 ` Thomas Lamprecht
  1 sibling, 1 reply; 5+ messages in thread
From: Dominik Csapak @ 2024-10-30 10:13 UTC (permalink / raw)
  To: Christoph Heiss, pve-devel

Hi,

thanks for tackling this!

a few comments inline

On 10/28/24 12:31, Christoph Heiss wrote:
> This calls the `nvidia-smi` to retrieve vGPU type properties and parses
> them into a property string - much like the old vGPU mdev interface
> presented them as description directly.
> 
> Unfortunately, `nvidia-smi` does not support some machine-readable
> output format for the `vgpu` subcommand, so we're basically stuck with
> parsing the human-readable.
> 
> The result is cached in /var/tmp, so that subsequent invocations do not
> need to call `nvidia-smi` and parse the whole output again, as suggested
> by Dominik off-list.
> 
> The final description for the devices is a proper property string and
> looks something like e.g. this:
> 
> class=NVS,framebuffer-size=24576MiB,license=GRID-Virtual-Apps-3.0,max-instances=1,max-instances-per-vm=1,max-resolution=1280x1024,num-heads=1,fps-limit=60FPS
> 
> Signed-off-by: Christoph Heiss <c.heiss@proxmox.com>
> ---
> Sending this as RFC for now, to see if the current approach is
> acceptable.
> 
> There is also `/usr/share/nvidia/vgpu/vgpuConfig.xml`, which contains
> information about all the available profiles, but it's missing some
> important (runtime) information unfortunately - such as frame rate
> limit.
> 
> And FWIW, these properties could also be retrieved without going through
> nvidia-smi using the NVML API directly [0], the same API nvidia-smi uses
> anyway under the hood.
> 
> But that would require either using something like e.g. DynaLoader in
> perl [1] or calling it from Rust using e.g. the nvml-wrapper-sys [2] and
> wrapping it using perlmod.
> 
> Both ways would be a bit involved of course, but also a lot more
> future-proof than parsing the human-readable output from `nvidia-smi`.
> If preferred I'd be happy to re-write it in some way or another.

another alternative could be to wrap the few necessary calls in a small
c binary that we can call in place of nvidia-smi.
This tool could then output more structured (e.g. json) output,
though not sure if it's worth that, since I hope that the nvidia-smi
output is relatively stable

(we can still change this method should parsing nvidia-smi output
turn out to be too brittle)

> 
> [0] https://docs.nvidia.com/deploy/nvml-api/group__nvmlVgpu.html#group__nvmlVgpu
> [1] https://perldoc.perl.org/DynaLoader
> [2] https://docs.rs/nvml-wrapper-sys/0.8.0/nvml_wrapper_sys/index.html
> 
>   src/PVE/SysFSTools.pm | 99 ++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 98 insertions(+), 1 deletion(-)
> 
> diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm
> index 0bde6d7..fc6282d 100644
> --- a/src/PVE/SysFSTools.pm
> +++ b/src/PVE/SysFSTools.pm
> @@ -4,8 +4,10 @@ use strict;
>   use warnings;
>   
>   use IO::File;
> +use JSON qw(decode_json encode_json);
>   
>   use PVE::Tools qw(file_read_firstline dir_glob_foreach);
> +use PVE::JSONSchema;
>   
>   my $pcisysfs = "/sys/bus/pci";
>   my $domainregex = "[a-f0-9]{4,}";
> @@ -145,6 +147,98 @@ sub lspci {
>       return $devices;
>   }
>   
> +my sub nvidia_parse_vgpu_config_from_smi {
> +    # generic properties which values will be taken as-is
> +    my $generic_propmap = {
> +	'Class' => 'class',
> +	'Max Instances' => 'max-instances',
> +	'Max Instances Per VM' => 'max-instances-per-vm',
> +	'FB Memory' => 'framebuffer-size',
> +	'Frame Rate Limit' => 'fps-limit',
> +	'Display Heads' => 'num-heads',
> +	'Placement Size' => 'placement-size',
> +	'GRID License' => 'license',
> +    };
> +
> +    my $prop_schema = {
> +	'class' => {},
> +	'max-instances' => {},
> +	'max-instances-per-vm' => {},
> +	'framebuffer-size' => {},
> +	'num-heads' => {},
> +	'max-resolution' => {},
> +	'license' => {},
> +	'fps-limit' => { optional => 1 },
> +	'placement-size' => { optional => 1 },
> +    };
> +
> +    my $configs = {};
> +    my $cur_id;
> +
> +    my $command = ['nvidia-smi', 'vgpu', '--creatable', '--verbose'];

IMO we should not use 'creatable' here but 'available' (or however it's named)
AFAIR the 'creatable' are only the ones currently allowed to be created,
not the ones the cards generally support.

This could become a  problem e.g. when a vm is started with some profile,
then 'creatable' will only return models compatible to that, and even if
the vm is stopped after, since we cached the output, will not show the
description of the others)

Of course we could instead change the caching logic to reparse it, whenever
we don't find the particular id in the cached config, but that
seems too much work if we can just have a list of all 'available'

> +    my $parsefn = sub {
> +	my ($line) = @_;
> +	return if $line =~ m/^GPU/;
> +
> +	my @parts = split(':', $line);
> +	return if scalar(@parts) != 2;
> +
> +	my ($key, $value) = @parts;
> +
> +	$key =~ s/^\s+|\s+$//g; # trim whitespace from start and end
> +	$value =~ s/\s+//g; # trim all whitespace
> +	$value =~ s/,/-/g; # replace any commas with dashes
> +
> +	if ($key eq 'vGPU Type ID') {
> +	    $cur_id = hex($value);
> +	} elsif (defined($generic_propmap->{$key}) && $value ne 'N/A') {
> +	    $configs->{$cur_id}->{$generic_propmap->{$key}} = $value;
> +	}
> +
> +	# `nvidia-smi` prints these keys/values in a deterministic order,
> +	# so the order they appear in can be relied upon.
> +	if ($key eq 'Maximum X Resolution') {
> +	    $configs->{$cur_id}->{'max-resolution'} = $value;
> +	} elsif ($key eq 'Maximum Y Resolution') {
> +	    $configs->{$cur_id}->{'max-resolution'} .= "x$value";
> +	}
> +    };

it would be great to have some tests here with some example output, that
way we can compare output if it changes, and it makes reasoning about the
parsing logic a bit easier.

> +
> +    eval {
> +	PVE::Tools::run_command($command, outfunc => $parsefn);
> +    };
> +
> +    if (my $err = $@) {
> +	warn "failed to run nvidia-smi: $err\n";
> +	return undef;
> +    }
> +
> +    for my $k (keys %$configs) {
> +	$configs->{$k} = PVE::JSONSchema::print_property_string($configs->{$k}, $prop_schema);
> +    }

since the schema above is mostly emptly besides defining which keys there are,
is there a specific reason you don't just use something like (untested!):

join(",", map { $_ ."=". $configs->{$_} } keys $configs->%*);

?
that way you could omit the prop_schema entirely

> +
> +    return $configs;
> +}
> +
> +my sub nvidia_parse_vgpu_config_cached {
> +    my $cachefile = '/var/tmp/pve-nvidia-vgpu-configs.json';
> +
> +    # First try reading from cached file
> +    eval {
> +	my $contents = PVE::Tools::file_get_contents($cachefile);
> +	return decode_json($contents);
> +    };
> +
> +    # Otherwise, go the slow path and parse it from nvidia-smi
> +    my $configs = nvidia_parse_vgpu_config_from_smi();
> +    return {} if !defined($configs);
> +
> +    # .. and cache it
> +    PVE::Tools::file_set_contents($cachefile, encode_json($configs));
> +
> +    return $configs;
> +}
> +
>   #
>   # return format:
>   # [
> @@ -152,6 +246,7 @@ sub lspci {
>   #         type => 'FooType_1',
>   #         description => "a longer description with custom format\nand newlines",
>   #         available => 5,
> +#         name => "human-readable name of mdev/vGPU"
>   #     },
>   #     ...
>   # ]
> @@ -188,6 +283,8 @@ sub get_mdev_types {
>   	});
>       } elsif (-f $nvidia_path) {
>   	my $creatable = PVE::Tools::file_get_contents($nvidia_path);
> +	my $configs = nvidia_parse_vgpu_config_cached();
> +
>   	for my $line (split("\n", $creatable)) {
>   	    next if $line =~ m/^ID/; # header
>   	    next if $line !~ m/^(.*?)\s*:\s*(.*)$/;
> @@ -196,7 +293,7 @@ sub get_mdev_types {
>   
>   	    push $types->@*, {
>   		type => "nvidia-$id", # backwards compatibility
> -		description => "", # TODO, read from xml/nvidia-smi ?
> +		description => $configs->{$id} || '',
>   		available => 1,
>   		name  => $name,
>   	    }



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi
  2024-10-30 10:13 ` Dominik Csapak
@ 2024-10-30 11:09   ` Christoph Heiss
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Heiss @ 2024-10-30 11:09 UTC (permalink / raw)
  To: Dominik Csapak; +Cc: pve-devel

Thanks for the review!

On Wed, Oct 30, 2024 at 11:13:36AM GMT, Dominik Csapak wrote:
> Hi,
>
> thanks for tackling this!
>
> a few comments inline
>
> On 10/28/24 12:31, Christoph Heiss wrote:
> > [..]
> > And FWIW, these properties could also be retrieved without going through
> > nvidia-smi using the NVML API directly [0], the same API nvidia-smi uses
> > anyway under the hood.
> >
> > But that would require either using something like e.g. DynaLoader in
> > perl [1] or calling it from Rust using e.g. the nvml-wrapper-sys [2] and
> > wrapping it using perlmod.
> >
> > Both ways would be a bit involved of course, but also a lot more
> > future-proof than parsing the human-readable output from `nvidia-smi`.
> > If preferred I'd be happy to re-write it in some way or another.
>
> another alternative could be to wrap the few necessary calls in a small
> c binary that we can call in place of nvidia-smi.
> This tool could then output more structured (e.g. json) output,
> though not sure if it's worth that,

Sounds like a good idea, TBH. Although that should then be probably be
packaged separately from libpve-common-perl, so there is still that
"overhead".

> since I hope that the nvidia-smi output is relatively stable

I'd hope too, but I (personally, at least) wouldn't want to count on it.
It is still a proprietary tool after all, so really could be changed
any way seen fit w/o notice.

>
> (we can still change this method should parsing nvidia-smi output
> turn out to be too brittle)

Using it as a stop-gap measure basically would work for me, so that this
isn't blocked unnecessarily long and gives us time to properly implement
something like this.

>
> > [..]
> > diff --git a/src/PVE/SysFSTools.pm b/src/PVE/SysFSTools.pm
> > index 0bde6d7..fc6282d 100644
> > --- a/src/PVE/SysFSTools.pm
> > +++ b/src/PVE/SysFSTools.pm
> > @@ -4,8 +4,10 @@ use strict;
> > [..]
> > +
> > +    my $command = ['nvidia-smi', 'vgpu', '--creatable', '--verbose'];
>
> IMO we should not use 'creatable' here but 'available' (or however it's named)

FWIW, for reference, would be `--supported`.

> AFAIR the 'creatable' are only the ones currently allowed to be created,
> not the ones the cards generally support.
>
> This could become a  problem e.g. when a vm is started with some profile,
> then 'creatable' will only return models compatible to that, and even if
> the vm is stopped after, since we cached the output, will not show the
> description of the others)

Oh I see, yeah, that makes sense then. I'll change it for v2, thanks!

>
> Of course we could instead change the caching logic to reparse it, whenever
> we don't find the particular id in the cached config, but that
> seems too much work if we can just have a list of all 'available'
>
> > +    my $parsefn = sub {
> > +	my ($line) = @_;
> > +	return if $line =~ m/^GPU/;
> > +
> > +	my @parts = split(':', $line);
> > +	return if scalar(@parts) != 2;
> > +
> > +	my ($key, $value) = @parts;
> > +
> > +	$key =~ s/^\s+|\s+$//g; # trim whitespace from start and end
> > +	$value =~ s/\s+//g; # trim all whitespace
> > +	$value =~ s/,/-/g; # replace any commas with dashes
> > +
> > +	if ($key eq 'vGPU Type ID') {
> > +	    $cur_id = hex($value);
> > +	} elsif (defined($generic_propmap->{$key}) && $value ne 'N/A') {
> > +	    $configs->{$cur_id}->{$generic_propmap->{$key}} = $value;
> > +	}
> > +
> > +	# `nvidia-smi` prints these keys/values in a deterministic order,
> > +	# so the order they appear in can be relied upon.
> > +	if ($key eq 'Maximum X Resolution') {
> > +	    $configs->{$cur_id}->{'max-resolution'} = $value;
> > +	} elsif ($key eq 'Maximum Y Resolution') {
> > +	    $configs->{$cur_id}->{'max-resolution'} .= "x$value";
> > +	}
> > +    };
>
> it would be great to have some tests here with some example output, that
> way we can compare output if it changes, and it makes reasoning about the
> parsing logic a bit easier.

Sure, will add some tests with the next revision!

>
> > +
> > +    eval {
> > +	PVE::Tools::run_command($command, outfunc => $parsefn);
> > +    };
> > +
> > +    if (my $err = $@) {
> > +	warn "failed to run nvidia-smi: $err\n";
> > +	return undef;
> > +    }
> > +
> > +    for my $k (keys %$configs) {
> > +	$configs->{$k} = PVE::JSONSchema::print_property_string($configs->{$k}, $prop_schema);
> > +    }
>
> since the schema above is mostly emptly besides defining which keys there are,
> is there a specific reason you don't just use something like (untested!):
>
> join(",", map { $_ ."=". $configs->{$_} } keys $configs->%*);
>
> ?
> that way you could omit the prop_schema entirely

No particular reason, other than of not thinking of it :^)

I'll rework it, since the property schema is indeed mostly redundant
w.r.t to the property map above it.



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi
  2024-10-28 11:31 [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi Christoph Heiss
  2024-10-30 10:13 ` Dominik Csapak
@ 2024-10-30 12:08 ` Thomas Lamprecht
  2024-10-30 13:50   ` Christoph Heiss
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Lamprecht @ 2024-10-30 12:08 UTC (permalink / raw)
  To: Proxmox VE development discussion, Christoph Heiss, Dominik Csapak

higher level comment, so CCing also Dominik:

Is pve-common, and possibly even SysFSTools the right place for this?
As in from a quick look this has nothing to do with sysfs directly, and
from recent development it seems like placing such things into a more
specific package, and probably also (binary) package might be better.

Don't get me wrong, I'm fine with the code living in pve-common, while
I'm a fan of having more modular packages, at least for bigger projects
like PVE, this does correlates with the amount of git repos I'd like
to manage.

As pve-common, and lots of its perl modules, are pretty overloaded already
I'd like to avoid expanding this further.

In the long term, i.e. not a requirement for this series, I'd like to
have not only the perl module split up (*stares at PVE::Tools*), but also
then leverage that and provide more binary packages so that we can better
reuse specific things without pulling so many dependencies transitively.
We sometimes just copy over common helpers to avoid that, e.g. for some
non-PVE (infra) tooling, so the pain *is* real.

So while you certainly do not need to fix all of that just to get your
changes here in, it would be OTOH great if we could not make this tech
debt worse; so lets create a new package and possibly also finer-grained
modules for this.

The hard thing is to carve out what belongs together, and lets not be
strictly limited by existing module layout, method from those can be
split and merged.

Some rough/unfinished ideas/proposal from top of my head:

- possible package names:
  - libpve-device-common (or host-device / hw-device as slight alternations)
  - libpve-sysfs-common (if sysfs is really only what this does, which then
    should not need any external tools or compiled programs)
  - libpve-hw-passthrough-common

- modules: The basename might depend a bit on the package named chosen, could
  be e.g. "PVE::Device" or "PVE::SysFS"
  - $basename::USB
  - $basename::PCI
  - $basename::PCI::NVIDIA

In general, it can be also fine to have a very generic (micro) package with
just sysfs helpers, or keep that in existing pve-common, and then depend
on that in a more specific package that provides passthrough related stuff
on top of that, mixing sysfs query/writing with some potential calling of
external tools.

As said, the direction should be that, and great if some parts of pve-common
can be improved "for free" w.r.t. not being huge modules in a huge package,
but there's no need to rework all of pve-common now already just for this.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi
  2024-10-30 12:08 ` Thomas Lamprecht
@ 2024-10-30 13:50   ` Christoph Heiss
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Heiss @ 2024-10-30 13:50 UTC (permalink / raw)
  To: Thomas Lamprecht; +Cc: Proxmox VE development discussion

Thanks for chiming in!

On Wed, Oct 30, 2024 at 01:08:28PM GMT, Thomas Lamprecht wrote:
> higher level comment, so CCing also Dominik:
>
> Is pve-common, and possibly even SysFSTools the right place for this?

To be honest, that was kinda my thought to when I started implementing
this, but didn't know quite where to put it otherwise - other than a
completely new package and whether that is worth the additional effort.

So no, not really.

> [..]
> from recent development it seems like placing such things into a more
> specific package, and probably also (binary) package might be better.
>
> So while you certainly do not need to fix all of that just to get your
> changes here in, it would be OTOH great if we could not make this tech
> debt worse; so lets create a new package and possibly also finer-grained
> modules for this.

If that's the way we want to go tho - I'd be happy to put something
concrete together to get started. This change is be a good starting
point for all of this anyway, I think.

And if we split it out anyway, I would really go the way of directly
using libnvml to retrieve all the information. It is not that much more
effort overall and since it being a binary package is fine too, seems
fitting.

>
> The hard thing is to carve out what belongs together, and lets not be
> strictly limited by existing module layout, method from those can be
> split and merged.
>
> Some rough/unfinished ideas/proposal from top of my head:
>
> - possible package names:
>   - libpve-device-common (or host-device / hw-device as slight alternations)

`libpve-host-device(-common)` sounds pretty good IMO; in that it is
quite distinctive about what devices the package actually works with.

`device` alone would be a bit ambiguous, in my mind.

In the long-term, I guess the mdev-related stuff could be moved there
too, being quite a good fit for that too.

>   - libpve-sysfs-common (if sysfs is really only what this does, which then
>     should not need any external tools or compiled programs)
>   - libpve-hw-passthrough-common
>
> - modules: The basename might depend a bit on the package named chosen, could
>   be e.g. "PVE::Device" or "PVE::SysFS"
>   - $basename::USB
>   - $basename::PCI
>   - $basename::PCI::NVIDIA

.. and splitting up such things into vendor specific modules further
down is a good idea too. Keeps things together that should be.

>
> In general, it can be also fine to have a very generic (micro) package with
> just sysfs helpers, or keep that in existing pve-common, and then depend
> on that in a more specific package that provides passthrough related stuff
> on top of that, mixing sysfs query/writing with some potential calling of
> external tools.
>
> As said, the direction should be that, and great if some parts of pve-common
> can be improved "for free" w.r.t. not being huge modules in a huge package,
> but there's no need to rework all of pve-common now already just for this.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-30 13:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-28 11:31 [pve-devel] [RFC PATCH common] SysFSTools: mdev: retrieve Nvidia vGPU description from nvidia-smi Christoph Heiss
2024-10-30 10:13 ` Dominik Csapak
2024-10-30 11:09   ` Christoph Heiss
2024-10-30 12:08 ` Thomas Lamprecht
2024-10-30 13:50   ` Christoph Heiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal