public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH manager stable-7] pve7to8: add check for nvidia-vgpu-mgr
@ 2023-06-12 10:00 Dominik Csapak
  2023-06-12 15:15 ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 1 reply; 2+ messages in thread
From: Dominik Csapak @ 2023-06-12 10:00 UTC (permalink / raw)
  To: pve-devel

Currently the nvidia vgpu host driver (15.2) does not support kernels >
6.0 and thus will not work with bookworm based releases for now.

Fail when the service is running, and warn if it only exists, but is
disabled/stopped (in case a user installed it sometime but did not need
it and disabled it).

In any case, link to the known issues section in the upgrade guide
(which we can update to contain up-to-date information).

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
I opted to not parse more specific information about the driver (like
version, etc.) since it increases the complexity of the check but
without any real upside currently. If there is some future version that
supports it, we can update that to only warn/error for not supported
versions.

I'll add the section to the upgrade guide shortly

 PVE/CLI/pve7to8.pm | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/PVE/CLI/pve7to8.pm b/PVE/CLI/pve7to8.pm
index 6b51e98e..dbcb87ff 100644
--- a/PVE/CLI/pve7to8.pm
+++ b/PVE/CLI/pve7to8.pm
@@ -1215,6 +1215,27 @@ sub check_apt_repos {
     }
 }
 
+sub check_nvidia_vgpu_service {
+    log_info("Checking for existance of NVIDIA vGPU Manager..");
+
+    my $state = $get_systemd_unit_state->("nvidia-vgpu-mgr.service");
+    if ($state && $state eq 'active') {
+	log_fail(
+	    "Running NVIDIA vGPU Service found, possibly not compatible with newer kernel versions,"
+	    ." check with their documentation and"
+	    ." https://pve.proxmox.com/wiki/Upgrade_from_7_to_8#Known_upgrade_issues."
+	);
+    } elsif ($state && $state ne 'unknown') {
+	log_warn(
+	    "NVIDIA vGPU Service found, possibly not compatible with newer kernel versions,"
+	    ." check with their documentation and"
+	    ." https://pve.proxmox.com/wiki/Upgrade_from_7_to_8#Known_upgrade_issues."
+	);
+    } else {
+	log_pass("No NVIDIA vGPU Service found.");
+    }
+}
+
 sub check_time_sync {
     my $unit_active = sub { return $get_systemd_unit_state->($_[0], 1) eq 'active' ? $_[0] : undef };
 
@@ -1337,6 +1358,7 @@ sub check_misc {
     check_lxcfs_fuse_version();
     check_node_and_guest_configurations();
     check_apt_repos();
+    check_nvidia_vgpu_service();
 }
 
 my sub colored_if {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 2+ messages in thread

* [pve-devel] applied: [PATCH manager stable-7] pve7to8: add check for nvidia-vgpu-mgr
  2023-06-12 10:00 [pve-devel] [PATCH manager stable-7] pve7to8: add check for nvidia-vgpu-mgr Dominik Csapak
@ 2023-06-12 15:15 ` Thomas Lamprecht
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Lamprecht @ 2023-06-12 15:15 UTC (permalink / raw)
  To: Proxmox VE development discussion, Dominik Csapak

Am 12/06/2023 um 12:00 schrieb Dominik Csapak:
> Currently the nvidia vgpu host driver (15.2) does not support kernels >
> 6.0 and thus will not work with bookworm based releases for now.
> 
> Fail when the service is running, and warn if it only exists, but is
> disabled/stopped (in case a user installed it sometime but did not need
> it and disabled it).
> 
> In any case, link to the known issues section in the upgrade guide
> (which we can update to contain up-to-date information).
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> I opted to not parse more specific information about the driver (like
> version, etc.) since it increases the complexity of the check but
> without any real upside currently. If there is some future version that
> supports it, we can update that to only warn/error for not supported
> versions.
> 
> I'll add the section to the upgrade guide shortly
> 
>  PVE/CLI/pve7to8.pm | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
>

applied, thanks!

But I made some follow-ups:

- fix typo and factor common message into single variable

- pass the suppress_stderr param from get_systemd_unit_state to avoid an ugly message
  for unaffected systems, i.e. like:
  "Failed to get unit file state for nvidia-vgpu-mgr.service: No such file or directory"

- downgraded the failure again to a warning, reversing my initial recommendation to you,
  mostly due to future proofing for the case where NVIDIA fixes this, as in that case we'd
  need to tell users that they should ignore a failure, which is not good – my bad for not
  thinking of this earlier.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-06-12 15:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12 10:00 [pve-devel] [PATCH manager stable-7] pve7to8: add check for nvidia-vgpu-mgr Dominik Csapak
2023-06-12 15:15 ` [pve-devel] applied: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal