From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.csapak@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id DAD9694937
 for <pve-devel@lists.proxmox.com>; Fri, 24 Feb 2023 14:05:03 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id A6F0C9BD3
 for <pve-devel@lists.proxmox.com>; Fri, 24 Feb 2023 14:04:33 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Fri, 24 Feb 2023 14:04:32 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1600A48400
 for <pve-devel@lists.proxmox.com>; Fri, 24 Feb 2023 14:04:32 +0100 (CET)
From: Dominik Csapak <d.csapak@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Fri, 24 Feb 2023 14:04:31 +0100
Message-Id: <20230224130431.1174277-1-d.csapak@proxmox.com>
X-Mailer: git-send-email 2.30.2
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.061 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pve-devel] [PATCH qemu-server] pci: workaround nvidia driver issue
 on mdev cleanup
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2023 13:05:03 -0000

in some nvidia grid drivers (e.g. 14.4 and 15.x), their kernel module
tries to clean up the mdev device when the vm is shutdown and if it
cannot do that (e.g. becaues we already cleaned it up), their removal
process cancels with an error such that the vgpu does still exist inside
their book-keeping, but can't be used/recreated/freed until a reboot.

since there seems no obvious way to detect if thats the case besides
either parsing dmesg (which is racy), or the nvidia kernel module
version(which i'd rather not do), we simply test the pci device vendor
for nvidia and add a 10s sleep. that should give the driver enough time
to clean up and we will not find the path anymore and skip the cleanup.

This way, it works with both the newer and older versions of the driver
(some of the older drivers are LTS releases, so they're still
supported).

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 PVE/QemuServer.pm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 40be44db..096e7f0d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6161,6 +6161,15 @@ sub cleanup_pci_devices {
 	    # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
 	    # don't want to break ABI just for this two liner
 	    my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
+
+	    # some nvidia vgpu driver versions want to clean the mdevs up themselves, and error
+	    # out when we do it first. so wait for 10 seconds and then try it
+	    my $pciid = $d->{pciid}->[0]->{id};
+	    my $info = PVE::SysFSTools::pci_device_info("$pciid");
+	    if ($info->{vendor} eq '10de') {
+		sleep 10;
+	    }
+
 	    PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
 	}
     }
-- 
2.30.2