From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 1554CB823E for ; Thu, 7 Mar 2024 10:34:12 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E9B2033C94 for ; Thu, 7 Mar 2024 10:33:41 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 7 Mar 2024 10:33:38 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id EE0AA48862 for ; Thu, 7 Mar 2024 10:33:37 +0100 (CET) From: Dominik Csapak To: pve-devel@lists.proxmox.com Date: Thu, 7 Mar 2024 10:33:37 +0100 Message-Id: <20240307093337.1104294-1-d.csapak@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.019 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [qemuserver.pm, qm.pm] Subject: [pve-devel] [PATCH qemu-server] mediated devices: fix race condition in vm reboot X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2024 09:34:12 -0000 when triggering a vm reboot from the host (via cli/api), the reboot code is called under a guest lock, which creates a reboot request, shuts down the vm and calls the regular cleanup code (which includes the mdev cleanup). in parallel the qmeventd sees the vanished vm, and starts 'qm cleanup' which is (among other tasks) used to restart a vm with a pending reboot request. It does this also under a guest lock, with a default timeout of 10 seconds. Since we currently wait 10 seconds for the nvidia driver to clean the mdev, this creates a race condition for the cleanup lock. Iow. when the call to `qm cleanup` starts before we started to sleep for 10 seconds, it will not be able to acquire its lock and not start the vm again. To fix it, do two things: * increase the timeout in `qm cleanup` to 60 seconds (technically this still might run into a timeout, as we can configure up to 16 mediated devices with up to 10 seconds sleep each, but realistically most users won't configure more than two or three of them, if even that) * change the `sleep 10` to a loop sleeping for 1 second each before checking the state again. This shortens the timeout when the driver can clean it up in the meantime. Further, add a bit of logging, so we can properly see in the (task) log what is happening when. Fixes: 49c51a60 (pci: workaround nvidia driver issue on mdev cleanup) Signed-off-by: Dominik Csapak --- PVE/CLI/qm.pm | 3 ++- PVE/QemuServer.pm | 16 ++++++++++++---- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm index b17b4fe2..dce6c7a1 100755 --- a/PVE/CLI/qm.pm +++ b/PVE/CLI/qm.pm @@ -915,7 +915,8 @@ __PACKAGE__->register_method({ my $storecfg = PVE::Storage::config(); warn "Starting cleanup for $vmid\n"; - PVE::QemuConfig->lock_config($vmid, sub { + # mdev cleanup can take a while, so wait up to 60 seconds + PVE::QemuConfig->lock_config_full($vmid, 60, sub { my $conf = PVE::QemuConfig->load_config ($vmid); my $pid = PVE::QemuServer::check_running ($vmid); die "vm still running\n" if $pid; diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index b45507aa..efe93c62 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -6133,12 +6133,20 @@ sub cleanup_pci_devices { my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid"; # some nvidia vgpu driver versions want to clean the mdevs up themselves, and error - # out when we do it first. so wait for 10 seconds and then try it - if ($d->{ids}->[0]->[0]->{vendor} =~ m/^(0x)?10de$/) { - sleep 10; + # out when we do it first. so wait for up to 10 seconds and then try it manually + if ($d->{ids}->[0]->[0]->{vendor} =~ m/^(0x)?10de$/ && -e $dev_sysfs_dir) { + my $count = 0; + while (-e $dev_sysfs_dir && $count < 10) { + sleep 1; + $count++; + } + warn "waited $count seconds for mdev cleanup\n"; } - PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir; + if (-e $dev_sysfs_dir) { + warn "cleaning up mediated device $uuid\n"; + PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1"); + } } } PVE::QemuServer::PCI::remove_pci_reservation($vmid); -- 2.39.2