From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id A0E2C9822F for ; Fri, 6 Oct 2023 14:16:40 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 881CF24C for ; Fri, 6 Oct 2023 14:16:40 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 6 Oct 2023 14:16:39 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id D353244497 for ; Fri, 6 Oct 2023 14:16:38 +0200 (CEST) From: Friedrich Weber To: pve-devel@lists.proxmox.com Date: Fri, 6 Oct 2023 14:15:33 +0200 Message-Id: <20231006121532.90772-1-f.weber@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.151 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, helpers.pm] Subject: [pve-devel] [PATCH qemu-server v2] vm start: set higher timeout if using PCI passthrough X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Oct 2023 12:16:40 -0000 The default VM startup timeout is `max(30, VM memory in GiB)` seconds. Multiple reports in the forum [0] [1] and the bug tracker [2] suggest this is too short when using PCI passthrough with a large amount of VM memory, since QEMU needs to map the whole memory during startup (see comment #2 in [2]). As a result, VM startup fails with "got timeout". To work around this, set a larger default timeout if at least one PCI device is passed through. The question remains how to choose an appropriate timeout. Users reported the following startup times: ref | RAM | time | ratio (s/GiB) --------------------------------- [1] | 60G | 135s | 2.25 [1] | 70G | 157s | 2.24 [1] | 80G | 277s | 3.46 [2] | 65G | 213s | 3.28 [2] | 96G | >290s | >3.02 The data does not really indicate any simple (e.g. linear) relationship between RAM and startup time (even data from the same source). However, to keep the heuristic simple, assume linear growth and multiply the default timeout by 4 if at least one `hostpci[n]` option is present, obtaining `4 * max(30, VM memory in GiB)`. This covers all cases above, and should still leave some headroom. [0]: https://forum.proxmox.com/threads/83765/post-552071 [1]: https://forum.proxmox.com/threads/126398/post-592826 [2]: https://bugzilla.proxmox.com/show_bug.cgi?id=3502 Suggested-by: Fiona Ebner Signed-off-by: Friedrich Weber --- Notes: changes since v1 (was called "vm start: set minimum timeout of 300s if using PCI passthrough", 20230503133723.165739-1-f.weber@proxmox.com): * Use a constant multiplier as suggested by Fiona (thx!) Another workaround is offered by an unapplied patch series [3] of bug 3502 [2] that makes it possible to set VM-specific timeouts (also in the GUI). Users could use this option to manually set a higher timeout for VMs that use PCI passthrough. However, it is not immediately obvious that a higher timeout is necessary when using PCI passthrough. Since the problem seems to come up somewhat frequently, I think it makes sense to have the heuristic choose a higher timeout by default. As discussed in v1, I'll also pick up the patch series to allow users to set custom timeouts [3], also to offer a workaround for cases where the new heuristic chooses a timeout that is still too short. [2]: https://bugzilla.proxmox.com/show_bug.cgi?id=3502 [3]: https://lists.proxmox.com/pipermail/pve-devel/2023-January/055352.html PVE/QemuServer/Helpers.pm | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/PVE/QemuServer/Helpers.pm b/PVE/QemuServer/Helpers.pm index 8817427..0afb631 100644 --- a/PVE/QemuServer/Helpers.pm +++ b/PVE/QemuServer/Helpers.pm @@ -152,6 +152,13 @@ sub config_aware_timeout { $timeout = int($memory/1024); } + # When using PCI passthrough, users reported much higher startup times, + # growing with the amount of memory configured. Constant factor chosen + # based on user reports. + if (grep(/^hostpci[0-9]+$/, keys %$config)) { + $timeout *= 4; + } + if ($is_suspended && $timeout < 300) { $timeout = 300; } -- 2.39.2