From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 2928B1FF15D for ; Thu, 25 Jul 2024 14:32:35 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id AC78B5945; Thu, 25 Jul 2024 14:32:33 +0200 (CEST) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Thu, 25 Jul 2024 14:32:26 +0200 Message-Id: <20240725123226.51239-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.060 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH v2 qemu-server] resume: bump timeout for query-status X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" As reported in the community forum [0], after migration, the VM might not immediately be able to respond to QMP commands, which means the VM could fail to resume and stay in paused state on the target. The reason is that activating the block drives in QEMU can take a bit of time. For example, it might be necessary to invalidate the caches (where for raw devices a flush might be needed) and the request alignment and size of the block device needs to be queried. In [0], an external Ceph cluster with krbd is used, and the initial read to the block device after migration, for probing the request alignment, takes a bit over 10 seconds[1]. Use 60 seconds as the new timeout to be on the safe side for the future. All callers are inside workers or via the 'qm' CLI command, so bumping beyond 30 seconds is fine. [0]: https://forum.proxmox.com/threads/149610/ Signed-off-by: Fiona Ebner --- Changes in v2: * improve commit message with new findings from the forum thread PVE/QemuServer.pm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index bf59b091..9e840912 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -6461,7 +6461,9 @@ sub vm_resume { my ($vmid, $skiplock, $nocheck) = @_; PVE::QemuConfig->lock_config($vmid, sub { - my $res = mon_cmd($vmid, 'query-status'); + # After migration, the VM might not immediately be able to respond to QMP commands, because + # activating the block devices might take a bit of time. + my $res = mon_cmd($vmid, 'query-status', timeout => 60); my $resume_cmd = 'cont'; my $reset = 0; my $conf; -- 2.39.2 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel