From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 961861FF2CA
	for <inbox@lore.proxmox.com>; Tue, 23 Jul 2024 15:44:59 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 88F144767;
	Tue, 23 Jul 2024 15:45:32 +0200 (CEST)
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Tue, 23 Jul 2024 15:44:55 +0200
Message-Id: <20240723134455.87923-1-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.39.2
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.061 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [qemuserver.pm, proxmox.com]
Subject: [pve-devel] [PATCH qemu-server] resume: bump timeout for
 query-status
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

As reported in the community forum [0], after migration, the VM might
not immediately be able to respond to QMP commands, which means the VM
could fail to resume and stay in paused state on the target.

The reason seems to be that activating the block drives in QEMU can
take a bit of time. For example, it might be necessary to invalidate
the caches (where for raw devices a flush might be needed) and the
size of the block device needs to be queried.

In [0], an external Ceph cluster is used, but there doesn't seem to be
a flush. It also shows that the required timeout is a bit over 10
seconds, so use 60 to be on the safe side for the future.

All callers are inside workers or via the 'qm' CLI command, so bumping
beyond 30 seconds is fine.

[0]: https://forum.proxmox.com/threads/149610/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index bf59b091..9e840912 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6461,7 +6461,9 @@ sub vm_resume {
     my ($vmid, $skiplock, $nocheck) = @_;
 
     PVE::QemuConfig->lock_config($vmid, sub {
-	my $res = mon_cmd($vmid, 'query-status');
+	# After migration, the VM might not immediately be able to respond to QMP commands, because
+	# activating the block devices might take a bit of time.
+	my $res = mon_cmd($vmid, 'query-status', timeout => 60);
 	my $resume_cmd = 'cont';
 	my $reset = 0;
 	my $conf;
-- 
2.39.2


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel