From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 5CF8888F4
 for <pve-devel@lists.proxmox.com>; Mon, 25 Apr 2022 14:31:18 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 519DD7768
 for <pve-devel@lists.proxmox.com>; Mon, 25 Apr 2022 14:31:18 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id A5C6C775E
 for <pve-devel@lists.proxmox.com>; Mon, 25 Apr 2022 14:31:17 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7DF21428E5
 for <pve-devel@lists.proxmox.com>; Mon, 25 Apr 2022 14:31:17 +0200 (CEST)
From: Fabian Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Date: Mon, 25 Apr 2022 14:31:12 +0200
Message-Id: <20220425123112.37261-2-f.ebner@proxmox.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20220425123112.37261-1-f.ebner@proxmox.com>
References: <20220425123112.37261-1-f.ebner@proxmox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.083 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: [pve-devel] [PATCH qemu-server 2/2] migrate: resume initially
 running VM when failing after convergence
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 25 Apr 2022 12:31:18 -0000

When phase2() is aborted after the migration already converged, then
after migrate_cancel, the VM might be in POSTMIGRATE state.

(There also is a conditional for SHUTDOWN state in QEMU's
migration_iteration_finish(), so it's likely possible to end up there
if the VM is shut down at the right time during migration, but no need
to resume then).

Detect the POSTMIGRATE state and resume the VM if it wasn't paused at
the beginning of the migration. There is no direct way to go to
PAUSED, so just print an error if the VM was paused at the beginning
of the migration.

Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 PVE/QemuMigrate.pm | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index c293e294..dfe92325 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1056,6 +1056,22 @@ sub phase2_cleanup {
     };
     $self->log('info', "migrate_cancel error: $@") if $@;
 
+    my $vm_status = eval {
+	mon_cmd($vmid, 'query-status')->{status} or die "no 'status' in result\n";
+    };
+    $self->log('err', "query-status error: $@") if $@;
+
+    # Can end up in POSTMIGRATE state if failure occurred after convergence. Try going back to
+    # original state. Unfortunately, direct transition from POSTMIGRATE to PAUSED is not possible.
+    if ($vm_status && $vm_status eq 'postmigrate') {
+	if (!$self->{vm_was_paused}) {
+	    eval { mon_cmd($vmid, 'cont'); };
+	    $self->log('err', "resuming VM failed: $@") if $@;
+	} else {
+	    $self->log('err', "VM was paused, but ended in postmigrate state");
+	}
+    }
+
     my $conf = $self->{vmconf};
     delete $conf->{lock};
     eval { PVE::QemuConfig->write_config($vmid, $conf) };
-- 
2.30.2