public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fabian Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH v2 qemu-server 13/13] migration: move finishing block jobs to phase2 for better/uniform error handling
Date: Fri, 29 Jan 2021 16:11:43 +0100	[thread overview]
Message-ID: <20210129151143.10014-14-f.ebner@proxmox.com> (raw)
In-Reply-To: <20210129151143.10014-1-f.ebner@proxmox.com>

avoids the possibility to die during phase3_cleanup and instead of needing to
duplicate the cleanup ourselves, benefit from phase2_cleanup doing so.

The duplicate cleanup was also very incomplete: it didn't stop the remote kvm
process (leading to 'VM already running' when trying to migrate again
afterwards), but it removed its disks, and it didn't unlock the config, didn't
close the tunnel and didn't cancel the block-dirty bitmaps.

Since migrate_cancel should do nothing after the (non-storage) migrate process
has completed, even that cleanup step is fine here.

Since phase3 is empty at the moment, the order of operations is still the same.

Also add a test, that would complain about finish_tunnel not being called before
this patch. That test also checks that local disks are not already removed
before finishing the block jobs.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

New in v2

The test would also expose the temporary breakage with the wrong #8/#9 patch
order

With and without this patch: When dying here, i.e. when finishing the
block jobs, the VM is in a blocked state afterwards (postmigrate), because the
(non-storage) migration was successful. Simply resuming it seems to work just
fine, would it be worth to add a (guarded) resume call in the cleanup too?

 PVE/QemuMigrate.pm                    | 23 ++++++++----------
 test/MigrationTest/QemuMigrateMock.pm |  6 +++++
 test/run_qemu_migrate_tests.pl        | 35 +++++++++++++++++++++++++++
 3 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index b503601..435c1f7 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1134,6 +1134,16 @@ sub phase2 {
 	    die "unable to parse migration status '$stat->{status}' - aborting\n";
 	}
     }
+
+    if ($self->{storage_migration}) {
+	# finish block-job with block-job-cancel, to disconnect source VM from NBD
+	# to avoid it trying to re-establish it. We are in blockjob ready state,
+	# thus, this command changes to it to blockjob complete (see qapi docs)
+	eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}, 'cancel'); };
+	if (my $err = $@) {
+	    die "Failed to complete storage migration: $err\n";
+	}
+    }
 }
 
 sub phase2_cleanup {
@@ -1209,19 +1219,6 @@ sub phase3_cleanup {
 
     my $tunnel = $self->{tunnel};
 
-    if ($self->{storage_migration}) {
-	# finish block-job with block-job-cancel, to disconnect source VM from NBD
-	# to avoid it trying to re-establish it. We are in blockjob ready state,
-	# thus, this command changes to it to blockjob complete (see qapi docs)
-	eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}, 'cancel'); };
-
-	if (my $err = $@) {
-	    eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };
-	    eval { PVE::QemuMigrate::cleanup_remotedisks($self) };
-	    die "Failed to complete storage migration: $err\n";
-	}
-    }
-
     if ($self->{volume_map}) {
 	my $target_drives = $self->{target_drive};
 
diff --git a/test/MigrationTest/QemuMigrateMock.pm b/test/MigrationTest/QemuMigrateMock.pm
index 2d424e0..8e0b7d0 100644
--- a/test/MigrationTest/QemuMigrateMock.pm
+++ b/test/MigrationTest/QemuMigrateMock.pm
@@ -139,6 +139,12 @@ $MigrationTest::Shared::qemu_server_module->mock(
 	file_set_contents("${RUN_DIR_PATH}/nbd_info", to_json($nbd_info));
     },
     qemu_drive_mirror_monitor => sub {
+	my ($vmid, $vmiddst, $jobs, $completion, $qga) = @_;
+
+	if ($fail_config->{qemu_drive_mirror_monitor} &&
+	    $fail_config->{qemu_drive_mirror_monitor} eq $completion) {
+	    die "qemu_drive_mirror_monitor '$completion' error\n";
+	}
 	return;
     },
     set_migration_caps => sub {
diff --git a/test/run_qemu_migrate_tests.pl b/test/run_qemu_migrate_tests.pl
index 4f7f021..5edea7b 100755
--- a/test/run_qemu_migrate_tests.pl
+++ b/test/run_qemu_migrate_tests.pl
@@ -1444,6 +1444,41 @@ my $tests = [
 	    },
 	},
     },
+    {
+	name => '149_running_unused_block_job_cancel_fail',
+	target => 'pve1',
+	vmid => 149,
+	vm_status => {
+	    running => 1,
+	    runningmachine => 'pc-q35-5.0+pve0',
+	},
+	opts => {
+	    online => 1,
+	    'with-local-disks' => 1,
+	},
+	config_patch => {
+	    scsi1 => undef,
+	    unused0 => 'local-dir:149/vm-149-disk-0.qcow2',
+	},
+	expected_calls => {},
+	expect_die => "qemu_drive_mirror_monitor 'cancel' error",
+	# note that 'cancel' is also used to finish and that's what this test is about
+	fail_config => {
+	    'qemu_drive_mirror_monitor' => 'cancel',
+	},
+	expected => {
+	    source_volids => local_volids_for_vm(149),
+	    target_volids => {},
+	    vm_config => get_patched_config(149, {
+		scsi1 => undef,
+		unused0 => 'local-dir:149/vm-149-disk-0.qcow2',
+	    }),
+	    vm_status => {
+		running => 1,
+		runningmachine => 'pc-q35-5.0+pve0',
+	    },
+	},
+    },
     {
 	name => '149_offline',
 	target => 'pve1',
-- 
2.20.1





  parent reply	other threads:[~2021-01-29 15:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29 15:11 [pve-devel] [PATCH-SERIES v2 qemu-server] Cleanup migration code and improve migration disk cleanup Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 01/13] test: migration: add parse_volume_id calls Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 02/13] migration: split sync_disks into two functions Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 03/13] migration: avoid re-scanning all volumes Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 04/13] migration: split out config_update_local_disksizes from scan_local_volumes Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 05/13] migration: fix calculation of bandwith limit for non-disk migration Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 06/13] migration: save targetstorage and bwlimit in local_volumes hash and re-use information Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 07/13] migration: add nbd migrated volumes to volume_map earlier Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 08/13] migration: simplify removal of local volumes and get rid of self->{volumes} Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 09/13] migration: cleanup_remotedisks: simplify and include more disks Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 10/13] migration: use storage_migration for checks instead of online_local_volumes Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 11/13] migration: keep track of replicated volumes via local_volumes Fabian Ebner
2021-01-29 15:11 ` [pve-devel] [PATCH v2 qemu-server 12/13] migration: split out replication from scan_local_volumes Fabian Ebner
2021-01-29 15:11 ` Fabian Ebner [this message]
2021-04-19  6:49 ` [pve-devel] [PATCH-SERIES v2 qemu-server] Cleanup migration code and improve migration disk cleanup Fabian Ebner
2021-04-19 11:50 ` [pve-devel] applied-series: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210129151143.10014-14-f.ebner@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal