public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v2 qemu-server 1/4] migration: avoid crash with heavy IO on local VM disk
@ 2024-05-28  8:50 Fiona Ebner
  2024-05-28  8:50 ` [pve-devel] [PATCH v2 qemu-server 2/4] migration: handle replication: remove outdated and inaccurate check for QEMU version Fiona Ebner
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Fiona Ebner @ 2024-05-28  8:50 UTC (permalink / raw)
  To: pve-devel

There is a possibility that the drive-mirror job is not yet done when
the migration wants to inactivate the source's blockdrives:

> bdrv_co_write_req_prepare: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

This can be prevented by using the 'write-blocking' copy mode (also
called active mode) for the mirror. However, with active mode, the
guest write speed is limited by the synchronous writes to the mirror
target. For this reason, a way to start out in the faster 'background'
mode and later switch to active mode was introduced in QEMU 8.2.

The switch is done once the mirror job for all drives is ready to be
completed to reduce the time spent where guest IO is limited.

Reported rarely, but steadily over the years:
https://forum.proxmox.com/threads/78954/post-353651
https://forum.proxmox.com/threads/78954/post-380015
https://forum.proxmox.com/threads/100020/post-431660
https://forum.proxmox.com/threads/111831/post-482425
https://forum.proxmox.com/threads/111831/post-499807
https://forum.proxmox.com/threads/137849/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
    * check for running QEMU version instead of installed version

 PVE/QemuMigrate.pm                    |  8 ++++++
 PVE/QemuServer.pm                     | 41 +++++++++++++++++++++++++++
 test/MigrationTest/QemuMigrateMock.pm |  6 ++++
 3 files changed, 55 insertions(+)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 33d5b2d1..d7ee4a5b 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -1145,6 +1145,14 @@ sub phase2 {
 	    $self->log('info', "$drive: start migration to $nbd_uri");
 	    PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
 	}
+
+	if (PVE::QemuServer::Machine::runs_at_least_qemu_version($vmid, 8, 2)) {
+	    $self->log('info', "switching mirror jobs to actively synced mode");
+	    PVE::QemuServer::qemu_drive_mirror_switch_to_active_mode(
+		$vmid,
+		$self->{storage_migration_jobs},
+	    );
+	}
     }
 
     $self->log('info', "starting online/live migration on $migrate_uri");
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 5df0c96d..d472e805 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -8122,6 +8122,47 @@ sub qemu_blockjobs_cancel {
     }
 }
 
+# Callers should version guard this (only available with a binary >= QEMU 8.2)
+sub qemu_drive_mirror_switch_to_active_mode {
+    my ($vmid, $jobs) = @_;
+
+    my $switching = {};
+
+    for my $job (sort keys $jobs->%*) {
+	print "$job: switching to actively synced mode\n";
+
+	eval {
+	    mon_cmd(
+		$vmid,
+		"block-job-change",
+		id => $job,
+		type => 'mirror',
+		'copy-mode' => 'write-blocking',
+	    );
+	    $switching->{$job} = 1;
+	};
+	die "could not switch mirror job $job to active mode - $@\n" if $@;
+    }
+
+    while (1) {
+	my $stats = mon_cmd($vmid, "query-block-jobs");
+
+	my $running_jobs = {};
+	$running_jobs->{$_->{device}} = $_ for $stats->@*;
+
+	for my $job (sort keys $switching->%*) {
+	    if ($running_jobs->{$job}->{'actively-synced'}) {
+		print "$job: successfully switched to actively synced mode\n";
+		delete $switching->{$job};
+	    }
+	}
+
+	last if scalar(keys $switching->%*) == 0;
+
+	sleep 1;
+    }
+}
+
 # Check for bug #4525: drive-mirror will open the target drive with the same aio setting as the
 # source, but some storages have problems with io_uring, sometimes even leading to crashes.
 my sub clone_disk_check_io_uring {
diff --git a/test/MigrationTest/QemuMigrateMock.pm b/test/MigrationTest/QemuMigrateMock.pm
index 1efabe24..f5b44424 100644
--- a/test/MigrationTest/QemuMigrateMock.pm
+++ b/test/MigrationTest/QemuMigrateMock.pm
@@ -152,6 +152,9 @@ $MigrationTest::Shared::qemu_server_module->mock(
 	}
 	return;
     },
+    qemu_drive_mirror_switch_to_active_mode => sub {
+	return;
+    },
     set_migration_caps => sub {
 	return;
     },
@@ -185,6 +188,9 @@ $qemu_server_machine_module->mock(
 	    if !defined($vm_status->{runningmachine});
 	return $vm_status->{runningmachine};
     },
+    runs_at_least_qemu_version => sub {
+	return 1;
+    },
 );
 
 my $ssh_info_module = Test::MockModule->new("PVE::SSHInfo");
-- 
2.39.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-07-03 13:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-28  8:50 [pve-devel] [PATCH v2 qemu-server 1/4] migration: avoid crash with heavy IO on local VM disk Fiona Ebner
2024-05-28  8:50 ` [pve-devel] [PATCH v2 qemu-server 2/4] migration: handle replication: remove outdated and inaccurate check for QEMU version Fiona Ebner
2024-07-03 13:10   ` [pve-devel] applied: " Fabian Grünbichler
2024-05-28  8:50 ` [pve-devel] [PATCH v2 qemu-server 3/4] backup: prepare: remove outdated QEMU version check Fiona Ebner
2024-07-03 13:10   ` [pve-devel] applied: " Fabian Grünbichler
2024-05-28  8:50 ` [pve-devel] [RFC v2 qemu-server 4/4] move helper to check running QEMU version out of the 'Machine' module Fiona Ebner
2024-07-03 13:32   ` Fabian Grünbichler
2024-07-03 13:15 ` [pve-devel] [PATCH v2 qemu-server 1/4] migration: avoid crash with heavy IO on local VM disk Fabian Grünbichler
2024-07-03 13:44   ` Fiona Ebner
2024-07-03 13:49     ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal