[pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu
@ 2023-04-25 16:52 Alexandre Derumier
  2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub Alexandre Derumier
  2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier
  0 siblings, 2 replies; 11+ messages in thread
From: Alexandre Derumier @ 2023-04-25 16:52 UTC (permalink / raw)
  To: pve-devel

This patch series allow remote migration between cluster with different cpu model.

A new param is introduced: "target-cpu"

When target-cpu is defined, the live migration with memory transfert
is skipped (as anyway, the target will die with a different cpu).

Then, after the storage copy, we call agent fsfreeze or suspend the vm
to have coherent data.

Then we stop the source vm and stop/start the target vm.

Like this, we can reduce the downtime of migration to only 1 restart.

Changelog v2:

The first version was simply shuting down the target vm,
wihout doing the block-job-complete.

After doing production migration with around 400vms, I had
some fs corruption, like some datas was still in buffer.

This v2 has been tested with another 400vms batch, without
any corruption.

Alexandre Derumier (2):
  migration: move livemigration code in a dedicated sub
  remote-migration: add target-cpu param

 PVE/API2/Qemu.pm   |  18 ++
 PVE/CLI/qm.pm      |   6 +
 PVE/QemuMigrate.pm | 439 ++++++++++++++++++++++++---------------------
 3 files changed, 260 insertions(+), 203 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub
  2023-04-25 16:52 [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu Alexandre Derumier
@ 2023-04-25 16:52 ` Alexandre Derumier
  2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier
  1 sibling, 0 replies; 11+ messages in thread
From: Alexandre Derumier @ 2023-04-25 16:52 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/QemuMigrate.pm | 420 +++++++++++++++++++++++----------------------
 1 file changed, 214 insertions(+), 206 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 09cc1d8..e182415 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -728,6 +728,219 @@ sub cleanup_bitmaps {
     }
 }
 
+sub live_migration {
+    my ($self, $vmid, $migrate_uri, $spice_port) = @_;
+
+    my $conf = $self->{vmconf};
+
+    $self->log('info', "starting online/live migration on $migrate_uri");
+    $self->{livemigration} = 1;
+
+    # load_defaults
+    my $defaults = PVE::QemuServer::load_defaults();
+
+    $self->log('info', "set migration capabilities");
+    eval { PVE::QemuServer::set_migration_caps($vmid) };
+    warn $@ if $@;
+
+    my $qemu_migrate_params = {};
+
+    # migrate speed can be set via bwlimit (datacenter.cfg and API) and via the
+    # migrate_speed parameter in qm.conf - take the lower of the two.
+    my $bwlimit = $self->get_bwlimit();
+
+    my $migrate_speed = $conf->{migrate_speed} // 0;
+    $migrate_speed *= 1024; # migrate_speed is in MB/s, bwlimit in KB/s
+
+    if ($bwlimit && $migrate_speed) {
+	$migrate_speed = ($bwlimit < $migrate_speed) ? $bwlimit : $migrate_speed;
+    } else {
+	$migrate_speed ||= $bwlimit;
+    }
+    $migrate_speed ||= ($defaults->{migrate_speed} || 0) * 1024;
+
+    if ($migrate_speed) {
+	$migrate_speed *= 1024; # qmp takes migrate_speed in B/s.
+	$self->log('info', "migration speed limit: ". render_bytes($migrate_speed, 1) ."/s");
+    } else {
+	# always set migrate speed as QEMU default to 128 MiBps == 1 Gbps, use 16 GiBps == 128 Gbps
+	$migrate_speed = (16 << 30);
+    }
+    $qemu_migrate_params->{'max-bandwidth'} = int($migrate_speed);
+
+    my $migrate_downtime = $defaults->{migrate_downtime};
+    $migrate_downtime = $conf->{migrate_downtime} if defined($conf->{migrate_downtime});
+    # migrate-set-parameters expects limit in ms
+    $migrate_downtime *= 1000;
+    $self->log('info', "migration downtime limit: $migrate_downtime ms");
+    $qemu_migrate_params->{'downtime-limit'} = int($migrate_downtime);
+
+    # set cachesize to 10% of the total memory
+    my $memory =  $conf->{memory} || $defaults->{memory};
+    my $cachesize = int($memory * 1048576 / 10);
+    $cachesize = round_powerof2($cachesize);
+
+    $self->log('info', "migration cachesize: " . render_bytes($cachesize, 1));
+    $qemu_migrate_params->{'xbzrle-cache-size'} = int($cachesize);
+
+    $self->log('info', "set migration parameters");
+    eval {
+	mon_cmd($vmid, "migrate-set-parameters", %{$qemu_migrate_params});
+    };
+    $self->log('info', "migrate-set-parameters error: $@") if $@;
+
+    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga}) && !$self->{opts}->{remote}) {
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my (undef, $proxyticket) = PVE::AccessControl::assemble_spice_ticket($authuser, $vmid, $self->{node});
+
+	my $filename = "/etc/pve/nodes/$self->{node}/pve-ssl.pem";
+	my $subject =  PVE::AccessControl::read_x509_subject_spice($filename);
+
+	$self->log('info', "spice client_migrate_info");
+
+	eval {
+	    mon_cmd($vmid, "client_migrate_info", protocol => 'spice',
+						hostname => $proxyticket, 'port' => 0, 'tls-port' => $spice_port,
+						'cert-subject' => $subject);
+	};
+	$self->log('info', "client_migrate_info error: $@") if $@;
+
+    }
+
+    my $start = time();
+
+    $self->log('info', "start migrate command to $migrate_uri");
+    eval {
+	mon_cmd($vmid, "migrate", uri => $migrate_uri);
+    };
+    my $merr = $@;
+    $self->log('info', "migrate uri => $migrate_uri failed: $merr") if $merr;
+
+    my $last_mem_transferred = 0;
+    my $usleep = 1000000;
+    my $i = 0;
+    my $err_count = 0;
+    my $lastrem = undef;
+    my $downtimecounter = 0;
+    while (1) {
+	$i++;
+	my $avglstat = $last_mem_transferred ? $last_mem_transferred / $i : 0;
+
+	usleep($usleep);
+
+	my $stat = eval { mon_cmd($vmid, "query-migrate") };
+	if (my $err = $@) {
+	    $err_count++;
+	    warn "query migrate failed: $err\n";
+	    $self->log('info', "query migrate failed: $err");
+	    if ($err_count <= 5) {
+		usleep(1_000_000);
+		next;
+	    }
+	    die "too many query migrate failures - aborting\n";
+	}
+
+	my $status = $stat->{status};
+	if (defined($status) && $status =~ m/^(setup)$/im) {
+	    sleep(1);
+	    next;
+	}
+
+	if (!defined($status) || $status !~ m/^(active|completed|failed|cancelled)$/im) {
+	    die $merr if $merr;
+	    die "unable to parse migration status '$status' - aborting\n";
+	}
+	$merr = undef;
+	$err_count = 0;
+
+	my $memstat = $stat->{ram};
+
+	if ($status eq 'completed') {
+	    my $delay = time() - $start;
+	    if ($delay > 0) {
+		my $total = $memstat->{total} || 0;
+		my $avg_speed = render_bytes($total / $delay, 1);
+		my $downtime = $stat->{downtime} || 0;
+		$self->log('info', "average migration speed: $avg_speed/s - downtime $downtime ms");
+	    }
+	}
+
+	if ($status eq 'failed' || $status eq 'cancelled') {
+	    my $message = $stat->{'error-desc'} ? "$status - $stat->{'error-desc'}" : $status;
+	    $self->log('info', "migration status error: $message");
+	    die "aborting\n"
+	}
+
+	if ($status ne 'active') {
+	    $self->log('info', "migration status: $status");
+	    last;
+	}
+
+	if ($memstat->{transferred} ne $last_mem_transferred) {
+	    my $trans = $memstat->{transferred} || 0;
+	    my $rem = $memstat->{remaining} || 0;
+	    my $total = $memstat->{total} || 0;
+	    my $speed = ($memstat->{'pages-per-second'} // 0) * ($memstat->{'page-size'} // 0);
+	    my $dirty_rate = ($memstat->{'dirty-pages-rate'} // 0) * ($memstat->{'page-size'} // 0);
+
+	    # reduce sleep if remainig memory is lower than the average transfer speed
+	    $usleep = 100_000 if $avglstat && $rem < $avglstat;
+
+	    # also reduce loggin if we poll more frequent
+	    my $should_log = $usleep > 100_000 ? 1 : ($i % 10) == 0;
+
+	    my $total_h = render_bytes($total, 1);
+	    my $transferred_h = render_bytes($trans, 1);
+	    my $speed_h = render_bytes($speed, 1);
+
+	    my $progress = "transferred $transferred_h of $total_h VM-state, ${speed_h}/s";
+
+	    if ($dirty_rate > $speed) {
+		my $dirty_rate_h = render_bytes($dirty_rate, 1);
+		$progress .= ", VM dirties lots of memory: $dirty_rate_h/s";
+	    }
+
+	    $self->log('info', "migration $status, $progress") if $should_log;
+
+	    my $xbzrle = $stat->{"xbzrle-cache"} || {};
+	    my ($xbzrlebytes, $xbzrlepages) = $xbzrle->@{'bytes', 'pages'};
+	    if ($xbzrlebytes || $xbzrlepages) {
+		my $bytes_h = render_bytes($xbzrlebytes, 1);
+
+		my $msg = "send updates to $xbzrlepages pages in $bytes_h encoded memory";
+
+		$msg .= sprintf(", cache-miss %.2f%%", $xbzrle->{'cache-miss-rate'} * 100)
+		    if $xbzrle->{'cache-miss-rate'};
+
+		$msg .= ", overflow $xbzrle->{overflow}" if $xbzrle->{overflow};
+
+		$self->log('info', "xbzrle: $msg") if $should_log;
+	    }
+
+	    if (($lastrem && $rem > $lastrem) || ($rem == 0)) {
+		$downtimecounter++;
+	    }
+	    $lastrem = $rem;
+
+	    if ($downtimecounter > 5) {
+		$downtimecounter = 0;
+		$migrate_downtime *= 2;
+		$self->log('info', "auto-increased downtime to continue migration: $migrate_downtime ms");
+		eval {
+		    # migrate-set-parameters does not touch values not
+		    # specified, so this only changes downtime-limit
+		    mon_cmd($vmid, "migrate-set-parameters", 'downtime-limit' => int($migrate_downtime));
+		};
+		$self->log('info', "migrate-set-parameters error: $@") if $@;
+	    }
+	}
+
+	$last_mem_transferred = $memstat->{transferred};
+    }
+}
+
 sub phase1 {
     my ($self, $vmid) = @_;
 
@@ -1138,212 +1351,7 @@ sub phase2 {
 	}
     }
 
-    $self->log('info', "starting online/live migration on $migrate_uri");
-    $self->{livemigration} = 1;
-
-    # load_defaults
-    my $defaults = PVE::QemuServer::load_defaults();
-
-    $self->log('info', "set migration capabilities");
-    eval { PVE::QemuServer::set_migration_caps($vmid) };
-    warn $@ if $@;
-
-    my $qemu_migrate_params = {};
-
-    # migrate speed can be set via bwlimit (datacenter.cfg and API) and via the
-    # migrate_speed parameter in qm.conf - take the lower of the two.
-    my $bwlimit = $self->get_bwlimit();
-
-    my $migrate_speed = $conf->{migrate_speed} // 0;
-    $migrate_speed *= 1024; # migrate_speed is in MB/s, bwlimit in KB/s
-
-    if ($bwlimit && $migrate_speed) {
-	$migrate_speed = ($bwlimit < $migrate_speed) ? $bwlimit : $migrate_speed;
-    } else {
-	$migrate_speed ||= $bwlimit;
-    }
-    $migrate_speed ||= ($defaults->{migrate_speed} || 0) * 1024;
-
-    if ($migrate_speed) {
-	$migrate_speed *= 1024; # qmp takes migrate_speed in B/s.
-	$self->log('info', "migration speed limit: ". render_bytes($migrate_speed, 1) ."/s");
-    } else {
-	# always set migrate speed as QEMU default to 128 MiBps == 1 Gbps, use 16 GiBps == 128 Gbps
-	$migrate_speed = (16 << 30);
-    }
-    $qemu_migrate_params->{'max-bandwidth'} = int($migrate_speed);
-
-    my $migrate_downtime = $defaults->{migrate_downtime};
-    $migrate_downtime = $conf->{migrate_downtime} if defined($conf->{migrate_downtime});
-    # migrate-set-parameters expects limit in ms
-    $migrate_downtime *= 1000;
-    $self->log('info', "migration downtime limit: $migrate_downtime ms");
-    $qemu_migrate_params->{'downtime-limit'} = int($migrate_downtime);
-
-    # set cachesize to 10% of the total memory
-    my $memory =  $conf->{memory} || $defaults->{memory};
-    my $cachesize = int($memory * 1048576 / 10);
-    $cachesize = round_powerof2($cachesize);
-
-    $self->log('info', "migration cachesize: " . render_bytes($cachesize, 1));
-    $qemu_migrate_params->{'xbzrle-cache-size'} = int($cachesize);
-
-    $self->log('info', "set migration parameters");
-    eval {
-	mon_cmd($vmid, "migrate-set-parameters", %{$qemu_migrate_params});
-    };
-    $self->log('info', "migrate-set-parameters error: $@") if $@;
-
-    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga}) && !$self->{opts}->{remote}) {
-	my $rpcenv = PVE::RPCEnvironment::get();
-	my $authuser = $rpcenv->get_user();
-
-	my (undef, $proxyticket) = PVE::AccessControl::assemble_spice_ticket($authuser, $vmid, $self->{node});
-
-	my $filename = "/etc/pve/nodes/$self->{node}/pve-ssl.pem";
-	my $subject =  PVE::AccessControl::read_x509_subject_spice($filename);
-
-	$self->log('info', "spice client_migrate_info");
-
-	eval {
-	    mon_cmd($vmid, "client_migrate_info", protocol => 'spice',
-						hostname => $proxyticket, 'port' => 0, 'tls-port' => $spice_port,
-						'cert-subject' => $subject);
-	};
-	$self->log('info', "client_migrate_info error: $@") if $@;
-
-    }
-
-    my $start = time();
-
-    $self->log('info', "start migrate command to $migrate_uri");
-    eval {
-	mon_cmd($vmid, "migrate", uri => $migrate_uri);
-    };
-    my $merr = $@;
-    $self->log('info', "migrate uri => $migrate_uri failed: $merr") if $merr;
-
-    my $last_mem_transferred = 0;
-    my $usleep = 1000000;
-    my $i = 0;
-    my $err_count = 0;
-    my $lastrem = undef;
-    my $downtimecounter = 0;
-    while (1) {
-	$i++;
-	my $avglstat = $last_mem_transferred ? $last_mem_transferred / $i : 0;
-
-	usleep($usleep);
-
-	my $stat = eval { mon_cmd($vmid, "query-migrate") };
-	if (my $err = $@) {
-	    $err_count++;
-	    warn "query migrate failed: $err\n";
-	    $self->log('info', "query migrate failed: $err");
-	    if ($err_count <= 5) {
-		usleep(1_000_000);
-		next;
-	    }
-	    die "too many query migrate failures - aborting\n";
-	}
-
-	my $status = $stat->{status};
-	if (defined($status) && $status =~ m/^(setup)$/im) {
-	    sleep(1);
-	    next;
-	}
-
-	if (!defined($status) || $status !~ m/^(active|completed|failed|cancelled)$/im) {
-	    die $merr if $merr;
-	    die "unable to parse migration status '$status' - aborting\n";
-	}
-	$merr = undef;
-	$err_count = 0;
-
-	my $memstat = $stat->{ram};
-
-	if ($status eq 'completed') {
-	    my $delay = time() - $start;
-	    if ($delay > 0) {
-		my $total = $memstat->{total} || 0;
-		my $avg_speed = render_bytes($total / $delay, 1);
-		my $downtime = $stat->{downtime} || 0;
-		$self->log('info', "average migration speed: $avg_speed/s - downtime $downtime ms");
-	    }
-	}
-
-	if ($status eq 'failed' || $status eq 'cancelled') {
-	    my $message = $stat->{'error-desc'} ? "$status - $stat->{'error-desc'}" : $status;
-	    $self->log('info', "migration status error: $message");
-	    die "aborting\n"
-	}
-
-	if ($status ne 'active') {
-	    $self->log('info', "migration status: $status");
-	    last;
-	}
-
-	if ($memstat->{transferred} ne $last_mem_transferred) {
-	    my $trans = $memstat->{transferred} || 0;
-	    my $rem = $memstat->{remaining} || 0;
-	    my $total = $memstat->{total} || 0;
-	    my $speed = ($memstat->{'pages-per-second'} // 0) * ($memstat->{'page-size'} // 0);
-	    my $dirty_rate = ($memstat->{'dirty-pages-rate'} // 0) * ($memstat->{'page-size'} // 0);
-
-	    # reduce sleep if remainig memory is lower than the average transfer speed
-	    $usleep = 100_000 if $avglstat && $rem < $avglstat;
-
-	    # also reduce loggin if we poll more frequent
-	    my $should_log = $usleep > 100_000 ? 1 : ($i % 10) == 0;
-
-	    my $total_h = render_bytes($total, 1);
-	    my $transferred_h = render_bytes($trans, 1);
-	    my $speed_h = render_bytes($speed, 1);
-
-	    my $progress = "transferred $transferred_h of $total_h VM-state, ${speed_h}/s";
-
-	    if ($dirty_rate > $speed) {
-		my $dirty_rate_h = render_bytes($dirty_rate, 1);
-		$progress .= ", VM dirties lots of memory: $dirty_rate_h/s";
-	    }
-
-	    $self->log('info', "migration $status, $progress") if $should_log;
-
-	    my $xbzrle = $stat->{"xbzrle-cache"} || {};
-	    my ($xbzrlebytes, $xbzrlepages) = $xbzrle->@{'bytes', 'pages'};
-	    if ($xbzrlebytes || $xbzrlepages) {
-		my $bytes_h = render_bytes($xbzrlebytes, 1);
-
-		my $msg = "send updates to $xbzrlepages pages in $bytes_h encoded memory";
-
-		$msg .= sprintf(", cache-miss %.2f%%", $xbzrle->{'cache-miss-rate'} * 100)
-		    if $xbzrle->{'cache-miss-rate'};
-
-		$msg .= ", overflow $xbzrle->{overflow}" if $xbzrle->{overflow};
-
-		$self->log('info', "xbzrle: $msg") if $should_log;
-	    }
-
-	    if (($lastrem && $rem > $lastrem) || ($rem == 0)) {
-		$downtimecounter++;
-	    }
-	    $lastrem = $rem;
-
-	    if ($downtimecounter > 5) {
-		$downtimecounter = 0;
-		$migrate_downtime *= 2;
-		$self->log('info', "auto-increased downtime to continue migration: $migrate_downtime ms");
-		eval {
-		    # migrate-set-parameters does not touch values not
-		    # specified, so this only changes downtime-limit
-		    mon_cmd($vmid, "migrate-set-parameters", 'downtime-limit' => int($migrate_downtime));
-		};
-		$self->log('info', "migrate-set-parameters error: $@") if $@;
-	    }
-	}
-
-	$last_mem_transferred = $memstat->{transferred};
-    }
+    live_migration($self, $vmid, $migrate_uri, $spice_port);
 
     if ($self->{storage_migration}) {
 	# finish block-job with block-job-cancel, to disconnect source VM from NBD
-- 
2.30.2




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-25 16:52 [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu Alexandre Derumier
  2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub Alexandre Derumier
@ 2023-04-25 16:52 ` Alexandre Derumier
  2023-04-26 13:14   ` Fabian Grünbichler
  1 sibling, 1 reply; 11+ messages in thread
From: Alexandre Derumier @ 2023-04-25 16:52 UTC (permalink / raw)
  To: pve-devel

This patch add support for remote migration when target
cpu model is different.

The target vm is restart after the migration

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/API2/Qemu.pm   | 18 ++++++++++++++++++
 PVE/CLI/qm.pm      |  6 ++++++
 PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++
 3 files changed, 49 insertions(+)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 587bb22..6703c87 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({
 		optional => 1,
 		default => 0,
 	    },
+	    'target-cpu' => {
+		optional => 1,
+		description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.",
+		type => 'string',
+		format => 'pve-vm-cpu-conf',
+	    },
 	    'target-storage' => get_standard_option('pve-targetstorage', {
 		completion => \&PVE::QemuServer::complete_migration_storage,
 		optional => 0,
@@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({
 	raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" })
 	    if $@;
 
+	my $target_cpu = extract_param($param, 'target-cpu');
+
 	die "remote migration requires explicit storage mapping!\n"
 	    if $storagemap->{identity};
 
 	$param->{storagemap} = $storagemap;
 	$param->{bridgemap} = $bridgemap;
+	$param->{targetcpu} = $target_cpu;
 	$param->{remote} = {
 	    conn => $conn_args, # re-use fingerprint for tunnel
 	    client => $api_client,
@@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({
 		    PVE::QemuServer::nbd_stop($state->{vmid});
 		    return;
 		},
+		'restart' => sub {
+		    PVE::QemuServer::vm_stop(undef, $state->{vmid}, 1, 1);
+		    my $info = PVE::QemuServer::vm_start_nolock(
+			$state->{storecfg},
+			$state->{vmid},
+			$state->{conf},
+		    );
+		    return;
+		},
 		'resume' => sub {
 		    if (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) {
 			PVE::QemuServer::vm_resume($state->{vmid}, 1, 1);
diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
index c3c2982..06c74c1 100755
--- a/PVE/CLI/qm.pm
+++ b/PVE/CLI/qm.pm
@@ -189,6 +189,12 @@ __PACKAGE__->register_method({
 		optional => 1,
 		default => 0,
 	    },
+	    'target-cpu' => {
+		optional => 1,
+		description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.",
+		type => 'string',
+		format => 'pve-vm-cpu-conf',
+	    },
 	    'target-storage' => get_standard_option('pve-targetstorage', {
 		completion => \&PVE::QemuServer::complete_migration_storage,
 		optional => 0,
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index e182415..04f8053 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -731,6 +731,11 @@ sub cleanup_bitmaps {
 sub live_migration {
     my ($self, $vmid, $migrate_uri, $spice_port) = @_;
 
+    if($self->{opts}->{targetcpu}){
+        $self->log('info', "target cpu is different - skip live migration.");
+        return;
+    }
+
     my $conf = $self->{vmconf};
 
     $self->log('info', "starting online/live migration on $migrate_uri");
@@ -995,6 +1000,7 @@ sub phase1_remote {
     my $remote_conf = PVE::QemuConfig->load_config($vmid);
     PVE::QemuConfig->update_volume_ids($remote_conf, $self->{volume_map});
 
+    $remote_conf->{cpu} = $self->{opts}->{targetcpu};
     my $bridges = map_bridges($remote_conf, $self->{opts}->{bridgemap});
     for my $target (keys $bridges->%*) {
 	for my $nic (keys $bridges->{$target}->%*) {
@@ -1354,6 +1360,21 @@ sub phase2 {
     live_migration($self, $vmid, $migrate_uri, $spice_port);
 
     if ($self->{storage_migration}) {
+
+        #freeze source vm io/s if target cpu is different (no livemigration)
+	if ($self->{opts}->{targetcpu}) {
+	    my $agent_running = $self->{conf}->{agent} && PVE::QemuServer::qga_check_running($vmid);
+	    if ($agent_running) {
+		print "freeze filesystem\n";
+		eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); };
+		die $@ if $@;
+	    } else {
+		print "suspend vm\n";
+		eval { PVE::QemuServer::vm_suspend($vmid, 1); };
+		warn $@ if $@;
+	    }
+	}
+
 	# finish block-job with block-job-cancel, to disconnect source VM from NBD
 	# to avoid it trying to re-establish it. We are in blockjob ready state,
 	# thus, this command changes to it to blockjob complete (see qapi docs)
@@ -1608,6 +1629,10 @@ sub phase3_cleanup {
     # clear migrate lock
     if ($tunnel && $tunnel->{version} >= 2) {
 	PVE::Tunnel::write_tunnel($tunnel, 10, "unlock");
+	if ($self->{opts}->{targetcpu}) {
+	    $self->log('info', "target cpu is different - restart target vm.");
+	    PVE::Tunnel::write_tunnel($tunnel, 10, 'restart');
+	}
 
 	PVE::Tunnel::finish_tunnel($tunnel);
     } else {
-- 
2.30.2




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier
@ 2023-04-26 13:14   ` Fabian Grünbichler
  2023-04-27  5:50     ` DERUMIER, Alexandre
  2023-09-28 14:58     ` DERUMIER, Alexandre
  0 siblings, 2 replies; 11+ messages in thread
From: Fabian Grünbichler @ 2023-04-26 13:14 UTC (permalink / raw)
  To: Proxmox VE development discussion

On April 25, 2023 6:52 pm, Alexandre Derumier wrote:
> This patch add support for remote migration when target
> cpu model is different.
> 
> The target vm is restart after the migration

so this effectively introduces a new "hybrid" migration mode ;) the
changes are a bit smaller than I expected (in part thanks to patch #1),
which is good.

there are semi-frequent requests for another variant (also applicable to
containers) in the form of a two phase migration
- storage migrate
- stop guest
- incremental storage migrate
- start guest on target

given that it might make sense to save-guard this implementation here,
and maybe switch to a new "mode" parameter?

online => switching CPU not allowed
offline or however-we-call-this-new-mode (or in the future, two-phase-restart) => switching CPU allowed

> 
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  PVE/API2/Qemu.pm   | 18 ++++++++++++++++++
>  PVE/CLI/qm.pm      |  6 ++++++
>  PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++
>  3 files changed, 49 insertions(+)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 587bb22..6703c87 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({
>  		optional => 1,
>  		default => 0,
>  	    },
> +	    'target-cpu' => {
> +		optional => 1,
> +		description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.",
> +		type => 'string',
> +		format => 'pve-vm-cpu-conf',
> +	    },
>  	    'target-storage' => get_standard_option('pve-targetstorage', {
>  		completion => \&PVE::QemuServer::complete_migration_storage,
>  		optional => 0,
> @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({
>  	raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" })
>  	    if $@;
>  
> +	my $target_cpu = extract_param($param, 'target-cpu');

this is okay

> +
>  	die "remote migration requires explicit storage mapping!\n"
>  	    if $storagemap->{identity};
>  
>  	$param->{storagemap} = $storagemap;
>  	$param->{bridgemap} = $bridgemap;
> +	$param->{targetcpu} = $target_cpu;

but this is a bit confusing with the variable/hash key naming ;)

>  	$param->{remote} = {
>  	    conn => $conn_args, # re-use fingerprint for tunnel
>  	    client => $api_client,
> @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({
>  		    PVE::QemuServer::nbd_stop($state->{vmid});
>  		    return;
>  		},
> +		'restart' => sub {
> +		    PVE::QemuServer::vm_stop(undef, $state->{vmid}, 1, 1);
> +		    my $info = PVE::QemuServer::vm_start_nolock(
> +			$state->{storecfg},
> +			$state->{vmid},
> +			$state->{conf},
> +		    );
> +		    return;
> +		},
>  		'resume' => sub {
>  		    if (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) {
>  			PVE::QemuServer::vm_resume($state->{vmid}, 1, 1);
> diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
> index c3c2982..06c74c1 100755
> --- a/PVE/CLI/qm.pm
> +++ b/PVE/CLI/qm.pm
> @@ -189,6 +189,12 @@ __PACKAGE__->register_method({
>  		optional => 1,
>  		default => 0,
>  	    },
> +	    'target-cpu' => {
> +		optional => 1,
> +		description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.",
> +		type => 'string',
> +		format => 'pve-vm-cpu-conf',
> +	    },
>  	    'target-storage' => get_standard_option('pve-targetstorage', {
>  		completion => \&PVE::QemuServer::complete_migration_storage,
>  		optional => 0,
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index e182415..04f8053 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -731,6 +731,11 @@ sub cleanup_bitmaps {
>  sub live_migration {
>      my ($self, $vmid, $migrate_uri, $spice_port) = @_;
>  
> +    if($self->{opts}->{targetcpu}){
> +        $self->log('info', "target cpu is different - skip live migration.");
> +        return;
> +    }
> +
>      my $conf = $self->{vmconf};
>  
>      $self->log('info', "starting online/live migration on $migrate_uri");
> @@ -995,6 +1000,7 @@ sub phase1_remote {
>      my $remote_conf = PVE::QemuConfig->load_config($vmid);
>      PVE::QemuConfig->update_volume_ids($remote_conf, $self->{volume_map});
>  
> +    $remote_conf->{cpu} = $self->{opts}->{targetcpu};

do we need permission checks here (or better, somewhere early on, for doing this here)

>      my $bridges = map_bridges($remote_conf, $self->{opts}->{bridgemap});
>      for my $target (keys $bridges->%*) {
>  	for my $nic (keys $bridges->{$target}->%*) {
> @@ -1354,6 +1360,21 @@ sub phase2 {
>      live_migration($self, $vmid, $migrate_uri, $spice_port);
>  
>      if ($self->{storage_migration}) {
> +
> +        #freeze source vm io/s if target cpu is different (no livemigration)
> +	if ($self->{opts}->{targetcpu}) {
> +	    my $agent_running = $self->{conf}->{agent} && PVE::QemuServer::qga_check_running($vmid);
> +	    if ($agent_running) {
> +		print "freeze filesystem\n";
> +		eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); };
> +		die $@ if $@;

die here

> +	    } else {
> +		print "suspend vm\n";
> +		eval { PVE::QemuServer::vm_suspend($vmid, 1); };
> +		warn $@ if $@;

but warn here?

I'd like some more rationale for these two variants, what are the pros
and cons? should we make it configurable?

> +	    }
> +	}
> +
>  	# finish block-job with block-job-cancel, to disconnect source VM from NBD
>  	# to avoid it trying to re-establish it. We are in blockjob ready state,
>  	# thus, this command changes to it to blockjob complete (see qapi docs)
> @@ -1608,6 +1629,10 @@ sub phase3_cleanup {
>      # clear migrate lock
>      if ($tunnel && $tunnel->{version} >= 2) {
>  	PVE::Tunnel::write_tunnel($tunnel, 10, "unlock");
> +	if ($self->{opts}->{targetcpu}) {
> +	    $self->log('info', "target cpu is different - restart target vm.");
> +	    PVE::Tunnel::write_tunnel($tunnel, 10, 'restart');
> +	}
>  
>  	PVE::Tunnel::finish_tunnel($tunnel);
>      } else {
> -- 
> 2.30.2
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-26 13:14   ` Fabian Grünbichler
@ 2023-04-27  5:50     ` DERUMIER, Alexandre
  2023-04-27  7:32       ` Fabian Grünbichler
  2023-09-28 14:58     ` DERUMIER, Alexandre
  1 sibling, 1 reply; 11+ messages in thread
From: DERUMIER, Alexandre @ 2023-04-27  5:50 UTC (permalink / raw)
  To: pve-devel

Hi,

Le mercredi 26 avril 2023 à 15:14 +0200, Fabian Grünbichler a écrit :
> On April 25, 2023 6:52 pm, Alexandre Derumier wrote:
> > This patch add support for remote migration when target
> > cpu model is different.
> > 
> > The target vm is restart after the migration
> 
> so this effectively introduces a new "hybrid" migration mode ;) the
> changes are a bit smaller than I expected (in part thanks to patch
> #1),
> which is good.
> 
> there are semi-frequent requests for another variant (also applicable
> to
> containers) in the form of a two phase migration
> - storage migrate
> - stop guest
> - incremental storage migrate
> - start guest on target
> 

But I'm not sure how to to an incremental storage migrate, without
storage snapshot send|receiv.  (so zfs && rbd could work).

- Vm/ct is running
- do a first snapshot + sync to target with zfs|rbd send|receive
- stop the guest
- do a second snapshot + incremental sync + sync to target with zfs|rbd
send|receive
- start the guest on remote


(or maybe for vm, without snapshot, with a dirty bitmap ? But we need
to be able to write the dirty map content to disk somewhere after vm
stop, and reread it for the last increment )

- vm is running
- create a dirty-bitmap and start sync with qemu-block-storage
- stop the vm && save the dirty bitmap
- reread the dirtymap && do incremental sync (with the new qemu-daemon-
storage or starting the vm paused ?


And currently we don't support yet offline storage migration. (BTW,
This is also breaking migration with unused disk).
I don't known if we can send send|receiv transfert through the tunnel ?
(I never tested it)


> given that it might make sense to save-guard this implementation
> here,
> and maybe switch to a new "mode" parameter?
> 
> online => switching CPU not allowed
> offline or however-we-call-this-new-mode (or in the future, two-
> phase-restart) => switching CPU allowed
> 

Yes, I was thinking about that too.
Maybe not "offline", because maybe we want to implement a real offline
mode later.
But simply "restart" ?



> > 
> > Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> > ---
> >  PVE/API2/Qemu.pm   | 18 ++++++++++++++++++
> >  PVE/CLI/qm.pm      |  6 ++++++
> >  PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++
> >  3 files changed, 49 insertions(+)
> > 
> > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> > index 587bb22..6703c87 100644
> > --- a/PVE/API2/Qemu.pm
> > +++ b/PVE/API2/Qemu.pm
> > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({
> >                 optional => 1,
> >                 default => 0,
> >             },
> > +           'target-cpu' => {
> > +               optional => 1,
> > +               description => "Target Emulated CPU model. For
> > online migration, the storage is live migrate, but the memory
> > migration is skipped and the target vm is restarted.",
> > +               type => 'string',
> > +               format => 'pve-vm-cpu-conf',
> > +           },
> >             'target-storage' => get_standard_option('pve-
> > targetstorage', {
> >                 completion =>
> > \&PVE::QemuServer::complete_migration_storage,
> >                 optional => 0,
> > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({
> >         raise_param_exc({ 'target-bridge' => "failed to parse
> > bridge map: $@" })
> >             if $@;
> >  
> > +       my $target_cpu = extract_param($param, 'target-cpu');
> 
> this is okay
> 
> > +
> >         die "remote migration requires explicit storage mapping!\n"
> >             if $storagemap->{identity};
> >  
> >         $param->{storagemap} = $storagemap;
> >         $param->{bridgemap} = $bridgemap;
> > +       $param->{targetcpu} = $target_cpu;
> 
> but this is a bit confusing with the variable/hash key naming ;)
> 
> >         $param->{remote} = {
> >             conn => $conn_args, # re-use fingerprint for tunnel
> >             client => $api_client,
> > @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({
> >                     PVE::QemuServer::nbd_stop($state->{vmid});
> >                     return;
> >                 },
> > +               'restart' => sub {
> > +                   PVE::QemuServer::vm_stop(undef, $state->{vmid},
> > 1, 1);
> > +                   my $info = PVE::QemuServer::vm_start_nolock(
> > +                       $state->{storecfg},
> > +                       $state->{vmid},
> > +                       $state->{conf},
> > +                   );
> > +                   return;
> > +               },
> >                 'resume' => sub {
> >                     if
> > (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) {
> >                         PVE::QemuServer::vm_resume($state->{vmid},
> > 1, 1);
> > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
> > index c3c2982..06c74c1 100755
> > --- a/PVE/CLI/qm.pm
> > +++ b/PVE/CLI/qm.pm
> > @@ -189,6 +189,12 @@ __PACKAGE__->register_method({
> >                 optional => 1,
> >                 default => 0,
> >             },
> > +           'target-cpu' => {
> > +               optional => 1,
> > +               description => "Target Emulated CPU model. For
> > online migration, the storage is live migrate, but the memory
> > migration is skipped and the target vm is restarted.",
> > +               type => 'string',
> > +               format => 'pve-vm-cpu-conf',
> > +           },
> >             'target-storage' => get_standard_option('pve-
> > targetstorage', {
> >                 completion =>
> > \&PVE::QemuServer::complete_migration_storage,
> >                 optional => 0,
> > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> > index e182415..04f8053 100644
> > --- a/PVE/QemuMigrate.pm
> > +++ b/PVE/QemuMigrate.pm
> > @@ -731,6 +731,11 @@ sub cleanup_bitmaps {
> >  sub live_migration {
> >      my ($self, $vmid, $migrate_uri, $spice_port) = @_;
> >  
> > +    if($self->{opts}->{targetcpu}){
> > +        $self->log('info', "target cpu is different - skip live
> > migration.");
> > +        return;
> > +    }
> > +
> >      my $conf = $self->{vmconf};
> >  
> >      $self->log('info', "starting online/live migration on
> > $migrate_uri");
> > @@ -995,6 +1000,7 @@ sub phase1_remote {
> >      my $remote_conf = PVE::QemuConfig->load_config($vmid);
> >      PVE::QemuConfig->update_volume_ids($remote_conf, $self-
> > >{volume_map});
> >  
> > +    $remote_conf->{cpu} = $self->{opts}->{targetcpu};
> 
> do we need permission checks here (or better, somewhere early on, for
> doing this here)
> 
> >      my $bridges = map_bridges($remote_conf, $self->{opts}-
> > >{bridgemap});
> >      for my $target (keys $bridges->%*) {
> >         for my $nic (keys $bridges->{$target}->%*) {
> > @@ -1354,6 +1360,21 @@ sub phase2 {
> >      live_migration($self, $vmid, $migrate_uri, $spice_port);
> >  
> >      if ($self->{storage_migration}) {
> > +
> > +        #freeze source vm io/s if target cpu is different (no
> > livemigration)
> > +       if ($self->{opts}->{targetcpu}) {
> > +           my $agent_running = $self->{conf}->{agent} &&
> > PVE::QemuServer::qga_check_running($vmid);
> > +           if ($agent_running) {
> > +               print "freeze filesystem\n";
> > +               eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); };
> > +               die $@ if $@;
> 
> die here
> 
> > +           } else {
> > +               print "suspend vm\n";
> > +               eval { PVE::QemuServer::vm_suspend($vmid, 1); };
> > +               warn $@ if $@;
> 
> but warn here?
> 
> I'd like some more rationale for these two variants, what are the
> pros
> and cons? should we make it configurable?
> 
> > +           }
> > +       }
> > +
> >         # finish block-job with block-job-cancel, to disconnect
> > source VM from NBD
> >         # to avoid it trying to re-establish it. We are in blockjob
> > ready state,
> >         # thus, this command changes to it to blockjob complete
> > (see qapi docs)
> > @@ -1608,6 +1629,10 @@ sub phase3_cleanup {
> >      # clear migrate lock
> >      if ($tunnel && $tunnel->{version} >= 2) {
> >         PVE::Tunnel::write_tunnel($tunnel, 10, "unlock");
> > +       if ($self->{opts}->{targetcpu}) {
> > +           $self->log('info', "target cpu is different - restart
> > target vm.");
> > +           PVE::Tunnel::write_tunnel($tunnel, 10, 'restart');
> > +       }
> >  
> >         PVE::Tunnel::finish_tunnel($tunnel);
> >      } else {
> > -- 
> > 2.30.2
> > 
> > 
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel@lists.proxmox.com
> > https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU
> > 
> > 
> > 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-27  5:50     ` DERUMIER, Alexandre
@ 2023-04-27  7:32       ` Fabian Grünbichler
  2023-04-28  6:43         ` DERUMIER, Alexandre
  0 siblings, 1 reply; 11+ messages in thread
From: Fabian Grünbichler @ 2023-04-27  7:32 UTC (permalink / raw)
  To: Proxmox VE development discussion

On April 27, 2023 7:50 am, DERUMIER, Alexandre wrote:
> Hi,
> 
> Le mercredi 26 avril 2023 à 15:14 +0200, Fabian Grünbichler a écrit :
>> On April 25, 2023 6:52 pm, Alexandre Derumier wrote:
>> > This patch add support for remote migration when target
>> > cpu model is different.
>> > 
>> > The target vm is restart after the migration
>> 
>> so this effectively introduces a new "hybrid" migration mode ;) the
>> changes are a bit smaller than I expected (in part thanks to patch
>> #1),
>> which is good.
>> 
>> there are semi-frequent requests for another variant (also applicable
>> to
>> containers) in the form of a two phase migration
>> - storage migrate
>> - stop guest
>> - incremental storage migrate
>> - start guest on target
>> 
> 
> But I'm not sure how to to an incremental storage migrate, without
> storage snapshot send|receiv.  (so zfs && rbd could work).
> 
> - Vm/ct is running
> - do a first snapshot + sync to target with zfs|rbd send|receive
> - stop the guest
> - do a second snapshot + incremental sync + sync to target with zfs|rbd
> send|receive
> - start the guest on remote
> 
> 
> (or maybe for vm, without snapshot, with a dirty bitmap ? But we need
> to be able to write the dirty map content to disk somewhere after vm
> stop, and reread it for the last increment )

theoretically, we could support such a mode for non-snapshot storages by
using bitmaps+block-mirror, yes. either with a target VM, or with
qemu-storage-daemon on the target node exposing the target volumes

> - vm is running
> - create a dirty-bitmap and start sync with qemu-block-storage
> - stop the vm && save the dirty bitmap
> - reread the dirtymap && do incremental sync (with the new qemu-daemon-
> storage or starting the vm paused ?

stop here could also just mean stop the guest OS, but leave the process
for the incremental sync, so it would not need persistent bitmap
support.

> And currently we don't support yet offline storage migration. (BTW,
> This is also breaking migration with unused disk).
> I don't known if we can send send|receiv transfert through the tunnel ?
> (I never tested it)

we do, but maybe you tested with RBD which doesn't support storage
migration yet? withing a cluster it doesn't need to, since it's a shared
storage, but between cluster we need to implement it (it's on my TODO
list and shouldn't be too hard since there is 'rbd export/import').

>> given that it might make sense to save-guard this implementation
>> here,
>> and maybe switch to a new "mode" parameter?
>> 
>> online => switching CPU not allowed
>> offline or however-we-call-this-new-mode (or in the future, two-
>> phase-restart) => switching CPU allowed
>> 
> 
> Yes, I was thinking about that too.
> Maybe not "offline", because maybe we want to implement a real offline
> mode later.
> But simply "restart" ?

no, I meant moving the existing --online switch to a new mode parameter,
then we'd have "online" and "offline", and then add your new mode on top
"however-we-call-this-new-mode", and then we could in the future also
add "two-phase-restart" for the sync-twice mode I described :)

target-cpu would of course also be supported for the (existing) offline
mode, since it just needs to adapt the target-cpu in the config.

the main thing I'd want to avoid is somebody accidentally setting
"target-cpu", not knowing/noticing that that entails what amounts to a
reset of the VM as part of the migration..

there were a few things down below that might also be worthy of
discussion. I also wonder whether the two variants of "freeze FS" and
"suspend without state" are enough - that only ensures that no more I/O
happens so the volumes are bitwise identical, but shouldn't we also at
least have the option of doing a clean shutdown at that point so that
applications can serialize/flush their state properly and that gets
synced across as well? else this is the equivalent of cutting the power
cord, which might not be a good fit for all use cases ;)

>> > 
>> > Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
>> > ---
>> >  PVE/API2/Qemu.pm   | 18 ++++++++++++++++++
>> >  PVE/CLI/qm.pm      |  6 ++++++
>> >  PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++
>> >  3 files changed, 49 insertions(+)
>> > 
>> > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
>> > index 587bb22..6703c87 100644
>> > --- a/PVE/API2/Qemu.pm
>> > +++ b/PVE/API2/Qemu.pm
>> > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({
>> >                 optional => 1,
>> >                 default => 0,
>> >             },
>> > +           'target-cpu' => {
>> > +               optional => 1,
>> > +               description => "Target Emulated CPU model. For
>> > online migration, the storage is live migrate, but the memory
>> > migration is skipped and the target vm is restarted.",
>> > +               type => 'string',
>> > +               format => 'pve-vm-cpu-conf',
>> > +           },
>> >             'target-storage' => get_standard_option('pve-
>> > targetstorage', {
>> >                 completion =>
>> > \&PVE::QemuServer::complete_migration_storage,
>> >                 optional => 0,
>> > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({
>> >         raise_param_exc({ 'target-bridge' => "failed to parse
>> > bridge map: $@" })
>> >             if $@;
>> >  
>> > +       my $target_cpu = extract_param($param, 'target-cpu');
>> 
>> this is okay
>> 
>> > +
>> >         die "remote migration requires explicit storage mapping!\n"
>> >             if $storagemap->{identity};
>> >  
>> >         $param->{storagemap} = $storagemap;
>> >         $param->{bridgemap} = $bridgemap;
>> > +       $param->{targetcpu} = $target_cpu;
>> 
>> but this is a bit confusing with the variable/hash key naming ;)
>> 
>> >         $param->{remote} = {
>> >             conn => $conn_args, # re-use fingerprint for tunnel
>> >             client => $api_client,
>> > @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({
>> >                     PVE::QemuServer::nbd_stop($state->{vmid});
>> >                     return;
>> >                 },
>> > +               'restart' => sub {
>> > +                   PVE::QemuServer::vm_stop(undef, $state->{vmid},
>> > 1, 1);
>> > +                   my $info = PVE::QemuServer::vm_start_nolock(
>> > +                       $state->{storecfg},
>> > +                       $state->{vmid},
>> > +                       $state->{conf},
>> > +                   );
>> > +                   return;
>> > +               },
>> >                 'resume' => sub {
>> >                     if
>> > (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) {
>> >                         PVE::QemuServer::vm_resume($state->{vmid},
>> > 1, 1);
>> > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
>> > index c3c2982..06c74c1 100755
>> > --- a/PVE/CLI/qm.pm
>> > +++ b/PVE/CLI/qm.pm
>> > @@ -189,6 +189,12 @@ __PACKAGE__->register_method({
>> >                 optional => 1,
>> >                 default => 0,
>> >             },
>> > +           'target-cpu' => {
>> > +               optional => 1,
>> > +               description => "Target Emulated CPU model. For
>> > online migration, the storage is live migrate, but the memory
>> > migration is skipped and the target vm is restarted.",
>> > +               type => 'string',
>> > +               format => 'pve-vm-cpu-conf',
>> > +           },
>> >             'target-storage' => get_standard_option('pve-
>> > targetstorage', {
>> >                 completion =>
>> > \&PVE::QemuServer::complete_migration_storage,
>> >                 optional => 0,
>> > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
>> > index e182415..04f8053 100644
>> > --- a/PVE/QemuMigrate.pm
>> > +++ b/PVE/QemuMigrate.pm
>> > @@ -731,6 +731,11 @@ sub cleanup_bitmaps {
>> >  sub live_migration {
>> >      my ($self, $vmid, $migrate_uri, $spice_port) = @_;
>> >  
>> > +    if($self->{opts}->{targetcpu}){
>> > +        $self->log('info', "target cpu is different - skip live
>> > migration.");
>> > +        return;
>> > +    }
>> > +
>> >      my $conf = $self->{vmconf};
>> >  
>> >      $self->log('info', "starting online/live migration on
>> > $migrate_uri");
>> > @@ -995,6 +1000,7 @@ sub phase1_remote {
>> >      my $remote_conf = PVE::QemuConfig->load_config($vmid);
>> >      PVE::QemuConfig->update_volume_ids($remote_conf, $self-
>> > >{volume_map});
>> >  
>> > +    $remote_conf->{cpu} = $self->{opts}->{targetcpu};
>> 
>> do we need permission checks here (or better, somewhere early on, for
>> doing this here)
>> 
>> >      my $bridges = map_bridges($remote_conf, $self->{opts}-
>> > >{bridgemap});
>> >      for my $target (keys $bridges->%*) {
>> >         for my $nic (keys $bridges->{$target}->%*) {
>> > @@ -1354,6 +1360,21 @@ sub phase2 {
>> >      live_migration($self, $vmid, $migrate_uri, $spice_port);
>> >  
>> >      if ($self->{storage_migration}) {
>> > +
>> > +        #freeze source vm io/s if target cpu is different (no
>> > livemigration)
>> > +       if ($self->{opts}->{targetcpu}) {
>> > +           my $agent_running = $self->{conf}->{agent} &&
>> > PVE::QemuServer::qga_check_running($vmid);
>> > +           if ($agent_running) {
>> > +               print "freeze filesystem\n";
>> > +               eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); };
>> > +               die $@ if $@;
>> 
>> die here
>> 
>> > +           } else {
>> > +               print "suspend vm\n";
>> > +               eval { PVE::QemuServer::vm_suspend($vmid, 1); };
>> > +               warn $@ if $@;
>> 
>> but warn here?
>> 
>> I'd like some more rationale for these two variants, what are the
>> pros
>> and cons? should we make it configurable?
>> > +           }
>> > +       }
>> > +
>> >         # finish block-job with block-job-cancel, to disconnect
>> > source VM from NBD
>> >         # to avoid it trying to re-establish it. We are in blockjob
>> > ready state,
>> >         # thus, this command changes to it to blockjob complete
>> > (see qapi docs)
>> > @@ -1608,6 +1629,10 @@ sub phase3_cleanup {
>> >      # clear migrate lock
>> >      if ($tunnel && $tunnel->{version} >= 2) {
>> >         PVE::Tunnel::write_tunnel($tunnel, 10, "unlock");
>> > +       if ($self->{opts}->{targetcpu}) {
>> > +           $self->log('info', "target cpu is different - restart
>> > target vm.");
>> > +           PVE::Tunnel::write_tunnel($tunnel, 10, 'restart');
>> > +       }
>> >  
>> >         PVE::Tunnel::finish_tunnel($tunnel);
>> >      } else {
>> > -- 
>> > 2.30.2
>> > 
>> > 
>> > _______________________________________________
>> > pve-devel mailing list
>> > pve-devel@lists.proxmox.com
>> > https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU
>> > 
>> > 
>> > 
>> 
>> 
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU
>> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-27  7:32       ` Fabian Grünbichler
@ 2023-04-28  6:43         ` DERUMIER, Alexandre
  2023-04-28  9:12           ` Fabian Grünbichler
  0 siblings, 1 reply; 11+ messages in thread
From: DERUMIER, Alexandre @ 2023-04-28  6:43 UTC (permalink / raw)
  To: pve-devel

> >
>> And currently we don't support yet offline storage migration. (BTW,
>> This is also breaking migration with unused disk).
>> I don't known if we can send send|receiv transfert through the
tunnel ?
>> (I never tested it)

> we do, but maybe you tested with RBD which doesn't support storage
> migration yet? withing a cluster it doesn't need to, since it's a
> shared
> storage, but between cluster we need to implement it (it's on my TODO
> list and shouldn't be too hard since there is 'rbd export/import').
> 
Yes, this was with an unused rbd device indeed.
(Another way could be to implement qemu-storage-daemon (never tested
it) for offline sync with any storage, like lvm)

Also cloud-init drive seem to be unmigratable currently. (I wonder if
we couldn't simply regenerate it on target, as now we have cloud-init
pending section, we can correctly generate the cloudinit with current
running config).



> > > given that it might make sense to save-guard this implementation
> > > here,
> > > and maybe switch to a new "mode" parameter?
> > > 
> > > online => switching CPU not allowed
> > > offline or however-we-call-this-new-mode (or in the future, two-
> > > phase-restart) => switching CPU allowed
> > > 
> > 
> > Yes, I was thinking about that too.
> > Maybe not "offline", because maybe we want to implement a real
> > offline
> > mode later.
> > But simply "restart" ?
> 
> no, I meant moving the existing --online switch to a new mode
> parameter,
> then we'd have "online" and "offline", and then add your new mode on
> top
> "however-we-call-this-new-mode", and then we could in the future also
> add "two-phase-restart" for the sync-twice mode I described :)
> 
> target-cpu would of course also be supported for the (existing)
> offline
> mode, since it just needs to adapt the target-cpu in the config.
> 
> the main thing I'd want to avoid is somebody accidentally setting
> "target-cpu", not knowing/noticing that that entails what amounts to
> a
> reset of the VM as part of the migration..
> 
Yes, that what I had understanded
 ;)

It's was more about "offline" term, because we don't offline the source
vm until the disk migration is finished. (to reduce downtime)
More like "online-restart" instead "offline".

Offline for me , is really, we shut the vm, then do the disk migration.
 

> there were a few things down below that might also be worthy of
> discussion. I also wonder whether the two variants of "freeze FS" and
> "suspend without state" are enough - that only ensures that no more
> I/O
> happens so the volumes are bitwise identical, but shouldn't we also
> at
> least have the option of doing a clean shutdown at that point so that
> applications can serialize/flush their state properly and that gets
> synced across as well? else this is the equivalent of cutting the
> power
> cord, which might not be a good fit for all use cases ;)
> 
I had try the clean shutdown in my v1 patch 
https://lists.proxmox.com/pipermail/pve-devel/2023-March/056291.html
(without doing the block-job-complete) in phase3,  and I have fs
coruption sometime.
Not sure why exactly (Maybe os didn't have correctly shutdown or maybe
some datas in the buffer ?)
Maybe doing the block-job-complete before should make it safe.
(transfert access to the nbd , then do the clean shutdown).

I'll give a try in the V3. 

I just wonder if we can add a new param, like:

--online --fsfreeze

--online --shutdown

--online --2phase-restart

?




 (I'm currently migrating a lot of vm between an old intel cluster to
the new amd cluster, on different datacenter, with a different ceph
cluster, so I can still do real production tests)




> > > > 
> > > > Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> > > > ---
> > > >  PVE/API2/Qemu.pm   | 18 ++++++++++++++++++
> > > >  PVE/CLI/qm.pm      |  6 ++++++
> > > >  PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++
> > > >  3 files changed, 49 insertions(+)
> > > > 
> > > > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> > > > index 587bb22..6703c87 100644
> > > > --- a/PVE/API2/Qemu.pm
> > > > +++ b/PVE/API2/Qemu.pm
> > > > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({
> > > >                 optional => 1,
> > > >                 default => 0,
> > > >             },
> > > > +           'target-cpu' => {
> > > > +               optional => 1,
> > > > +               description => "Target Emulated CPU model. For
> > > > online migration, the storage is live migrate, but the memory
> > > > migration is skipped and the target vm is restarted.",
> > > > +               type => 'string',
> > > > +               format => 'pve-vm-cpu-conf',
> > > > +           },
> > > >             'target-storage' => get_standard_option('pve-
> > > > targetstorage', {
> > > >                 completion =>
> > > > \&PVE::QemuServer::complete_migration_storage,
> > > >                 optional => 0,
> > > > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({
> > > >         raise_param_exc({ 'target-bridge' => "failed to parse
> > > > bridge map: $@" })
> > > >             if $@;
> > > >  
> > > > +       my $target_cpu = extract_param($param, 'target-cpu');
> > > 
> > > this is okay
> > > 
> > > > +
> > > >         die "remote migration requires explicit storage
> > > > mapping!\n"
> > > >             if $storagemap->{identity};
> > > >  
> > > >         $param->{storagemap} = $storagemap;
> > > >         $param->{bridgemap} = $bridgemap;
> > > > +       $param->{targetcpu} = $target_cpu;
> > > 
> > > but this is a bit confusing with the variable/hash key naming ;)
> > > 
> > > >         $param->{remote} = {
> > > >             conn => $conn_args, # re-use fingerprint for tunnel
> > > >             client => $api_client,
> > > > @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({
> > > >                     PVE::QemuServer::nbd_stop($state->{vmid});
> > > >                     return;
> > > >                 },
> > > > +               'restart' => sub {
> > > > +                   PVE::QemuServer::vm_stop(undef, $state-
> > > > >{vmid},
> > > > 1, 1);
> > > > +                   my $info =
> > > > PVE::QemuServer::vm_start_nolock(
> > > > +                       $state->{storecfg},
> > > > +                       $state->{vmid},
> > > > +                       $state->{conf},
> > > > +                   );
> > > > +                   return;
> > > > +               },
> > > >                 'resume' => sub {
> > > >                     if
> > > > (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid}))
> > > > {
> > > >                         PVE::QemuServer::vm_resume($state-
> > > > >{vmid},
> > > > 1, 1);
> > > > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
> > > > index c3c2982..06c74c1 100755
> > > > --- a/PVE/CLI/qm.pm
> > > > +++ b/PVE/CLI/qm.pm
> > > > @@ -189,6 +189,12 @@ __PACKAGE__->register_method({
> > > >                 optional => 1,
> > > >                 default => 0,
> > > >             },
> > > > +           'target-cpu' => {
> > > > +               optional => 1,
> > > > +               description => "Target Emulated CPU model. For
> > > > online migration, the storage is live migrate, but the memory
> > > > migration is skipped and the target vm is restarted.",
> > > > +               type => 'string',
> > > > +               format => 'pve-vm-cpu-conf',
> > > > +           },
> > > >             'target-storage' => get_standard_option('pve-
> > > > targetstorage', {
> > > >                 completion =>
> > > > \&PVE::QemuServer::complete_migration_storage,
> > > >                 optional => 0,
> > > > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> > > > index e182415..04f8053 100644
> > > > --- a/PVE/QemuMigrate.pm
> > > > +++ b/PVE/QemuMigrate.pm
> > > > @@ -731,6 +731,11 @@ sub cleanup_bitmaps {
> > > >  sub live_migration {
> > > >      my ($self, $vmid, $migrate_uri, $spice_port) = @_;
> > > >  
> > > > +    if($self->{opts}->{targetcpu}){
> > > > +        $self->log('info', "target cpu is different - skip
> > > > live
> > > > migration.");
> > > > +        return;
> > > > +    }
> > > > +
> > > >      my $conf = $self->{vmconf};
> > > >  
> > > >      $self->log('info', "starting online/live migration on
> > > > $migrate_uri");
> > > > @@ -995,6 +1000,7 @@ sub phase1_remote {
> > > >      my $remote_conf = PVE::QemuConfig->load_config($vmid);
> > > >      PVE::QemuConfig->update_volume_ids($remote_conf, $self-
> > > > > {volume_map});
> > > >  
> > > > +    $remote_conf->{cpu} = $self->{opts}->{targetcpu};
> > > 
> > > do we need permission checks here (or better, somewhere early on,
> > > for
> > > doing this here)
> > > 
> > > >      my $bridges = map_bridges($remote_conf, $self->{opts}-
> > > > > {bridgemap});
> > > >      for my $target (keys $bridges->%*) {
> > > >         for my $nic (keys $bridges->{$target}->%*) {
> > > > @@ -1354,6 +1360,21 @@ sub phase2 {
> > > >      live_migration($self, $vmid, $migrate_uri, $spice_port);
> > > >  
> > > >      if ($self->{storage_migration}) {
> > > > +
> > > > +        #freeze source vm io/s if target cpu is different (no
> > > > livemigration)
> > > > +       if ($self->{opts}->{targetcpu}) {
> > > > +           my $agent_running = $self->{conf}->{agent} &&
> > > > PVE::QemuServer::qga_check_running($vmid);
> > > > +           if ($agent_running) {
> > > > +               print "freeze filesystem\n";
> > > > +               eval { mon_cmd($vmid, "guest-fsfreeze-freeze");
> > > > };
> > > > +               die $@ if $@;
> > > 
> > > die here
> > > 
> > > > +           } else {
> > > > +               print "suspend vm\n";
> > > > +               eval { PVE::QemuServer::vm_suspend($vmid, 1);
> > > > };
> > > > +               warn $@ if $@;
> > > 
> > > but warn here?
> > > 
> > > I'd like some more rationale for these two variants, what are the
> > > pros
> > > and cons? should we make it configurable?
> > > > +           }
> > > > +       }
> > > > +
> > > >         # finish block-job with block-job-cancel, to disconnect
> > > > source VM from NBD
> > > >         # to avoid it trying to re-establish it. We are in
> > > > blockjob
> > > > ready state,
> > > >         # thus, this command changes to it to blockjob complete
> > > > (see qapi docs)
> > > > @@ -1608,6 +1629,10 @@ sub phase3_cleanup {
> > > >      # clear migrate lock
> > > >      if ($tunnel && $tunnel->{version} >= 2) {
> > > >         PVE::Tunnel::write_tunnel($tunnel, 10, "unlock");
> > > > +       if ($self->{opts}->{targetcpu}) {
> > > > +           $self->log('info', "target cpu is different -
> > > > restart
> > > > target vm.");
> > > > +           PVE::Tunnel::write_tunnel($tunnel, 10, 'restart');
> > > > +       }
> > > >  
> > > >         PVE::Tunnel::finish_tunnel($tunnel);
> > > >      } else {
> > > > -- 
> > > > 2.30.2
> > > > 
> > > > 
> > > > _______________________________________________
> > > > pve-devel mailing list
> > > > pve-devel@lists.proxmox.com
> > > > https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//antiphishing.cetsi.fr/proxy/v3%3Fi%3DZk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0%26r%3DbHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g%26f%3DSlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg%26u%3Dhttps%253A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel%26k%3DXRKU&k=F1is
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > _______________________________________________
> > > pve-devel mailing list
> > > pve-devel@lists.proxmox.com
> > > https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//antiphishing.cetsi.fr/proxy/v3%3Fi%3DZk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0%26r%3DbHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g%26f%3DSlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg%26u%3Dhttps%253A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel%26k%3DXRKU&k=F1is
> > > 
> > 
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel@lists.proxmox.com
> > https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=F1is
> > 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=F1is


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-28  6:43         ` DERUMIER, Alexandre
@ 2023-04-28  9:12           ` Fabian Grünbichler
  2023-04-29  7:57             ` Thomas Lamprecht
  0 siblings, 1 reply; 11+ messages in thread
From: Fabian Grünbichler @ 2023-04-28  9:12 UTC (permalink / raw)
  To: Proxmox VE development discussion

On April 28, 2023 8:43 am, DERUMIER, Alexandre wrote:
>> >
>>> And currently we don't support yet offline storage migration. (BTW,
>>> This is also breaking migration with unused disk).
>>> I don't known if we can send send|receiv transfert through the
> tunnel ?
>>> (I never tested it)
> 
>> we do, but maybe you tested with RBD which doesn't support storage
>> migration yet? withing a cluster it doesn't need to, since it's a
>> shared
>> storage, but between cluster we need to implement it (it's on my TODO
>> list and shouldn't be too hard since there is 'rbd export/import').
>> 
> Yes, this was with an unused rbd device indeed.
> (Another way could be to implement qemu-storage-daemon (never tested
> it) for offline sync with any storage, like lvm)
> 
> Also cloud-init drive seem to be unmigratable currently. (I wonder if
> we couldn't simply regenerate it on target, as now we have cloud-init
> pending section, we can correctly generate the cloudinit with current
> running config).
> 
> 
> 
>> > > given that it might make sense to save-guard this implementation
>> > > here,
>> > > and maybe switch to a new "mode" parameter?
>> > > 
>> > > online => switching CPU not allowed
>> > > offline or however-we-call-this-new-mode (or in the future, two-
>> > > phase-restart) => switching CPU allowed
>> > > 
>> > 
>> > Yes, I was thinking about that too.
>> > Maybe not "offline", because maybe we want to implement a real
>> > offline
>> > mode later.
>> > But simply "restart" ?
>> 
>> no, I meant moving the existing --online switch to a new mode
>> parameter,
>> then we'd have "online" and "offline", and then add your new mode on
>> top
>> "however-we-call-this-new-mode", and then we could in the future also
>> add "two-phase-restart" for the sync-twice mode I described :)
>> 
>> target-cpu would of course also be supported for the (existing)
>> offline
>> mode, since it just needs to adapt the target-cpu in the config.
>> 
>> the main thing I'd want to avoid is somebody accidentally setting
>> "target-cpu", not knowing/noticing that that entails what amounts to
>> a
>> reset of the VM as part of the migration..
>> 
> Yes, that what I had understanded
>  ;)
> 
> It's was more about "offline" term, because we don't offline the source
> vm until the disk migration is finished. (to reduce downtime)
> More like "online-restart" instead "offline".
> 
> Offline for me , is really, we shut the vm, then do the disk migration.

hmm, I guess how you see it. for me, online means without interruption,
anything else is offline :) but yeah, naming is hard, as always ;)

>> there were a few things down below that might also be worthy of
>> discussion. I also wonder whether the two variants of "freeze FS" and
>> "suspend without state" are enough - that only ensures that no more
>> I/O
>> happens so the volumes are bitwise identical, but shouldn't we also
>> at
>> least have the option of doing a clean shutdown at that point so that
>> applications can serialize/flush their state properly and that gets
>> synced across as well? else this is the equivalent of cutting the
>> power
>> cord, which might not be a good fit for all use cases ;)
>> 
> I had try the clean shutdown in my v1 patch 
> https://lists.proxmox.com/pipermail/pve-devel/2023-March/056291.html
> (without doing the block-job-complete) in phase3,  and I have fs
> coruption sometime.
> Not sure why exactly (Maybe os didn't have correctly shutdown or maybe
> some datas in the buffer ?)
> Maybe doing the block-job-complete before should make it safe.
> (transfert access to the nbd , then do the clean shutdown).

possibly we need a special "shutdown guest, but leave qemu running" way
of shutting down (so that the guest and any applications within can do
their thing, and the block job can transfer all the delta across).
completing or cancelling the block job before the guest has shut down
would mean the source and target are not consistent (since shutdown can
change the disk content, and that would then not be mirrored anymore?),
so I don't see any way that that could be an improvement. it would mean
that starting the shutdown is already the point of no return -
cancelling before would mean writes are not transferred to the target,
completing before would mean writes are not written to the source
anymore, so we can't fallback to the source node in error handling.

I guess we could have to approaches:

A - freeze or suspend (depending on QGA availability), then complete
block job and (re)start target VM
B - shutdown guest OS, complete, then exit source VM and (re)start
target VM

as always, there's a tradeoff there - A is faster, but less consistent
from the guests point of view (somwhat similar to pulling the power
cable). B can take a while (== service downtime!), but it has the same
semantics as a reboot.

there are also IMHO multiple ways to think about the target side:

A start VM in migration mode, but kill it without ever doing any
migration, then start it again with modified config (current approach)
B start VM paused (like when doing a backup of a stopped VM, without
incoming migration), but with overridden CPU parameter, then just
'resume' it when the block migration is finished
C don't start a VM at all, just the block devices via
qemu-storage-daemon for the block migration, then do a regular start
after the block migration and config update are done

B has the advantage over A that we don't risk the VM not being able to
restart (e.g., because of a race for memory or pass-through resources),
and also the resume should be (depending on exact environment possibly
quite a bit) faster than kill+start
C has the advantage over A and B that the migration itself is cheaper
resource-wise, but the big downside that we don't even know if the VM is
startable on the target node, and of course, it's a lot more code to
write. possibly I just included it because I am looking for an excuse to
play around with qemu-storage-daemon - it's probably the least relevant
variant for now ;)

> 
> I'll give a try in the V3. 
> 
> I just wonder if we can add a new param, like:
> 
> --online --fsfreeze
> 
> --online --shutdown
> 
> --online --2phase-restart

that would also be an option. not sure by heart if it's possible to
make --online into a property string that is backwards compatible with
the "plain boolean" option? if so, we could do

--online [mode=live,qga,suspend,shutdown,2phase,..]

with live being the default (not supporting target-cpu) and
qga,suspend,shutdown all handling target-cpu (2phase just included for
completeness sake)

alternative, if that doesn't work, having --online [--online-mode live,qga,suspend,..]

would be my second choice I guess, if we are reasonable sure that all
the possible extensions would be for running VMs only. the only thing
counter to that that I can think of would be storage migration using
qemu-storage-daemon (e.g., like you said, to somehow bolt on
incremental support using persistent bitmaps for storages/image formats
that don't support that otherwise), and there I am not even sure whether
that couldn't be somehow handled in pve-storage anyway

>  (I'm currently migrating a lot of vm between an old intel cluster to
> the new amd cluster, on different datacenter, with a different ceph
> cluster, so I can still do real production tests)

technically, target-cpu might also be a worthwhile feature for
heterogenous clusters where a proper/full live migration is not possible
for certain node/CPU combinations.. we do already update volume IDs when
using 'targetstorage', so also updating the CPU should be doable there
as well. using the still experimental remote migration as a field for
evaluation is fine, just something to keep in mind while thinking about
options, so that we don't accidentally maneuver ourselves into a corner
that makes that part impossible :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-28  9:12           ` Fabian Grünbichler
@ 2023-04-29  7:57             ` Thomas Lamprecht
  2023-05-02  8:30               ` Fabian Grünbichler
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Lamprecht @ 2023-04-29  7:57 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

Am 28/04/2023 um 11:12 schrieb Fabian Grünbichler:
>> It's was more about "offline" term, because we don't offline the source
>> vm until the disk migration is finished. (to reduce downtime)
>> More like "online-restart" instead "offline".
>>
>> Offline for me , is really, we shut the vm, then do the disk migration.
> hmm, I guess how you see it. for me, online means without interruption,
> anything else is offline 😄 but yeah, naming is hard, as always 😉

FWIW, in Proxmox Container land that's currently basically the "most online"
it gets, and there it's named "restore migration" – at least if we go for the
"clean reboot for actual moving the guest over" approach.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-29  7:57             ` Thomas Lamprecht
@ 2023-05-02  8:30               ` Fabian Grünbichler
  0 siblings, 0 replies; 11+ messages in thread
From: Fabian Grünbichler @ 2023-05-02  8:30 UTC (permalink / raw)
  To: Proxmox VE development discussion, Thomas Lamprecht

On April 29, 2023 9:57 am, Thomas Lamprecht wrote:
> Am 28/04/2023 um 11:12 schrieb Fabian Grünbichler:
>>> It's was more about "offline" term, because we don't offline the source
>>> vm until the disk migration is finished. (to reduce downtime)
>>> More like "online-restart" instead "offline".
>>>
>>> Offline for me , is really, we shut the vm, then do the disk migration.
>> hmm, I guess how you see it. for me, online means without interruption,
>> anything else is offline 😄 but yeah, naming is hard, as always 😉
> 
> FWIW, in Proxmox Container land that's currently basically the "most online"
> it gets, and there it's named "restore migration" – at least if we go for the
> "clean reboot for actual moving the guest over" approach.

"restart", you meant? yes, but it's explicitly not "online", it's a
second parameter besides that called "restart" that cannot be combined
with "online" (nothing can, since setting "online" leads to a hard error
if the VM is running).

it also does the following:

- stop CT on source node (if running)
- storage migration (via pve-storage)
- start CT again on target node (if previously running)

similar to this series, but not quite:

- start storage migration (via pve-storage for unused/.., live via qemu
  for currently used volumes)
- wait for storage migration convergence
- stop VM on source node / complete block job (details still to be hashed out)
- start VM on target node

so naming what this series does "restart" might be confusing, since the
most fundamental part is different (the downtime is only for the restart
part, as opposed to for restart+storage migration).




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param
  2023-04-26 13:14   ` Fabian Grünbichler
  2023-04-27  5:50     ` DERUMIER, Alexandre
@ 2023-09-28 14:58     ` DERUMIER, Alexandre
  1 sibling, 0 replies; 11+ messages in thread
From: DERUMIER, Alexandre @ 2023-09-28 14:58 UTC (permalink / raw)
  To: f.gruenbichler; +Cc: pve-devel

Le mercredi 26 avril 2023 à 15:14 +0200, Fabian Grünbichler a écrit :
> On April 25, 2023 6:52 pm, Alexandre Derumier wrote:
> > This patch add support for remote migration when target
> > cpu model is different.
> > 
> > The target vm is restart after the migration
> 
> so this effectively introduces a new "hybrid" migration mode ;) the
> changes are a bit smaller than I expected (in part thanks to patch
> #1),
> which is good.
> 
> there are semi-frequent requests for another variant (also applicable
> to
> containers) in the form of a two phase migration
> - storage migrate
> - stop guest
> - incremental storage migrate
> - start guest on target
> 
> given that it might make sense to save-guard this implementation
> here,
> and maybe switch to a new "mode" parameter?
> 

I have implemented in v3 a working switch to remote nbd.

so, after the disk migration, we do a block-job-complete,
source vm is still running and now is running over nbd through the
target-vm.
Then the source vm is shutdown, flushing last pending writes through
nbd.
then the target vm is restarted



> online => switching CPU not allowed
> offline or however-we-call-this-new-mode (or in the future, two-
> phase-restart) => switching CPU allowed
> 
> > 
Still unsure about it, I have added an extra flag  in v3 "-target-
reboot"

- online : check if source vm is online
- target-cpu: change the targetcpu.  (only change value on targetvm)
- target-reboot: skip live migration, do shutdown of source vm and
restart of target vm.



> > Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> > ---
> >  PVE/API2/Qemu.pm   | 18 ++++++++++++++++++
> >  PVE/CLI/qm.pm      |  6 ++++++
> >  PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++
> >  3 files changed, 49 insertions(+)
> > 
> > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> > index 587bb22..6703c87 100644
> > --- a/PVE/API2/Qemu.pm
> > +++ b/PVE/API2/Qemu.pm
> > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({
> >                 optional => 1,
> >                 default => 0,
> >             },
> > +           'target-cpu' => {
> > +               optional => 1,
> > +               description => "Target Emulated CPU model. For
> > online migration, the storage is live migrate, but the memory
> > migration is skipped and the target vm is restarted.",
> > +               type => 'string',
> > +               format => 'pve-vm-cpu-conf',
> > +           },
> >             'target-storage' => get_standard_option('pve-
> > targetstorage', {
> >                 completion =>
> > \&PVE::QemuServer::complete_migration_storage,
> >                 optional => 0,
> > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({
> >         raise_param_exc({ 'target-bridge' => "failed to parse
> > bridge map: $@" })
> >             if $@;
> >  
> > +       my $target_cpu = extract_param($param, 'target-cpu');
> 
> this is okay
> 
> > +
> >         die "remote migration requires explicit storage mapping!\n"
> >             if $storagemap->{identity};
> >  
> >         $param->{storagemap} = $storagemap;
> >         $param->{bridgemap} = $bridgemap;
> > +       $param->{targetcpu} = $target_cpu;
> 
> but this is a bit confusing with the variable/hash key naming ;)
> 
Fixed in the v4

...
> >  
> > +    $remote_conf->{cpu} = $self->{opts}->{targetcpu};
> 
> do we need permission checks here (or better, somewhere early on, for
> doing this here)
> 
> 
> 
fixed in v4: do not override cpuconfig is targetcpu is empty.

About permission, I'm not sure but we don't have specific permission
for cpu.  Do we need to check perm on vm.config ? 
Because Anyway,we should already a have permission to create a vm on
target cluster.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-09-28 14:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-25 16:52 [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu Alexandre Derumier
2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub Alexandre Derumier
2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier
2023-04-26 13:14   ` Fabian Grünbichler
2023-04-27  5:50     ` DERUMIER, Alexandre
2023-04-27  7:32       ` Fabian Grünbichler
2023-04-28  6:43         ` DERUMIER, Alexandre
2023-04-28  9:12           ` Fabian Grünbichler
2023-04-29  7:57             ` Thomas Lamprecht
2023-05-02  8:30               ` Fabian Grünbichler
2023-09-28 14:58     ` DERUMIER, Alexandre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal