* [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu @ 2023-04-25 16:52 Alexandre Derumier 2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub Alexandre Derumier 2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier 0 siblings, 2 replies; 11+ messages in thread From: Alexandre Derumier @ 2023-04-25 16:52 UTC (permalink / raw) To: pve-devel This patch series allow remote migration between cluster with different cpu model. A new param is introduced: "target-cpu" When target-cpu is defined, the live migration with memory transfert is skipped (as anyway, the target will die with a different cpu). Then, after the storage copy, we call agent fsfreeze or suspend the vm to have coherent data. Then we stop the source vm and stop/start the target vm. Like this, we can reduce the downtime of migration to only 1 restart. Changelog v2: The first version was simply shuting down the target vm, wihout doing the block-job-complete. After doing production migration with around 400vms, I had some fs corruption, like some datas was still in buffer. This v2 has been tested with another 400vms batch, without any corruption. Alexandre Derumier (2): migration: move livemigration code in a dedicated sub remote-migration: add target-cpu param PVE/API2/Qemu.pm | 18 ++ PVE/CLI/qm.pm | 6 + PVE/QemuMigrate.pm | 439 ++++++++++++++++++++++++--------------------- 3 files changed, 260 insertions(+), 203 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub 2023-04-25 16:52 [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu Alexandre Derumier @ 2023-04-25 16:52 ` Alexandre Derumier 2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier 1 sibling, 0 replies; 11+ messages in thread From: Alexandre Derumier @ 2023-04-25 16:52 UTC (permalink / raw) To: pve-devel Signed-off-by: Alexandre Derumier <aderumier@odiso.com> --- PVE/QemuMigrate.pm | 420 +++++++++++++++++++++++---------------------- 1 file changed, 214 insertions(+), 206 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 09cc1d8..e182415 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -728,6 +728,219 @@ sub cleanup_bitmaps { } } +sub live_migration { + my ($self, $vmid, $migrate_uri, $spice_port) = @_; + + my $conf = $self->{vmconf}; + + $self->log('info', "starting online/live migration on $migrate_uri"); + $self->{livemigration} = 1; + + # load_defaults + my $defaults = PVE::QemuServer::load_defaults(); + + $self->log('info', "set migration capabilities"); + eval { PVE::QemuServer::set_migration_caps($vmid) }; + warn $@ if $@; + + my $qemu_migrate_params = {}; + + # migrate speed can be set via bwlimit (datacenter.cfg and API) and via the + # migrate_speed parameter in qm.conf - take the lower of the two. + my $bwlimit = $self->get_bwlimit(); + + my $migrate_speed = $conf->{migrate_speed} // 0; + $migrate_speed *= 1024; # migrate_speed is in MB/s, bwlimit in KB/s + + if ($bwlimit && $migrate_speed) { + $migrate_speed = ($bwlimit < $migrate_speed) ? $bwlimit : $migrate_speed; + } else { + $migrate_speed ||= $bwlimit; + } + $migrate_speed ||= ($defaults->{migrate_speed} || 0) * 1024; + + if ($migrate_speed) { + $migrate_speed *= 1024; # qmp takes migrate_speed in B/s. + $self->log('info', "migration speed limit: ". render_bytes($migrate_speed, 1) ."/s"); + } else { + # always set migrate speed as QEMU default to 128 MiBps == 1 Gbps, use 16 GiBps == 128 Gbps + $migrate_speed = (16 << 30); + } + $qemu_migrate_params->{'max-bandwidth'} = int($migrate_speed); + + my $migrate_downtime = $defaults->{migrate_downtime}; + $migrate_downtime = $conf->{migrate_downtime} if defined($conf->{migrate_downtime}); + # migrate-set-parameters expects limit in ms + $migrate_downtime *= 1000; + $self->log('info', "migration downtime limit: $migrate_downtime ms"); + $qemu_migrate_params->{'downtime-limit'} = int($migrate_downtime); + + # set cachesize to 10% of the total memory + my $memory = $conf->{memory} || $defaults->{memory}; + my $cachesize = int($memory * 1048576 / 10); + $cachesize = round_powerof2($cachesize); + + $self->log('info', "migration cachesize: " . render_bytes($cachesize, 1)); + $qemu_migrate_params->{'xbzrle-cache-size'} = int($cachesize); + + $self->log('info', "set migration parameters"); + eval { + mon_cmd($vmid, "migrate-set-parameters", %{$qemu_migrate_params}); + }; + $self->log('info', "migrate-set-parameters error: $@") if $@; + + if (PVE::QemuServer::vga_conf_has_spice($conf->{vga}) && !$self->{opts}->{remote}) { + my $rpcenv = PVE::RPCEnvironment::get(); + my $authuser = $rpcenv->get_user(); + + my (undef, $proxyticket) = PVE::AccessControl::assemble_spice_ticket($authuser, $vmid, $self->{node}); + + my $filename = "/etc/pve/nodes/$self->{node}/pve-ssl.pem"; + my $subject = PVE::AccessControl::read_x509_subject_spice($filename); + + $self->log('info', "spice client_migrate_info"); + + eval { + mon_cmd($vmid, "client_migrate_info", protocol => 'spice', + hostname => $proxyticket, 'port' => 0, 'tls-port' => $spice_port, + 'cert-subject' => $subject); + }; + $self->log('info', "client_migrate_info error: $@") if $@; + + } + + my $start = time(); + + $self->log('info', "start migrate command to $migrate_uri"); + eval { + mon_cmd($vmid, "migrate", uri => $migrate_uri); + }; + my $merr = $@; + $self->log('info', "migrate uri => $migrate_uri failed: $merr") if $merr; + + my $last_mem_transferred = 0; + my $usleep = 1000000; + my $i = 0; + my $err_count = 0; + my $lastrem = undef; + my $downtimecounter = 0; + while (1) { + $i++; + my $avglstat = $last_mem_transferred ? $last_mem_transferred / $i : 0; + + usleep($usleep); + + my $stat = eval { mon_cmd($vmid, "query-migrate") }; + if (my $err = $@) { + $err_count++; + warn "query migrate failed: $err\n"; + $self->log('info', "query migrate failed: $err"); + if ($err_count <= 5) { + usleep(1_000_000); + next; + } + die "too many query migrate failures - aborting\n"; + } + + my $status = $stat->{status}; + if (defined($status) && $status =~ m/^(setup)$/im) { + sleep(1); + next; + } + + if (!defined($status) || $status !~ m/^(active|completed|failed|cancelled)$/im) { + die $merr if $merr; + die "unable to parse migration status '$status' - aborting\n"; + } + $merr = undef; + $err_count = 0; + + my $memstat = $stat->{ram}; + + if ($status eq 'completed') { + my $delay = time() - $start; + if ($delay > 0) { + my $total = $memstat->{total} || 0; + my $avg_speed = render_bytes($total / $delay, 1); + my $downtime = $stat->{downtime} || 0; + $self->log('info', "average migration speed: $avg_speed/s - downtime $downtime ms"); + } + } + + if ($status eq 'failed' || $status eq 'cancelled') { + my $message = $stat->{'error-desc'} ? "$status - $stat->{'error-desc'}" : $status; + $self->log('info', "migration status error: $message"); + die "aborting\n" + } + + if ($status ne 'active') { + $self->log('info', "migration status: $status"); + last; + } + + if ($memstat->{transferred} ne $last_mem_transferred) { + my $trans = $memstat->{transferred} || 0; + my $rem = $memstat->{remaining} || 0; + my $total = $memstat->{total} || 0; + my $speed = ($memstat->{'pages-per-second'} // 0) * ($memstat->{'page-size'} // 0); + my $dirty_rate = ($memstat->{'dirty-pages-rate'} // 0) * ($memstat->{'page-size'} // 0); + + # reduce sleep if remainig memory is lower than the average transfer speed + $usleep = 100_000 if $avglstat && $rem < $avglstat; + + # also reduce loggin if we poll more frequent + my $should_log = $usleep > 100_000 ? 1 : ($i % 10) == 0; + + my $total_h = render_bytes($total, 1); + my $transferred_h = render_bytes($trans, 1); + my $speed_h = render_bytes($speed, 1); + + my $progress = "transferred $transferred_h of $total_h VM-state, ${speed_h}/s"; + + if ($dirty_rate > $speed) { + my $dirty_rate_h = render_bytes($dirty_rate, 1); + $progress .= ", VM dirties lots of memory: $dirty_rate_h/s"; + } + + $self->log('info', "migration $status, $progress") if $should_log; + + my $xbzrle = $stat->{"xbzrle-cache"} || {}; + my ($xbzrlebytes, $xbzrlepages) = $xbzrle->@{'bytes', 'pages'}; + if ($xbzrlebytes || $xbzrlepages) { + my $bytes_h = render_bytes($xbzrlebytes, 1); + + my $msg = "send updates to $xbzrlepages pages in $bytes_h encoded memory"; + + $msg .= sprintf(", cache-miss %.2f%%", $xbzrle->{'cache-miss-rate'} * 100) + if $xbzrle->{'cache-miss-rate'}; + + $msg .= ", overflow $xbzrle->{overflow}" if $xbzrle->{overflow}; + + $self->log('info', "xbzrle: $msg") if $should_log; + } + + if (($lastrem && $rem > $lastrem) || ($rem == 0)) { + $downtimecounter++; + } + $lastrem = $rem; + + if ($downtimecounter > 5) { + $downtimecounter = 0; + $migrate_downtime *= 2; + $self->log('info', "auto-increased downtime to continue migration: $migrate_downtime ms"); + eval { + # migrate-set-parameters does not touch values not + # specified, so this only changes downtime-limit + mon_cmd($vmid, "migrate-set-parameters", 'downtime-limit' => int($migrate_downtime)); + }; + $self->log('info', "migrate-set-parameters error: $@") if $@; + } + } + + $last_mem_transferred = $memstat->{transferred}; + } +} + sub phase1 { my ($self, $vmid) = @_; @@ -1138,212 +1351,7 @@ sub phase2 { } } - $self->log('info', "starting online/live migration on $migrate_uri"); - $self->{livemigration} = 1; - - # load_defaults - my $defaults = PVE::QemuServer::load_defaults(); - - $self->log('info', "set migration capabilities"); - eval { PVE::QemuServer::set_migration_caps($vmid) }; - warn $@ if $@; - - my $qemu_migrate_params = {}; - - # migrate speed can be set via bwlimit (datacenter.cfg and API) and via the - # migrate_speed parameter in qm.conf - take the lower of the two. - my $bwlimit = $self->get_bwlimit(); - - my $migrate_speed = $conf->{migrate_speed} // 0; - $migrate_speed *= 1024; # migrate_speed is in MB/s, bwlimit in KB/s - - if ($bwlimit && $migrate_speed) { - $migrate_speed = ($bwlimit < $migrate_speed) ? $bwlimit : $migrate_speed; - } else { - $migrate_speed ||= $bwlimit; - } - $migrate_speed ||= ($defaults->{migrate_speed} || 0) * 1024; - - if ($migrate_speed) { - $migrate_speed *= 1024; # qmp takes migrate_speed in B/s. - $self->log('info', "migration speed limit: ". render_bytes($migrate_speed, 1) ."/s"); - } else { - # always set migrate speed as QEMU default to 128 MiBps == 1 Gbps, use 16 GiBps == 128 Gbps - $migrate_speed = (16 << 30); - } - $qemu_migrate_params->{'max-bandwidth'} = int($migrate_speed); - - my $migrate_downtime = $defaults->{migrate_downtime}; - $migrate_downtime = $conf->{migrate_downtime} if defined($conf->{migrate_downtime}); - # migrate-set-parameters expects limit in ms - $migrate_downtime *= 1000; - $self->log('info', "migration downtime limit: $migrate_downtime ms"); - $qemu_migrate_params->{'downtime-limit'} = int($migrate_downtime); - - # set cachesize to 10% of the total memory - my $memory = $conf->{memory} || $defaults->{memory}; - my $cachesize = int($memory * 1048576 / 10); - $cachesize = round_powerof2($cachesize); - - $self->log('info', "migration cachesize: " . render_bytes($cachesize, 1)); - $qemu_migrate_params->{'xbzrle-cache-size'} = int($cachesize); - - $self->log('info', "set migration parameters"); - eval { - mon_cmd($vmid, "migrate-set-parameters", %{$qemu_migrate_params}); - }; - $self->log('info', "migrate-set-parameters error: $@") if $@; - - if (PVE::QemuServer::vga_conf_has_spice($conf->{vga}) && !$self->{opts}->{remote}) { - my $rpcenv = PVE::RPCEnvironment::get(); - my $authuser = $rpcenv->get_user(); - - my (undef, $proxyticket) = PVE::AccessControl::assemble_spice_ticket($authuser, $vmid, $self->{node}); - - my $filename = "/etc/pve/nodes/$self->{node}/pve-ssl.pem"; - my $subject = PVE::AccessControl::read_x509_subject_spice($filename); - - $self->log('info', "spice client_migrate_info"); - - eval { - mon_cmd($vmid, "client_migrate_info", protocol => 'spice', - hostname => $proxyticket, 'port' => 0, 'tls-port' => $spice_port, - 'cert-subject' => $subject); - }; - $self->log('info', "client_migrate_info error: $@") if $@; - - } - - my $start = time(); - - $self->log('info', "start migrate command to $migrate_uri"); - eval { - mon_cmd($vmid, "migrate", uri => $migrate_uri); - }; - my $merr = $@; - $self->log('info', "migrate uri => $migrate_uri failed: $merr") if $merr; - - my $last_mem_transferred = 0; - my $usleep = 1000000; - my $i = 0; - my $err_count = 0; - my $lastrem = undef; - my $downtimecounter = 0; - while (1) { - $i++; - my $avglstat = $last_mem_transferred ? $last_mem_transferred / $i : 0; - - usleep($usleep); - - my $stat = eval { mon_cmd($vmid, "query-migrate") }; - if (my $err = $@) { - $err_count++; - warn "query migrate failed: $err\n"; - $self->log('info', "query migrate failed: $err"); - if ($err_count <= 5) { - usleep(1_000_000); - next; - } - die "too many query migrate failures - aborting\n"; - } - - my $status = $stat->{status}; - if (defined($status) && $status =~ m/^(setup)$/im) { - sleep(1); - next; - } - - if (!defined($status) || $status !~ m/^(active|completed|failed|cancelled)$/im) { - die $merr if $merr; - die "unable to parse migration status '$status' - aborting\n"; - } - $merr = undef; - $err_count = 0; - - my $memstat = $stat->{ram}; - - if ($status eq 'completed') { - my $delay = time() - $start; - if ($delay > 0) { - my $total = $memstat->{total} || 0; - my $avg_speed = render_bytes($total / $delay, 1); - my $downtime = $stat->{downtime} || 0; - $self->log('info', "average migration speed: $avg_speed/s - downtime $downtime ms"); - } - } - - if ($status eq 'failed' || $status eq 'cancelled') { - my $message = $stat->{'error-desc'} ? "$status - $stat->{'error-desc'}" : $status; - $self->log('info', "migration status error: $message"); - die "aborting\n" - } - - if ($status ne 'active') { - $self->log('info', "migration status: $status"); - last; - } - - if ($memstat->{transferred} ne $last_mem_transferred) { - my $trans = $memstat->{transferred} || 0; - my $rem = $memstat->{remaining} || 0; - my $total = $memstat->{total} || 0; - my $speed = ($memstat->{'pages-per-second'} // 0) * ($memstat->{'page-size'} // 0); - my $dirty_rate = ($memstat->{'dirty-pages-rate'} // 0) * ($memstat->{'page-size'} // 0); - - # reduce sleep if remainig memory is lower than the average transfer speed - $usleep = 100_000 if $avglstat && $rem < $avglstat; - - # also reduce loggin if we poll more frequent - my $should_log = $usleep > 100_000 ? 1 : ($i % 10) == 0; - - my $total_h = render_bytes($total, 1); - my $transferred_h = render_bytes($trans, 1); - my $speed_h = render_bytes($speed, 1); - - my $progress = "transferred $transferred_h of $total_h VM-state, ${speed_h}/s"; - - if ($dirty_rate > $speed) { - my $dirty_rate_h = render_bytes($dirty_rate, 1); - $progress .= ", VM dirties lots of memory: $dirty_rate_h/s"; - } - - $self->log('info', "migration $status, $progress") if $should_log; - - my $xbzrle = $stat->{"xbzrle-cache"} || {}; - my ($xbzrlebytes, $xbzrlepages) = $xbzrle->@{'bytes', 'pages'}; - if ($xbzrlebytes || $xbzrlepages) { - my $bytes_h = render_bytes($xbzrlebytes, 1); - - my $msg = "send updates to $xbzrlepages pages in $bytes_h encoded memory"; - - $msg .= sprintf(", cache-miss %.2f%%", $xbzrle->{'cache-miss-rate'} * 100) - if $xbzrle->{'cache-miss-rate'}; - - $msg .= ", overflow $xbzrle->{overflow}" if $xbzrle->{overflow}; - - $self->log('info', "xbzrle: $msg") if $should_log; - } - - if (($lastrem && $rem > $lastrem) || ($rem == 0)) { - $downtimecounter++; - } - $lastrem = $rem; - - if ($downtimecounter > 5) { - $downtimecounter = 0; - $migrate_downtime *= 2; - $self->log('info', "auto-increased downtime to continue migration: $migrate_downtime ms"); - eval { - # migrate-set-parameters does not touch values not - # specified, so this only changes downtime-limit - mon_cmd($vmid, "migrate-set-parameters", 'downtime-limit' => int($migrate_downtime)); - }; - $self->log('info', "migrate-set-parameters error: $@") if $@; - } - } - - $last_mem_transferred = $memstat->{transferred}; - } + live_migration($self, $vmid, $migrate_uri, $spice_port); if ($self->{storage_migration}) { # finish block-job with block-job-cancel, to disconnect source VM from NBD -- 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-25 16:52 [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu Alexandre Derumier 2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub Alexandre Derumier @ 2023-04-25 16:52 ` Alexandre Derumier 2023-04-26 13:14 ` Fabian Grünbichler 1 sibling, 1 reply; 11+ messages in thread From: Alexandre Derumier @ 2023-04-25 16:52 UTC (permalink / raw) To: pve-devel This patch add support for remote migration when target cpu model is different. The target vm is restart after the migration Signed-off-by: Alexandre Derumier <aderumier@odiso.com> --- PVE/API2/Qemu.pm | 18 ++++++++++++++++++ PVE/CLI/qm.pm | 6 ++++++ PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm index 587bb22..6703c87 100644 --- a/PVE/API2/Qemu.pm +++ b/PVE/API2/Qemu.pm @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({ optional => 1, default => 0, }, + 'target-cpu' => { + optional => 1, + description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.", + type => 'string', + format => 'pve-vm-cpu-conf', + }, 'target-storage' => get_standard_option('pve-targetstorage', { completion => \&PVE::QemuServer::complete_migration_storage, optional => 0, @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({ raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" }) if $@; + my $target_cpu = extract_param($param, 'target-cpu'); + die "remote migration requires explicit storage mapping!\n" if $storagemap->{identity}; $param->{storagemap} = $storagemap; $param->{bridgemap} = $bridgemap; + $param->{targetcpu} = $target_cpu; $param->{remote} = { conn => $conn_args, # re-use fingerprint for tunnel client => $api_client, @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({ PVE::QemuServer::nbd_stop($state->{vmid}); return; }, + 'restart' => sub { + PVE::QemuServer::vm_stop(undef, $state->{vmid}, 1, 1); + my $info = PVE::QemuServer::vm_start_nolock( + $state->{storecfg}, + $state->{vmid}, + $state->{conf}, + ); + return; + }, 'resume' => sub { if (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) { PVE::QemuServer::vm_resume($state->{vmid}, 1, 1); diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm index c3c2982..06c74c1 100755 --- a/PVE/CLI/qm.pm +++ b/PVE/CLI/qm.pm @@ -189,6 +189,12 @@ __PACKAGE__->register_method({ optional => 1, default => 0, }, + 'target-cpu' => { + optional => 1, + description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.", + type => 'string', + format => 'pve-vm-cpu-conf', + }, 'target-storage' => get_standard_option('pve-targetstorage', { completion => \&PVE::QemuServer::complete_migration_storage, optional => 0, diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index e182415..04f8053 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -731,6 +731,11 @@ sub cleanup_bitmaps { sub live_migration { my ($self, $vmid, $migrate_uri, $spice_port) = @_; + if($self->{opts}->{targetcpu}){ + $self->log('info', "target cpu is different - skip live migration."); + return; + } + my $conf = $self->{vmconf}; $self->log('info', "starting online/live migration on $migrate_uri"); @@ -995,6 +1000,7 @@ sub phase1_remote { my $remote_conf = PVE::QemuConfig->load_config($vmid); PVE::QemuConfig->update_volume_ids($remote_conf, $self->{volume_map}); + $remote_conf->{cpu} = $self->{opts}->{targetcpu}; my $bridges = map_bridges($remote_conf, $self->{opts}->{bridgemap}); for my $target (keys $bridges->%*) { for my $nic (keys $bridges->{$target}->%*) { @@ -1354,6 +1360,21 @@ sub phase2 { live_migration($self, $vmid, $migrate_uri, $spice_port); if ($self->{storage_migration}) { + + #freeze source vm io/s if target cpu is different (no livemigration) + if ($self->{opts}->{targetcpu}) { + my $agent_running = $self->{conf}->{agent} && PVE::QemuServer::qga_check_running($vmid); + if ($agent_running) { + print "freeze filesystem\n"; + eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); }; + die $@ if $@; + } else { + print "suspend vm\n"; + eval { PVE::QemuServer::vm_suspend($vmid, 1); }; + warn $@ if $@; + } + } + # finish block-job with block-job-cancel, to disconnect source VM from NBD # to avoid it trying to re-establish it. We are in blockjob ready state, # thus, this command changes to it to blockjob complete (see qapi docs) @@ -1608,6 +1629,10 @@ sub phase3_cleanup { # clear migrate lock if ($tunnel && $tunnel->{version} >= 2) { PVE::Tunnel::write_tunnel($tunnel, 10, "unlock"); + if ($self->{opts}->{targetcpu}) { + $self->log('info', "target cpu is different - restart target vm."); + PVE::Tunnel::write_tunnel($tunnel, 10, 'restart'); + } PVE::Tunnel::finish_tunnel($tunnel); } else { -- 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier @ 2023-04-26 13:14 ` Fabian Grünbichler 2023-04-27 5:50 ` DERUMIER, Alexandre 2023-09-28 14:58 ` DERUMIER, Alexandre 0 siblings, 2 replies; 11+ messages in thread From: Fabian Grünbichler @ 2023-04-26 13:14 UTC (permalink / raw) To: Proxmox VE development discussion On April 25, 2023 6:52 pm, Alexandre Derumier wrote: > This patch add support for remote migration when target > cpu model is different. > > The target vm is restart after the migration so this effectively introduces a new "hybrid" migration mode ;) the changes are a bit smaller than I expected (in part thanks to patch #1), which is good. there are semi-frequent requests for another variant (also applicable to containers) in the form of a two phase migration - storage migrate - stop guest - incremental storage migrate - start guest on target given that it might make sense to save-guard this implementation here, and maybe switch to a new "mode" parameter? online => switching CPU not allowed offline or however-we-call-this-new-mode (or in the future, two-phase-restart) => switching CPU allowed > > Signed-off-by: Alexandre Derumier <aderumier@odiso.com> > --- > PVE/API2/Qemu.pm | 18 ++++++++++++++++++ > PVE/CLI/qm.pm | 6 ++++++ > PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++ > 3 files changed, 49 insertions(+) > > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm > index 587bb22..6703c87 100644 > --- a/PVE/API2/Qemu.pm > +++ b/PVE/API2/Qemu.pm > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({ > optional => 1, > default => 0, > }, > + 'target-cpu' => { > + optional => 1, > + description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.", > + type => 'string', > + format => 'pve-vm-cpu-conf', > + }, > 'target-storage' => get_standard_option('pve-targetstorage', { > completion => \&PVE::QemuServer::complete_migration_storage, > optional => 0, > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({ > raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" }) > if $@; > > + my $target_cpu = extract_param($param, 'target-cpu'); this is okay > + > die "remote migration requires explicit storage mapping!\n" > if $storagemap->{identity}; > > $param->{storagemap} = $storagemap; > $param->{bridgemap} = $bridgemap; > + $param->{targetcpu} = $target_cpu; but this is a bit confusing with the variable/hash key naming ;) > $param->{remote} = { > conn => $conn_args, # re-use fingerprint for tunnel > client => $api_client, > @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({ > PVE::QemuServer::nbd_stop($state->{vmid}); > return; > }, > + 'restart' => sub { > + PVE::QemuServer::vm_stop(undef, $state->{vmid}, 1, 1); > + my $info = PVE::QemuServer::vm_start_nolock( > + $state->{storecfg}, > + $state->{vmid}, > + $state->{conf}, > + ); > + return; > + }, > 'resume' => sub { > if (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) { > PVE::QemuServer::vm_resume($state->{vmid}, 1, 1); > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm > index c3c2982..06c74c1 100755 > --- a/PVE/CLI/qm.pm > +++ b/PVE/CLI/qm.pm > @@ -189,6 +189,12 @@ __PACKAGE__->register_method({ > optional => 1, > default => 0, > }, > + 'target-cpu' => { > + optional => 1, > + description => "Target Emulated CPU model. For online migration, the storage is live migrate, but the memory migration is skipped and the target vm is restarted.", > + type => 'string', > + format => 'pve-vm-cpu-conf', > + }, > 'target-storage' => get_standard_option('pve-targetstorage', { > completion => \&PVE::QemuServer::complete_migration_storage, > optional => 0, > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm > index e182415..04f8053 100644 > --- a/PVE/QemuMigrate.pm > +++ b/PVE/QemuMigrate.pm > @@ -731,6 +731,11 @@ sub cleanup_bitmaps { > sub live_migration { > my ($self, $vmid, $migrate_uri, $spice_port) = @_; > > + if($self->{opts}->{targetcpu}){ > + $self->log('info', "target cpu is different - skip live migration."); > + return; > + } > + > my $conf = $self->{vmconf}; > > $self->log('info', "starting online/live migration on $migrate_uri"); > @@ -995,6 +1000,7 @@ sub phase1_remote { > my $remote_conf = PVE::QemuConfig->load_config($vmid); > PVE::QemuConfig->update_volume_ids($remote_conf, $self->{volume_map}); > > + $remote_conf->{cpu} = $self->{opts}->{targetcpu}; do we need permission checks here (or better, somewhere early on, for doing this here) > my $bridges = map_bridges($remote_conf, $self->{opts}->{bridgemap}); > for my $target (keys $bridges->%*) { > for my $nic (keys $bridges->{$target}->%*) { > @@ -1354,6 +1360,21 @@ sub phase2 { > live_migration($self, $vmid, $migrate_uri, $spice_port); > > if ($self->{storage_migration}) { > + > + #freeze source vm io/s if target cpu is different (no livemigration) > + if ($self->{opts}->{targetcpu}) { > + my $agent_running = $self->{conf}->{agent} && PVE::QemuServer::qga_check_running($vmid); > + if ($agent_running) { > + print "freeze filesystem\n"; > + eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); }; > + die $@ if $@; die here > + } else { > + print "suspend vm\n"; > + eval { PVE::QemuServer::vm_suspend($vmid, 1); }; > + warn $@ if $@; but warn here? I'd like some more rationale for these two variants, what are the pros and cons? should we make it configurable? > + } > + } > + > # finish block-job with block-job-cancel, to disconnect source VM from NBD > # to avoid it trying to re-establish it. We are in blockjob ready state, > # thus, this command changes to it to blockjob complete (see qapi docs) > @@ -1608,6 +1629,10 @@ sub phase3_cleanup { > # clear migrate lock > if ($tunnel && $tunnel->{version} >= 2) { > PVE::Tunnel::write_tunnel($tunnel, 10, "unlock"); > + if ($self->{opts}->{targetcpu}) { > + $self->log('info', "target cpu is different - restart target vm."); > + PVE::Tunnel::write_tunnel($tunnel, 10, 'restart'); > + } > > PVE::Tunnel::finish_tunnel($tunnel); > } else { > -- > 2.30.2 > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-26 13:14 ` Fabian Grünbichler @ 2023-04-27 5:50 ` DERUMIER, Alexandre 2023-04-27 7:32 ` Fabian Grünbichler 2023-09-28 14:58 ` DERUMIER, Alexandre 1 sibling, 1 reply; 11+ messages in thread From: DERUMIER, Alexandre @ 2023-04-27 5:50 UTC (permalink / raw) To: pve-devel Hi, Le mercredi 26 avril 2023 à 15:14 +0200, Fabian Grünbichler a écrit : > On April 25, 2023 6:52 pm, Alexandre Derumier wrote: > > This patch add support for remote migration when target > > cpu model is different. > > > > The target vm is restart after the migration > > so this effectively introduces a new "hybrid" migration mode ;) the > changes are a bit smaller than I expected (in part thanks to patch > #1), > which is good. > > there are semi-frequent requests for another variant (also applicable > to > containers) in the form of a two phase migration > - storage migrate > - stop guest > - incremental storage migrate > - start guest on target > But I'm not sure how to to an incremental storage migrate, without storage snapshot send|receiv. (so zfs && rbd could work). - Vm/ct is running - do a first snapshot + sync to target with zfs|rbd send|receive - stop the guest - do a second snapshot + incremental sync + sync to target with zfs|rbd send|receive - start the guest on remote (or maybe for vm, without snapshot, with a dirty bitmap ? But we need to be able to write the dirty map content to disk somewhere after vm stop, and reread it for the last increment ) - vm is running - create a dirty-bitmap and start sync with qemu-block-storage - stop the vm && save the dirty bitmap - reread the dirtymap && do incremental sync (with the new qemu-daemon- storage or starting the vm paused ? And currently we don't support yet offline storage migration. (BTW, This is also breaking migration with unused disk). I don't known if we can send send|receiv transfert through the tunnel ? (I never tested it) > given that it might make sense to save-guard this implementation > here, > and maybe switch to a new "mode" parameter? > > online => switching CPU not allowed > offline or however-we-call-this-new-mode (or in the future, two- > phase-restart) => switching CPU allowed > Yes, I was thinking about that too. Maybe not "offline", because maybe we want to implement a real offline mode later. But simply "restart" ? > > > > Signed-off-by: Alexandre Derumier <aderumier@odiso.com> > > --- > > PVE/API2/Qemu.pm | 18 ++++++++++++++++++ > > PVE/CLI/qm.pm | 6 ++++++ > > PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++ > > 3 files changed, 49 insertions(+) > > > > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm > > index 587bb22..6703c87 100644 > > --- a/PVE/API2/Qemu.pm > > +++ b/PVE/API2/Qemu.pm > > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({ > > optional => 1, > > default => 0, > > }, > > + 'target-cpu' => { > > + optional => 1, > > + description => "Target Emulated CPU model. For > > online migration, the storage is live migrate, but the memory > > migration is skipped and the target vm is restarted.", > > + type => 'string', > > + format => 'pve-vm-cpu-conf', > > + }, > > 'target-storage' => get_standard_option('pve- > > targetstorage', { > > completion => > > \&PVE::QemuServer::complete_migration_storage, > > optional => 0, > > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({ > > raise_param_exc({ 'target-bridge' => "failed to parse > > bridge map: $@" }) > > if $@; > > > > + my $target_cpu = extract_param($param, 'target-cpu'); > > this is okay > > > + > > die "remote migration requires explicit storage mapping!\n" > > if $storagemap->{identity}; > > > > $param->{storagemap} = $storagemap; > > $param->{bridgemap} = $bridgemap; > > + $param->{targetcpu} = $target_cpu; > > but this is a bit confusing with the variable/hash key naming ;) > > > $param->{remote} = { > > conn => $conn_args, # re-use fingerprint for tunnel > > client => $api_client, > > @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({ > > PVE::QemuServer::nbd_stop($state->{vmid}); > > return; > > }, > > + 'restart' => sub { > > + PVE::QemuServer::vm_stop(undef, $state->{vmid}, > > 1, 1); > > + my $info = PVE::QemuServer::vm_start_nolock( > > + $state->{storecfg}, > > + $state->{vmid}, > > + $state->{conf}, > > + ); > > + return; > > + }, > > 'resume' => sub { > > if > > (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) { > > PVE::QemuServer::vm_resume($state->{vmid}, > > 1, 1); > > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm > > index c3c2982..06c74c1 100755 > > --- a/PVE/CLI/qm.pm > > +++ b/PVE/CLI/qm.pm > > @@ -189,6 +189,12 @@ __PACKAGE__->register_method({ > > optional => 1, > > default => 0, > > }, > > + 'target-cpu' => { > > + optional => 1, > > + description => "Target Emulated CPU model. For > > online migration, the storage is live migrate, but the memory > > migration is skipped and the target vm is restarted.", > > + type => 'string', > > + format => 'pve-vm-cpu-conf', > > + }, > > 'target-storage' => get_standard_option('pve- > > targetstorage', { > > completion => > > \&PVE::QemuServer::complete_migration_storage, > > optional => 0, > > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm > > index e182415..04f8053 100644 > > --- a/PVE/QemuMigrate.pm > > +++ b/PVE/QemuMigrate.pm > > @@ -731,6 +731,11 @@ sub cleanup_bitmaps { > > sub live_migration { > > my ($self, $vmid, $migrate_uri, $spice_port) = @_; > > > > + if($self->{opts}->{targetcpu}){ > > + $self->log('info', "target cpu is different - skip live > > migration."); > > + return; > > + } > > + > > my $conf = $self->{vmconf}; > > > > $self->log('info', "starting online/live migration on > > $migrate_uri"); > > @@ -995,6 +1000,7 @@ sub phase1_remote { > > my $remote_conf = PVE::QemuConfig->load_config($vmid); > > PVE::QemuConfig->update_volume_ids($remote_conf, $self- > > >{volume_map}); > > > > + $remote_conf->{cpu} = $self->{opts}->{targetcpu}; > > do we need permission checks here (or better, somewhere early on, for > doing this here) > > > my $bridges = map_bridges($remote_conf, $self->{opts}- > > >{bridgemap}); > > for my $target (keys $bridges->%*) { > > for my $nic (keys $bridges->{$target}->%*) { > > @@ -1354,6 +1360,21 @@ sub phase2 { > > live_migration($self, $vmid, $migrate_uri, $spice_port); > > > > if ($self->{storage_migration}) { > > + > > + #freeze source vm io/s if target cpu is different (no > > livemigration) > > + if ($self->{opts}->{targetcpu}) { > > + my $agent_running = $self->{conf}->{agent} && > > PVE::QemuServer::qga_check_running($vmid); > > + if ($agent_running) { > > + print "freeze filesystem\n"; > > + eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); }; > > + die $@ if $@; > > die here > > > + } else { > > + print "suspend vm\n"; > > + eval { PVE::QemuServer::vm_suspend($vmid, 1); }; > > + warn $@ if $@; > > but warn here? > > I'd like some more rationale for these two variants, what are the > pros > and cons? should we make it configurable? > > > + } > > + } > > + > > # finish block-job with block-job-cancel, to disconnect > > source VM from NBD > > # to avoid it trying to re-establish it. We are in blockjob > > ready state, > > # thus, this command changes to it to blockjob complete > > (see qapi docs) > > @@ -1608,6 +1629,10 @@ sub phase3_cleanup { > > # clear migrate lock > > if ($tunnel && $tunnel->{version} >= 2) { > > PVE::Tunnel::write_tunnel($tunnel, 10, "unlock"); > > + if ($self->{opts}->{targetcpu}) { > > + $self->log('info', "target cpu is different - restart > > target vm."); > > + PVE::Tunnel::write_tunnel($tunnel, 10, 'restart'); > > + } > > > > PVE::Tunnel::finish_tunnel($tunnel); > > } else { > > -- > > 2.30.2 > > > > > > _______________________________________________ > > pve-devel mailing list > > pve-devel@lists.proxmox.com > > https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU > > > > > > > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-27 5:50 ` DERUMIER, Alexandre @ 2023-04-27 7:32 ` Fabian Grünbichler 2023-04-28 6:43 ` DERUMIER, Alexandre 0 siblings, 1 reply; 11+ messages in thread From: Fabian Grünbichler @ 2023-04-27 7:32 UTC (permalink / raw) To: Proxmox VE development discussion On April 27, 2023 7:50 am, DERUMIER, Alexandre wrote: > Hi, > > Le mercredi 26 avril 2023 à 15:14 +0200, Fabian Grünbichler a écrit : >> On April 25, 2023 6:52 pm, Alexandre Derumier wrote: >> > This patch add support for remote migration when target >> > cpu model is different. >> > >> > The target vm is restart after the migration >> >> so this effectively introduces a new "hybrid" migration mode ;) the >> changes are a bit smaller than I expected (in part thanks to patch >> #1), >> which is good. >> >> there are semi-frequent requests for another variant (also applicable >> to >> containers) in the form of a two phase migration >> - storage migrate >> - stop guest >> - incremental storage migrate >> - start guest on target >> > > But I'm not sure how to to an incremental storage migrate, without > storage snapshot send|receiv. (so zfs && rbd could work). > > - Vm/ct is running > - do a first snapshot + sync to target with zfs|rbd send|receive > - stop the guest > - do a second snapshot + incremental sync + sync to target with zfs|rbd > send|receive > - start the guest on remote > > > (or maybe for vm, without snapshot, with a dirty bitmap ? But we need > to be able to write the dirty map content to disk somewhere after vm > stop, and reread it for the last increment ) theoretically, we could support such a mode for non-snapshot storages by using bitmaps+block-mirror, yes. either with a target VM, or with qemu-storage-daemon on the target node exposing the target volumes > - vm is running > - create a dirty-bitmap and start sync with qemu-block-storage > - stop the vm && save the dirty bitmap > - reread the dirtymap && do incremental sync (with the new qemu-daemon- > storage or starting the vm paused ? stop here could also just mean stop the guest OS, but leave the process for the incremental sync, so it would not need persistent bitmap support. > And currently we don't support yet offline storage migration. (BTW, > This is also breaking migration with unused disk). > I don't known if we can send send|receiv transfert through the tunnel ? > (I never tested it) we do, but maybe you tested with RBD which doesn't support storage migration yet? withing a cluster it doesn't need to, since it's a shared storage, but between cluster we need to implement it (it's on my TODO list and shouldn't be too hard since there is 'rbd export/import'). >> given that it might make sense to save-guard this implementation >> here, >> and maybe switch to a new "mode" parameter? >> >> online => switching CPU not allowed >> offline or however-we-call-this-new-mode (or in the future, two- >> phase-restart) => switching CPU allowed >> > > Yes, I was thinking about that too. > Maybe not "offline", because maybe we want to implement a real offline > mode later. > But simply "restart" ? no, I meant moving the existing --online switch to a new mode parameter, then we'd have "online" and "offline", and then add your new mode on top "however-we-call-this-new-mode", and then we could in the future also add "two-phase-restart" for the sync-twice mode I described :) target-cpu would of course also be supported for the (existing) offline mode, since it just needs to adapt the target-cpu in the config. the main thing I'd want to avoid is somebody accidentally setting "target-cpu", not knowing/noticing that that entails what amounts to a reset of the VM as part of the migration.. there were a few things down below that might also be worthy of discussion. I also wonder whether the two variants of "freeze FS" and "suspend without state" are enough - that only ensures that no more I/O happens so the volumes are bitwise identical, but shouldn't we also at least have the option of doing a clean shutdown at that point so that applications can serialize/flush their state properly and that gets synced across as well? else this is the equivalent of cutting the power cord, which might not be a good fit for all use cases ;) >> > >> > Signed-off-by: Alexandre Derumier <aderumier@odiso.com> >> > --- >> > PVE/API2/Qemu.pm | 18 ++++++++++++++++++ >> > PVE/CLI/qm.pm | 6 ++++++ >> > PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++ >> > 3 files changed, 49 insertions(+) >> > >> > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm >> > index 587bb22..6703c87 100644 >> > --- a/PVE/API2/Qemu.pm >> > +++ b/PVE/API2/Qemu.pm >> > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({ >> > optional => 1, >> > default => 0, >> > }, >> > + 'target-cpu' => { >> > + optional => 1, >> > + description => "Target Emulated CPU model. For >> > online migration, the storage is live migrate, but the memory >> > migration is skipped and the target vm is restarted.", >> > + type => 'string', >> > + format => 'pve-vm-cpu-conf', >> > + }, >> > 'target-storage' => get_standard_option('pve- >> > targetstorage', { >> > completion => >> > \&PVE::QemuServer::complete_migration_storage, >> > optional => 0, >> > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({ >> > raise_param_exc({ 'target-bridge' => "failed to parse >> > bridge map: $@" }) >> > if $@; >> > >> > + my $target_cpu = extract_param($param, 'target-cpu'); >> >> this is okay >> >> > + >> > die "remote migration requires explicit storage mapping!\n" >> > if $storagemap->{identity}; >> > >> > $param->{storagemap} = $storagemap; >> > $param->{bridgemap} = $bridgemap; >> > + $param->{targetcpu} = $target_cpu; >> >> but this is a bit confusing with the variable/hash key naming ;) >> >> > $param->{remote} = { >> > conn => $conn_args, # re-use fingerprint for tunnel >> > client => $api_client, >> > @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({ >> > PVE::QemuServer::nbd_stop($state->{vmid}); >> > return; >> > }, >> > + 'restart' => sub { >> > + PVE::QemuServer::vm_stop(undef, $state->{vmid}, >> > 1, 1); >> > + my $info = PVE::QemuServer::vm_start_nolock( >> > + $state->{storecfg}, >> > + $state->{vmid}, >> > + $state->{conf}, >> > + ); >> > + return; >> > + }, >> > 'resume' => sub { >> > if >> > (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) { >> > PVE::QemuServer::vm_resume($state->{vmid}, >> > 1, 1); >> > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm >> > index c3c2982..06c74c1 100755 >> > --- a/PVE/CLI/qm.pm >> > +++ b/PVE/CLI/qm.pm >> > @@ -189,6 +189,12 @@ __PACKAGE__->register_method({ >> > optional => 1, >> > default => 0, >> > }, >> > + 'target-cpu' => { >> > + optional => 1, >> > + description => "Target Emulated CPU model. For >> > online migration, the storage is live migrate, but the memory >> > migration is skipped and the target vm is restarted.", >> > + type => 'string', >> > + format => 'pve-vm-cpu-conf', >> > + }, >> > 'target-storage' => get_standard_option('pve- >> > targetstorage', { >> > completion => >> > \&PVE::QemuServer::complete_migration_storage, >> > optional => 0, >> > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm >> > index e182415..04f8053 100644 >> > --- a/PVE/QemuMigrate.pm >> > +++ b/PVE/QemuMigrate.pm >> > @@ -731,6 +731,11 @@ sub cleanup_bitmaps { >> > sub live_migration { >> > my ($self, $vmid, $migrate_uri, $spice_port) = @_; >> > >> > + if($self->{opts}->{targetcpu}){ >> > + $self->log('info', "target cpu is different - skip live >> > migration."); >> > + return; >> > + } >> > + >> > my $conf = $self->{vmconf}; >> > >> > $self->log('info', "starting online/live migration on >> > $migrate_uri"); >> > @@ -995,6 +1000,7 @@ sub phase1_remote { >> > my $remote_conf = PVE::QemuConfig->load_config($vmid); >> > PVE::QemuConfig->update_volume_ids($remote_conf, $self- >> > >{volume_map}); >> > >> > + $remote_conf->{cpu} = $self->{opts}->{targetcpu}; >> >> do we need permission checks here (or better, somewhere early on, for >> doing this here) >> >> > my $bridges = map_bridges($remote_conf, $self->{opts}- >> > >{bridgemap}); >> > for my $target (keys $bridges->%*) { >> > for my $nic (keys $bridges->{$target}->%*) { >> > @@ -1354,6 +1360,21 @@ sub phase2 { >> > live_migration($self, $vmid, $migrate_uri, $spice_port); >> > >> > if ($self->{storage_migration}) { >> > + >> > + #freeze source vm io/s if target cpu is different (no >> > livemigration) >> > + if ($self->{opts}->{targetcpu}) { >> > + my $agent_running = $self->{conf}->{agent} && >> > PVE::QemuServer::qga_check_running($vmid); >> > + if ($agent_running) { >> > + print "freeze filesystem\n"; >> > + eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); }; >> > + die $@ if $@; >> >> die here >> >> > + } else { >> > + print "suspend vm\n"; >> > + eval { PVE::QemuServer::vm_suspend($vmid, 1); }; >> > + warn $@ if $@; >> >> but warn here? >> >> I'd like some more rationale for these two variants, what are the >> pros >> and cons? should we make it configurable? >> > + } >> > + } >> > + >> > # finish block-job with block-job-cancel, to disconnect >> > source VM from NBD >> > # to avoid it trying to re-establish it. We are in blockjob >> > ready state, >> > # thus, this command changes to it to blockjob complete >> > (see qapi docs) >> > @@ -1608,6 +1629,10 @@ sub phase3_cleanup { >> > # clear migrate lock >> > if ($tunnel && $tunnel->{version} >= 2) { >> > PVE::Tunnel::write_tunnel($tunnel, 10, "unlock"); >> > + if ($self->{opts}->{targetcpu}) { >> > + $self->log('info', "target cpu is different - restart >> > target vm."); >> > + PVE::Tunnel::write_tunnel($tunnel, 10, 'restart'); >> > + } >> > >> > PVE::Tunnel::finish_tunnel($tunnel); >> > } else { >> > -- >> > 2.30.2 >> > >> > >> > _______________________________________________ >> > pve-devel mailing list >> > pve-devel@lists.proxmox.com >> > https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU >> > >> > >> > >> >> >> _______________________________________________ >> pve-devel mailing list >> pve-devel@lists.proxmox.com >> https://antiphishing.cetsi.fr/proxy/v3?i=Zk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0&r=bHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g&f=SlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=XRKU >> > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-27 7:32 ` Fabian Grünbichler @ 2023-04-28 6:43 ` DERUMIER, Alexandre 2023-04-28 9:12 ` Fabian Grünbichler 0 siblings, 1 reply; 11+ messages in thread From: DERUMIER, Alexandre @ 2023-04-28 6:43 UTC (permalink / raw) To: pve-devel > > >> And currently we don't support yet offline storage migration. (BTW, >> This is also breaking migration with unused disk). >> I don't known if we can send send|receiv transfert through the tunnel ? >> (I never tested it) > we do, but maybe you tested with RBD which doesn't support storage > migration yet? withing a cluster it doesn't need to, since it's a > shared > storage, but between cluster we need to implement it (it's on my TODO > list and shouldn't be too hard since there is 'rbd export/import'). > Yes, this was with an unused rbd device indeed. (Another way could be to implement qemu-storage-daemon (never tested it) for offline sync with any storage, like lvm) Also cloud-init drive seem to be unmigratable currently. (I wonder if we couldn't simply regenerate it on target, as now we have cloud-init pending section, we can correctly generate the cloudinit with current running config). > > > given that it might make sense to save-guard this implementation > > > here, > > > and maybe switch to a new "mode" parameter? > > > > > > online => switching CPU not allowed > > > offline or however-we-call-this-new-mode (or in the future, two- > > > phase-restart) => switching CPU allowed > > > > > > > Yes, I was thinking about that too. > > Maybe not "offline", because maybe we want to implement a real > > offline > > mode later. > > But simply "restart" ? > > no, I meant moving the existing --online switch to a new mode > parameter, > then we'd have "online" and "offline", and then add your new mode on > top > "however-we-call-this-new-mode", and then we could in the future also > add "two-phase-restart" for the sync-twice mode I described :) > > target-cpu would of course also be supported for the (existing) > offline > mode, since it just needs to adapt the target-cpu in the config. > > the main thing I'd want to avoid is somebody accidentally setting > "target-cpu", not knowing/noticing that that entails what amounts to > a > reset of the VM as part of the migration.. > Yes, that what I had understanded ;) It's was more about "offline" term, because we don't offline the source vm until the disk migration is finished. (to reduce downtime) More like "online-restart" instead "offline". Offline for me , is really, we shut the vm, then do the disk migration. > there were a few things down below that might also be worthy of > discussion. I also wonder whether the two variants of "freeze FS" and > "suspend without state" are enough - that only ensures that no more > I/O > happens so the volumes are bitwise identical, but shouldn't we also > at > least have the option of doing a clean shutdown at that point so that > applications can serialize/flush their state properly and that gets > synced across as well? else this is the equivalent of cutting the > power > cord, which might not be a good fit for all use cases ;) > I had try the clean shutdown in my v1 patch https://lists.proxmox.com/pipermail/pve-devel/2023-March/056291.html (without doing the block-job-complete) in phase3, and I have fs coruption sometime. Not sure why exactly (Maybe os didn't have correctly shutdown or maybe some datas in the buffer ?) Maybe doing the block-job-complete before should make it safe. (transfert access to the nbd , then do the clean shutdown). I'll give a try in the V3. I just wonder if we can add a new param, like: --online --fsfreeze --online --shutdown --online --2phase-restart ? (I'm currently migrating a lot of vm between an old intel cluster to the new amd cluster, on different datacenter, with a different ceph cluster, so I can still do real production tests) > > > > > > > > Signed-off-by: Alexandre Derumier <aderumier@odiso.com> > > > > --- > > > > PVE/API2/Qemu.pm | 18 ++++++++++++++++++ > > > > PVE/CLI/qm.pm | 6 ++++++ > > > > PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++ > > > > 3 files changed, 49 insertions(+) > > > > > > > > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm > > > > index 587bb22..6703c87 100644 > > > > --- a/PVE/API2/Qemu.pm > > > > +++ b/PVE/API2/Qemu.pm > > > > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({ > > > > optional => 1, > > > > default => 0, > > > > }, > > > > + 'target-cpu' => { > > > > + optional => 1, > > > > + description => "Target Emulated CPU model. For > > > > online migration, the storage is live migrate, but the memory > > > > migration is skipped and the target vm is restarted.", > > > > + type => 'string', > > > > + format => 'pve-vm-cpu-conf', > > > > + }, > > > > 'target-storage' => get_standard_option('pve- > > > > targetstorage', { > > > > completion => > > > > \&PVE::QemuServer::complete_migration_storage, > > > > optional => 0, > > > > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({ > > > > raise_param_exc({ 'target-bridge' => "failed to parse > > > > bridge map: $@" }) > > > > if $@; > > > > > > > > + my $target_cpu = extract_param($param, 'target-cpu'); > > > > > > this is okay > > > > > > > + > > > > die "remote migration requires explicit storage > > > > mapping!\n" > > > > if $storagemap->{identity}; > > > > > > > > $param->{storagemap} = $storagemap; > > > > $param->{bridgemap} = $bridgemap; > > > > + $param->{targetcpu} = $target_cpu; > > > > > > but this is a bit confusing with the variable/hash key naming ;) > > > > > > > $param->{remote} = { > > > > conn => $conn_args, # re-use fingerprint for tunnel > > > > client => $api_client, > > > > @@ -5604,6 +5613,15 @@ __PACKAGE__->register_method({ > > > > PVE::QemuServer::nbd_stop($state->{vmid}); > > > > return; > > > > }, > > > > + 'restart' => sub { > > > > + PVE::QemuServer::vm_stop(undef, $state- > > > > >{vmid}, > > > > 1, 1); > > > > + my $info = > > > > PVE::QemuServer::vm_start_nolock( > > > > + $state->{storecfg}, > > > > + $state->{vmid}, > > > > + $state->{conf}, > > > > + ); > > > > + return; > > > > + }, > > > > 'resume' => sub { > > > > if > > > > (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) > > > > { > > > > PVE::QemuServer::vm_resume($state- > > > > >{vmid}, > > > > 1, 1); > > > > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm > > > > index c3c2982..06c74c1 100755 > > > > --- a/PVE/CLI/qm.pm > > > > +++ b/PVE/CLI/qm.pm > > > > @@ -189,6 +189,12 @@ __PACKAGE__->register_method({ > > > > optional => 1, > > > > default => 0, > > > > }, > > > > + 'target-cpu' => { > > > > + optional => 1, > > > > + description => "Target Emulated CPU model. For > > > > online migration, the storage is live migrate, but the memory > > > > migration is skipped and the target vm is restarted.", > > > > + type => 'string', > > > > + format => 'pve-vm-cpu-conf', > > > > + }, > > > > 'target-storage' => get_standard_option('pve- > > > > targetstorage', { > > > > completion => > > > > \&PVE::QemuServer::complete_migration_storage, > > > > optional => 0, > > > > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm > > > > index e182415..04f8053 100644 > > > > --- a/PVE/QemuMigrate.pm > > > > +++ b/PVE/QemuMigrate.pm > > > > @@ -731,6 +731,11 @@ sub cleanup_bitmaps { > > > > sub live_migration { > > > > my ($self, $vmid, $migrate_uri, $spice_port) = @_; > > > > > > > > + if($self->{opts}->{targetcpu}){ > > > > + $self->log('info', "target cpu is different - skip > > > > live > > > > migration."); > > > > + return; > > > > + } > > > > + > > > > my $conf = $self->{vmconf}; > > > > > > > > $self->log('info', "starting online/live migration on > > > > $migrate_uri"); > > > > @@ -995,6 +1000,7 @@ sub phase1_remote { > > > > my $remote_conf = PVE::QemuConfig->load_config($vmid); > > > > PVE::QemuConfig->update_volume_ids($remote_conf, $self- > > > > > {volume_map}); > > > > > > > > + $remote_conf->{cpu} = $self->{opts}->{targetcpu}; > > > > > > do we need permission checks here (or better, somewhere early on, > > > for > > > doing this here) > > > > > > > my $bridges = map_bridges($remote_conf, $self->{opts}- > > > > > {bridgemap}); > > > > for my $target (keys $bridges->%*) { > > > > for my $nic (keys $bridges->{$target}->%*) { > > > > @@ -1354,6 +1360,21 @@ sub phase2 { > > > > live_migration($self, $vmid, $migrate_uri, $spice_port); > > > > > > > > if ($self->{storage_migration}) { > > > > + > > > > + #freeze source vm io/s if target cpu is different (no > > > > livemigration) > > > > + if ($self->{opts}->{targetcpu}) { > > > > + my $agent_running = $self->{conf}->{agent} && > > > > PVE::QemuServer::qga_check_running($vmid); > > > > + if ($agent_running) { > > > > + print "freeze filesystem\n"; > > > > + eval { mon_cmd($vmid, "guest-fsfreeze-freeze"); > > > > }; > > > > + die $@ if $@; > > > > > > die here > > > > > > > + } else { > > > > + print "suspend vm\n"; > > > > + eval { PVE::QemuServer::vm_suspend($vmid, 1); > > > > }; > > > > + warn $@ if $@; > > > > > > but warn here? > > > > > > I'd like some more rationale for these two variants, what are the > > > pros > > > and cons? should we make it configurable? > > > > + } > > > > + } > > > > + > > > > # finish block-job with block-job-cancel, to disconnect > > > > source VM from NBD > > > > # to avoid it trying to re-establish it. We are in > > > > blockjob > > > > ready state, > > > > # thus, this command changes to it to blockjob complete > > > > (see qapi docs) > > > > @@ -1608,6 +1629,10 @@ sub phase3_cleanup { > > > > # clear migrate lock > > > > if ($tunnel && $tunnel->{version} >= 2) { > > > > PVE::Tunnel::write_tunnel($tunnel, 10, "unlock"); > > > > + if ($self->{opts}->{targetcpu}) { > > > > + $self->log('info', "target cpu is different - > > > > restart > > > > target vm."); > > > > + PVE::Tunnel::write_tunnel($tunnel, 10, 'restart'); > > > > + } > > > > > > > > PVE::Tunnel::finish_tunnel($tunnel); > > > > } else { > > > > -- > > > > 2.30.2 > > > > > > > > > > > > _______________________________________________ > > > > pve-devel mailing list > > > > pve-devel@lists.proxmox.com > > > > https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//antiphishing.cetsi.fr/proxy/v3%3Fi%3DZk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0%26r%3DbHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g%26f%3DSlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg%26u%3Dhttps%253A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel%26k%3DXRKU&k=F1is > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > pve-devel mailing list > > > pve-devel@lists.proxmox.com > > > https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//antiphishing.cetsi.fr/proxy/v3%3Fi%3DZk92VEFKaGQ4Ums4cnZEUWMTpfHaXFQGRw1_CnOoOH0%26r%3DbHA1dGV3NWJQVUloaWNFUZPm0fiiBviaiy_RDav2GQ1U4uy6lsDDv3uBszpvvWYQN5FqKqFD6WPYupfAUP1c9g%26f%3DSlhDbE9uS2laS2JaZFpNWvmsxai1zlJP9llgnl5HIv-4jAji8Dh2BQawzxID5bzr6Uv-3EQd-eluQbsPfcUOTg%26u%3Dhttps%253A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel%26k%3DXRKU&k=F1is > > > > > > > _______________________________________________ > > pve-devel mailing list > > pve-devel@lists.proxmox.com > > https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=F1is > > > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://antiphishing.cetsi.fr/proxy/v3?i=MlZSTzBhZFZ6Nzl4c3EyN7fbSKDePLMxi5u5_onpAoI&r=cm1qVmRYUWk2WXhYZVFHWA0PXtTaYxz7-FIOTkZBm34_dHdSch-gXn7ST9eGhQLN&f=S1Zkd042VWdrZG5qQUxxWk5ps4t67kNuHsBZzdzhpquLKuXqTZLIq2K1DfKr9N61yBafm7AuAITd6bHtRU4zEQ&u=https%3A//lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel&k=F1is ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-28 6:43 ` DERUMIER, Alexandre @ 2023-04-28 9:12 ` Fabian Grünbichler 2023-04-29 7:57 ` Thomas Lamprecht 0 siblings, 1 reply; 11+ messages in thread From: Fabian Grünbichler @ 2023-04-28 9:12 UTC (permalink / raw) To: Proxmox VE development discussion On April 28, 2023 8:43 am, DERUMIER, Alexandre wrote: >> > >>> And currently we don't support yet offline storage migration. (BTW, >>> This is also breaking migration with unused disk). >>> I don't known if we can send send|receiv transfert through the > tunnel ? >>> (I never tested it) > >> we do, but maybe you tested with RBD which doesn't support storage >> migration yet? withing a cluster it doesn't need to, since it's a >> shared >> storage, but between cluster we need to implement it (it's on my TODO >> list and shouldn't be too hard since there is 'rbd export/import'). >> > Yes, this was with an unused rbd device indeed. > (Another way could be to implement qemu-storage-daemon (never tested > it) for offline sync with any storage, like lvm) > > Also cloud-init drive seem to be unmigratable currently. (I wonder if > we couldn't simply regenerate it on target, as now we have cloud-init > pending section, we can correctly generate the cloudinit with current > running config). > > > >> > > given that it might make sense to save-guard this implementation >> > > here, >> > > and maybe switch to a new "mode" parameter? >> > > >> > > online => switching CPU not allowed >> > > offline or however-we-call-this-new-mode (or in the future, two- >> > > phase-restart) => switching CPU allowed >> > > >> > >> > Yes, I was thinking about that too. >> > Maybe not "offline", because maybe we want to implement a real >> > offline >> > mode later. >> > But simply "restart" ? >> >> no, I meant moving the existing --online switch to a new mode >> parameter, >> then we'd have "online" and "offline", and then add your new mode on >> top >> "however-we-call-this-new-mode", and then we could in the future also >> add "two-phase-restart" for the sync-twice mode I described :) >> >> target-cpu would of course also be supported for the (existing) >> offline >> mode, since it just needs to adapt the target-cpu in the config. >> >> the main thing I'd want to avoid is somebody accidentally setting >> "target-cpu", not knowing/noticing that that entails what amounts to >> a >> reset of the VM as part of the migration.. >> > Yes, that what I had understanded > ;) > > It's was more about "offline" term, because we don't offline the source > vm until the disk migration is finished. (to reduce downtime) > More like "online-restart" instead "offline". > > Offline for me , is really, we shut the vm, then do the disk migration. hmm, I guess how you see it. for me, online means without interruption, anything else is offline :) but yeah, naming is hard, as always ;) >> there were a few things down below that might also be worthy of >> discussion. I also wonder whether the two variants of "freeze FS" and >> "suspend without state" are enough - that only ensures that no more >> I/O >> happens so the volumes are bitwise identical, but shouldn't we also >> at >> least have the option of doing a clean shutdown at that point so that >> applications can serialize/flush their state properly and that gets >> synced across as well? else this is the equivalent of cutting the >> power >> cord, which might not be a good fit for all use cases ;) >> > I had try the clean shutdown in my v1 patch > https://lists.proxmox.com/pipermail/pve-devel/2023-March/056291.html > (without doing the block-job-complete) in phase3, and I have fs > coruption sometime. > Not sure why exactly (Maybe os didn't have correctly shutdown or maybe > some datas in the buffer ?) > Maybe doing the block-job-complete before should make it safe. > (transfert access to the nbd , then do the clean shutdown). possibly we need a special "shutdown guest, but leave qemu running" way of shutting down (so that the guest and any applications within can do their thing, and the block job can transfer all the delta across). completing or cancelling the block job before the guest has shut down would mean the source and target are not consistent (since shutdown can change the disk content, and that would then not be mirrored anymore?), so I don't see any way that that could be an improvement. it would mean that starting the shutdown is already the point of no return - cancelling before would mean writes are not transferred to the target, completing before would mean writes are not written to the source anymore, so we can't fallback to the source node in error handling. I guess we could have to approaches: A - freeze or suspend (depending on QGA availability), then complete block job and (re)start target VM B - shutdown guest OS, complete, then exit source VM and (re)start target VM as always, there's a tradeoff there - A is faster, but less consistent from the guests point of view (somwhat similar to pulling the power cable). B can take a while (== service downtime!), but it has the same semantics as a reboot. there are also IMHO multiple ways to think about the target side: A start VM in migration mode, but kill it without ever doing any migration, then start it again with modified config (current approach) B start VM paused (like when doing a backup of a stopped VM, without incoming migration), but with overridden CPU parameter, then just 'resume' it when the block migration is finished C don't start a VM at all, just the block devices via qemu-storage-daemon for the block migration, then do a regular start after the block migration and config update are done B has the advantage over A that we don't risk the VM not being able to restart (e.g., because of a race for memory or pass-through resources), and also the resume should be (depending on exact environment possibly quite a bit) faster than kill+start C has the advantage over A and B that the migration itself is cheaper resource-wise, but the big downside that we don't even know if the VM is startable on the target node, and of course, it's a lot more code to write. possibly I just included it because I am looking for an excuse to play around with qemu-storage-daemon - it's probably the least relevant variant for now ;) > > I'll give a try in the V3. > > I just wonder if we can add a new param, like: > > --online --fsfreeze > > --online --shutdown > > --online --2phase-restart that would also be an option. not sure by heart if it's possible to make --online into a property string that is backwards compatible with the "plain boolean" option? if so, we could do --online [mode=live,qga,suspend,shutdown,2phase,..] with live being the default (not supporting target-cpu) and qga,suspend,shutdown all handling target-cpu (2phase just included for completeness sake) alternative, if that doesn't work, having --online [--online-mode live,qga,suspend,..] would be my second choice I guess, if we are reasonable sure that all the possible extensions would be for running VMs only. the only thing counter to that that I can think of would be storage migration using qemu-storage-daemon (e.g., like you said, to somehow bolt on incremental support using persistent bitmaps for storages/image formats that don't support that otherwise), and there I am not even sure whether that couldn't be somehow handled in pve-storage anyway > (I'm currently migrating a lot of vm between an old intel cluster to > the new amd cluster, on different datacenter, with a different ceph > cluster, so I can still do real production tests) technically, target-cpu might also be a worthwhile feature for heterogenous clusters where a proper/full live migration is not possible for certain node/CPU combinations.. we do already update volume IDs when using 'targetstorage', so also updating the CPU should be doable there as well. using the still experimental remote migration as a field for evaluation is fine, just something to keep in mind while thinking about options, so that we don't accidentally maneuver ourselves into a corner that makes that part impossible :) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-28 9:12 ` Fabian Grünbichler @ 2023-04-29 7:57 ` Thomas Lamprecht 2023-05-02 8:30 ` Fabian Grünbichler 0 siblings, 1 reply; 11+ messages in thread From: Thomas Lamprecht @ 2023-04-29 7:57 UTC (permalink / raw) To: Proxmox VE development discussion, Fabian Grünbichler Am 28/04/2023 um 11:12 schrieb Fabian Grünbichler: >> It's was more about "offline" term, because we don't offline the source >> vm until the disk migration is finished. (to reduce downtime) >> More like "online-restart" instead "offline". >> >> Offline for me , is really, we shut the vm, then do the disk migration. > hmm, I guess how you see it. for me, online means without interruption, > anything else is offline 😄 but yeah, naming is hard, as always 😉 FWIW, in Proxmox Container land that's currently basically the "most online" it gets, and there it's named "restore migration" – at least if we go for the "clean reboot for actual moving the guest over" approach. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-29 7:57 ` Thomas Lamprecht @ 2023-05-02 8:30 ` Fabian Grünbichler 0 siblings, 0 replies; 11+ messages in thread From: Fabian Grünbichler @ 2023-05-02 8:30 UTC (permalink / raw) To: Proxmox VE development discussion, Thomas Lamprecht On April 29, 2023 9:57 am, Thomas Lamprecht wrote: > Am 28/04/2023 um 11:12 schrieb Fabian Grünbichler: >>> It's was more about "offline" term, because we don't offline the source >>> vm until the disk migration is finished. (to reduce downtime) >>> More like "online-restart" instead "offline". >>> >>> Offline for me , is really, we shut the vm, then do the disk migration. >> hmm, I guess how you see it. for me, online means without interruption, >> anything else is offline 😄 but yeah, naming is hard, as always 😉 > > FWIW, in Proxmox Container land that's currently basically the "most online" > it gets, and there it's named "restore migration" – at least if we go for the > "clean reboot for actual moving the guest over" approach. "restart", you meant? yes, but it's explicitly not "online", it's a second parameter besides that called "restart" that cannot be combined with "online" (nothing can, since setting "online" leads to a hard error if the VM is running). it also does the following: - stop CT on source node (if running) - storage migration (via pve-storage) - start CT again on target node (if previously running) similar to this series, but not quite: - start storage migration (via pve-storage for unused/.., live via qemu for currently used volumes) - wait for storage migration convergence - stop VM on source node / complete block job (details still to be hashed out) - start VM on target node so naming what this series does "restart" might be confusing, since the most fundamental part is different (the downtime is only for the restart part, as opposed to for restart+storage migration). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param 2023-04-26 13:14 ` Fabian Grünbichler 2023-04-27 5:50 ` DERUMIER, Alexandre @ 2023-09-28 14:58 ` DERUMIER, Alexandre 1 sibling, 0 replies; 11+ messages in thread From: DERUMIER, Alexandre @ 2023-09-28 14:58 UTC (permalink / raw) To: f.gruenbichler; +Cc: pve-devel Le mercredi 26 avril 2023 à 15:14 +0200, Fabian Grünbichler a écrit : > On April 25, 2023 6:52 pm, Alexandre Derumier wrote: > > This patch add support for remote migration when target > > cpu model is different. > > > > The target vm is restart after the migration > > so this effectively introduces a new "hybrid" migration mode ;) the > changes are a bit smaller than I expected (in part thanks to patch > #1), > which is good. > > there are semi-frequent requests for another variant (also applicable > to > containers) in the form of a two phase migration > - storage migrate > - stop guest > - incremental storage migrate > - start guest on target > > given that it might make sense to save-guard this implementation > here, > and maybe switch to a new "mode" parameter? > I have implemented in v3 a working switch to remote nbd. so, after the disk migration, we do a block-job-complete, source vm is still running and now is running over nbd through the target-vm. Then the source vm is shutdown, flushing last pending writes through nbd. then the target vm is restarted > online => switching CPU not allowed > offline or however-we-call-this-new-mode (or in the future, two- > phase-restart) => switching CPU allowed > > > Still unsure about it, I have added an extra flag in v3 "-target- reboot" - online : check if source vm is online - target-cpu: change the targetcpu. (only change value on targetvm) - target-reboot: skip live migration, do shutdown of source vm and restart of target vm. > > Signed-off-by: Alexandre Derumier <aderumier@odiso.com> > > --- > > PVE/API2/Qemu.pm | 18 ++++++++++++++++++ > > PVE/CLI/qm.pm | 6 ++++++ > > PVE/QemuMigrate.pm | 25 +++++++++++++++++++++++++ > > 3 files changed, 49 insertions(+) > > > > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm > > index 587bb22..6703c87 100644 > > --- a/PVE/API2/Qemu.pm > > +++ b/PVE/API2/Qemu.pm > > @@ -4460,6 +4460,12 @@ __PACKAGE__->register_method({ > > optional => 1, > > default => 0, > > }, > > + 'target-cpu' => { > > + optional => 1, > > + description => "Target Emulated CPU model. For > > online migration, the storage is live migrate, but the memory > > migration is skipped and the target vm is restarted.", > > + type => 'string', > > + format => 'pve-vm-cpu-conf', > > + }, > > 'target-storage' => get_standard_option('pve- > > targetstorage', { > > completion => > > \&PVE::QemuServer::complete_migration_storage, > > optional => 0, > > @@ -4557,11 +4563,14 @@ __PACKAGE__->register_method({ > > raise_param_exc({ 'target-bridge' => "failed to parse > > bridge map: $@" }) > > if $@; > > > > + my $target_cpu = extract_param($param, 'target-cpu'); > > this is okay > > > + > > die "remote migration requires explicit storage mapping!\n" > > if $storagemap->{identity}; > > > > $param->{storagemap} = $storagemap; > > $param->{bridgemap} = $bridgemap; > > + $param->{targetcpu} = $target_cpu; > > but this is a bit confusing with the variable/hash key naming ;) > Fixed in the v4 ... > > > > + $remote_conf->{cpu} = $self->{opts}->{targetcpu}; > > do we need permission checks here (or better, somewhere early on, for > doing this here) > > > fixed in v4: do not override cpuconfig is targetcpu is empty. About permission, I'm not sure but we don't have specific permission for cpu. Do we need to check perm on vm.config ? Because Anyway,we should already a have permission to create a vm on target cluster. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-09-28 14:59 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-04-25 16:52 [pve-devel] [PATCH v2 qemu-server 0/2] remote-migration: migration with different cpu Alexandre Derumier 2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 1/2] migration: move livemigration code in a dedicated sub Alexandre Derumier 2023-04-25 16:52 ` [pve-devel] [PATCH v2 qemu-server 2/2] remote-migration: add target-cpu param Alexandre Derumier 2023-04-26 13:14 ` Fabian Grünbichler 2023-04-27 5:50 ` DERUMIER, Alexandre 2023-04-27 7:32 ` Fabian Grünbichler 2023-04-28 6:43 ` DERUMIER, Alexandre 2023-04-28 9:12 ` Fabian Grünbichler 2023-04-29 7:57 ` Thomas Lamprecht 2023-05-02 8:30 ` Fabian Grünbichler 2023-09-28 14:58 ` DERUMIER, Alexandre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox