From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id B37097EBA8 for ; Thu, 11 Nov 2021 13:26:00 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A73D7B42F for ; Thu, 11 Nov 2021 13:26:00 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 2C721B422 for ; Thu, 11 Nov 2021 13:25:57 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id F1B8441EF1 for ; Thu, 11 Nov 2021 13:25:56 +0100 (CET) Date: Thu, 11 Nov 2021 13:25:46 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Fabian Ebner , pve-devel@lists.proxmox.com References: <20211105130359.40803-1-f.gruenbichler@proxmox.com> <20211105130359.40803-22-f.gruenbichler@proxmox.com> <23b3c77c-e946-b7ae-c330-83ae7a451aae@proxmox.com> In-Reply-To: <23b3c77c-e946-b7ae-c330-83ae7a451aae@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1636629713.r0pg1k3q56.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.280 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu-server 09/10] migrate: add remote migration handling X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Nov 2021 12:26:00 -0000 On November 10, 2021 12:17 pm, Fabian Ebner wrote: > Am 05.11.21 um 14:03 schrieb Fabian Gr=C3=BCnbichler: >> remote migration uses a websocket connection to a task worker running on >> the target node instead of commands via SSH to control the migration. >> this websocket tunnel is started earlier than the SSH tunnel, and allows >> adding UNIX-socket forwarding over additional websocket connections >> on-demand. >>=20 >> the main differences to regular intra-cluster migration are: >> - source VM config and disks are only removed upon request via --delete >> - shared storages are treated like local storages, since we can't >> assume they are shared across clusters (with potentical to extend this >> by marking storages as shared) >> - NBD migrated disks are explicitly pre-allocated on the target node via >> tunnel command before starting the target VM instance >> - in addition to storages, network bridges and the VMID itself is >> transformed via a user defined mapping >> - all commands and migration data streams are sent via a WS tunnel proxy >>=20 >> Signed-off-by: Fabian Gr=C3=BCnbichler >> --- >>=20 >> Notes: >> requires proxmox-websocket-tunnel >>=20 >> PVE/API2/Qemu.pm | 4 +- >> PVE/QemuMigrate.pm | 647 +++++++++++++++++++++++++++++++++++++++------ >> PVE/QemuServer.pm | 8 +- >> 3 files changed, 575 insertions(+), 84 deletions(-) >>=20 >> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm >> index a1a1813..24f5b98 100644 >> --- a/PVE/API2/Qemu.pm >> +++ b/PVE/API2/Qemu.pm >> @@ -4610,7 +4610,7 @@ __PACKAGE__->register_method({ >> # bump/reset both for breaking changes >> # bump tunnel only for opt-in changes >=20 > Sorry for asking about this on this patch: shouldn't opt-in changes bump=20 > both? yes. this was initially version and min version, then changed to version=20 and age like with the storage plugins.. >=20 >> return { >> - api =3D> 2, >> + api =3D> $PVE::QemuMigrate::WS_TUNNEL_VERSION, >> age =3D> 0, >> }; >> }, >> @@ -4897,7 +4897,7 @@ __PACKAGE__->register_method({ >> PVE::Firewall::remove_vmfw_conf($vmid); >> } >> =20 >> - if (my @volumes =3D keys $state->{cleanup}->{volumes}->$%) { >> + if (my @volumes =3D keys $state->{cleanup}->{volumes}->%*) { >> PVE::Storage::foreach_volid(@volumes, sub { >> my ($volid, $sid, $volname, $d) =3D @_; >> =20 >> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm >> index 07b56eb..7378551 100644 >> --- a/PVE/QemuMigrate.pm >> +++ b/PVE/QemuMigrate.pm >> @@ -7,9 +7,15 @@ use IO::File; >> use IPC::Open2; >> use POSIX qw( WNOHANG ); >> use Time::HiRes qw( usleep ); >> +use JSON qw(encode_json decode_json); >> +use IO::Socket::UNIX; >> +use Socket qw(SOCK_STREAM); >> +use Storable qw(dclone); >> +use URI::Escape; >> =20 >> -use PVE::Format qw(render_bytes); >> +use PVE::APIClient::LWP; >> use PVE::Cluster; >> +use PVE::Format qw(render_bytes); >> use PVE::GuestHelpers qw(safe_boolean_ne safe_string_ne); >> use PVE::INotify; >> use PVE::RPCEnvironment; >> @@ -30,6 +36,9 @@ use PVE::QemuServer; >> use PVE::AbstractMigrate; >> use base qw(PVE::AbstractMigrate); >> =20 >> +# compared against remote end's minimum version >> +our $WS_TUNNEL_VERSION =3D 2; >> + >> sub fork_command_pipe { >> my ($self, $cmd) =3D @_; >> =20 >> @@ -85,7 +94,7 @@ sub finish_command_pipe { >> } >> } >> =20 >> - $self->log('info', "ssh tunnel still running - terminating now with= SIGTERM\n"); >> + $self->log('info', "tunnel still running - terminating now with SIG= TERM\n"); >> kill(15, $cpid); >> =20 >> # wait again >> @@ -94,11 +103,11 @@ sub finish_command_pipe { >> sleep(1); >> } >> =20 >> - $self->log('info', "ssh tunnel still running - terminating now with= SIGKILL\n"); >> + $self->log('info', "tunnel still running - terminating now with SIG= KILL\n"); >> kill 9, $cpid; >> sleep 1; >> =20 >> - $self->log('err', "ssh tunnel child process (PID $cpid) couldn't be= collected\n") >> + $self->log('err', "tunnel child process (PID $cpid) couldn't be col= lected\n") >> if !&$collect_child_process(); >> } >> =20 >> @@ -115,18 +124,28 @@ sub read_tunnel { >> }; >> die "reading from tunnel failed: $@\n" if $@; >> =20 >> - chomp $output; >> + chomp $output if defined($output); >> =20 >> return $output; >> } >> =20 >> sub write_tunnel { >> - my ($self, $tunnel, $timeout, $command) =3D @_; >> + my ($self, $tunnel, $timeout, $command, $params) =3D @_; >> =20 >> $timeout =3D 60 if !defined($timeout); >> =20 >> my $writer =3D $tunnel->{writer}; >> =20 >> + if ($tunnel->{version} && $tunnel->{version} >=3D 2) { >> + my $object =3D defined($params) ? dclone($params) : {}; >> + $object->{cmd} =3D $command; >> + >> + $command =3D eval { JSON::encode_json($object) }; >> + > + die "failed to encode command as JSON - $@\n" >> + if $@; >> + } >> + >> eval { >> PVE::Tools::run_with_timeout($timeout, sub { >> print $writer "$command\n"; >> @@ -136,13 +155,29 @@ sub write_tunnel { >> die "writing to tunnel failed: $@\n" if $@; >> =20 >> if ($tunnel->{version} && $tunnel->{version} >=3D 1) { >> - my $res =3D eval { $self->read_tunnel($tunnel, 10); }; >> + my $res =3D eval { $self->read_tunnel($tunnel, $timeout); }; >> die "no reply to command '$command': $@\n" if $@; >> =20 >> - if ($res eq 'OK') { >> - return; >> + if ($tunnel->{version} =3D=3D 1) { >> + if ($res eq 'OK') { >> + return; >> + } else { >> + die "tunnel replied '$res' to command '$command'\n"; >> + } >> } else { >> - die "tunnel replied '$res' to command '$command'\n"; >> + my $parsed =3D eval { JSON::decode_json($res) }; >> + die "failed to decode tunnel reply '$res' (command '$command') - $= @\n" >> + if $@; >> + >> + if (!$parsed->{success}) { >> + if (defined($parsed->{msg})) { >> + die "error - tunnel command '$command' failed - $parsed->{msg}\n"= ; >> + } else { >> + die "error - tunnel command '$command' failed\n"; >> + } >> + } >> + >> + return $parsed; >> } >> } >> } >> @@ -185,10 +220,150 @@ sub fork_tunnel { >> return $tunnel; >> } >> =20 >> +my $forward_unix_socket =3D sub { >> + my ($self, $local, $remote) =3D @_; >> + >> + my $params =3D dclone($self->{tunnel}->{params}); >> + $params->{unix} =3D $local; >> + $params->{url} =3D $params->{url} ."socket=3D$remote&"; >> + $params->{ticket} =3D { path =3D> $remote }; >> + >> + my $cmd =3D encode_json({ >> + control =3D> JSON::true, >> + cmd =3D> 'forward', >> + data =3D> $params, >> + }); >> + >> + my $writer =3D $self->{tunnel}->{writer}; >> + eval { >> + unlink $local; >> + PVE::Tools::run_with_timeout(15, sub { >> + print $writer "$cmd\n"; >> + $writer->flush(); >> + }); >> + }; >> + die "failed to write forwarding command - $@\n" if $@; >> + >> + $self->read_tunnel($self->{tunnel}); >> + >> + $self->log('info', "Forwarded local unix socket '$local' to remote = '$remote' via websocket tunnel"); >> +}; >> + >> +sub fork_websocket_tunnel { >> + my ($self, $storages) =3D @_; >> + >> + my $remote =3D $self->{opts}->{remote}; >> + my $conn =3D $remote->{conn}; >> + >> + my $websocket_url =3D "https://$conn->{host}:$conn->{port}/api2/jso= n/nodes/$self->{node}/qemu/$remote->{vmid}/mtunnelwebsocket"; >> + >> + my $params =3D { >> + url =3D> $websocket_url, >> + }; >> + >> + if (my $apitoken =3D $conn->{apitoken}) { >> + $params->{headers} =3D [["Authorization", "$apitoken"]]; >> + } else { >> + die "can't connect to remote host without credentials\n"; >> + } >> + >> + if (my $fps =3D $conn->{cached_fingerprints}) { >> + $params->{fingerprint} =3D (keys %$fps)[0]; >> + } >> + >> + my $api_client =3D PVE::APIClient::LWP->new(%$conn); >> + my $storage_list =3D join(',', keys %$storages); >> + my $res =3D $api_client->post("/nodes/$self->{node}/qemu/$remote->{= vmid}/mtunnel", { storages =3D> $storage_list }); >> + $self->log('info', "remote: started migration tunnel worker '$res->= {upid}'"); >> + $params->{url} .=3D "?ticket=3D".uri_escape($res->{ticket}); >> + $params->{url} .=3D "&socket=3D$res->{socket}"; >=20 > Nit: could also be escaped. >=20 done >> + >> + my $reader =3D IO::Pipe->new(); >> + my $writer =3D IO::Pipe->new(); >> + >> + my $cpid =3D fork(); >> + if ($cpid) { >> + $writer->writer(); >> + $reader->reader(); >> + my $tunnel =3D { writer =3D> $writer, reader =3D> $reader, pid =3D> $c= pid }; >> + >> + eval { >> + my $writer =3D $tunnel->{writer}; >> + my $cmd =3D encode_json({ >> + control =3D> JSON::true, >> + cmd =3D> 'connect', >> + data =3D> $params, >> + }); >> + >> + eval { >> + PVE::Tools::run_with_timeout(15, sub { >> + print {$writer} "$cmd\n"; >> + $writer->flush(); >> + }); >> + }; >> + die "failed to write tunnel connect command - $@\n" if $@; >> + }; >> + die "failed to connect via WS: $@\n" if $@; >> + >> + my $err; >> + eval { >> + my $writer =3D $tunnel->{writer}; >> + my $cmd =3D encode_json({ >> + cmd =3D> 'version', >> + }); >> + >> + eval { >> + PVE::Tools::run_with_timeout(15, sub { >> + print {$writer} "$cmd\n"; >> + $writer->flush(); >> + }); >> + }; >> + $err =3D "failed to write tunnel version command - $@\n" if $@; >> + my $res =3D $self->read_tunnel($tunnel, 10); >> + $res =3D JSON::decode_json($res); >> + my $version =3D $res->{api}; >> + >> + if ($version =3D~ /^(\d+)$/) { >> + $tunnel->{version} =3D $1; >> + $tunnel->{age} =3D $res->{age}; >> + $self->log('info', "tunnel info: $version\n"); >> + } else { >> + $err =3D "received invalid tunnel version string '$version'\n" if !$e= rr; >> + } >> + }; >> + $err =3D $@ if !$err; >> + >> + if ($err) { >> + $self->finish_command_pipe($tunnel); >> + die "can't open migration tunnel - $err"; >> + } >> + >> + $params->{url} =3D "$websocket_url?"; >> + $tunnel->{params} =3D $params; # for forwarding >> + >> + return $tunnel; >> + } else { >> + eval { >> + $writer->reader(); >> + $reader->writer(); >> + PVE::Tools::run_command( >> + ['proxmox-websocket-tunnel'], >> + input =3D> "<&".fileno($writer), >> + output =3D> ">&".fileno($reader), >> + errfunc =3D> sub { my $line =3D shift; print "tunnel: $line\n"; }, >> + ); >> + }; >> + warn "CMD websocket tunnel died: $@\n" if $@; >> + exit 0; >> + } >> +} >> + >> sub finish_tunnel { >> - my ($self, $tunnel) =3D @_; >> + my ($self, $tunnel, $cleanup) =3D @_; >> =20 >> - eval { $self->write_tunnel($tunnel, 30, 'quit'); }; >> + $cleanup =3D $cleanup ? 1 : 0; >> + >> + eval { $self->write_tunnel($tunnel, 30, 'quit', { cleanup =3D> $cle= anup }); }; >> my $err =3D $@; >> =20 >> $self->finish_command_pipe($tunnel, 30); >=20 > Nit: below here is > if (my $unix_sockets =3D $tunnel->{unix_sockets}) { > my $cmd =3D ['rm', '-f', @$unix_sockets]; > PVE::Tools::run_command($cmd); >=20 > # .. and just to be sure check on remote side > unshift @{$cmd}, @{$self->{rem_ssh}}; > PVE::Tools::run_command($cmd); > } > and if I'm not mistaken, $self->{rem_ssh} is undef for remote migration,=20 > resulting in an undef warning and $cmd being executed twice locally. doesn't hurt, but also not optimal. conditionalized in v2 >> @@ -338,23 +513,34 @@ sub prepare { >> } >> =20 >> my $vollist =3D PVE::QemuServer::get_vm_volumes($conf); >> + >> + my $storages =3D {}; >> foreach my $volid (@$vollist) { >> my ($sid, $volname) =3D PVE::Storage::parse_volume_id($volid, 1); >> =20 >> - # check if storage is available on both nodes >> + # check if storage is available on source node >> my $scfg =3D PVE::Storage::storage_check_enabled($storecfg, $sid); >> =20 >> my $targetsid =3D $sid; >> - # NOTE: we currently ignore shared source storages in mappings so skip= here too for now >> - if (!$scfg->{shared}) { >> + # NOTE: local ignores shared mappings, remote maps them >> + if (!$scfg->{shared} || $self->{opts}->{remote}) { >> $targetsid =3D PVE::QemuServer::map_id($self->{opts}->{storagemap= }, $sid); >> } >> =20 >> - my $target_scfg =3D PVE::Storage::storage_check_enabled($storecfg, $ta= rgetsid, $self->{node}); >> - my ($vtype) =3D PVE::Storage::parse_volname($storecfg, $volid); >> + $storages->{$targetsid} =3D 1; >> + >> + if (!$self->{opts}->{remote}) { >> + # check if storage is available on target node >> + my $target_scfg =3D PVE::Storage::storage_check_enabled( >> + $storecfg, >> + $targetsid, >> + $self->{node}, >> + ); >> + my ($vtype) =3D PVE::Storage::parse_volname($storecfg, $volid); >> =20 >> - die "$volid: content type '$vtype' is not available on storage '$targe= tsid'\n" >> - if !$target_scfg->{content}->{$vtype}; >> + die "$volid: content type '$vtype' is not available on storage '$t= argetsid'\n" >> + if !$target_scfg->{content}->{$vtype}; >> + } >> =20 >> if ($scfg->{shared}) { >> # PVE::Storage::activate_storage checks this for non-shared stora= ges >> @@ -364,10 +550,23 @@ sub prepare { >> } >> } >> =20 >> - # test ssh connection >> - my $cmd =3D [ @{$self->{rem_ssh}}, '/bin/true' ]; >> - eval { $self->cmd_quiet($cmd); }; >> - die "Can't connect to destination address using public key\n" if $@= ; >> + if ($self->{opts}->{remote}) { >> + # test & establish websocket connection >> + my $tunnel =3D $self->fork_websocket_tunnel($storages); >> + my $min_version =3D $tunnel->{version} - $tunnel->{age}; >> + die "Remote tunnel endpoint not compatible, upgrade required (current:= $WS_TUNNEL_VERSION, required: $min_version)\n" >> + if $WS_TUNNEL_VERSION < $min_version; >> + die "Remote tunnel endpoint too old, upgrade required (local: $WS_TUN= NEL_VERSION, remote: $tunnel->{version})" >=20 > Nit: missing '\n' in error, and while we're at it: style nit for >100=20 > character lines (are not the only instances in the series). >=20 fixed by switching to three info log statements, and shorter error=20 messages without the versions >> + if $WS_TUNNEL_VERSION > $tunnel->{version}; >> + >> + print "websocket tunnel started\n"; >> + $self->{tunnel} =3D $tunnel; >> + } else { >> + # test ssh connection >> + my $cmd =3D [ @{$self->{rem_ssh}}, '/bin/true' ]; >> + eval { $self->cmd_quiet($cmd); }; >> + die "Can't connect to destination address using public key\n" if $@; >> + } >> =20 >> return $running; >> } >> @@ -405,7 +604,7 @@ sub scan_local_volumes { >> my @sids =3D PVE::Storage::storage_ids($storecfg); >> foreach my $storeid (@sids) { >> my $scfg =3D PVE::Storage::storage_config($storecfg, $storeid); >> - next if $scfg->{shared}; >> + next if $scfg->{shared} && !$self->{opts}->{remote}; >> next if !PVE::Storage::storage_check_enabled($storecfg, $storeid,= undef, 1); >> =20 >> # get list from PVE::Storage (for unused volumes) >> @@ -414,19 +613,24 @@ sub scan_local_volumes { >> next if @{$dl->{$storeid}} =3D=3D 0; >> =20 >> my $targetsid =3D PVE::QemuServer::map_id($self->{opts}->{storage= map}, $storeid); >> - # check if storage is available on target node >> - my $target_scfg =3D PVE::Storage::storage_check_enabled( >> - $storecfg, >> - $targetsid, >> - $self->{node}, >> - ); >> - >> - die "content type 'images' is not available on storage '$targetsid= '\n" >> - if !$target_scfg->{content}->{images}; >> + my $bwlimit_sids =3D [$storeid]; >> + if (!$self->{opts}->{remote}) { >> + # check if storage is available on target node >> + my $target_scfg =3D PVE::Storage::storage_check_enabled( >> + $storecfg, >> + $targetsid, >> + $self->{node}, >> + ); >> + >> + die "content type 'images' is not available on storage '$targetsid'\n= " >> + if !$target_scfg->{content}->{images}; >> + >> + push @$bwlimit_sids, $targetsid; >> + } >> =20 >> my $bwlimit =3D PVE::Storage::get_bandwidth_limit( >> 'migration', >> - [$targetsid, $storeid], >> + $bwlimit_sids, >> $self->{opts}->{bwlimit}, >> ); >> =20 >> @@ -482,14 +686,17 @@ sub scan_local_volumes { >> my $scfg =3D PVE::Storage::storage_check_enabled($storecfg, $sid)= ; >> =20 >> my $targetsid =3D $sid; >> - # NOTE: we currently ignore shared source storages in mappings so = skip here too for now >> - if (!$scfg->{shared}) { >> + # NOTE: local ignores shared mappings, remote maps them >> + if (!$scfg->{shared} || $self->{opts}->{remote}) { >> $targetsid =3D PVE::QemuServer::map_id($self->{opts}->{storagemap}, = $sid); >> } >> =20 >> - PVE::Storage::storage_check_enabled($storecfg, $targetsid, $self->= {node}); >> + # check target storage on target node if intra-cluster migration >> + if (!$self->{opts}->{remote}) { >> + PVE::Storage::storage_check_enabled($storecfg, $targetsid, $self->{no= de}); >> =20 >> - return if $scfg->{shared}; >> + return if $scfg->{shared}; >> + } >> =20 >> $local_volumes->{$volid}->{ref} =3D $attr->{referenced_in_config}= ? 'config' : 'snapshot'; >> $local_volumes->{$volid}->{ref} =3D 'storage' if $attr->{is_unuse= d}; >> @@ -578,6 +785,9 @@ sub scan_local_volumes { >> =20 >> my $migratable =3D $scfg->{type} =3D~ /^(?:dir|btrfs|zfspool|lvmt= hin|lvm)$/; >> =20 >> + # TODO: what is this even here for? >> + $migratable =3D 1 if $self->{opts}->{remote}; >> + >> die "can't migrate '$volid' - storage type '$scfg->{type}' not su= pported\n" >> if !$migratable; >> =20 >> @@ -612,6 +822,10 @@ sub handle_replication { >> my $local_volumes =3D $self->{local_volumes}; >> =20 >> return if !$self->{replication_jobcfg}; >> + >> + die "can't migrate VM with replicated volumes to remote cluster/nod= e\n" >> + if $self->{opts}->{remote}; >=20 > We can add that later, asserting that no local removal will happen ;) > Same for being a base VM referenced by a linked clone. >=20 >> + >> if ($self->{running}) { >> =20 >> my $version =3D PVE::QemuServer::kvm_user_version(); >> @@ -709,26 +923,133 @@ sub sync_offline_local_volumes { >> my $opts =3D $self->{opts}; >> =20 >> $self->log('info', "copying local disk images") if scalar(@volids)= ; >> - >> + my $forwarded =3D 0; >> foreach my $volid (@volids) { >> my $targetsid =3D $local_volumes->{$volid}->{targetsid}; >> - my $bwlimit =3D $local_volumes->{$volid}->{bwlimit}; >> - $bwlimit =3D $bwlimit * 1024 if defined($bwlimit); # storage_migrate u= ses bps >> - >> - my $storage_migrate_opts =3D { >> - 'ratelimit_bps' =3D> $bwlimit, >> - 'insecure' =3D> $opts->{migration_type} eq 'insecure', >> - 'with_snapshots' =3D> $local_volumes->{$volid}->{snapshots}, >> - 'allow_rename' =3D> !$local_volumes->{$volid}->{is_vmstate}, >> - }; >> =20 >> - my $logfunc =3D sub { $self->log('info', $_[0]); }; >> - my $new_volid =3D eval { >> - PVE::Storage::storage_migrate($storecfg, $volid, $self->{ssh_info}= , >> - $targetsid, $storage_migrate_opts, $logfunc); >> - }; >> - if (my $err =3D $@) { >> - die "storage migration for '$volid' to storage '$targetsid' failed= - $err\n"; >> + my $new_volid; >> +=09 >=20 > Style nit: whitespace error >=20 >> + my $opts =3D $self->{opts}; >> + if (my $remote =3D $opts->{remote}) { >> + my $remote_vmid =3D $remote->{vmid}; >> + my ($sid, undef) =3D PVE::Storage::parse_volume_id($volid); >> + my (undef, $name, undef, undef, undef, undef, $format) =3D PVE::St= orage::parse_volname($storecfg, $volid); >> + my $scfg =3D PVE::Storage::storage_config($storecfg, $sid); >> + PVE::Storage::activate_volumes($storecfg, [$volid]); >> + >> + # use 'migrate' limit for transfer to other node >> + my $bwlimit_opts =3D { >> + storage =3D> $targetsid, >> + bwlimit =3D> $opts->{bwlimit}, >> + }; >> + my $bwlimit =3D PVE::Storage::get_bandwidth_limit('migration', [$s= id], $opts->{bwlimit}); >=20 > Nit: could use > my $bwlimit =3D $local_volumes->{$volid}->{bwlimit}; >=20 >> + my $remote_bwlimit =3D $self->write_tunnel($self->{tunnel}, 10, 'b= wlimit', $bwlimit_opts); >> + $remote_bwlimit =3D $remote_bwlimit->{bwlimit}; >> + if (defined($remote_bwlimit)) { >> + $bwlimit =3D $remote_bwlimit if !defined($bwlimit); >> + $bwlimit =3D $remote_bwlimit if $remote_bwlimit < $bwlimit; >> + } >> + >> + # JSONSchema and get_bandwidth_limit use kbps - storage_migrate bp= s >> + $bwlimit =3D $bwlimit * 1024 if defined($bwlimit); >> + >> + my $with_snapshots =3D $local_volumes->{$volid}->{snapshots} ? 1 := 0; >> + my $snapshot; >> + if ($scfg->{type} eq 'zfspool') { >> + $snapshot =3D '__migration__'; >> + $with_snapshots =3D 1; >> + PVE::Storage::volume_snapshot($storecfg, $volid, $snapshot); >> + } >> + >> + if ($self->{vmid} !=3D $remote_vmid) { >> + $name =3D~ s/-$self->{vmid}-/-$remote_vmid-/g; >> + $name =3D~ s/^$self->{vmid}\//$remote_vmid\//; >> + } >> + >> + my @export_formats =3D PVE::Storage::volume_export_formats($storec= fg, $volid, undef, undef, $with_snapshots); >> + >> + my $storage_migrate_opts =3D { >=20 > Nit: maybe call it disk_import_opts >=20 >> + format =3D> $format, >> + storage =3D> $targetsid, >> + 'with-snapshots' =3D> $with_snapshots, >> + 'allow-rename' =3D> !$local_volumes->{$volid}->{is_vmstate}, >> + 'export-formats' =3D> @export_formats, >=20 > Doesn't this need to be converted to a string? >=20 yes, but it seems it works without when there's just a single entry,=20 which is true for everything but btrfs :) >> + volname =3D> $name, >> + }; >> + my $res =3D $self->write_tunnel($self->{tunnel}, 600, 'disk-import= ', $storage_migrate_opts); >> + my $local =3D "/run/qemu-server/$self->{vmid}.storage"; >> + if (!$forwarded) { >> + $forward_unix_socket->($self, $local, $res->{socket}); >> + $forwarded =3D 1; >> + } >> + my $socket =3D IO::Socket::UNIX->new(Peer =3D> $local, Type =3D> S= OCK_STREAM()) >> + or die "failed to connect to websocket tunnel at $local\n"; >> + # we won't be reading from the socket >> + shutdown($socket, 0); >> + my $send =3D ['pvesm', 'export', $volid, $res->{format}, '-', '-wi= th-snapshots', $with_snapshots]; >> + push @$send, '-snapshot', $snapshot if $snapshot; >> + >> + my @cstream; >> + if (defined($bwlimit)) { >> + @cstream =3D ([ '/usr/bin/cstream', '-t', $bwlimit ]); >> + $self->log('info', "using a bandwidth limit of $bwlimit bps for trans= ferring '$volid'"); >> + } >> + >> + eval { >> + PVE::Tools::run_command( >> + [$send, @cstream], >> + output =3D> '>&'.fileno($socket), >> + errfunc =3D> sub { my $line =3D shift; $self->log('warn', $line);= }, >> + ); >> + }; >> + my $send_error =3D $@; >> + >> + # don't close the connection entirely otherwise the >> + # receiving end might not get all buffered data (and >> + # fails with 'connection reset by peer') >> + shutdown($socket, 1); >> + >> + # wait for the remote process to finish >> + while ($res =3D $self->write_tunnel($self->{tunnel}, 10, 'query-di= sk-import')) { >> + if ($res->{status} eq 'pending') { >> + $self->log('info', "waiting for disk import to finish..\n"); >> + sleep(1) >> + } elsif ($res->{status} eq 'complete') { >> + $new_volid =3D $res->{volid}; >> + last; >> + } else { >> + die "unknown query-disk-import result: $res->{status}\n"; >> + } >> + } >> + >> + # now close the socket >> + close($socket); >> + die $send_error if $send_error; >> + } else { >> + my $bwlimit =3D $local_volumes->{$volid}->{bwlimit}; >> + $bwlimit =3D $bwlimit * 1024 if defined($bwlimit); # storage_migra= te uses bps >> + >> + my $storage_migrate_opts =3D { >> + 'ratelimit_bps' =3D> $bwlimit, >> + 'insecure' =3D> $opts->{migration_type} eq 'insecure', >> + 'with_snapshots' =3D> $local_volumes->{$volid}->{snapshots}, >> + 'allow_rename' =3D> !$local_volumes->{$volid}->{is_vmstate}, >> + }; >> + >> + my $logfunc =3D sub { $self->log('info', $_[0]); }; >> + $new_volid =3D eval { >> + PVE::Storage::storage_migrate( >> + $storecfg, >> + $volid, >> + $self->{ssh_info}, >> + $targetsid, >> + $storage_migrate_opts, >> + $logfunc, >> + ); >> + }; >> + if (my $err =3D $@) { >> + die "storage migration for '$volid' to storage '$targetsid' failed - = $err\n"; >> + } >> } >> =20 >> $self->{volume_map}->{$volid} =3D $new_volid; >> @@ -744,6 +1065,12 @@ sub sync_offline_local_volumes { >> sub cleanup_remotedisks { >> my ($self) =3D @_; >> =20 >=20 > Nit, not to be taken seriously: cleanup_remotedisks_and_maybe_tunnel ;) >=20 >> + if ($self->{opts}->{remote}) { >> + $self->finish_tunnel($self->{tunnel}, 1); >> + delete $self->{tunnel}; >> + return; >> + } >> + >> my $local_volumes =3D $self->{local_volumes}; >> =20 >> foreach my $volid (values %{$self->{volume_map}}) { >> @@ -793,8 +1120,84 @@ sub phase1 { >> $self->handle_replication($vmid); >> =20 >> $self->sync_offline_local_volumes(); >> + $self->phase1_remote($vmid) if $self->{opts}->{remote}; >> }; >> =20 >> +sub phase1_remote { >> + my ($self, $vmid) =3D @_; >> + >> + my $remote_conf =3D PVE::QemuConfig->load_config($vmid); >> + PVE::QemuConfig->update_volume_ids($remote_conf, $self->{volume_map= }); >> + >> + # TODO: check bridge availability earlier? >> + my $bridgemap =3D $self->{opts}->{bridgemap}; >> + foreach my $opt (keys %$remote_conf) { >> + next if $opt !~ m/^net\d+$/; >> + >> + next if !$remote_conf->{$opt}; >> + my $d =3D PVE::QemuServer::parse_net($remote_conf->{$opt}); >> + next if !$d || !$d->{bridge}; >> + >> + my $target_bridge =3D PVE::QemuServer::map_id($bridgemap, $d->{bridge}= ); >> + $self->log('info', "mapped: $opt from $d->{bridge} to $target_bridge")= ; >> + $d->{bridge} =3D $target_bridge; >> + $remote_conf->{$opt} =3D PVE::QemuServer::print_net($d); >> + } >> + >> + my @online_local_volumes =3D $self->filter_local_volumes('online'); >> + >> + my $storage_map =3D $self->{opts}->{storagemap}; >> + $self->{nbd} =3D {}; >> + PVE::QemuConfig->foreach_volume($remote_conf, sub { >> + my ($ds, $drive) =3D @_; >> + >> + # TODO eject CDROM? >> + return if PVE::QemuServer::drive_is_cdrom($drive); >> + >> + my $volid =3D $drive->{file}; >> + return if !$volid; >> + >> + return if !grep { $_ eq $volid} @online_local_volumes; >> + >> + my ($storeid, $volname) =3D PVE::Storage::parse_volume_id($volid); >> + my $scfg =3D PVE::Storage::storage_config($self->{storecfg}, $storeid)= ; >> + my $source_format =3D PVE::QemuServer::qemu_img_format($scfg, $volname= ); >> + >> + # set by target cluster >> + my $oldvolid =3D delete $drive->{file}; >> + delete $drive->{format}; >> + >> + my $targetsid =3D PVE::QemuServer::map_id($storage_map, $storeid); >> + >> + my $params =3D { >> + format =3D> $source_format, >> + storage =3D> $targetsid, >> + drive =3D> $drive, >> + }; >> + >> + $self->log('info', "Allocating volume for drive '$ds' on remote storag= e '$targetsid'.."); >> + my $res =3D $self->write_tunnel($self->{tunnel}, 600, 'disk', $params)= ; >> + >> + $self->log('info', "volume '$oldvolid' os '$res->{volid}' on the targe= t\n"); >> + $remote_conf->{$ds} =3D $res->{drivestr}; >> + $self->{nbd}->{$ds} =3D $res; >> + }); >> + >> + my $conf_str =3D PVE::QemuServer::write_vm_config("remote", $remote= _conf); >> + >> + # TODO expose in PVE::Firewall? >> + my $vm_fw_conf_path =3D "/etc/pve/firewall/$vmid.fw"; >> + my $fw_conf_str; >> + $fw_conf_str =3D PVE::Tools::file_get_contents($vm_fw_conf_path) >> + if -e $vm_fw_conf_path; >> + my $params =3D { >> + conf =3D> $conf_str, >> + 'firewall-config' =3D> $fw_conf_str, >> + }; >> + >> + $self->write_tunnel($self->{tunnel}, 10, 'config', $params); >> +} >> + >> sub phase1_cleanup { >> my ($self, $vmid, $err) =3D @_; >> =20 >> @@ -825,7 +1228,6 @@ sub phase2_start_local_cluster { >> my $local_volumes =3D $self->{local_volumes}; >> my @online_local_volumes =3D $self->filter_local_volumes('online')= ; >> =20 >> - $self->{storage_migration} =3D 1 if scalar(@online_local_volumes); >> my $start =3D $params->{start_params}; >> my $migrate =3D $params->{migrate_opts}; >> =20 >> @@ -948,10 +1350,34 @@ sub phase2_start_local_cluster { >> return ($tunnel_info, $spice_port); >> } >> =20 >> +sub phase2_start_remote_cluster { >> + my ($self, $vmid, $params) =3D @_; >> + >> + die "insecure migration to remote cluster not implemented\n" >> + if $params->{migrate_opts}->{type} ne 'websocket'; >> + >> + my $remote_vmid =3D $self->{opts}->{remote}->{vmid}; >> + >> + my $res =3D $self->write_tunnel($self->{tunnel}, 10, "start", $para= ms); >> + >> + foreach my $drive (keys %{$res->{drives}}) { >> + $self->{stopnbd} =3D 1; >> + $self->{target_drive}->{$drive}->{drivestr} =3D $res->{drives}->{$driv= e}->{drivestr}; >> + my $nbd_uri =3D $res->{drives}->{$drive}->{nbd_uri}; >> + die "unexpected NBD uri for '$drive': $nbd_uri\n" >> + if $nbd_uri !~ s!/run/qemu-server/$remote_vmid\_!/run/qemu-server/= $vmid\_!; >> + >> + $self->{target_drive}->{$drive}->{nbd_uri} =3D $nbd_uri; >> + } >> + >> + return ($res->{migrate}, $res->{spice_port}); >> +} >> + >> sub phase2 { >> my ($self, $vmid) =3D @_; >> =20 >> my $conf =3D $self->{vmconf}; >> + my $local_volumes =3D $self->{local_volumes}; >> =20 >> # version > 0 for unix socket support >> my $nbd_protocol_version =3D 1; >> @@ -983,10 +1409,42 @@ sub phase2 { >> }, >> }; >> =20 >> - my ($tunnel_info, $spice_port) =3D $self->phase2_start_local_cluste= r($vmid, $params); >> + my ($tunnel_info, $spice_port); >> + >> + my @online_local_volumes =3D $self->filter_local_volumes('online'); >> + $self->{storage_migration} =3D 1 if scalar(@online_local_volumes); >> + >> + if (my $remote =3D $self->{opts}->{remote}) { >> + my $remote_vmid =3D $remote->{vmid}; >> + $params->{migrate_opts}->{remote_node} =3D $self->{node}; >> + ($tunnel_info, $spice_port) =3D $self->phase2_start_remote_cluster($vm= id, $params); >> + die "only UNIX sockets are supported for remote migration\n" >> + if $tunnel_info->{proto} ne 'unix'; >> + >> + my $forwarded =3D {}; >> + my $remote_socket =3D $tunnel_info->{addr}; >> + my $local_socket =3D $remote_socket; >> + $local_socket =3D~ s/$remote_vmid/$vmid/g; >> + $tunnel_info->{addr} =3D $local_socket; >> + >> + $self->log('info', "Setting up tunnel for '$local_socket'"); >> + $forward_unix_socket->($self, $local_socket, $remote_socket); >> + $forwarded->{$local_socket} =3D 1; >> + >> + foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) { >> + my $local_socket =3D $remote_socket; >> + $local_socket =3D~ s/$remote_vmid/$vmid/g; >> + next if $forwarded->{$local_socket}; >> + $self->log('info', "Setting up tunnel for '$local_socket'"); >> + $forward_unix_socket->($self, $local_socket, $remote_socket); >> + $forwarded->{$local_socket} =3D 1; >> + } >> + } else { >> + ($tunnel_info, $spice_port) =3D $self->phase2_start_local_cluster($vmi= d, $params); >> =20 >> - $self->log('info', "start remote tunnel"); >> - $self->start_remote_tunnel($tunnel_info); >> + $self->log('info', "start remote tunnel"); >> + $self->start_remote_tunnel($tunnel_info); >> + } >> =20 >> my $migrate_uri =3D "$tunnel_info->{proto}:$tunnel_info->{addr}"; >> $migrate_uri .=3D ":$tunnel_info->{port}" >> @@ -996,8 +1454,6 @@ sub phase2 { >> $self->{storage_migration_jobs} =3D {}; >> $self->log('info', "starting storage migration"); >> =20 >> - my @online_local_volumes =3D $self->filter_local_volumes('online'); >> - >> die "The number of local disks does not match between the source and = the destination.\n" >> if (scalar(keys %{$self->{target_drive}}) !=3D scalar(@online_loc= al_volumes)); >> foreach my $drive (keys %{$self->{target_drive}}){ >> @@ -1070,7 +1526,7 @@ sub phase2 { >> }; >> $self->log('info', "migrate-set-parameters error: $@") if $@; >> =20 >> - if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) { >> + if (PVE::QemuServer::vga_conf_has_spice($conf->{vga} && !$self->{op= ts}->{remote})) { >> my $rpcenv =3D PVE::RPCEnvironment::get(); >> my $authuser =3D $rpcenv->get_user(); >> =20 >> @@ -1267,11 +1723,15 @@ sub phase2_cleanup { >> =20 >> my $nodename =3D PVE::INotify::nodename(); >> =20 >> - my $cmd =3D [@{$self->{rem_ssh}}, 'qm', 'stop', $vmid, '--skiplock'= , '--migratedfrom', $nodename]; >> - eval{ PVE::Tools::run_command($cmd, outfunc =3D> sub {}, errfunc = =3D> sub {}) }; >> - if (my $err =3D $@) { >> - $self->log('err', $err); >> - $self->{errors} =3D 1; >> + if ($self->{tunnel} && $self->{tunnel}->{version} >=3D 2) { >> + $self->write_tunnel($self->{tunnel}, 10, 'stop'); >> + } else { >> + my $cmd =3D [@{$self->{rem_ssh}}, 'qm', 'stop', $vmid, '--skiplock', '= --migratedfrom', $nodename]; >> + eval{ PVE::Tools::run_command($cmd, outfunc =3D> sub {}, errfunc =3D> = sub {}) }; >> + if (my $err =3D $@) { >> + $self->log('err', $err); >> + $self->{errors} =3D 1; >> + } >> } >> =20 >> # cleanup after stopping, otherwise disks might be in-use by targe= t VM! >> @@ -1304,7 +1764,7 @@ sub phase3_cleanup { >> =20 >> my $tunnel =3D $self->{tunnel}; >> =20 >> - if ($self->{volume_map}) { >> + if ($self->{volume_map} && !$self->{opts}->{remote}) { >> my $target_drives =3D $self->{target_drive}; >> =20 >> # FIXME: for NBD storage migration we now only update the volid, and >> @@ -1321,26 +1781,33 @@ sub phase3_cleanup { >> =20 >> # transfer replication state before move config >> $self->transfer_replication_state() if $self->{is_replicated}; >> - PVE::QemuConfig->move_config_to_node($vmid, $self->{node}); >> + if (!$self->{opts}->{remote}) { >> + PVE::QemuConfig->move_config_to_node($vmid, $self->{node}); >> + } >> $self->switch_replication_job_target() if $self->{is_replicated}; >=20 > All three lines could/should be guarded by the if. >=20 true (doesn't matter now since we error out on replicated VMs anyway,=20 but makes this more obvious when we change that face ;)) >> =20 >> if ($self->{livemigration}) { >> if ($self->{stopnbd}) { >> $self->log('info', "stopping NBD storage migration server on targ= et."); >> # stop nbd server on remote vm - requirement for resume since 2.9 >> - my $cmd =3D [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid]; >> + if ($tunnel && $tunnel->{version} && $tunnel->{version} >=3D 2) { >> + $self->write_tunnel($tunnel, 30, 'nbdstop'); >> + } else { >> + my $cmd =3D [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid]; >> =20 >> - eval{ PVE::Tools::run_command($cmd, outfunc =3D> sub {}, errfunc = =3D> sub {}) }; >> - if (my $err =3D $@) { >> - $self->log('err', $err); >> - $self->{errors} =3D 1; >> + eval{ PVE::Tools::run_command($cmd, outfunc =3D> sub {}, errfunc =3D>= sub {}) }; >> + if (my $err =3D $@) { >> + $self->log('err', $err); >> + $self->{errors} =3D 1; >> + } >> } >> } >> =20 >> # config moved and nbd server stopped - now we can resume vm on targe= t >> if ($tunnel && $tunnel->{version} && $tunnel->{version} >=3D 1) { >> + my $cmd =3D $tunnel->{version} =3D=3D 1 ? "resume $vmid" : "resume= "; >> eval { >> - $self->write_tunnel($tunnel, 30, "resume $vmid"); >> + $self->write_tunnel($tunnel, 30, $cmd); >> }; >> if (my $err =3D $@) { >> $self->log('err', $err); >> @@ -1360,18 +1827,24 @@ sub phase3_cleanup { >> } >> =20 >> if ($self->{storage_migration} && PVE::QemuServer::parse_guest_agent(= $conf)->{fstrim_cloned_disks} && $self->{running}) { >> - my $cmd =3D [@{$self->{rem_ssh}}, 'qm', 'guest', 'cmd', $vmid, 'fs= trim']; >> - eval{ PVE::Tools::run_command($cmd, outfunc =3D> sub {}, errfunc = =3D> sub {}) }; >> + if ($self->{opts}->{remote}) { >> + $self->write_tunnel($self->{tunnel}, 600, 'fstrim'); >> + } else { >> + my $cmd =3D [@{$self->{rem_ssh}}, 'qm', 'guest', 'cmd', $vmid, 'fstri= m']; >> + eval{ PVE::Tools::run_command($cmd, outfunc =3D> sub {}, errfunc =3D>= sub {}) }; >> + } >> } >> } >> =20 >> # close tunnel on successful migration, on error phase2_cleanup cl= osed it >> - if ($tunnel) { >> + if ($tunnel && $tunnel->{version} =3D=3D 1) { >> eval { finish_tunnel($self, $tunnel); }; >> if (my $err =3D $@) { >> $self->log('err', $err); >> $self->{errors} =3D 1; >> } >> + $tunnel =3D undef; >> + delete $self->{tunnel}; >> } >> =20 >> eval { >> @@ -1409,6 +1882,9 @@ sub phase3_cleanup { >> =20 >> # destroy local copies >> foreach my $volid (@not_replicated_volumes) { >> + # remote is cleaned up below >> + next if $self->{opts}->{remote}; >> + >> eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); }; >> if (my $err =3D $@) { >> $self->log('err', "removing local copy of '$volid' failed - $err"= ); >> @@ -1418,8 +1894,19 @@ sub phase3_cleanup { >> } >> =20 >> # clear migrate lock >> - my $cmd =3D [ @{$self->{rem_ssh}}, 'qm', 'unlock', $vmid ]; >> - $self->cmd_logerr($cmd, errmsg =3D> "failed to clear migrate lock")= ; >> + if ($tunnel && $tunnel->{version} >=3D 2) { >> + $self->write_tunnel($tunnel, 10, "unlock"); >> + >> + $self->finish_tunnel($tunnel); >> + } else { >> + my $cmd =3D [ @{$self->{rem_ssh}}, 'qm', 'unlock', $vmid ]; >> + $self->cmd_logerr($cmd, errmsg =3D> "failed to clear migrate lock"); >> + } >> + >> + if ($self->{opts}->{remote} && $self->{opts}->{delete}) { >> + eval { PVE::QemuServer::destroy_vm($self->{storecfg}, $vmid, 1, undef,= 0) }; >> + warn "Failed to remove source VM - $@\n" if $@; >> + } >> } >> =20 >> sub final_cleanup { >> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm >> index d494cc0..bf05da2 100644 >> --- a/PVE/QemuServer.pm >> +++ b/PVE/QemuServer.pm >> @@ -5384,7 +5384,11 @@ sub vm_start_nolock { >> my $defaults =3D load_defaults(); >> =20 >> # set environment variable useful inside network script >> - $ENV{PVE_MIGRATED_FROM} =3D $migratedfrom if $migratedfrom; >> + if ($migrate_opts->{remote_node}) { >> + $ENV{PVE_MIGRATED_FROM} =3D $migrate_opts->{remote_node}; >> + } elsif ($migratedfrom) { >> + $ENV{PVE_MIGRATED_FROM} =3D $migratedfrom; >> + } >=20 > But the network script tries to load the config from that node and if=20 > it's not in the cluster that doesn't work? >=20 this is a bit confusing, yeah. $migratedfrom contains the source node, which is unusable on the remote=20 cluster remote_node contains the target node, which actually has the full config=20 when we start the VM there over the tunnel (in contrast to a local=20 migration, where the target node doesn't yet have the config!) so this should be correct? but even easier would be to just not set it=20 (for remote migrations), since the start MUST happen on the node where=20 mtunnel is running/the config is located. >> =20 >> PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1); >> =20 >> @@ -5621,7 +5625,7 @@ sub vm_start_nolock { >> =20 >> my $migrate_storage_uri; >> # nbd_protocol_version > 0 for unix socket support >> - if ($nbd_protocol_version > 0 && $migration_type eq 'secure') { >> + if ($nbd_protocol_version > 0 && ($migration_type eq 'secure' || $migr= ation_type eq 'websocket')) { >> my $socket_path =3D "/run/qemu-server/$vmid\_nbd.migrate"; >> mon_cmd($vmid, "nbd-server-start", addr =3D> { type =3D> 'unix', = data =3D> { path =3D> $socket_path } } ); >> $migrate_storage_uri =3D "nbd:unix:$socket_path"; >>=20 >=20