From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id EDFE37E454
 for <pve-devel@lists.proxmox.com>; Wed, 10 Nov 2021 13:30:05 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 83EDB1BAF1
 for <pve-devel@lists.proxmox.com>; Wed, 10 Nov 2021 13:29:35 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 7175E1BAE6
 for <pve-devel@lists.proxmox.com>; Wed, 10 Nov 2021 13:29:34 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 43595439B3
 for <pve-devel@lists.proxmox.com>; Wed, 10 Nov 2021 13:29:34 +0100 (CET)
To: pve-devel@lists.proxmox.com,
 =?UTF-8?Q?Fabian_Gr=c3=bcnbichler?= <f.gruenbichler@proxmox.com>
References: <20211105130359.40803-1-f.gruenbichler@proxmox.com>
 <20211105130359.40803-23-f.gruenbichler@proxmox.com>
From: Fabian Ebner <f.ebner@proxmox.com>
Message-ID: <f642d966-71fa-ce89-534d-a5f39c5da30f@proxmox.com>
Date: Wed, 10 Nov 2021 13:29:32 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.14.0
MIME-Version: 1.0
In-Reply-To: <20211105130359.40803-23-f.gruenbichler@proxmox.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 1.046 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -1.678 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [qemu.pm]
Subject: Re: [pve-devel] [PATCH qemu-server 10/10] api: add remote migrate
 endpoint
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 10 Nov 2021 12:30:06 -0000

Am 05.11.21 um 14:03 schrieb Fabian Grünbichler:
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> 
> Notes:
>      the checks currently done before the actual migration worker is
>      forked could be either moved to the client calling this (that then
>      makes the required API calls) or extracted into a precond API call
>      like for regular migration.
>      
>      for testing it helps catch trivial mistakes early on, and the calls shouldn't
>      be too expensive, so I left them in for now..
>      
>      requires
>      - pve-common with bridge-pair format
>      - pve-guest-common with AbstractMigrate handling remote migration
> 
>   PVE/API2/Qemu.pm | 205 ++++++++++++++++++++++++++++++++++++++++++++++-
>   debian/control   |   2 +
>   2 files changed, 205 insertions(+), 2 deletions(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 24f5b98..b931f04 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -14,6 +14,7 @@ use URI::Escape;
>   use Crypt::OpenSSL::Random;
>   use Socket qw(SOCK_STREAM);
>   
> +use PVE::APIClient::LWP;
>   use PVE::Cluster qw (cfs_read_file cfs_write_file);;
>   use PVE::RRD;
>   use PVE::SafeSyslog;
> @@ -51,8 +52,6 @@ BEGIN {
>       }
>   }
>   
> -use Data::Dumper; # fixme: remove
> -
>   use base qw(PVE::RESTHandler);
>   
>   my $opt_force_description = "Force physical removal. Without this, we simple remove the disk from the config file and create an additional configuration entry called 'unused[n]', which contains the volume ID. Unlink of unused[n] always cause physical removal.";
> @@ -3778,6 +3777,208 @@ __PACKAGE__->register_method({
>   
>       }});
>   
> +__PACKAGE__->register_method({
> +    name => 'remote_migrate_vm',
> +    path => '{vmid}/remote_migrate',
> +    method => 'POST',
> +    protected => 1,
> +    proxyto => 'node',
> +    description => "Migrate virtual machine to a remote cluster. Creates a new migration task.",
> +    permissions => {
> +	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
> +    },
> +    parameters => {
> +	additionalProperties => 0,
> +	properties => {
> +	    node => get_standard_option('pve-node'),
> +	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
> +	    'target-vmid' => get_standard_option('pve-vmid', { optional => 1 }),
> +	    'target-node' => get_standard_option('pve-node', {
> +		description => "Target node on remote cluster.",
> +            }),
> +	    'target-endpoint' => get_standard_option('proxmox-remote', {
> +		description => "Remote target endpoint",
> +	    }),
> +	    online => {
> +		type => 'boolean',
> +		description => "Use online/live migration if VM is running. Ignored if VM is stopped.",
> +		optional => 1,
> +	    },
> +	    'migration-network' => {
> +		type => 'string', format => 'CIDR',
> +		description => "CIDR of the (sub) network that is used for migration.",
> +		optional => 1,
> +	    },
> +	    'with-local-disks' => {
> +		type => 'boolean',
> +		description => "Enable live storage migration for local disk",
> +		optional => 1,
> +	    },
> +	    delete => {
> +		type => 'boolean',
> +		description => "Delete the original VM and related data after successful migration. By default the original VM is kept on the source cluster in a stopped state.",
> +		optional => 1,
> +		default => 0,
> +	    },
> +            'target-storage' => get_standard_option('pve-targetstorage', {
> +		completion => \&PVE::QemuServer::complete_migration_storage,
> +		optional => 0,
> +            }),
> +	    'target-bridge' => {
> +		type => 'string',
> +		description => "Mapping from source to target bridges. Providing only a single bridge ID maps all source bridges to that bridge. Providing the special value '1' will map each source bridge to itself.",
> +		format => 'bridge-pair-list',
> +	    },
> +	    bwlimit => {
> +		description => "Override I/O bandwidth limit (in KiB/s).",
> +		optional => 1,
> +		type => 'integer',
> +		minimum => '0',
> +		default => 'migrate limit from datacenter or storage config',
> +	    },
> +	},
> +    },
> +    returns => {
> +	type => 'string',
> +	description => "the task ID.",
> +    },
> +    code => sub {
> +	my ($param) = @_;
> +
> +	my $rpcenv = PVE::RPCEnvironment::get();
> +	my $authuser = $rpcenv->get_user();
> +
> +	my $source_vmid = extract_param($param, 'vmid');
> +	my $target_endpoint = extract_param($param, 'target-endpoint');
> +	my $target_node = extract_param($param, 'target-node');
> +	my $target_vmid = extract_param($param, 'target-vmid') // $source_vmid;
> +
> +	my $localnode = PVE::INotify::nodename();

Nit: not used (and could've been $param->{node}).

> +	my $network = extract_param($param, 'migration-network');
> +	my $delete = extract_param($param, 'delete') // 0;
> +
> +	PVE::Cluster::check_cfs_quorum();
> +
> +	raise_param_exc({ 'migration-network' => "Only root may use this option." })
> +	    if $network && $authuser ne 'root@pam';

I might be missing something obvious, but where is the migration network 
actually used down the line for the remote migration?

> +
> +	# test if VM exists
> +	my $conf = PVE::QemuConfig->load_config($source_vmid);
> +
> +	PVE::QemuConfig->check_lock($conf);
> +
> +	raise_param_exc({ vmid => "cannot migrate HA-manage VM to remote cluster" })

s/manage/managed/

> +	    if PVE::HA::Config::vm_is_ha_managed($source_vmid);
> +
> +	my $remote = PVE::JSONSchema::parse_property_string('proxmox-remote', $target_endpoint);
> +
> +	# TODO: move this as helper somewhere appropriate?
> +	my $conn_args = {
> +	    protocol => 'https',
> +	    host => $remote->{host},
> +	    port => $remote->{port} // 8006,
> +	    apitoken => $remote->{apitoken},
> +	};
> +
> +	my $fp;
> +	if ($fp = $remote->{fingerprint}) {
> +	    $conn_args->{cached_fingerprints} = { uc($fp) => 1 };
> +	}
> +
> +	print "Establishing API connection with remote at '$remote->{host}'\n";
> +
> +	my $api_client = PVE::APIClient::LWP->new(%$conn_args);
> +	my $version = $api_client->get("/version");
> +	print "remote: version '$version->{version}\n";
> +
> +	if (!defined($fp)) {
> +	    my $cert_info = $api_client->get("/nodes/$target_node/certificates/info");
> +	    foreach my $cert (@$cert_info) {
> +		$fp = $cert->{fingerprint} if $cert->{filename} ne 'pve-root-ca.pem';
> +		last if $cert->{filename} eq 'pveproxy-ssl.pem';

Not future-proof if the API call is ever extended to return an 
additional certificate which is not a valid fall-back here.

> +	    }
> +	    $conn_args->{cached_fingerprints} = { uc($fp) => 1 }
> +		if defined($fp);
> +	}
> +
> +	if (PVE::QemuServer::check_running($source_vmid)) {
> +	    die "can't migrate running VM without --online\n" if !$param->{online};
> +
> +	    my $repl_conf = PVE::ReplicationConfig->new();
> +	    my $is_replicated = $repl_conf->check_for_existing_jobs($source_vmid, 1);
> +	    die "cannot remote-migrate replicated VM\n" if $is_replicated;
> +	} else {
> +	    warn "VM isn't running. Doing offline migration instead.\n" if $param->{online};
> +	    $param->{online} = 0;
> +	}
> +
> +	# FIXME: fork worker hear to avoid timeout? or poll these periodically
> +	# in pvestatd and access cached info here? all of the below is actually
> +	# checked at the remote end anyway once we call the mtunnel endpoint,
> +	# we could also punt it to the client and not do it here at all..
> +	my $resources = $api_client->get("/cluster/resources");
> +	if (grep { defined($_->{vmid}) && $_->{vmid} eq $target_vmid } @$resources) {
> +	    raise_param_exc({ target_vmid => "Guest with ID '$target_vmid' already exists on remote cluster" });
> +	}
> +
> +	my $storages = [ grep { $_->{type} eq 'storage' && $_->{node} eq $target_node } @$resources ];
> +	my $storecfg = PVE::Storage::config();
> +	my $target_storage = extract_param($param, 'target-storage');
> +	my $storagemap = eval { PVE::JSONSchema::parse_idmap($target_storage, 'pve-storage-id') };
> +	raise_param_exc({ 'target-storage' => "failed to parse storage map: $@" })
> +	    if $@;
> +
> +	my $target_bridge = extract_param($param, 'target-bridge');
> +	my $bridgemap = eval { PVE::JSONSchema::parse_idmap($target_bridge, 'pve-bridge-id') };
> +	raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" })
> +	    if $@;
> +
> +	my $check_remote_storage = sub {
> +	    my ($storage) = @_;
> +	    my $found = [ grep { $_->{storage} eq $storage } @$storages ];
> +	    die "remote: storage '$storage' does not exist!\n"
> +		if !@$found;
> +
> +	    $found = @$found[0];
> +
> +	    my $content_types = [ PVE::Tools::split_list($found->{content}) ];
> +	    die "remote: storage '$storage' cannot store images\n"
> +		if !grep { $_ eq 'images' } @$content_types;
> +	};
> +
> +	foreach my $target_sid (values %{$storagemap->{entries}}) {
> +	    $check_remote_storage->($target_sid);
> +	}
> +
> +	$check_remote_storage->($storagemap->{default})
> +	    if $storagemap->{default};
> +
> +	# TODO: or check all referenced storages?
> +	die "remote migration requires explicit storage mapping!\n"
> +	    if $storagemap->{identity};
> +
> +	$param->{storagemap} = $storagemap;
> +	$param->{bridgemap} = $bridgemap;
> +	$param->{remote} = {
> +	    conn => $conn_args, # re-use fingerprint for tunnel
> +	    client => $api_client,
> +	    vmid => $target_vmid,
> +	};
> +	$param->{migration_type} = 'websocket';
> +	$param->{migration_network} = $network if $network;
> +	$param->{delete} = $delete if $delete;
> +
> +	my $realcmd = sub {
> +	    PVE::QemuMigrate->migrate($target_node, $remote->{host}, $source_vmid, $param);
> +	};
> +
> +	my $worker = sub {
> +	    return PVE::GuestHelpers::guest_migration_lock($source_vmid, 10, $realcmd);
> +	};
> +
> +	return $rpcenv->fork_worker('qmigrate', $source_vmid, $authuser, $worker);
> +    }});
> +
>   __PACKAGE__->register_method({
>       name => 'monitor',
>       path => '{vmid}/monitor',
> diff --git a/debian/control b/debian/control
> index 8032ae5..33e3916 100644
> --- a/debian/control
> +++ b/debian/control
> @@ -6,6 +6,7 @@ Build-Depends: debhelper (>= 12~),
>                  libglib2.0-dev,
>                  libio-multiplex-perl,
>                  libjson-c-dev,
> +               libpve-apiclient-perl,
>                  libpve-cluster-perl,
>                  libpve-common-perl (>= 6.3-3),
>                  libpve-guest-common-perl (>= 3.1-3),
> @@ -34,6 +35,7 @@ Depends: dbus,
>            libjson-xs-perl,
>            libnet-ssleay-perl,
>            libpve-access-control (>= 5.0-7),
> +         libpve-apiclient-perl,
>            libpve-cluster-perl,
>            libpve-common-perl (>= 7.0-3),
>            libpve-guest-common-perl (>= 3.1-3),
>