public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration
@ 2022-11-17 13:33 Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH container v7 1/3] migration: add " Fabian Grünbichler
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

this series adds remote migration for VMs and CTs.

both live and offline migration of VMs including NBD and
storage-migrated disks should work, containers don't have any live
migration so both offline and restart mode work identical except for the
restart part.

groundwork for extending to pvesr already laid.

uncovered (but still not fixed)
https://bugzilla.proxmox.com/show_bug.cgi?id=3873
(migration btrfs -> btrfs with snapshots)

follow-ups/todos:
- implement disk export/import for shared storages like rbd
- implement disk export/import raw+size for ZFS zvols
- extend ZFS replication via websocket tunnel to remote cluster
- extend replication to support RBD snapshot-based replication
- extend RBD replication via websocket tunnel to remote cluster
- switch regular migration SSH mtunnel to version 2 with json support
  (related -> s.hanreichs pre-/post-migrate-hook series)

new in v6:
- --with-local-disks always set and not a parameter
- `pct remote-migrate`
- new Sys.Incoming privilege + checks
- storage export taintedness bug fix
- properly take over pve-targetstorage option (qemu-server ->
  pve-common)
- review feedback addressed

new in v5: lots of edge cases fixed, PoC for pve-container, some more
helper moving for re-use in pve-container without duplication

new in v4: lots of small fixes, improved bwlimit handling, `qm` command
(thanks Fabian Ebner and Dominik Csapak for the feedback on v3!)

new in v3: lots of refactoring and edge-case handling

new in v2: dropped parts already applied, incorporated Fabian's and
Dominik's feedback (thanks!)

new in v1: explicit remote endpoint specified as part of API call
instead of remote.cfg

pve-container:

Fabian Grünbichler (3):
  migration: add remote migration
  pct: add 'remote-migrate' command
  migrate: print mapped volume in error

 debian/control         |   3 +-
 src/PVE/API2/LXC.pm    | 635 +++++++++++++++++++++++++++++++++++++++++
 src/PVE/CLI/pct.pm     | 124 ++++++++
 src/PVE/LXC/Migrate.pm | 248 +++++++++++++---
 4 files changed, 967 insertions(+), 43 deletions(-)

qemu-server:

Fabian Grünbichler (7):
  pending changes: allow skipping cloud-init
  pending: fix typo in variable name
  mtunnel: add API endpoints
  migrate: refactor remote VM/tunnel start
  migrate: add remote migration handling
  api: add remote migrate endpoint
  qm: add remote-migrate command

 PVE/API2/Qemu.pm   | 717 ++++++++++++++++++++++++++++++++++++++++++++-
 PVE/CLI/qm.pm      | 113 +++++++
 PVE/QemuMigrate.pm | 590 ++++++++++++++++++++++++++++---------
 PVE/QemuServer.pm  |  49 ++--
 debian/control     |   7 +-
 5 files changed, 1311 insertions(+), 165 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH container v7 1/3] migration: add remote migration
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH container v7 2/3] pct: add 'remote-migrate' command Fabian Grünbichler
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

modelled after the VM migration, but folded into a single commit since
the actual migration changes are a lot smaller here.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    v7:
    - fix order of parsing parameters (thanks Stefan Hanreich!)
    - add libpve-access-control dependency (for Sys.Incoming privilege, but missing
      in general)
    - bump libpve-storage-perl dependency
    v6:
    - check for Sys.Incoming in mtunnel API endpoint
    - mark as experimental
    - test_mp fix for non-snapshot calls
    
    new in v5 - PoC to ensure helpers and abstractions are re-usable
    
 debian/control         |   3 +-
 src/PVE/API2/LXC.pm    | 635 +++++++++++++++++++++++++++++++++++++++++
 src/PVE/LXC/Migrate.pm | 245 +++++++++++++---
 3 files changed, 840 insertions(+), 43 deletions(-)

diff --git a/debian/control b/debian/control
index afef317..8d18f7f 100644
--- a/debian/control
+++ b/debian/control
@@ -19,10 +19,11 @@ Section: perl
 Priority: optional
 Architecture: all
 Depends: file,
+         libpve-access-control(>= 7.2-5),
          libpve-cluster-perl,
          libpve-common-perl (>= 7.2-4),
          libpve-guest-common-perl (>= 4.1-1),
-         libpve-storage-perl (>= 6.3-8),
+         libpve-storage-perl (>= 7.2-10),
          lxc-pve,
          pve-cluster (>= 4.0-8),
          pve-ha-manager (>= 3.0-9),
diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm
index 79aecaa..03d7ea0 100644
--- a/src/PVE/API2/LXC.pm
+++ b/src/PVE/API2/LXC.pm
@@ -3,6 +3,8 @@ package PVE::API2::LXC;
 use strict;
 use warnings;
 
+use Socket qw(SOCK_STREAM);
+
 use PVE::SafeSyslog;
 use PVE::Tools qw(extract_param run_command);
 use PVE::Exception qw(raise raise_param_exc raise_perm_exc);
@@ -1092,6 +1094,174 @@ __PACKAGE__->register_method ({
     }});
 
 
+__PACKAGE__->register_method({
+    name => 'remote_migrate_vm',
+    path => '{vmid}/remote_migrate',
+    method => 'POST',
+    protected => 1,
+    proxyto => 'node',
+    description => "Migrate the container to another cluster. Creates a new migration task. EXPERIMENTAL feature!",
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
+    },
+    parameters => {
+    	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::LXC::complete_ctid }),
+	    'target-vmid' => get_standard_option('pve-vmid', { optional => 1 }),
+	    'target-endpoint' => get_standard_option('proxmox-remote', {
+		description => "Remote target endpoint",
+	    }),
+	    online => {
+		type => 'boolean',
+		description => "Use online/live migration.",
+		optional => 1,
+	    },
+	    restart => {
+		type => 'boolean',
+		description => "Use restart migration",
+		optional => 1,
+	    },
+	    timeout => {
+		type => 'integer',
+		description => "Timeout in seconds for shutdown for restart migration",
+		optional => 1,
+		default => 180,
+	    },
+	    delete => {
+		type => 'boolean',
+		description => "Delete the original CT and related data after successful migration. By default the original CT is kept on the source cluster in a stopped state.",
+		optional => 1,
+		default => 0,
+	    },
+	    'target-storage' => get_standard_option('pve-targetstorage', {
+		optional => 0,
+	    }),
+	    'target-bridge' => {
+		type => 'string',
+		description => "Mapping from source to target bridges. Providing only a single bridge ID maps all source bridges to that bridge. Providing the special value '1' will map each source bridge to itself.",
+		format => 'bridge-pair-list',
+	    },
+	    bwlimit => {
+		description => "Override I/O bandwidth limit (in KiB/s).",
+		optional => 1,
+		type => 'number',
+		minimum => '0',
+		default => 'migrate limit from datacenter or storage config',
+	    },
+	},
+    },
+    returns => {
+	type => 'string',
+	description => "the task ID.",
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $source_vmid = extract_param($param, 'vmid');
+	my $target_endpoint = extract_param($param, 'target-endpoint');
+	my $target_vmid = extract_param($param, 'target-vmid') // $source_vmid;
+
+	my $delete = extract_param($param, 'delete') // 0;
+
+	PVE::Cluster::check_cfs_quorum();
+
+	# test if CT exists
+	my $conf = PVE::LXC::Config->load_config($source_vmid);
+	PVE::LXC::Config->check_lock($conf);
+
+	# try to detect errors early
+	if (PVE::LXC::check_running($source_vmid)) {
+	    die "can't migrate running container without --online or --restart\n"
+		if !$param->{online} && !$param->{restart};
+	}
+
+	raise_param_exc({ vmid => "cannot migrate HA-managed CT to remote cluster" })
+	    if PVE::HA::Config::vm_is_ha_managed($source_vmid);
+
+	my $remote = PVE::JSONSchema::parse_property_string('proxmox-remote', $target_endpoint);
+
+	# TODO: move this as helper somewhere appropriate?
+	my $conn_args = {
+	    protocol => 'https',
+	    host => $remote->{host},
+	    port => $remote->{port} // 8006,
+	    apitoken => $remote->{apitoken},
+	};
+
+	my $fp;
+	if ($fp = $remote->{fingerprint}) {
+	    $conn_args->{cached_fingerprints} = { uc($fp) => 1 };
+	}
+
+	print "Establishing API connection with remote at '$remote->{host}'\n";
+
+	my $api_client = PVE::APIClient::LWP->new(%$conn_args);
+
+	if (!defined($fp)) {
+	    my $cert_info = $api_client->get("/nodes/localhost/certificates/info");
+	    foreach my $cert (@$cert_info) {
+		my $filename = $cert->{filename};
+		next if $filename ne 'pveproxy-ssl.pem' && $filename ne 'pve-ssl.pem';
+		$fp = $cert->{fingerprint} if !$fp || $filename eq 'pveproxy-ssl.pem';
+	    }
+	    $conn_args->{cached_fingerprints} = { uc($fp) => 1 }
+		if defined($fp);
+	}
+
+	my $storecfg = PVE::Storage::config();
+	my $target_storage = extract_param($param, 'target-storage');
+	my $storagemap = eval { PVE::JSONSchema::parse_idmap($target_storage, 'pve-storage-id') };
+	raise_param_exc({ 'target-storage' => "failed to parse storage map: $@" })
+	    if $@;
+
+	my $target_bridge = extract_param($param, 'target-bridge');
+	my $bridgemap = eval { PVE::JSONSchema::parse_idmap($target_bridge, 'pve-bridge-id') };
+	raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" })
+	    if $@;
+
+	die "remote migration requires explicit storage mapping!\n"
+	    if $storagemap->{identity};
+
+	$param->{storagemap} = $storagemap;
+	$param->{bridgemap} = $bridgemap;
+	$param->{remote} = {
+	    conn => $conn_args, # re-use fingerprint for tunnel
+	    client => $api_client,
+	    vmid => $target_vmid,
+	};
+	$param->{migration_type} = 'websocket';
+	$param->{delete} = $delete if $delete;
+
+	my $cluster_status = $api_client->get("/cluster/status");
+	my $target_node;
+	foreach my $entry (@$cluster_status) {
+	    next if $entry->{type} ne 'node';
+	    if ($entry->{local}) {
+		$target_node = $entry->{name};
+		last;
+	    }
+	}
+
+	die "couldn't determine endpoint's node name\n"
+	    if !defined($target_node);
+
+	my $realcmd = sub {
+	    PVE::LXC::Migrate->migrate($target_node, $remote->{host}, $source_vmid, $param);
+	};
+
+	my $worker = sub {
+	    return PVE::GuestHelpers::guest_migration_lock($source_vmid, 10, $realcmd);
+	};
+
+	return $rpcenv->fork_worker('vzmigrate', $source_vmid, $authuser, $worker);
+    }});
+
+
 __PACKAGE__->register_method({
     name => 'migrate_vm',
     path => '{vmid}/migrate',
@@ -2321,4 +2491,469 @@ __PACKAGE__->register_method({
 	return PVE::GuestHelpers::config_with_pending_array($conf, $pending_delete_hash);
     }});
 
+__PACKAGE__->register_method({
+    name => 'mtunnel',
+    path => '{vmid}/mtunnel',
+    method => 'POST',
+    protected => 1,
+    description => 'Migration tunnel endpoint - only for internal use by CT migration.',
+    permissions => {
+	check =>
+	[ 'and',
+	  ['perm', '/vms/{vmid}', [ 'VM.Allocate' ]],
+	  ['perm', '/', [ 'Sys.Incoming' ]],
+	],
+	description => "You need 'VM.Allocate' permissions on '/vms/{vmid}' and Sys.Incoming" .
+	               " on '/'. Further permission checks happen during the actual migration.",
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid'),
+	    storages => {
+		type => 'string',
+		format => 'pve-storage-id-list',
+		optional => 1,
+		description => 'List of storages to check permission and availability. Will be checked again for all actually used storages during migration.',
+	    },
+	    bridges => {
+		type => 'string',
+		format => 'pve-bridge-id-list',
+		optional => 1,
+		description => 'List of network bridges to check availability. Will be checked again for actually used bridges during migration.',
+	    },
+	},
+    },
+    returns => {
+	additionalProperties => 0,
+	properties => {
+	    upid => { type => 'string' },
+	    ticket => { type => 'string' },
+	    socket => { type => 'string' },
+	},
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $node = extract_param($param, 'node');
+	my $vmid = extract_param($param, 'vmid');
+
+	my $storages = extract_param($param, 'storages');
+	my $bridges = extract_param($param, 'bridges');
+
+	my $nodename = PVE::INotify::nodename();
+
+	raise_param_exc({ node => "node needs to be 'localhost' or local hostname '$nodename'" })
+	    if $node ne 'localhost' && $node ne $nodename;
+
+	$node = $nodename;
+
+	my $storecfg = PVE::Storage::config();
+	foreach my $storeid (PVE::Tools::split_list($storages)) {
+	    $check_storage_access_migrate->($rpcenv, $authuser, $storecfg, $storeid, $node);
+	}
+
+	foreach my $bridge (PVE::Tools::split_list($bridges)) {
+	    PVE::Network::read_bridge_mtu($bridge);
+	}
+
+	PVE::Cluster::check_cfs_quorum();
+
+	my $socket_addr = "/run/pve/ct-$vmid.mtunnel";
+
+	my $lock = 'create';
+	eval { PVE::LXC::Config->create_and_lock_config($vmid, 0, $lock); };
+
+	raise_param_exc({ vmid => "unable to create empty CT config - $@"})
+	    if $@;
+
+	my $realcmd = sub {
+	    my $state = {
+		storecfg => PVE::Storage::config(),
+		lock => $lock,
+		vmid => $vmid,
+	    };
+
+	    my $run_locked = sub {
+		my ($code, $params) = @_;
+		return PVE::LXC::Config->lock_config($state->{vmid}, sub {
+		    my $conf = PVE::LXC::Config->load_config($state->{vmid});
+
+		    $state->{conf} = $conf;
+
+		    die "Encountered wrong lock - aborting mtunnel command handling.\n"
+			if $state->{lock} && !PVE::LXC::Config->has_lock($conf, $state->{lock});
+
+		    return $code->($params);
+		});
+	    };
+
+	    my $cmd_desc = {
+		config => {
+		    conf => {
+			type => 'string',
+			description => 'Full CT config, adapted for target cluster/node',
+		    },
+		    'firewall-config' => {
+			type => 'string',
+			description => 'CT firewall config',
+			optional => 1,
+		    },
+		},
+		ticket => {
+		    path => {
+			type => 'string',
+			description => 'socket path for which the ticket should be valid. must be known to current mtunnel instance.',
+		    },
+		},
+		quit => {
+		    cleanup => {
+			type => 'boolean',
+			description => 'remove CT config and volumes, aborting migration',
+			default => 0,
+		    },
+		},
+		'disk-import' => $PVE::StorageTunnel::cmd_schema->{'disk-import'},
+		'query-disk-import' => $PVE::StorageTunnel::cmd_schema->{'query-disk-import'},
+		bwlimit => $PVE::StorageTunnel::cmd_schema->{bwlimit},
+	    };
+
+	    my $cmd_handlers = {
+		'version' => sub {
+		    # compared against other end's version
+		    # bump/reset for breaking changes
+		    # bump/bump for opt-in changes
+		    return {
+			api => $PVE::LXC::Migrate::WS_TUNNEL_VERSION,
+			age => 0,
+		    };
+		},
+		'config' => sub {
+		    my ($params) = @_;
+
+		    # parse and write out VM FW config if given
+		    if (my $fw_conf = $params->{'firewall-config'}) {
+			my ($path, $fh) = PVE::Tools::tempfile_contents($fw_conf, 700);
+
+			my $empty_conf = {
+			    rules => [],
+			    options => {},
+			    aliases => {},
+			    ipset => {} ,
+			    ipset_comments => {},
+			};
+			my $cluster_fw_conf = PVE::Firewall::load_clusterfw_conf();
+
+			# TODO: add flag for strict parsing?
+			# TODO: add import sub that does all this given raw content?
+			my $vmfw_conf = PVE::Firewall::generic_fw_config_parser($path, $cluster_fw_conf, $empty_conf, 'vm');
+			$vmfw_conf->{vmid} = $state->{vmid};
+			PVE::Firewall::save_vmfw_conf($state->{vmid}, $vmfw_conf);
+
+			$state->{cleanup}->{fw} = 1;
+		    }
+
+		    my $conf_fn = "incoming/lxc/$state->{vmid}.conf";
+		    my $new_conf = PVE::LXC::Config::parse_pct_config($conf_fn, $params->{conf}, 1);
+		    delete $new_conf->{lock};
+		    delete $new_conf->{digest};
+
+		    my $unprivileged = delete $new_conf->{unprivileged};
+		    my $arch = delete $new_conf->{arch};
+
+		    # TODO handle properly?
+		    delete $new_conf->{snapshots};
+		    delete $new_conf->{parent};
+		    delete $new_conf->{pending};
+		    delete $new_conf->{lxc};
+
+		    PVE::LXC::Config->remove_lock($state->{vmid}, 'create');
+
+		    eval {
+			my $conf = {
+			    unprivileged => $unprivileged,
+			    arch => $arch,
+			};
+			PVE::LXC::check_ct_modify_config_perm(
+			    $rpcenv,
+			    $authuser,
+			    $state->{vmid},
+			    undef,
+			    $conf,
+			    $new_conf,
+			    undef,
+			    $unprivileged,
+			);
+			my $errors = PVE::LXC::Config->update_pct_config(
+			    $state->{vmid},
+			    $conf,
+			    0,
+			    $new_conf,
+			    [],
+			    [],
+			);
+			raise_param_exc($errors) if scalar(keys %$errors);
+			PVE::LXC::Config->write_config($state->{vmid}, $conf);
+			PVE::LXC::update_lxc_config($vmid, $conf);
+		    };
+		    if (my $err = $@) {
+			# revert to locked previous config
+			my $conf = PVE::LXC::Config->load_config($state->{vmid});
+			$conf->{lock} = 'create';
+			PVE::LXC::Config->write_config($state->{vmid}, $conf);
+
+			die $err;
+		    }
+
+		    my $conf = PVE::LXC::Config->load_config($state->{vmid});
+		    $conf->{lock} = 'migrate';
+		    PVE::LXC::Config->write_config($state->{vmid}, $conf);
+
+		    $state->{lock} = 'migrate';
+
+		    return;
+		},
+		'bwlimit' => sub {
+		    my ($params) = @_;
+		    return PVE::StorageTunnel::handle_bwlimit($params);
+		},
+		'disk-import' => sub {
+		    my ($params) = @_;
+
+		    $check_storage_access_migrate->(
+			$rpcenv,
+			$authuser,
+			$state->{storecfg},
+			$params->{storage},
+			$node
+		    );
+
+		    $params->{unix} = "/run/pve/ct-$state->{vmid}.storage";
+
+		    return PVE::StorageTunnel::handle_disk_import($state, $params);
+		},
+		'query-disk-import' => sub {
+		    my ($params) = @_;
+
+		    return PVE::StorageTunnel::handle_query_disk_import($state, $params);
+		},
+		'unlock' => sub {
+		    PVE::LXC::Config->remove_lock($state->{vmid}, $state->{lock});
+		    delete $state->{lock};
+		    return;
+		},
+		'start' => sub {
+		    PVE::LXC::vm_start(
+			$state->{vmid},
+			$state->{conf},
+			0
+		    );
+
+		    return;
+		},
+		'stop' => sub {
+		    PVE::LXC::vm_stop($state->{vmid}, 1, 10, 1);
+		    return;
+		},
+		'ticket' => sub {
+		    my ($params) = @_;
+
+		    my $path = $params->{path};
+
+		    die "Not allowed to generate ticket for unknown socket '$path'\n"
+			if !defined($state->{sockets}->{$path});
+
+		    return { ticket => PVE::AccessControl::assemble_tunnel_ticket($authuser, "/socket/$path") };
+		},
+		'quit' => sub {
+		    my ($params) = @_;
+
+		    if ($params->{cleanup}) {
+			if ($state->{cleanup}->{fw}) {
+			    PVE::Firewall::remove_vmfw_conf($state->{vmid});
+			}
+
+			for my $volid (keys $state->{cleanup}->{volumes}->%*) {
+			    print "freeing volume '$volid' as part of cleanup\n";
+			    eval { PVE::Storage::vdisk_free($state->{storecfg}, $volid) };
+			    warn $@ if $@;
+			}
+
+			PVE::LXC::destroy_lxc_container(
+			    $state->{storecfg},
+			    $state->{vmid},
+			    $state->{conf},
+			    undef,
+			    0,
+			);
+		    }
+
+		    print "switching to exit-mode, waiting for client to disconnect\n";
+		    $state->{exit} = 1;
+		    return;
+		},
+	    };
+
+	    $run_locked->(sub {
+		my $socket_addr = "/run/pve/ct-$state->{vmid}.mtunnel";
+		unlink $socket_addr;
+
+		$state->{socket} = IO::Socket::UNIX->new(
+	            Type => SOCK_STREAM(),
+		    Local => $socket_addr,
+		    Listen => 1,
+		);
+
+		$state->{socket_uid} = getpwnam('www-data')
+		    or die "Failed to resolve user 'www-data' to numeric UID\n";
+		chown $state->{socket_uid}, -1, $socket_addr;
+	    });
+
+	    print "mtunnel started\n";
+
+	    my $conn = eval { PVE::Tools::run_with_timeout(300, sub { $state->{socket}->accept() }) };
+	    if ($@) {
+		warn "Failed to accept tunnel connection - $@\n";
+
+		warn "Removing tunnel socket..\n";
+		unlink $state->{socket};
+
+		warn "Removing temporary VM config..\n";
+		$run_locked->(sub {
+		    PVE::LXC::destroy_config($state->{vmid});
+		});
+
+		die "Exiting mtunnel\n";
+	    }
+
+	    $state->{conn} = $conn;
+
+	    my $reply_err = sub {
+		my ($msg) = @_;
+
+		my $reply = JSON::encode_json({
+		    success => JSON::false,
+		    msg => $msg,
+		});
+		$conn->print("$reply\n");
+		$conn->flush();
+	    };
+
+	    my $reply_ok = sub {
+		my ($res) = @_;
+
+		$res->{success} = JSON::true;
+		my $reply = JSON::encode_json($res);
+		$conn->print("$reply\n");
+		$conn->flush();
+	    };
+
+	    while (my $line = <$conn>) {
+		chomp $line;
+
+		# untaint, we validate below if needed
+		($line) = $line =~ /^(.*)$/;
+		my $parsed = eval { JSON::decode_json($line) };
+		if ($@) {
+		    $reply_err->("failed to parse command - $@");
+		    next;
+		}
+
+		my $cmd = delete $parsed->{cmd};
+		if (!defined($cmd)) {
+		    $reply_err->("'cmd' missing");
+		} elsif ($state->{exit}) {
+		    $reply_err->("tunnel is in exit-mode, processing '$cmd' cmd not possible");
+		    next;
+		} elsif (my $handler = $cmd_handlers->{$cmd}) {
+		    print "received command '$cmd'\n";
+		    eval {
+			if ($cmd_desc->{$cmd}) {
+			    PVE::JSONSchema::validate($parsed, $cmd_desc->{$cmd});
+			} else {
+			    $parsed = {};
+			}
+			my $res = $run_locked->($handler, $parsed);
+			$reply_ok->($res);
+		    };
+		    $reply_err->("failed to handle '$cmd' command - $@")
+			if $@;
+		} else {
+		    $reply_err->("unknown command '$cmd' given");
+		}
+	    }
+
+	    if ($state->{exit}) {
+		print "mtunnel exited\n";
+	    } else {
+		die "mtunnel exited unexpectedly\n";
+	    }
+	};
+
+	my $ticket = PVE::AccessControl::assemble_tunnel_ticket($authuser, "/socket/$socket_addr");
+	my $upid = $rpcenv->fork_worker('vzmtunnel', $vmid, $authuser, $realcmd);
+
+	return {
+	    ticket => $ticket,
+	    upid => $upid,
+	    socket => $socket_addr,
+	};
+    }});
+
+__PACKAGE__->register_method({
+    name => 'mtunnelwebsocket',
+    path => '{vmid}/mtunnelwebsocket',
+    method => 'GET',
+    permissions => {
+	description => "You need to pass a ticket valid for the selected socket. Tickets can be created via the mtunnel API call, which will check permissions accordingly.",
+        user => 'all', # check inside
+    },
+    description => 'Migration tunnel endpoint for websocket upgrade - only for internal use by VM migration.',
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid'),
+	    socket => {
+		type => "string",
+		description => "unix socket to forward to",
+	    },
+	    ticket => {
+		type => "string",
+		description => "ticket return by initial 'mtunnel' API call, or retrieved via 'ticket' tunnel command",
+	    },
+	},
+    },
+    returns => {
+	type => "object",
+	properties => {
+	    port => { type => 'string', optional => 1 },
+	    socket => { type => 'string', optional => 1 },
+	},
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $nodename = PVE::INotify::nodename();
+	my $node = extract_param($param, 'node');
+
+	raise_param_exc({ node => "node needs to be 'localhost' or local hostname '$nodename'" })
+	    if $node ne 'localhost' && $node ne $nodename;
+
+	my $vmid = $param->{vmid};
+	# check VM exists
+	PVE::LXC::Config->load_config($vmid);
+
+	my $socket = $param->{socket};
+	PVE::AccessControl::verify_tunnel_ticket($param->{ticket}, $authuser, "/socket/$socket");
+
+	return { socket => $socket };
+    }});
 1;
diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index 5c5dcbe..82305c0 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -20,6 +20,9 @@ use PVE::LXC;
 use PVE::AbstractMigrate;
 use base qw(PVE::AbstractMigrate);
 
+# compared against remote end's minimum version
+our $WS_TUNNEL_VERSION = 2;
+
 sub lock_vm {
     my ($self, $vmid, $code, @param) = @_;
 
@@ -31,6 +34,7 @@ sub prepare {
 
     my $online = $self->{opts}->{online};
     my $restart= $self->{opts}->{restart};
+    my $remote = $self->{opts}->{remote};
 
     $self->{storecfg} = PVE::Storage::config();
 
@@ -47,6 +51,7 @@ sub prepare {
     }
     $self->{was_running} = $running;
 
+    my $storages = {};
     PVE::LXC::Config->foreach_volume_full($conf, { include_unused => 1 }, sub {
 	my ($ms, $mountpoint) = @_;
 
@@ -73,7 +78,7 @@ sub prepare {
 	die "content type 'rootdir' is not available on storage '$storage'\n"
 	    if !$scfg->{content}->{rootdir};
 
-	if ($scfg->{shared}) {
+	if ($scfg->{shared} && !$remote) {
 	    # PVE::Storage::activate_storage checks this for non-shared storages
 	    my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
 	    warn "Used shared storage '$storage' is not online on source node!\n"
@@ -86,18 +91,63 @@ sub prepare {
 	    $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $storage);
 	}
 
-	my $target_scfg = PVE::Storage::storage_check_enabled($self->{storecfg}, $targetsid, $self->{node});
+	if (!$remote) {
+	    my $target_scfg = PVE::Storage::storage_check_enabled($self->{storecfg}, $targetsid, $self->{node});
+
+	    die "$volid: content type 'rootdir' is not available on storage '$targetsid'\n"
+		if !$target_scfg->{content}->{rootdir};
+	}
 
-	die "$volid: content type 'rootdir' is not available on storage '$targetsid'\n"
-	    if !$target_scfg->{content}->{rootdir};
+	$storages->{$targetsid} = 1;
     });
 
     # todo: test if VM uses local resources
 
-    # test ssh connection
-    my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
-    eval { $self->cmd_quiet($cmd); };
-    die "Can't connect to destination address using public key\n" if $@;
+    if ($remote) {
+	# test & establish websocket connection
+	my $bridges = map_bridges($conf, $self->{opts}->{bridgemap}, 1);
+
+	my $remote = $self->{opts}->{remote};
+	my $conn = $remote->{conn};
+
+	my $log = sub {
+	    my ($level, $msg) = @_;
+	    $self->log($level, $msg);
+	};
+
+	my $websocket_url = "https://$conn->{host}:$conn->{port}/api2/json/nodes/$self->{node}/lxc/$remote->{vmid}/mtunnelwebsocket";
+	my $url = "/nodes/$self->{node}/lxc/$remote->{vmid}/mtunnel";
+
+	my $tunnel_params = {
+	    url => $websocket_url,
+	};
+
+	my $storage_list = join(',', keys %$storages);
+	my $bridge_list = join(',', keys %$bridges);
+
+	my $req_params = {
+	    storages => $storage_list,
+	    bridges => $bridge_list,
+	};
+
+	my $tunnel = PVE::Tunnel::fork_websocket_tunnel($conn, $url, $req_params, $tunnel_params, $log);
+	my $min_version = $tunnel->{version} - $tunnel->{age};
+	$self->log('info', "local WS tunnel version: $WS_TUNNEL_VERSION");
+	$self->log('info', "remote WS tunnel version: $tunnel->{version}");
+	$self->log('info', "minimum required WS tunnel version: $min_version");
+	die "Remote tunnel endpoint not compatible, upgrade required\n"
+	    if $WS_TUNNEL_VERSION < $min_version;
+	 die "Remote tunnel endpoint too old, upgrade required\n"
+	    if $WS_TUNNEL_VERSION > $tunnel->{version};
+
+	$self->log('info', "websocket tunnel started\n");
+	$self->{tunnel} = $tunnel;
+    } else {
+	# test ssh connection
+	my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
+	eval { $self->cmd_quiet($cmd); };
+	die "Can't connect to destination address using public key\n" if $@;
+    }
 
     # in restart mode, we shutdown the container before migrating
     if ($restart && $running) {
@@ -116,6 +166,8 @@ sub prepare {
 sub phase1 {
     my ($self, $vmid) = @_;
 
+    my $remote = $self->{opts}->{remote};
+
     $self->log('info', "starting migration of CT $self->{vmid} to node '$self->{node}' ($self->{nodeip})");
 
     my $conf = $self->{vmconf};
@@ -150,7 +202,7 @@ sub phase1 {
 
 	my $targetsid = $sid;
 
-	if ($scfg->{shared}) {
+	if ($scfg->{shared} && !$remote) {
 	    $self->log('info', "volume '$volid' is on shared storage '$sid'")
 		if !$snapname;
 	    return;
@@ -158,7 +210,8 @@ sub phase1 {
 	    $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $sid);
 	}
 
-	PVE::Storage::storage_check_enabled($self->{storecfg}, $targetsid, $self->{node});
+	PVE::Storage::storage_check_enabled($self->{storecfg}, $targetsid, $self->{node})
+	    if !$remote;
 
 	my $bwlimit = $self->get_bwlimit($sid, $targetsid);
 
@@ -195,6 +248,9 @@ sub phase1 {
 
 	eval {
 	    &$test_volid($volid, $snapname);
+
+	    die "remote migration with snapshots not supported yet\n"
+		if $remote && $snapname;
 	};
 
 	&$log_error($@, $volid) if $@;
@@ -204,7 +260,7 @@ sub phase1 {
     my @sids = PVE::Storage::storage_ids($self->{storecfg});
     foreach my $storeid (@sids) {
 	my $scfg = PVE::Storage::storage_config($self->{storecfg}, $storeid);
-	next if $scfg->{shared};
+	next if $scfg->{shared} && !$remote;
 	next if !PVE::Storage::storage_check_enabled($self->{storecfg}, $storeid, undef, 1);
 
 	# get list from PVE::Storage (for unreferenced volumes)
@@ -214,10 +270,12 @@ sub phase1 {
 
 	# check if storage is available on target node
 	my $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $storeid);
-	my $target_scfg = PVE::Storage::storage_check_enabled($self->{storecfg}, $targetsid, $self->{node});
+	if (!$remote) {
+	    my $target_scfg = PVE::Storage::storage_check_enabled($self->{storecfg}, $targetsid, $self->{node});
 
-	die "content type 'rootdir' is not available on storage '$targetsid'\n"
-	    if !$target_scfg->{content}->{rootdir};
+	    die "content type 'rootdir' is not available on storage '$targetsid'\n"
+		if !$target_scfg->{content}->{rootdir};
+	}
 
 	PVE::Storage::foreach_volid($dl, sub {
 	    my ($volid, $sid, $volname) = @_;
@@ -243,12 +301,21 @@ sub phase1 {
 	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 	    my $scfg =  PVE::Storage::storage_config($self->{storecfg}, $sid);
 
-	    my $migratable = ($scfg->{type} eq 'dir') || ($scfg->{type} eq 'zfspool')
-		|| ($scfg->{type} eq 'lvmthin') || ($scfg->{type} eq 'lvm')
-		|| ($scfg->{type} eq 'btrfs');
+	    # TODO move to storage plugin layer?
+	    my $migratable_storages = [
+		'dir',
+		'zfspool',
+		'lvmthin',
+		'lvm',
+		'btrfs',
+	    ];
+	    if ($remote) {
+		push @$migratable_storages, 'cifs';
+		push @$migratable_storages, 'nfs';
+	    }
 
 	    die "storage type '$scfg->{type}' not supported\n"
-		if !$migratable;
+		if !grep { $_ eq $scfg->{type} } @$migratable_storages;
 
 	    # image is a linked clone on local storage, se we can't migrate.
 	    if (my $basename = (PVE::Storage::parse_volname($self->{storecfg}, $volid))[3]) {
@@ -283,7 +350,10 @@ sub phase1 {
 
     my $rep_cfg = PVE::ReplicationConfig->new();
 
-    if (my $jobcfg = $rep_cfg->find_local_replication_job($vmid, $self->{node})) {
+    if ($remote) {
+	die "cannot remote-migrate replicated VM\n"
+	    if $rep_cfg->check_for_existing_jobs($vmid, 1);
+    } elsif (my $jobcfg = $rep_cfg->find_local_replication_job($vmid, $self->{node})) {
 	die "can't live migrate VM with replicated volumes\n" if $self->{running};
 	my $start_time = time();
 	my $logfunc = sub { my ($msg) = @_;  $self->log('info', $msg); };
@@ -294,7 +364,6 @@ sub phase1 {
     my $opts = $self->{opts};
     foreach my $volid (keys %$volhash) {
 	next if $rep_volumes->{$volid};
-	my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 	push @{$self->{volumes}}, $volid;
 
 	# JSONSchema and get_bandwidth_limit use kbps - storage_migrate bps
@@ -304,22 +373,39 @@ sub phase1 {
 	my $targetsid = $volhash->{$volid}->{targetsid};
 
 	my $new_volid = eval {
-	    my $storage_migrate_opts = {
-		'ratelimit_bps' => $bwlimit,
-		'insecure' => $opts->{migration_type} eq 'insecure',
-		'with_snapshots' => $volhash->{$volid}->{snapshots},
-		'allow_rename' => 1,
-	    };
-
-	    my $logfunc = sub { $self->log('info', $_[0]); };
-	    return PVE::Storage::storage_migrate(
-		$self->{storecfg},
-		$volid,
-		$self->{ssh_info},
-		$targetsid,
-		$storage_migrate_opts,
-		$logfunc,
-	    );
+	    if ($remote) {
+		my $log = sub {
+		    my ($level, $msg) = @_;
+		    $self->log($level, $msg);
+		};
+
+		return PVE::StorageTunnel::storage_migrate(
+		    $self->{tunnel},
+		    $self->{storecfg},
+		    $volid,
+		    $self->{vmid},
+		    $remote->{vmid},
+		    $volhash->{$volid},
+		    $log,
+		);
+	    } else {
+		my $storage_migrate_opts = {
+		    'ratelimit_bps' => $bwlimit,
+		    'insecure' => $opts->{migration_type} eq 'insecure',
+		    'with_snapshots' => $volhash->{$volid}->{snapshots},
+		    'allow_rename' => 1,
+		};
+
+		my $logfunc = sub { $self->log('info', $_[0]); };
+		return PVE::Storage::storage_migrate(
+		    $self->{storecfg},
+		    $volid,
+		    $self->{ssh_info},
+		    $targetsid,
+		    $storage_migrate_opts,
+		    $logfunc,
+		);
+	    }
 	};
 
 	if (my $err = $@) {
@@ -349,13 +435,38 @@ sub phase1 {
     my $vollist = PVE::LXC::Config->get_vm_volumes($conf);
     PVE::Storage::deactivate_volumes($self->{storecfg}, $vollist);
 
-    # transfer replication state before moving config
-    $self->transfer_replication_state() if $rep_volumes;
-    PVE::LXC::Config->update_volume_ids($conf, $self->{volume_map});
-    PVE::LXC::Config->write_config($vmid, $conf);
-    PVE::LXC::Config->move_config_to_node($vmid, $self->{node});
+    if ($remote) {
+	my $remote_conf = PVE::LXC::Config->load_config($vmid);
+	PVE::LXC::Config->update_volume_ids($remote_conf, $self->{volume_map});
+
+	my $bridges = map_bridges($remote_conf, $self->{opts}->{bridgemap});
+	for my $target (keys $bridges->%*) {
+	    for my $nic (keys $bridges->{$target}->%*) {
+		$self->log('info', "mapped: $nic from $bridges->{$target}->{$nic} to $target");
+	    }
+	}
+	my $conf_str = PVE::LXC::Config::write_pct_config("remote", $remote_conf);
+
+	# TODO expose in PVE::Firewall?
+	my $vm_fw_conf_path = "/etc/pve/firewall/$vmid.fw";
+	my $fw_conf_str;
+	$fw_conf_str = PVE::Tools::file_get_contents($vm_fw_conf_path)
+	    if -e $vm_fw_conf_path;
+	my $params = {
+	    conf => $conf_str,
+	    'firewall-config' => $fw_conf_str,
+	};
+
+	PVE::Tunnel::write_tunnel($self->{tunnel}, 10, 'config', $params);
+    } else {
+	# transfer replication state before moving config
+	$self->transfer_replication_state() if $rep_volumes;
+	PVE::LXC::Config->update_volume_ids($conf, $self->{volume_map});
+	PVE::LXC::Config->write_config($vmid, $conf);
+	PVE::LXC::Config->move_config_to_node($vmid, $self->{node});
+	$self->switch_replication_job_target() if $rep_volumes;
+    }
     $self->{conf_migrated} = 1;
-    $self->switch_replication_job_target() if $rep_volumes;
 }
 
 sub phase1_cleanup {
@@ -369,6 +480,12 @@ sub phase1_cleanup {
 	    # fixme: try to remove ?
 	}
     }
+
+    if ($self->{opts}->{remote}) {
+	# cleans up remote volumes
+	PVE::Tunnel::finish_tunnel($self->{tunnel}, 1);
+	delete $self->{tunnel};
+    }
 }
 
 sub phase3 {
@@ -376,6 +493,9 @@ sub phase3 {
 
     my $volids = $self->{volumes};
 
+    # handled below in final_cleanup
+    return if $self->{opts}->{remote};
+
     # destroy local copies
     foreach my $volid (@$volids) {
 	eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); };
@@ -403,6 +523,24 @@ sub final_cleanup {
 	    my $skiplock = 1;
 	    PVE::LXC::vm_start($vmid, $self->{vmconf}, $skiplock);
 	}
+    } elsif ($self->{opts}->{remote}) {
+	eval { PVE::Tunnel::write_tunnel($self->{tunnel}, 10, 'unlock') };
+	$self->log('err', "Failed to clear migrate lock - $@\n") if $@;
+
+	if ($self->{opts}->{restart} && $self->{was_running}) {
+	    $self->log('info', "start container on target node");
+	    PVE::Tunnel::write_tunnel($self->{tunnel}, 60, 'start');
+	}
+	if ($self->{opts}->{delete}) {
+	    PVE::LXC::destroy_lxc_container(
+		PVE::Storage::config(),
+		$vmid,
+		PVE::LXC::Config->load_config($vmid),
+		undef,
+		0,
+	    );
+	}
+	PVE::Tunnel::finish_tunnel($self->{tunnel});
     } else {
 	my $cmd = [ @{$self->{rem_ssh}}, 'pct', 'unlock', $vmid ];
 	$self->cmd_logerr($cmd, errmsg => "failed to clear migrate lock");
@@ -414,7 +552,30 @@ sub final_cleanup {
 	    $self->cmd($cmd);
 	}
     }
+}
+
+sub map_bridges {
+    my ($conf, $map, $scan_only) = @_;
+
+    my $bridges = {};
+
+    foreach my $opt (keys %$conf) {
+	next if $opt !~ m/^net\d+$/;
+
+	next if !$conf->{$opt};
+	my $d = PVE::LXC::Config->parse_lxc_network($conf->{$opt});
+	next if !$d || !$d->{bridge};
+
+	my $target_bridge = PVE::JSONSchema::map_id($map, $d->{bridge});
+	$bridges->{$target_bridge}->{$opt} = $d->{bridge};
+
+	next if $scan_only;
+
+	$d->{bridge} = $target_bridge;
+	$conf->{$opt} = PVE::LXC::Config->print_lxc_network($d);
+    }
 
+    return $bridges;
 }
 
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH container v7 2/3] pct: add 'remote-migrate' command
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH container v7 1/3] migration: add " Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH container v7 3/3] migrate: print mapped volume in error Fabian Grünbichler
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

works the same as `qm remote-migrate`, with the addition of `--restart`
and `--timeout` parameters.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    v6: new

 src/PVE/CLI/pct.pm | 124 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm
index 23793ee..3ade2ba 100755
--- a/src/PVE/CLI/pct.pm
+++ b/src/PVE/CLI/pct.pm
@@ -10,6 +10,7 @@ use POSIX;
 use PVE::CLIHandler;
 use PVE::Cluster;
 use PVE::CpuSet;
+use PVE::Exception qw(raise_param_exc);
 use PVE::GuestHelpers;
 use PVE::INotify;
 use PVE::JSONSchema qw(get_standard_option);
@@ -803,6 +804,128 @@ __PACKAGE__->register_method ({
 	return undef;
     }});
 
+
+__PACKAGE__->register_method({
+    name => 'remote_migrate_vm',
+    path => 'remote_migrate_vm',
+    method => 'POST',
+    description => "Migrate container to a remote cluster. Creates a new migration task. EXPERIMENTAL feature!",
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
+	    'target-vmid' => get_standard_option('pve-vmid', { optional => 1 }),
+	    'target-endpoint' => get_standard_option('proxmox-remote', {
+		description => "Remote target endpoint",
+	    }),
+	    online => {
+		type => 'boolean',
+		description => "Use online/live migration.",
+		optional => 1,
+	    },
+	    restart => {
+		type => 'boolean',
+		description => "Use restart migration",
+		optional => 1,
+	    },
+	    timeout => {
+		type => 'integer',
+		description => "Timeout in seconds for shutdown for restart migration",
+		optional => 1,
+		default => 180,
+	    },
+	    delete => {
+		type => 'boolean',
+		description => "Delete the original CT and related data after successful migration. By default the original CT is kept on the source cluster in a stopped state.",
+		optional => 1,
+		default => 0,
+	    },
+	    'target-storage' => get_standard_option('pve-targetstorage', {
+		completion => \&PVE::QemuServer::complete_migration_storage,
+		optional => 0,
+	    }),
+	    'target-bridge' => {
+		type => 'string',
+		description => "Mapping from source to target bridges. Providing only a single bridge ID maps all source bridges to that bridge. Providing the special value '1' will map each source bridge to itself.",
+		format => 'bridge-pair-list',
+	    },
+	    bwlimit => {
+		description => "Override I/O bandwidth limit (in KiB/s).",
+		optional => 1,
+		type => 'integer',
+		minimum => '0',
+		default => 'migrate limit from datacenter or storage config',
+	    },
+	},
+    },
+    returns => {
+	type => 'string',
+	description => "the task ID.",
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $source_vmid = $param->{vmid};
+	my $target_endpoint = $param->{'target-endpoint'};
+	my $target_vmid = $param->{'target-vmid'} // $source_vmid;
+
+	my $remote = PVE::JSONSchema::parse_property_string('proxmox-remote', $target_endpoint);
+
+	# TODO: move this as helper somewhere appropriate?
+	my $conn_args = {
+	    protocol => 'https',
+	    host => $remote->{host},
+	    port => $remote->{port} // 8006,
+	    apitoken => $remote->{apitoken},
+	};
+
+	$conn_args->{cached_fingerprints} = { uc($remote->{fingerprint}) => 1 }
+	    if defined($remote->{fingerprint});
+
+	my $api_client = PVE::APIClient::LWP->new(%$conn_args);
+	my $resources = $api_client->get("/cluster/resources", { type => 'vm' });
+	if (grep { defined($_->{vmid}) && $_->{vmid} eq $target_vmid } @$resources) {
+	    raise_param_exc({ target_vmid => "Guest with ID '$target_vmid' already exists on remote cluster" });
+	}
+
+	my $storages = $api_client->get("/nodes/localhost/storage", { enabled => 1 });
+
+	my $storecfg = PVE::Storage::config();
+	my $target_storage = $param->{'target-storage'};
+	my $storagemap = eval { PVE::JSONSchema::parse_idmap($target_storage, 'pve-storage-id') };
+	raise_param_exc({ 'target-storage' => "failed to parse storage map: $@" })
+	    if $@;
+
+	my $check_remote_storage = sub {
+	    my ($storage) = @_;
+	    my $found = [ grep { $_->{storage} eq $storage } @$storages ];
+	    die "remote: storage '$storage' does not exist!\n"
+		if !@$found;
+
+	    $found = @$found[0];
+
+	    my $content_types = [ PVE::Tools::split_list($found->{content}) ];
+	    die "remote: storage '$storage' cannot store CT rootdir\n"
+		if !grep { $_ eq 'rootdir' } @$content_types;
+	};
+
+	foreach my $target_sid (values %{$storagemap->{entries}}) {
+	    $check_remote_storage->($target_sid);
+	}
+
+	$check_remote_storage->($storagemap->{default})
+	    if $storagemap->{default};
+
+	return PVE::API2::LXC->remote_migrate_vm($param);
+    }});
+
 our $cmddef = {
     list=> [ 'PVE::API2::LXC', 'vmlist', [], { node => $nodename }, sub {
 	my $res = shift;
@@ -851,6 +974,7 @@ our $cmddef = {
     migrate => [ "PVE::API2::LXC", 'migrate_vm', ['vmid', 'target'], { node => $nodename }, $upid_exit],
     'move-volume' => [ "PVE::API2::LXC", 'move_volume', ['vmid', 'volume', 'storage', 'target-vmid', 'target-volume'], { node => $nodename }, $upid_exit ],
     move_volume => { alias => 'move-volume' },
+    'remote-migrate' => [ __PACKAGE__, 'remote_migrate_vm', ['vmid', 'target-vmid', 'target-endpoint'], { node => $nodename }, $upid_exit ],
 
     snapshot => [ "PVE::API2::LXC::Snapshot", 'snapshot', ['vmid', 'snapname'], { node => $nodename } , $upid_exit ],
     delsnapshot => [ "PVE::API2::LXC::Snapshot", 'delsnapshot', ['vmid', 'snapname'], { node => $nodename } , $upid_exit ],
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH container v7 3/3] migrate: print mapped volume in error
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH container v7 1/3] migration: add " Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH container v7 2/3] pct: add 'remote-migrate' command Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 1/7] pending changes: allow skipping cloud-init Fabian Grünbichler
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

since that is the ID on the target node..

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 src/PVE/LXC/Migrate.pm | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/PVE/LXC/Migrate.pm b/src/PVE/LXC/Migrate.pm
index 82305c0..35455e1 100644
--- a/src/PVE/LXC/Migrate.pm
+++ b/src/PVE/LXC/Migrate.pm
@@ -476,6 +476,9 @@ sub phase1_cleanup {
 
     if ($self->{volumes}) {
 	foreach my $volid (@{$self->{volumes}}) {
+	    if (my $mapped_volume = $self->{volume_map}->{$volid}) {
+		$volid = $mapped_volume;
+	    }
 	    $self->log('err', "found stale volume copy '$volid' on node '$self->{node}'");
 	    # fixme: try to remove ?
 	}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH qemu-server v7 1/7] pending changes: allow skipping cloud-init
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (2 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH container v7 3/3] migrate: print mapped volume in error Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 2/7] pending: fix typo in variable name Fabian Grünbichler
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

in case of remote migration, we use the `update_vm_api` helper for
checking permissions on the incoming config. this would also cause an
incoming cloud-init image to be overwritten, since the VM is not running
yet at this point.

provide a parameter which can be set by an incoming *remote* migration
to avoid having inconsistent cloud init images on the source and target
side.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    new in v7

 PVE/QemuServer.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index dd3d3512..dea5f251 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5193,7 +5193,7 @@ sub vmconfig_delete_or_detach_drive {
 
 
 sub vmconfig_apply_pending {
-    my ($vmid, $conf, $storecfg, $errors) = @_;
+    my ($vmid, $conf, $storecfg, $errors, $skip_cloud_init) = @_;
 
     return if !scalar(keys %{$conf->{pending}});
 
@@ -5226,7 +5226,7 @@ sub vmconfig_apply_pending {
 
     PVE::QemuConfig->cleanup_pending($conf);
 
-    my $generate_cloudnit = undef;
+    my $generate_cloudnit = $skip_cloud_init ? 0 : undef;
 
     foreach my $opt (keys %{$conf->{pending}}) { # add/change
 	next if $opt eq 'delete'; # just to be sure
@@ -5241,7 +5241,7 @@ sub vmconfig_apply_pending {
 
 	    if (is_valid_drivename($opt)) {
 		my $drive = parse_drive($opt, $conf->{pending}->{$opt});
-		$generate_cloudnit = 1 if drive_is_cloudinit($drive);
+		$generate_cloudnit //= 1 if drive_is_cloudinit($drive);
 	    }
 
 	    $conf->{$opt} = delete $conf->{pending}->{$opt};
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH qemu-server v7 2/7] pending: fix typo in variable name
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (3 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 1/7] pending changes: allow skipping cloud-init Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 3/7] mtunnel: add API endpoints Fabian Grünbichler
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    new in v7

 PVE/QemuServer.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index dea5f251..9a62b29d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5226,7 +5226,7 @@ sub vmconfig_apply_pending {
 
     PVE::QemuConfig->cleanup_pending($conf);
 
-    my $generate_cloudnit = $skip_cloud_init ? 0 : undef;
+    my $generate_cloudinit = $skip_cloud_init ? 0 : undef;
 
     foreach my $opt (keys %{$conf->{pending}}) { # add/change
 	next if $opt eq 'delete'; # just to be sure
@@ -5241,7 +5241,7 @@ sub vmconfig_apply_pending {
 
 	    if (is_valid_drivename($opt)) {
 		my $drive = parse_drive($opt, $conf->{pending}->{$opt});
-		$generate_cloudnit //= 1 if drive_is_cloudinit($drive);
+		$generate_cloudinit //= 1 if drive_is_cloudinit($drive);
 	    }
 
 	    $conf->{$opt} = delete $conf->{pending}->{$opt};
@@ -5250,7 +5250,7 @@ sub vmconfig_apply_pending {
 
     # write all changes at once to avoid unnecessary i/o
     PVE::QemuConfig->write_config($vmid, $conf);
-    if ($generate_cloudnit) {
+    if ($generate_cloudinit) {
 	if (PVE::QemuServer::Cloudinit::apply_cloudinit_config($conf, $vmid)) {
 	    # After successful generation and if there were changes to be applied, update the
 	    # config to drop the {cloudinit} entry.
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH qemu-server v7 3/7] mtunnel: add API endpoints
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (4 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 2/7] pending: fix typo in variable name Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 4/7] migrate: refactor remote VM/tunnel start Fabian Grünbichler
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

the following two endpoints are used for migration on the remote side

POST /nodes/NODE/qemu/VMID/mtunnel

which creates and locks an empty VM config, and spawns the main qmtunnel
worker which binds to a VM-specific UNIX socket.

this worker handles JSON-encoded migration commands coming in via this
UNIX socket:
- config (set target VM config)
-- checks permissions for updating config
-- strips pending changes and snapshots
-- sets (optional) firewall config
- disk (allocate disk for NBD migration)
-- checks permission for target storage
-- returns drive string for allocated volume
- disk-import, query-disk-import, bwlimit
-- handled by PVE::StorageTunnel
- start (returning migration info)
- fstrim (via agent)
- ticket (creates a ticket for a WS connection to a specific socket)
- resume
- stop
- nbdstop
- unlock
- quit (+ cleanup)

this worker serves as a replacement for both 'qm mtunnel' and various
manual calls via SSH. the API call will return a ticket valid for
connecting to the worker's UNIX socket via a websocket connection.

GET+WebSocket upgrade /nodes/NODE/qemu/VMID/mtunnelwebsocket

gets called for connecting to a UNIX socket via websocket forwarding,
i.e. once for the main command mtunnel, and once each for the memory
migration and each NBD drive-mirror/storage migration.

access is guarded by a short-lived ticket binding the authenticated user
to the socket path. such tickets can be requested over the main mtunnel,
which keeps track of socket paths currently used by that
mtunnel/migration instance.

each command handler should check privileges for the requested action if
necessary.

both mtunnel and mtunnelwebsocket endpoints are not proxied, the
client/caller is responsible for ensuring the passed 'node' parameter
and the endpoint handling the call are matching.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    v7:
    - fix parameter order when parsing command (thanks Stefan Hanreich!)
    - pass-through $conf->{cloudinit} and tell update_vm_api to not regenerate the
      cloudinit image
    - bump d/control libpve-access-control dependency for Sys.Incoming privilege
    v6:
    - check for Sys.Incoming in mtunnel
    - add definedness checks in 'config' command
    - switch to vm_running_locally in 'resume' command
    - moved $socket_addr closer to usage
    v5:
    - us vm_running_locally
    - move '$socket_addr' declaration closer to usage
    v4:
    - add timeout to accept()
    - move 'bwlimit' to PVE::StorageTunnel and extend it
    - mark mtunnel(websocket) as non-proxied, and check $node accordingly
    v3:
    - handle meta and vmgenid better
    - handle failure of 'config' updating
    - move 'disk-import' and 'query-disk-import' handlers to pve-guest-common
    - improve tunnel exit by letting client close the connection
    - use strict VM config parser
    v2: incorporated Fabian Ebner's feedback, mainly:
    - use modified nbd alloc helper instead of duplicating
    - fix disk cleanup, also cleanup imported disks
    - fix firewall-conf vs firewall-config mismatch
    
    soft-requires
    - pve-http-server with websocket fixes (could be done via breaks? or bumped in
      pve-manager, or ignored..)

 PVE/API2/Qemu.pm | 535 ++++++++++++++++++++++++++++++++++++++++++++++-
 debian/control   |   2 +-
 2 files changed, 534 insertions(+), 3 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 0bb2d147..be84ff58 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -4,10 +4,13 @@ use strict;
 use warnings;
 use Cwd 'abs_path';
 use Net::SSLeay;
-use POSIX;
 use IO::Socket::IP;
+use IO::Socket::UNIX;
+use IPC::Open3;
+use JSON;
 use URI::Escape;
 use Crypt::OpenSSL::Random;
+use Socket qw(SOCK_STREAM);
 
 use PVE::Cluster qw (cfs_read_file cfs_write_file);;
 use PVE::RRD;
@@ -39,6 +42,7 @@ use PVE::VZDump::Plugin;
 use PVE::DataCenterConfig;
 use PVE::SSHInfo;
 use PVE::Replication;
+use PVE::StorageTunnel;
 
 BEGIN {
     if (!$ENV{PVE_GENERATING_DOCS}) {
@@ -1092,6 +1096,7 @@ __PACKAGE__->register_method({
 	    { subdir => 'spiceproxy' },
 	    { subdir => 'sendkey' },
 	    { subdir => 'firewall' },
+	    { subdir => 'mtunnel' },
 	    ];
 
 	return $res;
@@ -1447,6 +1452,8 @@ my $update_vm_api  = sub {
 
     my $background_delay = extract_param($param, 'background_delay');
 
+    my $skip_cloud_init = extract_param($param, 'skip_cloud_init');
+
     if (defined(my $cipassword = $param->{cipassword})) {
 	# Same logic as in cloud-init (but with the regex fixed...)
 	$param->{cipassword} = PVE::Tools::encrypt_pw($cipassword)
@@ -1804,7 +1811,8 @@ my $update_vm_api  = sub {
 	    if ($running) {
 		PVE::QemuServer::vmconfig_hotplug_pending($vmid, $conf, $storecfg, $modified, $errors);
 	    } else {
-		PVE::QemuServer::vmconfig_apply_pending($vmid, $conf, $storecfg, $errors);
+		# cloud_init must be skipped if we are in an incoming, remote live migration
+		PVE::QemuServer::vmconfig_apply_pending($vmid, $conf, $storecfg, $errors, $skip_cloud_init);
 	    }
 	    raise_param_exc($errors) if scalar(keys %$errors);
 
@@ -5099,4 +5107,527 @@ __PACKAGE__->register_method({
 	return PVE::QemuServer::Cloudinit::dump_cloudinit_config($conf, $param->{vmid}, $param->{type});
     }});
 
+__PACKAGE__->register_method({
+    name => 'mtunnel',
+    path => '{vmid}/mtunnel',
+    method => 'POST',
+    protected => 1,
+    description => 'Migration tunnel endpoint - only for internal use by VM migration.',
+    permissions => {
+	check =>
+	[ 'and',
+	  ['perm', '/vms/{vmid}', [ 'VM.Allocate' ]],
+	  ['perm', '/', [ 'Sys.Incoming' ]],
+	],
+	description => "You need 'VM.Allocate' permissions on '/vms/{vmid}' and Sys.Incoming" .
+	               " on '/'. Further permission checks happen during the actual migration.",
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid'),
+	    storages => {
+		type => 'string',
+		format => 'pve-storage-id-list',
+		optional => 1,
+		description => 'List of storages to check permission and availability. Will be checked again for all actually used storages during migration.',
+	    },
+	},
+    },
+    returns => {
+	additionalProperties => 0,
+	properties => {
+	    upid => { type => 'string' },
+	    ticket => { type => 'string' },
+	    socket => { type => 'string' },
+	},
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $node = extract_param($param, 'node');
+	my $vmid = extract_param($param, 'vmid');
+
+	my $storages = extract_param($param, 'storages');
+
+	my $nodename = PVE::INotify::nodename();
+
+	raise_param_exc({ node => "node needs to be 'localhost' or local hostname '$nodename'" })
+	    if $node ne 'localhost' && $node ne $nodename;
+
+	$node = $nodename;
+
+	my $storecfg = PVE::Storage::config();
+	foreach my $storeid (PVE::Tools::split_list($storages)) {
+	    $check_storage_access_migrate->($rpcenv, $authuser, $storecfg, $storeid, $node);
+	}
+
+	PVE::Cluster::check_cfs_quorum();
+
+	my $lock = 'create';
+	eval { PVE::QemuConfig->create_and_lock_config($vmid, 0, $lock); };
+
+	raise_param_exc({ vmid => "unable to create empty VM config - $@"})
+	    if $@;
+
+	my $realcmd = sub {
+	    my $state = {
+		storecfg => PVE::Storage::config(),
+		lock => $lock,
+		vmid => $vmid,
+	    };
+
+	    my $run_locked = sub {
+		my ($code, $params) = @_;
+		return PVE::QemuConfig->lock_config($state->{vmid}, sub {
+		    my $conf = PVE::QemuConfig->load_config($state->{vmid});
+
+		    $state->{conf} = $conf;
+
+		    die "Encountered wrong lock - aborting mtunnel command handling.\n"
+			if $state->{lock} && !PVE::QemuConfig->has_lock($conf, $state->{lock});
+
+		    return $code->($params);
+		});
+	    };
+
+	    my $cmd_desc = {
+		config => {
+		    conf => {
+			type => 'string',
+			description => 'Full VM config, adapted for target cluster/node',
+		    },
+		    'firewall-config' => {
+			type => 'string',
+			description => 'VM firewall config',
+			optional => 1,
+		    },
+		},
+		disk => {
+		    format => PVE::JSONSchema::get_standard_option('pve-qm-image-format'),
+		    storage => {
+			type => 'string',
+			format => 'pve-storage-id',
+		    },
+		    drive => {
+			type => 'object',
+			description => 'parsed drive information without volid and format',
+		    },
+		},
+		start => {
+		    start_params => {
+			type => 'object',
+			description => 'params passed to vm_start_nolock',
+		    },
+		    migrate_opts => {
+			type => 'object',
+			description => 'migrate_opts passed to vm_start_nolock',
+		    },
+		},
+		ticket => {
+		    path => {
+			type => 'string',
+			description => 'socket path for which the ticket should be valid. must be known to current mtunnel instance.',
+		    },
+		},
+		quit => {
+		    cleanup => {
+			type => 'boolean',
+			description => 'remove VM config and disks, aborting migration',
+			default => 0,
+		    },
+		},
+		'disk-import' => $PVE::StorageTunnel::cmd_schema->{'disk-import'},
+		'query-disk-import' => $PVE::StorageTunnel::cmd_schema->{'query-disk-import'},
+		bwlimit => $PVE::StorageTunnel::cmd_schema->{bwlimit},
+	    };
+
+	    my $cmd_handlers = {
+		'version' => sub {
+		    # compared against other end's version
+		    # bump/reset for breaking changes
+		    # bump/bump for opt-in changes
+		    return {
+			api => 2,
+			age => 0,
+		    };
+		},
+		'config' => sub {
+		    my ($params) = @_;
+
+		    # parse and write out VM FW config if given
+		    if (my $fw_conf = $params->{'firewall-config'}) {
+			my ($path, $fh) = PVE::Tools::tempfile_contents($fw_conf, 700);
+
+			my $empty_conf = {
+			    rules => [],
+			    options => {},
+			    aliases => {},
+			    ipset => {} ,
+			    ipset_comments => {},
+			};
+			my $cluster_fw_conf = PVE::Firewall::load_clusterfw_conf();
+
+			# TODO: add flag for strict parsing?
+			# TODO: add import sub that does all this given raw content?
+			my $vmfw_conf = PVE::Firewall::generic_fw_config_parser($path, $cluster_fw_conf, $empty_conf, 'vm');
+			$vmfw_conf->{vmid} = $state->{vmid};
+			PVE::Firewall::save_vmfw_conf($state->{vmid}, $vmfw_conf);
+
+			$state->{cleanup}->{fw} = 1;
+		    }
+
+		    my $conf_fn = "incoming/qemu-server/$state->{vmid}.conf";
+		    my $new_conf = PVE::QemuServer::parse_vm_config($conf_fn, $params->{conf}, 1);
+		    delete $new_conf->{lock};
+		    delete $new_conf->{digest};
+
+		    # TODO handle properly?
+		    delete $new_conf->{snapshots};
+		    delete $new_conf->{parent};
+		    delete $new_conf->{pending};
+
+		    # not handled by update_vm_api
+		    my $vmgenid = delete $new_conf->{vmgenid};
+		    my $meta = delete $new_conf->{meta};
+		    my $cloudinit = delete $new_conf->{cloudinit}; # this is informational only
+		    $new_conf->{skip_cloud_init} = 1; # re-use image from source side
+
+		    $new_conf->{vmid} = $state->{vmid};
+		    $new_conf->{node} = $node;
+
+		    PVE::QemuConfig->remove_lock($state->{vmid}, 'create');
+
+		    eval {
+			$update_vm_api->($new_conf, 1);
+		    };
+		    if (my $err = $@) {
+			# revert to locked previous config
+			my $conf = PVE::QemuConfig->load_config($state->{vmid});
+			$conf->{lock} = 'create';
+			PVE::QemuConfig->write_config($state->{vmid}, $conf);
+
+			die $err;
+		    }
+
+		    my $conf = PVE::QemuConfig->load_config($state->{vmid});
+		    $conf->{lock} = 'migrate';
+		    $conf->{vmgenid} = $vmgenid if defined($vmgenid);
+		    $conf->{meta} = $meta if defined($meta);
+		    $conf->{cloudinit} = $cloudinit if defined($cloudinit);
+		    PVE::QemuConfig->write_config($state->{vmid}, $conf);
+
+		    $state->{lock} = 'migrate';
+
+		    return;
+		},
+		'bwlimit' => sub {
+		    my ($params) = @_;
+		    return PVE::StorageTunnel::handle_bwlimit($params);
+		},
+		'disk' => sub {
+		    my ($params) = @_;
+
+		    my $format = $params->{format};
+		    my $storeid = $params->{storage};
+		    my $drive = $params->{drive};
+
+		    $check_storage_access_migrate->($rpcenv, $authuser, $state->{storecfg}, $storeid, $node);
+
+		    my $storagemap = {
+			default => $storeid,
+		    };
+
+		    my $source_volumes = {
+			'disk' => [
+			    undef,
+			    $storeid,
+			    undef,
+			    $drive,
+			    0,
+			    $format,
+			],
+		    };
+
+		    my $res = PVE::QemuServer::vm_migrate_alloc_nbd_disks($state->{storecfg}, $state->{vmid}, $source_volumes, $storagemap);
+		    if (defined($res->{disk})) {
+			$state->{cleanup}->{volumes}->{$res->{disk}->{volid}} = 1;
+			return $res->{disk};
+		    } else {
+			die "failed to allocate NBD disk..\n";
+		    }
+		},
+		'disk-import' => sub {
+		    my ($params) = @_;
+
+		    $check_storage_access_migrate->(
+			$rpcenv,
+			$authuser,
+			$state->{storecfg},
+			$params->{storage},
+			$node
+		    );
+
+		    $params->{unix} = "/run/qemu-server/$state->{vmid}.storage";
+
+		    return PVE::StorageTunnel::handle_disk_import($state, $params);
+		},
+		'query-disk-import' => sub {
+		    my ($params) = @_;
+
+		    return PVE::StorageTunnel::handle_query_disk_import($state, $params);
+		},
+		'start' => sub {
+		    my ($params) = @_;
+
+		    my $info = PVE::QemuServer::vm_start_nolock(
+			$state->{storecfg},
+			$state->{vmid},
+			$state->{conf},
+			$params->{start_params},
+			$params->{migrate_opts},
+		    );
+
+
+		    if ($info->{migrate}->{proto} ne 'unix') {
+			PVE::QemuServer::vm_stop(undef, $state->{vmid}, 1, 1);
+			die "migration over non-UNIX sockets not possible\n";
+		    }
+
+		    my $socket = $info->{migrate}->{addr};
+		    chown $state->{socket_uid}, -1, $socket;
+		    $state->{sockets}->{$socket} = 1;
+
+		    my $unix_sockets = $info->{migrate}->{unix_sockets};
+		    foreach my $socket (@$unix_sockets) {
+			chown $state->{socket_uid}, -1, $socket;
+			$state->{sockets}->{$socket} = 1;
+		    }
+		    return $info;
+		},
+		'fstrim' => sub {
+		    if (PVE::QemuServer::qga_check_running($state->{vmid})) {
+			eval { mon_cmd($state->{vmid}, "guest-fstrim") };
+			warn "fstrim failed: $@\n" if $@;
+		    }
+		    return;
+		},
+		'stop' => sub {
+		    PVE::QemuServer::vm_stop(undef, $state->{vmid}, 1, 1);
+		    return;
+		},
+		'nbdstop' => sub {
+		    PVE::QemuServer::nbd_stop($state->{vmid});
+		    return;
+		},
+		'resume' => sub {
+		    if (PVE::QemuServer::Helpers::vm_running_locally($state->{vmid})) {
+			PVE::QemuServer::vm_resume($state->{vmid}, 1, 1);
+		    } else {
+			die "VM $state->{vmid} not running\n";
+		    }
+		    return;
+		},
+		'unlock' => sub {
+		    PVE::QemuConfig->remove_lock($state->{vmid}, $state->{lock});
+		    delete $state->{lock};
+		    return;
+		},
+		'ticket' => sub {
+		    my ($params) = @_;
+
+		    my $path = $params->{path};
+
+		    die "Not allowed to generate ticket for unknown socket '$path'\n"
+			if !defined($state->{sockets}->{$path});
+
+		    return { ticket => PVE::AccessControl::assemble_tunnel_ticket($authuser, "/socket/$path") };
+		},
+		'quit' => sub {
+		    my ($params) = @_;
+
+		    if ($params->{cleanup}) {
+			if ($state->{cleanup}->{fw}) {
+			    PVE::Firewall::remove_vmfw_conf($state->{vmid});
+			}
+
+			for my $volid (keys $state->{cleanup}->{volumes}->%*) {
+			    print "freeing volume '$volid' as part of cleanup\n";
+			    eval { PVE::Storage::vdisk_free($state->{storecfg}, $volid) };
+			    warn $@ if $@;
+			}
+
+			PVE::QemuServer::destroy_vm($state->{storecfg}, $state->{vmid}, 1);
+		    }
+
+		    print "switching to exit-mode, waiting for client to disconnect\n";
+		    $state->{exit} = 1;
+		    return;
+		},
+	    };
+
+	    $run_locked->(sub {
+		my $socket_addr = "/run/qemu-server/$state->{vmid}.mtunnel";
+		unlink $socket_addr;
+
+		$state->{socket} = IO::Socket::UNIX->new(
+	            Type => SOCK_STREAM(),
+		    Local => $socket_addr,
+		    Listen => 1,
+		);
+
+		$state->{socket_uid} = getpwnam('www-data')
+		    or die "Failed to resolve user 'www-data' to numeric UID\n";
+		chown $state->{socket_uid}, -1, $socket_addr;
+	    });
+
+	    print "mtunnel started\n";
+
+	    my $conn = eval { PVE::Tools::run_with_timeout(300, sub { $state->{socket}->accept() }) };
+	    if ($@) {
+		warn "Failed to accept tunnel connection - $@\n";
+
+		warn "Removing tunnel socket..\n";
+		unlink $state->{socket};
+
+		warn "Removing temporary VM config..\n";
+		$run_locked->(sub {
+		    PVE::QemuServer::destroy_vm($state->{storecfg}, $state->{vmid}, 1);
+		});
+
+		die "Exiting mtunnel\n";
+	    }
+
+	    $state->{conn} = $conn;
+
+	    my $reply_err = sub {
+		my ($msg) = @_;
+
+		my $reply = JSON::encode_json({
+		    success => JSON::false,
+		    msg => $msg,
+		});
+		$conn->print("$reply\n");
+		$conn->flush();
+	    };
+
+	    my $reply_ok = sub {
+		my ($res) = @_;
+
+		$res->{success} = JSON::true;
+		my $reply = JSON::encode_json($res);
+		$conn->print("$reply\n");
+		$conn->flush();
+	    };
+
+	    while (my $line = <$conn>) {
+		chomp $line;
+
+		# untaint, we validate below if needed
+		($line) = $line =~ /^(.*)$/;
+		my $parsed = eval { JSON::decode_json($line) };
+		if ($@) {
+		    $reply_err->("failed to parse command - $@");
+		    next;
+		}
+
+		my $cmd = delete $parsed->{cmd};
+		if (!defined($cmd)) {
+		    $reply_err->("'cmd' missing");
+		} elsif ($state->{exit}) {
+		    $reply_err->("tunnel is in exit-mode, processing '$cmd' cmd not possible");
+		    next;
+		} elsif (my $handler = $cmd_handlers->{$cmd}) {
+		    print "received command '$cmd'\n";
+		    eval {
+			if ($cmd_desc->{$cmd}) {
+			    PVE::JSONSchema::validate($parsed, $cmd_desc->{$cmd});
+			} else {
+			    $parsed = {};
+			}
+			my $res = $run_locked->($handler, $parsed);
+			$reply_ok->($res);
+		    };
+		    $reply_err->("failed to handle '$cmd' command - $@")
+			if $@;
+		} else {
+		    $reply_err->("unknown command '$cmd' given");
+		}
+	    }
+
+	    if ($state->{exit}) {
+		print "mtunnel exited\n";
+	    } else {
+		die "mtunnel exited unexpectedly\n";
+	    }
+	};
+
+	my $socket_addr = "/run/qemu-server/$vmid.mtunnel";
+	my $ticket = PVE::AccessControl::assemble_tunnel_ticket($authuser, "/socket/$socket_addr");
+	my $upid = $rpcenv->fork_worker('qmtunnel', $vmid, $authuser, $realcmd);
+
+	return {
+	    ticket => $ticket,
+	    upid => $upid,
+	    socket => $socket_addr,
+	};
+    }});
+
+__PACKAGE__->register_method({
+    name => 'mtunnelwebsocket',
+    path => '{vmid}/mtunnelwebsocket',
+    method => 'GET',
+    permissions => {
+	description => "You need to pass a ticket valid for the selected socket. Tickets can be created via the mtunnel API call, which will check permissions accordingly.",
+        user => 'all', # check inside
+    },
+    description => 'Migration tunnel endpoint for websocket upgrade - only for internal use by VM migration.',
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid'),
+	    socket => {
+		type => "string",
+		description => "unix socket to forward to",
+	    },
+	    ticket => {
+		type => "string",
+		description => "ticket return by initial 'mtunnel' API call, or retrieved via 'ticket' tunnel command",
+	    },
+	},
+    },
+    returns => {
+	type => "object",
+	properties => {
+	    port => { type => 'string', optional => 1 },
+	    socket => { type => 'string', optional => 1 },
+	},
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $nodename = PVE::INotify::nodename();
+	my $node = extract_param($param, 'node');
+
+	raise_param_exc({ node => "node needs to be 'localhost' or local hostname '$nodename'" })
+	    if $node ne 'localhost' && $node ne $nodename;
+
+	my $vmid = $param->{vmid};
+	# check VM exists
+	PVE::QemuConfig->load_config($vmid);
+
+	my $socket = $param->{socket};
+	PVE::AccessControl::verify_tunnel_ticket($param->{ticket}, $authuser, "/socket/$socket");
+
+	return { socket => $socket };
+    }});
+
 1;
diff --git a/debian/control b/debian/control
index af11b8fe..a2a7ce48 100644
--- a/debian/control
+++ b/debian/control
@@ -33,7 +33,7 @@ Depends: dbus,
          libjson-perl,
          libjson-xs-perl,
          libnet-ssleay-perl,
-         libpve-access-control (>= 5.0-7),
+         libpve-access-control (>= 7.2-5),
          libpve-cluster-perl,
          libpve-common-perl (>= 7.2-5),
          libpve-guest-common-perl (>= 4.2-2),
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH qemu-server v7 4/7] migrate: refactor remote VM/tunnel start
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (5 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 3/7] mtunnel: add API endpoints Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 5/7] migrate: add remote migration handling Fabian Grünbichler
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

no semantic changes intended, except for:
- no longer passing the main migration UNIX socket to SSH twice for
forwarding
- dropping the 'unix:' prefix in start_remote_tunnel's timeout error message

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    v6:
    - rport/port
    - properly conditionalize 'skiplock'

 PVE/QemuMigrate.pm | 159 ++++++++++++++++++++++++++++-----------------
 PVE/QemuServer.pm  |  34 +++++-----
 2 files changed, 116 insertions(+), 77 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 5cd7d288..cbcd80bb 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -43,19 +43,24 @@ sub fork_tunnel {
     return PVE::Tunnel::fork_ssh_tunnel($self->{rem_ssh}, $cmd, $ssh_forward_info, $log);
 }
 
+# tunnel_info:
+#   proto: unix (secure) or tcp (insecure/legacy compat)
+#   addr: IP or UNIX socket path
+#   port: optional TCP port
+#   unix_sockets: additional UNIX socket paths to forward
 sub start_remote_tunnel {
-    my ($self, $raddr, $rport, $ruri, $unix_socket_info) = @_;
+    my ($self, $tunnel_info) = @_;
 
     my $nodename = PVE::INotify::nodename();
     my $migration_type = $self->{opts}->{migration_type};
 
     if ($migration_type eq 'secure') {
 
-	if ($ruri =~ /^unix:/) {
-	    my $ssh_forward_info = ["$raddr:$raddr"];
-	    $unix_socket_info->{$raddr} = 1;
+	if ($tunnel_info->{proto} eq 'unix') {
+	    my $ssh_forward_info = [];
 
-	    my $unix_sockets = [ keys %$unix_socket_info ];
+	    my $unix_sockets = [ keys %{$tunnel_info->{unix_sockets}} ];
+	    push @$unix_sockets, $tunnel_info->{addr};
 	    for my $sock (@$unix_sockets) {
 		push @$ssh_forward_info, "$sock:$sock";
 		unlink $sock;
@@ -82,23 +87,23 @@ sub start_remote_tunnel {
 	    if ($unix_socket_try > 100) {
 		$self->{errors} = 1;
 		PVE::Tunnel::finish_tunnel($self->{tunnel});
-		die "Timeout, migration socket $ruri did not get ready";
+		die "Timeout, migration socket $tunnel_info->{addr} did not get ready";
 	    }
 	    $self->{tunnel}->{unix_sockets} = $unix_sockets if (@$unix_sockets);
 
-	} elsif ($ruri =~ /^tcp:/) {
+	} elsif ($tunnel_info->{proto} eq 'tcp') {
 	    my $ssh_forward_info = [];
-	    if ($raddr eq "localhost") {
+	    if ($tunnel_info->{addr} eq "localhost") {
 		# for backwards compatibility with older qemu-server versions
 		my $pfamily = PVE::Tools::get_host_address_family($nodename);
 		my $lport = PVE::Tools::next_migrate_port($pfamily);
-		push @$ssh_forward_info, "$lport:localhost:$rport";
+		push @$ssh_forward_info, "$lport:localhost:$tunnel_info->{port}";
 	    }
 
 	    $self->{tunnel} = $self->fork_tunnel($ssh_forward_info);
 
 	} else {
-	    die "unsupported protocol in migration URI: $ruri\n";
+	    die "unsupported protocol in migration URI: $tunnel_info->{proto}\n";
 	}
     } else {
 	#fork tunnel for insecure migration, to send faster commands like resume
@@ -663,52 +668,45 @@ sub phase1_cleanup {
     }
 }
 
-sub phase2 {
-    my ($self, $vmid) = @_;
+sub phase2_start_local_cluster {
+    my ($self, $vmid, $params) = @_;
 
     my $conf = $self->{vmconf};
     my $local_volumes = $self->{local_volumes};
     my @online_local_volumes = $self->filter_local_volumes('online');
 
     $self->{storage_migration} = 1 if scalar(@online_local_volumes);
+    my $start = $params->{start_params};
+    my $migrate = $params->{migrate_opts};
 
     $self->log('info', "starting VM $vmid on remote node '$self->{node}'");
 
-    my $raddr;
-    my $rport;
-    my $ruri; # the whole migration dst. URI (protocol:address[:port])
-    my $nodename = PVE::INotify::nodename();
+    my $tunnel_info = {};
 
     ## start on remote node
     my $cmd = [@{$self->{rem_ssh}}];
 
-    my $spice_ticket;
-    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) {
-	my $res = mon_cmd($vmid, 'query-spice');
-	$spice_ticket = $res->{ticket};
+    push @$cmd, 'qm', 'start', $vmid;
+
+    if ($start->{skiplock}) {
+	push @$cmd, '--skiplock';
     }
 
-    push @$cmd , 'qm', 'start', $vmid, '--skiplock', '--migratedfrom', $nodename;
+    push @$cmd, '--migratedfrom', $migrate->{migratedfrom};
 
-    my $migration_type = $self->{opts}->{migration_type};
+    push @$cmd, '--migration_type', $migrate->{type};
 
-    push @$cmd, '--migration_type', $migration_type;
+    push @$cmd, '--migration_network', $migrate->{network}
+      if $migrate->{network};
 
-    push @$cmd, '--migration_network', $self->{opts}->{migration_network}
-      if $self->{opts}->{migration_network};
+    push @$cmd, '--stateuri', $start->{statefile};
 
-    if ($migration_type eq 'insecure') {
-	push @$cmd, '--stateuri', 'tcp';
-    } else {
-	push @$cmd, '--stateuri', 'unix';
-    }
-
-    if ($self->{forcemachine}) {
-	push @$cmd, '--machine', $self->{forcemachine};
+    if ($start->{forcemachine}) {
+	push @$cmd, '--machine', $start->{forcemachine};
     }
 
-    if ($self->{forcecpu}) {
-	push @$cmd, '--force-cpu', $self->{forcecpu};
+    if ($start->{forcecpu}) {
+	push @$cmd, '--force-cpu', $start->{forcecpu};
     }
 
     if ($self->{storage_migration}) {
@@ -716,10 +714,7 @@ sub phase2 {
     }
 
     my $spice_port;
-    my $unix_socket_info = {};
-    # version > 0 for unix socket support
-    my $nbd_protocol_version = 1;
-    my $input = "nbd_protocol_version: $nbd_protocol_version\n";
+    my $input = "nbd_protocol_version: $migrate->{nbd_proto_version}\n";
 
     my @offline_local_volumes = $self->filter_local_volumes('offline');
     for my $volid (@offline_local_volumes) {
@@ -737,7 +732,7 @@ sub phase2 {
 	}
     }
 
-    $input .= "spice_ticket: $spice_ticket\n" if $spice_ticket;
+    $input .= "spice_ticket: $migrate->{spice_ticket}\n" if $migrate->{spice_ticket};
 
     my @online_replicated_volumes = $self->filter_local_volumes('online', 1);
     foreach my $volid (@online_replicated_volumes) {
@@ -767,20 +762,20 @@ sub phase2 {
     my $exitcode = PVE::Tools::run_command($cmd, input => $input, outfunc => sub {
 	my $line = shift;
 
-	if ($line =~ m/^migration listens on tcp:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)$/) {
-	    $raddr = $1;
-	    $rport = int($2);
-	    $ruri = "tcp:$raddr:$rport";
+	if ($line =~ m/^migration listens on (tcp):(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)$/) {
+	    $tunnel_info->{addr} = $2;
+	    $tunnel_info->{port} = int($3);
+	    $tunnel_info->{proto} = $1;
 	}
-	elsif ($line =~ m!^migration listens on unix:(/run/qemu-server/(\d+)\.migrate)$!) {
-	    $raddr = $1;
-	    die "Destination UNIX sockets VMID does not match source VMID" if $vmid ne $2;
-	    $ruri = "unix:$raddr";
+	elsif ($line =~ m!^migration listens on (unix):(/run/qemu-server/(\d+)\.migrate)$!) {
+	    $tunnel_info->{addr} = $2;
+	    die "Destination UNIX sockets VMID does not match source VMID" if $vmid ne $3;
+	    $tunnel_info->{proto} = $1;
 	}
 	elsif ($line =~ m/^migration listens on port (\d+)$/) {
-	    $raddr = "localhost";
-	    $rport = int($1);
-	    $ruri = "tcp:$raddr:$rport";
+	    $tunnel_info->{addr} = "localhost";
+	    $tunnel_info->{port} = int($1);
+	    $tunnel_info->{proto} = "tcp";
 	}
 	elsif ($line =~ m/^spice listens on port (\d+)$/) {
 	    $spice_port = int($1);
@@ -801,7 +796,7 @@ sub phase2 {
 	    $targetdrive =~ s/drive-//g;
 
 	    $handle_storage_migration_listens->($targetdrive, $drivestr, $nbd_uri);
-	    $unix_socket_info->{$nbd_unix_addr} = 1;
+	    $tunnel_info->{unix_sockets}->{$nbd_unix_addr} = 1;
 	} elsif ($line =~ m/^re-using replicated volume: (\S+) - (.*)$/) {
 	    my $drive = $1;
 	    my $volid = $2;
@@ -816,19 +811,65 @@ sub phase2 {
 
     die "remote command failed with exit code $exitcode\n" if $exitcode;
 
-    die "unable to detect remote migration address\n" if !$raddr;
+    die "unable to detect remote migration address\n" if !$tunnel_info->{addr} || !$tunnel_info->{proto};
 
     if (scalar(keys %$target_replicated_volumes) != scalar(@online_replicated_volumes)) {
 	die "number of replicated disks on source and target node do not match - target node too old?\n"
     }
 
+    return ($tunnel_info, $spice_port);
+}
+
+sub phase2 {
+    my ($self, $vmid) = @_;
+
+    my $conf = $self->{vmconf};
+
+    # version > 0 for unix socket support
+    my $nbd_protocol_version = 1;
+
+    my $spice_ticket;
+    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) {
+	my $res = mon_cmd($vmid, 'query-spice');
+	$spice_ticket = $res->{ticket};
+    }
+
+    my $migration_type = $self->{opts}->{migration_type};
+    my $state_uri = $migration_type eq 'insecure' ? 'tcp' : 'unix';
+
+    my $params = {
+	start_params => {
+	    statefile => $state_uri,
+	    forcemachine => $self->{forcemachine},
+	    forcecpu => $self->{forcecpu},
+	    skiplock => 1,
+	},
+	migrate_opts => {
+	    spice_ticket => $spice_ticket,
+	    type => $migration_type,
+	    network => $self->{opts}->{migration_network},
+	    storagemap => $self->{opts}->{storagemap},
+	    migratedfrom => PVE::INotify::nodename(),
+	    nbd_proto_version => $nbd_protocol_version,
+	    nbd => $self->{nbd},
+	},
+    };
+
+    my ($tunnel_info, $spice_port) = $self->phase2_start_local_cluster($vmid, $params);
+
     $self->log('info', "start remote tunnel");
-    $self->start_remote_tunnel($raddr, $rport, $ruri, $unix_socket_info);
+    $self->start_remote_tunnel($tunnel_info);
+
+    my $migrate_uri = "$tunnel_info->{proto}:$tunnel_info->{addr}";
+    $migrate_uri .= ":$tunnel_info->{port}"
+	if defined($tunnel_info->{port});
 
     if ($self->{storage_migration}) {
 	$self->{storage_migration_jobs} = {};
 	$self->log('info', "starting storage migration");
 
+	my @online_local_volumes = $self->filter_local_volumes('online');
+
 	die "The number of local disks does not match between the source and the destination.\n"
 	    if (scalar(keys %{$self->{target_drive}}) != scalar(@online_local_volumes));
 	foreach my $drive (keys %{$self->{target_drive}}){
@@ -838,7 +879,7 @@ sub phase2 {
 	    my $source_drive = PVE::QemuServer::parse_drive($drive, $conf->{$drive});
 	    my $source_volid = $source_drive->{file};
 
-	    my $bwlimit = $local_volumes->{$source_volid}->{bwlimit};
+	    my $bwlimit = $self->{local_volumes}->{$source_volid}->{bwlimit};
 	    my $bitmap = $target->{bitmap};
 
 	    $self->log('info', "$drive: start migration to $nbd_uri");
@@ -846,7 +887,7 @@ sub phase2 {
 	}
     }
 
-    $self->log('info', "starting online/live migration on $ruri");
+    $self->log('info', "starting online/live migration on $migrate_uri");
     $self->{livemigration} = 1;
 
     # load_defaults
@@ -923,12 +964,12 @@ sub phase2 {
 
     my $start = time();
 
-    $self->log('info', "start migrate command to $ruri");
+    $self->log('info', "start migrate command to $migrate_uri");
     eval {
-	mon_cmd($vmid, "migrate", uri => $ruri);
+	mon_cmd($vmid, "migrate", uri => $migrate_uri);
     };
     my $merr = $@;
-    $self->log('info', "migrate uri => $ruri failed: $merr") if $merr;
+    $self->log('info', "migrate uri => $migrate_uri failed: $merr") if $merr;
 
     my $last_mem_transferred = 0;
     my $usleep = 1000000;
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 9a62b29d..d3ab43ee 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5717,10 +5717,10 @@ sub vm_start_nolock {
 	return $migration_ip;
     };
 
-    my $migrate_uri;
     if ($statefile) {
 	if ($statefile eq 'tcp') {
-	    my $localip = "localhost";
+	    my $migrate = $res->{migrate} = { proto => 'tcp' };
+	    $migrate->{addr} = "localhost";
 	    my $datacenterconf = PVE::Cluster::cfs_read_file('datacenter.cfg');
 	    my $nodename = nodename();
 
@@ -5733,26 +5733,26 @@ sub vm_start_nolock {
 	    }
 
 	    if ($migration_type eq 'insecure') {
-		$localip = $get_migration_ip->($nodename);
-		$localip = "[$localip]" if Net::IP::ip_is_ipv6($localip);
+		$migrate->{addr} = $get_migration_ip->($nodename);
+		$migrate->{addr} = "[$migrate->{addr}]" if Net::IP::ip_is_ipv6($migrate->{addr});
 	    }
 
 	    my $pfamily = PVE::Tools::get_host_address_family($nodename);
-	    my $migrate_port = PVE::Tools::next_migrate_port($pfamily);
-	    $migrate_uri = "tcp:${localip}:${migrate_port}";
-	    push @$cmd, '-incoming', $migrate_uri;
+	    $migrate->{port} = PVE::Tools::next_migrate_port($pfamily);
+	    $migrate->{uri} = "tcp:$migrate->{addr}:$migrate->{port}";
+	    push @$cmd, '-incoming', $migrate->{uri};
 	    push @$cmd, '-S';
 
 	} elsif ($statefile eq 'unix') {
 	    # should be default for secure migrations as a ssh TCP forward
 	    # tunnel is not deterministic reliable ready and fails regurarly
 	    # to set up in time, so use UNIX socket forwards
-	    my $socket_addr = "/run/qemu-server/$vmid.migrate";
-	    unlink $socket_addr;
+	    my $migrate = $res->{migrate} = { proto => 'unix' };
+	    $migrate->{addr} = "/run/qemu-server/$vmid.migrate";
+	    unlink $migrate->{addr};
 
-	    $migrate_uri = "unix:$socket_addr";
-
-	    push @$cmd, '-incoming', $migrate_uri;
+	    $migrate->{uri} = "unix:$migrate->{addr}";
+	    push @$cmd, '-incoming', $migrate->{uri};
 	    push @$cmd, '-S';
 
 	} elsif (-e $statefile) {
@@ -5907,10 +5907,9 @@ sub vm_start_nolock {
     eval { PVE::QemuServer::PCI::reserve_pci_usage($pci_id_list, $vmid, undef, $pid) };
     warn $@ if $@;
 
-    print "migration listens on $migrate_uri\n" if $migrate_uri;
-    $res->{migrate_uri} = $migrate_uri;
-
-    if ($statefile && $statefile ne 'tcp' && $statefile ne 'unix')  {
+    if (defined($res->{migrate})) {
+	print "migration listens on $res->{migrate}->{uri}\n";
+    } elsif ($statefile) {
 	eval { mon_cmd($vmid, "cont"); };
 	warn $@ if $@;
     }
@@ -5925,6 +5924,7 @@ sub vm_start_nolock {
 	    my $socket_path = "/run/qemu-server/$vmid\_nbd.migrate";
 	    mon_cmd($vmid, "nbd-server-start", addr => { type => 'unix', data => { path => $socket_path } } );
 	    $migrate_storage_uri = "nbd:unix:$socket_path";
+	    $res->{migrate}->{unix_sockets} = [$socket_path];
 	} else {
 	    my $nodename = nodename();
 	    my $localip = $get_migration_ip->($nodename);
@@ -5942,8 +5942,6 @@ sub vm_start_nolock {
 	    $migrate_storage_uri = "nbd:${localip}:${storage_migrate_port}";
 	}
 
-	$res->{migrate_storage_uri} = $migrate_storage_uri;
-
 	foreach my $opt (sort keys %$nbd) {
 	    my $drivestr = $nbd->{$opt}->{drivestr};
 	    my $volid = $nbd->{$opt}->{volid};
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH qemu-server v7 5/7] migrate: add remote migration handling
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (6 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 4/7] migrate: refactor remote VM/tunnel start Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 6/7] api: add remote migrate endpoint Fabian Grünbichler
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

remote migration uses a websocket connection to a task worker running on
the target node instead of commands via SSH to control the migration.
this websocket tunnel is started earlier than the SSH tunnel, and allows
adding UNIX-socket forwarding over additional websocket connections
on-demand.

the main differences to regular intra-cluster migration are:
- source VM config and disks are only removed upon request via --delete
- shared storages are treated like local storages, since we can't
assume they are shared across clusters (with potentical to extend this
by marking storages as shared)
- NBD migrated disks are explicitly pre-allocated on the target node via
tunnel command before starting the target VM instance
- in addition to storages, network bridges and the VMID itself is
transformed via a user defined mapping
- all commands and migration data streams are sent via a WS tunnel proxy
- pending changes and snapshots are discarded on the target side (for
  the time being)

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    v7:
    - bump libpve-storage-perl dependency
    
    v6:
    - add proxmox-websocket-tunnel dependency
    
    v5:
    - move merge_bwlimits helper to PVE::AbstractMigrate and extend it
    - adapt to map_id move
    - add check on source side for VM snapshots (not yet supported/implemented)
    
    v4:
    - new merge_bwlimits helper, improved bwlimit handling
    - use config-aware remote start timeout
    - switch tunnel log to match migration log sub
    
    v3:
    - move WS tunnel helpers to pve-guest-common-perl
    - check bridge mapping early
    - fix misplaced parentheses
    
    v2:
    - improve tunnel version info printing and error handling
    - don't cleanup unix sockets twice
    - url escape remote socket path
    - cleanup nits and small issues

 PVE/API2/Qemu.pm   |   2 +-
 PVE/QemuMigrate.pm | 439 +++++++++++++++++++++++++++++++++++++--------
 PVE/QemuServer.pm  |   7 +-
 debian/control     |   3 +-
 4 files changed, 368 insertions(+), 83 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index be84ff58..1517eada 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -5252,7 +5252,7 @@ __PACKAGE__->register_method({
 		    # bump/reset for breaking changes
 		    # bump/bump for opt-in changes
 		    return {
-			api => 2,
+			api => $PVE::QemuMigrate::WS_TUNNEL_VERSION,
 			age => 0,
 		    };
 		},
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index cbcd80bb..5941cce6 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -5,11 +5,10 @@ use warnings;
 
 use IO::File;
 use IPC::Open2;
-use POSIX qw( WNOHANG );
 use Time::HiRes qw( usleep );
 
-use PVE::Format qw(render_bytes);
 use PVE::Cluster;
+use PVE::Format qw(render_bytes);
 use PVE::GuestHelpers qw(safe_boolean_ne safe_string_ne);
 use PVE::INotify;
 use PVE::RPCEnvironment;
@@ -17,6 +16,7 @@ use PVE::Replication;
 use PVE::ReplicationConfig;
 use PVE::ReplicationState;
 use PVE::Storage;
+use PVE::StorageTunnel;
 use PVE::Tools;
 use PVE::Tunnel;
 
@@ -31,6 +31,9 @@ use PVE::QemuServer;
 use PVE::AbstractMigrate;
 use base qw(PVE::AbstractMigrate);
 
+# compared against remote end's minimum version
+our $WS_TUNNEL_VERSION = 2;
+
 sub fork_tunnel {
     my ($self, $ssh_forward_info) = @_;
 
@@ -43,6 +46,35 @@ sub fork_tunnel {
     return PVE::Tunnel::fork_ssh_tunnel($self->{rem_ssh}, $cmd, $ssh_forward_info, $log);
 }
 
+sub fork_websocket_tunnel {
+    my ($self, $storages, $bridges) = @_;
+
+    my $remote = $self->{opts}->{remote};
+    my $conn = $remote->{conn};
+
+    my $log = sub {
+	my ($level, $msg) = @_;
+	$self->log($level, $msg);
+    };
+
+    my $websocket_url = "https://$conn->{host}:$conn->{port}/api2/json/nodes/$self->{node}/qemu/$remote->{vmid}/mtunnelwebsocket";
+    my $url = "/nodes/$self->{node}/qemu/$remote->{vmid}/mtunnel";
+
+    my $tunnel_params = {
+	url => $websocket_url,
+    };
+
+    my $storage_list = join(',', keys %$storages);
+    my $bridge_list = join(',', keys %$bridges);
+
+    my $req_params = {
+	storages => $storage_list,
+	bridges => $bridge_list,
+    };
+
+    return PVE::Tunnel::fork_websocket_tunnel($conn, $url, $req_params, $tunnel_params, $log);
+}
+
 # tunnel_info:
 #   proto: unix (secure) or tcp (insecure/legacy compat)
 #   addr: IP or UNIX socket path
@@ -188,23 +220,34 @@ sub prepare {
     }
 
     my $vollist = PVE::QemuServer::get_vm_volumes($conf);
+
+    my $storages = {};
     foreach my $volid (@$vollist) {
 	my ($sid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
 
-	# check if storage is available on both nodes
+	# check if storage is available on source node
 	my $scfg = PVE::Storage::storage_check_enabled($storecfg, $sid);
 
 	my $targetsid = $sid;
-	# NOTE: we currently ignore shared source storages in mappings so skip here too for now
-	if (!$scfg->{shared}) {
+	# NOTE: local ignores shared mappings, remote maps them
+	if (!$scfg->{shared} || $self->{opts}->{remote}) {
 	    $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $sid);
 	}
 
-	my $target_scfg = PVE::Storage::storage_check_enabled($storecfg, $targetsid, $self->{node});
-	my ($vtype) = PVE::Storage::parse_volname($storecfg, $volid);
+	$storages->{$targetsid} = 1;
 
-	die "$volid: content type '$vtype' is not available on storage '$targetsid'\n"
-	    if !$target_scfg->{content}->{$vtype};
+	if (!$self->{opts}->{remote}) {
+	    # check if storage is available on target node
+	    my $target_scfg = PVE::Storage::storage_check_enabled(
+		$storecfg,
+		$targetsid,
+		$self->{node},
+	    );
+	    my ($vtype) = PVE::Storage::parse_volname($storecfg, $volid);
+
+	    die "$volid: content type '$vtype' is not available on storage '$targetsid'\n"
+		if !$target_scfg->{content}->{$vtype};
+	}
 
 	if ($scfg->{shared}) {
 	    # PVE::Storage::activate_storage checks this for non-shared storages
@@ -214,10 +257,27 @@ sub prepare {
 	}
     }
 
-    # test ssh connection
-    my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
-    eval { $self->cmd_quiet($cmd); };
-    die "Can't connect to destination address using public key\n" if $@;
+    if ($self->{opts}->{remote}) {
+	# test & establish websocket connection
+	my $bridges = map_bridges($conf, $self->{opts}->{bridgemap}, 1);
+	my $tunnel = $self->fork_websocket_tunnel($storages, $bridges);
+	my $min_version = $tunnel->{version} - $tunnel->{age};
+	$self->log('info', "local WS tunnel version: $WS_TUNNEL_VERSION");
+	$self->log('info', "remote WS tunnel version: $tunnel->{version}");
+	$self->log('info', "minimum required WS tunnel version: $min_version");
+	die "Remote tunnel endpoint not compatible, upgrade required\n"
+	    if $WS_TUNNEL_VERSION < $min_version;
+	 die "Remote tunnel endpoint too old, upgrade required\n"
+	    if $WS_TUNNEL_VERSION > $tunnel->{version};
+
+	print "websocket tunnel started\n";
+	$self->{tunnel} = $tunnel;
+    } else {
+	# test ssh connection
+	my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
+	eval { $self->cmd_quiet($cmd); };
+	die "Can't connect to destination address using public key\n" if $@;
+    }
 
     return $running;
 }
@@ -255,7 +315,7 @@ sub scan_local_volumes {
 	my @sids = PVE::Storage::storage_ids($storecfg);
 	foreach my $storeid (@sids) {
 	    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
-	    next if $scfg->{shared};
+	    next if $scfg->{shared} && !$self->{opts}->{remote};
 	    next if !PVE::Storage::storage_check_enabled($storecfg, $storeid, undef, 1);
 
 	    # get list from PVE::Storage (for unused volumes)
@@ -264,21 +324,20 @@ sub scan_local_volumes {
 	    next if @{$dl->{$storeid}} == 0;
 
 	    my $targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $storeid);
-	    # check if storage is available on target node
-	    my $target_scfg = PVE::Storage::storage_check_enabled(
-		$storecfg,
-		$targetsid,
-		$self->{node},
-	    );
+	    if (!$self->{opts}->{remote}) {
+		# check if storage is available on target node
+		my $target_scfg = PVE::Storage::storage_check_enabled(
+		    $storecfg,
+		    $targetsid,
+		    $self->{node},
+		);
 
-	    die "content type 'images' is not available on storage '$targetsid'\n"
-		if !$target_scfg->{content}->{images};
+		die "content type 'images' is not available on storage '$targetsid'\n"
+		    if !$target_scfg->{content}->{images};
 
-	    my $bwlimit = PVE::Storage::get_bandwidth_limit(
-		'migration',
-		[$targetsid, $storeid],
-		$self->{opts}->{bwlimit},
-	    );
+	    }
+
+	    my $bwlimit = $self->get_bwlimit($storeid, $targetsid);
 
 	    PVE::Storage::foreach_volid($dl, sub {
 		my ($volid, $sid, $volinfo) = @_;
@@ -332,14 +391,17 @@ sub scan_local_volumes {
 	    my $scfg = PVE::Storage::storage_check_enabled($storecfg, $sid);
 
 	    my $targetsid = $sid;
-	    # NOTE: we currently ignore shared source storages in mappings so skip here too for now
-	    if (!$scfg->{shared}) {
+	    # NOTE: local ignores shared mappings, remote maps them
+	    if (!$scfg->{shared} || $self->{opts}->{remote}) {
 		$targetsid = PVE::JSONSchema::map_id($self->{opts}->{storagemap}, $sid);
 	    }
 
-	    PVE::Storage::storage_check_enabled($storecfg, $targetsid, $self->{node});
+	    # check target storage on target node if intra-cluster migration
+	    if (!$self->{opts}->{remote}) {
+		PVE::Storage::storage_check_enabled($storecfg, $targetsid, $self->{node});
 
-	    return if $scfg->{shared};
+		return if $scfg->{shared};
+	    }
 
 	    $local_volumes->{$volid}->{ref} = $attr->{referenced_in_config} ? 'config' : 'snapshot';
 	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_unused};
@@ -372,6 +434,8 @@ sub scan_local_volumes {
 		# exceptions: 'zfspool' or 'qcow2' files (on directory storage)
 
 		die "online storage migration not possible if snapshot exists\n" if $self->{running};
+		die "remote migration with snapshots not supported yet\n" if $self->{opts}->{remote};
+
 		if (!($scfg->{type} eq 'zfspool'
 		    || ($scfg->{type} eq 'btrfs' && $local_volumes->{$volid}->{format} eq 'raw')
 		    || $local_volumes->{$volid}->{format} eq 'qcow2'
@@ -428,6 +492,9 @@ sub scan_local_volumes {
 
 	    my $migratable = $scfg->{type} =~ /^(?:dir|btrfs|zfspool|lvmthin|lvm)$/;
 
+	    # TODO: what is this even here for?
+	    $migratable = 1 if $self->{opts}->{remote};
+
 	    die "can't migrate '$volid' - storage type '$scfg->{type}' not supported\n"
 		if !$migratable;
 
@@ -462,6 +529,10 @@ sub handle_replication {
     my $local_volumes = $self->{local_volumes};
 
     return if !$self->{replication_jobcfg};
+
+    die "can't migrate VM with replicated volumes to remote cluster/node\n"
+	if $self->{opts}->{remote};
+
     if ($self->{running}) {
 
 	my $version = PVE::QemuServer::kvm_user_version();
@@ -561,24 +632,51 @@ sub sync_offline_local_volumes {
     $self->log('info', "copying local disk images") if scalar(@volids);
 
     foreach my $volid (@volids) {
-	my $targetsid = $local_volumes->{$volid}->{targetsid};
-	my $bwlimit = $local_volumes->{$volid}->{bwlimit};
-	$bwlimit = $bwlimit * 1024 if defined($bwlimit); # storage_migrate uses bps
-
-	my $storage_migrate_opts = {
-	    'ratelimit_bps' => $bwlimit,
-	    'insecure' => $opts->{migration_type} eq 'insecure',
-	    'with_snapshots' => $local_volumes->{$volid}->{snapshots},
-	    'allow_rename' => !$local_volumes->{$volid}->{is_vmstate},
-	};
+	my $new_volid;
 
-	my $logfunc = sub { $self->log('info', $_[0]); };
-	my $new_volid = eval {
-	    PVE::Storage::storage_migrate($storecfg, $volid, $self->{ssh_info},
-					  $targetsid, $storage_migrate_opts, $logfunc);
-	};
-	if (my $err = $@) {
-	    die "storage migration for '$volid' to storage '$targetsid' failed - $err\n";
+	my $opts = $self->{opts};
+	if ($opts->{remote}) {
+	    my $log = sub {
+		my ($level, $msg) = @_;
+		$self->log($level, $msg);
+	    };
+
+	    $new_volid = PVE::StorageTunnel::storage_migrate(
+		$self->{tunnel},
+		$storecfg,
+		$volid,
+		$self->{vmid},
+		$opts->{remote}->{vmid},
+		$local_volumes->{$volid},
+		$log,
+	    );
+	} else {
+	    my $targetsid = $local_volumes->{$volid}->{targetsid};
+
+	    my $bwlimit = $local_volumes->{$volid}->{bwlimit};
+	    $bwlimit = $bwlimit * 1024 if defined($bwlimit); # storage_migrate uses bps
+
+	    my $storage_migrate_opts = {
+		'ratelimit_bps' => $bwlimit,
+		'insecure' => $opts->{migration_type} eq 'insecure',
+		'with_snapshots' => $local_volumes->{$volid}->{snapshots},
+		'allow_rename' => !$local_volumes->{$volid}->{is_vmstate},
+	    };
+
+	    my $logfunc = sub { $self->log('info', $_[0]); };
+	    $new_volid = eval {
+		PVE::Storage::storage_migrate(
+		    $storecfg,
+		    $volid,
+		    $self->{ssh_info},
+		    $targetsid,
+		    $storage_migrate_opts,
+		    $logfunc,
+		);
+	    };
+	    if (my $err = $@) {
+		die "storage migration for '$volid' to storage '$targetsid' failed - $err\n";
+	    }
 	}
 
 	$self->{volume_map}->{$volid} = $new_volid;
@@ -594,6 +692,12 @@ sub sync_offline_local_volumes {
 sub cleanup_remotedisks {
     my ($self) = @_;
 
+    if ($self->{opts}->{remote}) {
+	PVE::Tunnel::finish_tunnel($self->{tunnel}, 1);
+	delete $self->{tunnel};
+	return;
+    }
+
     my $local_volumes = $self->{local_volumes};
 
     foreach my $volid (values %{$self->{volume_map}}) {
@@ -643,8 +747,100 @@ sub phase1 {
     $self->handle_replication($vmid);
 
     $self->sync_offline_local_volumes();
+    $self->phase1_remote($vmid) if $self->{opts}->{remote};
 };
 
+sub map_bridges {
+    my ($conf, $map, $scan_only) = @_;
+
+    my $bridges = {};
+
+    foreach my $opt (keys %$conf) {
+	next if $opt !~ m/^net\d+$/;
+
+	next if !$conf->{$opt};
+	my $d = PVE::QemuServer::parse_net($conf->{$opt});
+	next if !$d || !$d->{bridge};
+
+	my $target_bridge = PVE::JSONSchema::map_id($map, $d->{bridge});
+	$bridges->{$target_bridge}->{$opt} = $d->{bridge};
+
+	next if $scan_only;
+
+	$d->{bridge} = $target_bridge;
+	$conf->{$opt} = PVE::QemuServer::print_net($d);
+    }
+
+    return $bridges;
+}
+
+sub phase1_remote {
+    my ($self, $vmid) = @_;
+
+    my $remote_conf = PVE::QemuConfig->load_config($vmid);
+    PVE::QemuConfig->update_volume_ids($remote_conf, $self->{volume_map});
+
+    my $bridges = map_bridges($remote_conf, $self->{opts}->{bridgemap});
+    for my $target (keys $bridges->%*) {
+	for my $nic (keys $bridges->{$target}->%*) {
+	    $self->log('info', "mapped: $nic from $bridges->{$target}->{$nic} to $target");
+	}
+    }
+
+    my @online_local_volumes = $self->filter_local_volumes('online');
+
+    my $storage_map = $self->{opts}->{storagemap};
+    $self->{nbd} = {};
+    PVE::QemuConfig->foreach_volume($remote_conf, sub {
+	my ($ds, $drive) = @_;
+
+	# TODO eject CDROM?
+	return if PVE::QemuServer::drive_is_cdrom($drive);
+
+	my $volid = $drive->{file};
+	return if !$volid;
+
+	return if !grep { $_ eq $volid} @online_local_volumes;
+
+	my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
+	my $scfg = PVE::Storage::storage_config($self->{storecfg}, $storeid);
+	my $source_format = PVE::QemuServer::qemu_img_format($scfg, $volname);
+
+	# set by target cluster
+	my $oldvolid = delete $drive->{file};
+	delete $drive->{format};
+
+	my $targetsid = PVE::JSONSchema::map_id($storage_map, $storeid);
+
+	my $params = {
+	    format => $source_format,
+	    storage => $targetsid,
+	    drive => $drive,
+	};
+
+	$self->log('info', "Allocating volume for drive '$ds' on remote storage '$targetsid'..");
+	my $res = PVE::Tunnel::write_tunnel($self->{tunnel}, 600, 'disk', $params);
+
+	$self->log('info', "volume '$oldvolid' is '$res->{volid}' on the target\n");
+	$remote_conf->{$ds} = $res->{drivestr};
+	$self->{nbd}->{$ds} = $res;
+    });
+
+    my $conf_str = PVE::QemuServer::write_vm_config("remote", $remote_conf);
+
+    # TODO expose in PVE::Firewall?
+    my $vm_fw_conf_path = "/etc/pve/firewall/$vmid.fw";
+    my $fw_conf_str;
+    $fw_conf_str = PVE::Tools::file_get_contents($vm_fw_conf_path)
+	if -e $vm_fw_conf_path;
+    my $params = {
+	conf => $conf_str,
+	'firewall-config' => $fw_conf_str,
+    };
+
+    PVE::Tunnel::write_tunnel($self->{tunnel}, 10, 'config', $params);
+}
+
 sub phase1_cleanup {
     my ($self, $vmid, $err) = @_;
 
@@ -675,7 +871,6 @@ sub phase2_start_local_cluster {
     my $local_volumes = $self->{local_volumes};
     my @online_local_volumes = $self->filter_local_volumes('online');
 
-    $self->{storage_migration} = 1 if scalar(@online_local_volumes);
     my $start = $params->{start_params};
     my $migrate = $params->{migrate_opts};
 
@@ -820,10 +1015,37 @@ sub phase2_start_local_cluster {
     return ($tunnel_info, $spice_port);
 }
 
+sub phase2_start_remote_cluster {
+    my ($self, $vmid, $params) = @_;
+
+    die "insecure migration to remote cluster not implemented\n"
+	if $params->{migrate_opts}->{type} ne 'websocket';
+
+    my $remote_vmid = $self->{opts}->{remote}->{vmid};
+
+    # like regular start but with some overhead accounted for
+    my $timeout = PVE::QemuServer::Helpers::config_aware_timeout($self->{vmconf}) + 10;
+
+    my $res = PVE::Tunnel::write_tunnel($self->{tunnel}, $timeout, "start", $params);
+
+    foreach my $drive (keys %{$res->{drives}}) {
+	$self->{stopnbd} = 1;
+	$self->{target_drive}->{$drive}->{drivestr} = $res->{drives}->{$drive}->{drivestr};
+	my $nbd_uri = $res->{drives}->{$drive}->{nbd_uri};
+	die "unexpected NBD uri for '$drive': $nbd_uri\n"
+	    if $nbd_uri !~ s!/run/qemu-server/$remote_vmid\_!/run/qemu-server/$vmid\_!;
+
+	$self->{target_drive}->{$drive}->{nbd_uri} = $nbd_uri;
+    }
+
+    return ($res->{migrate}, $res->{spice_port});
+}
+
 sub phase2 {
     my ($self, $vmid) = @_;
 
     my $conf = $self->{vmconf};
+    my $local_volumes = $self->{local_volumes};
 
     # version > 0 for unix socket support
     my $nbd_protocol_version = 1;
@@ -855,10 +1077,39 @@ sub phase2 {
 	},
     };
 
-    my ($tunnel_info, $spice_port) = $self->phase2_start_local_cluster($vmid, $params);
+    my ($tunnel_info, $spice_port);
 
-    $self->log('info', "start remote tunnel");
-    $self->start_remote_tunnel($tunnel_info);
+    my @online_local_volumes = $self->filter_local_volumes('online');
+    $self->{storage_migration} = 1 if scalar(@online_local_volumes);
+
+    if (my $remote = $self->{opts}->{remote}) {
+	my $remote_vmid = $remote->{vmid};
+	$params->{migrate_opts}->{remote_node} = $self->{node};
+	($tunnel_info, $spice_port) = $self->phase2_start_remote_cluster($vmid, $params);
+	die "only UNIX sockets are supported for remote migration\n"
+	    if $tunnel_info->{proto} ne 'unix';
+
+	my $remote_socket = $tunnel_info->{addr};
+	my $local_socket = $remote_socket;
+	$local_socket =~ s/$remote_vmid/$vmid/g;
+	$tunnel_info->{addr} = $local_socket;
+
+	$self->log('info', "Setting up tunnel for '$local_socket'");
+	PVE::Tunnel::forward_unix_socket($self->{tunnel}, $local_socket, $remote_socket);
+
+	foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) {
+	    my $local_socket = $remote_socket;
+	    $local_socket =~ s/$remote_vmid/$vmid/g;
+	    next if $self->{tunnel}->{forwarded}->{$local_socket};
+	    $self->log('info', "Setting up tunnel for '$local_socket'");
+	    PVE::Tunnel::forward_unix_socket($self->{tunnel}, $local_socket, $remote_socket);
+	}
+    } else {
+	($tunnel_info, $spice_port) = $self->phase2_start_local_cluster($vmid, $params);
+
+	$self->log('info', "start remote tunnel");
+	$self->start_remote_tunnel($tunnel_info);
+    }
 
     my $migrate_uri = "$tunnel_info->{proto}:$tunnel_info->{addr}";
     $migrate_uri .= ":$tunnel_info->{port}"
@@ -868,8 +1119,6 @@ sub phase2 {
 	$self->{storage_migration_jobs} = {};
 	$self->log('info', "starting storage migration");
 
-	my @online_local_volumes = $self->filter_local_volumes('online');
-
 	die "The number of local disks does not match between the source and the destination.\n"
 	    if (scalar(keys %{$self->{target_drive}}) != scalar(@online_local_volumes));
 	foreach my $drive (keys %{$self->{target_drive}}){
@@ -901,7 +1150,8 @@ sub phase2 {
 
     # migrate speed can be set via bwlimit (datacenter.cfg and API) and via the
     # migrate_speed parameter in qm.conf - take the lower of the two.
-    my $bwlimit = PVE::Storage::get_bandwidth_limit('migration', undef, $self->{opts}->{bwlimit}) // 0;
+    my $bwlimit = $self->get_bwlimit();
+
     my $migrate_speed = $conf->{migrate_speed} // 0;
     $migrate_speed *= 1024; # migrate_speed is in MB/s, bwlimit in KB/s
 
@@ -942,7 +1192,7 @@ sub phase2 {
     };
     $self->log('info', "migrate-set-parameters error: $@") if $@;
 
-    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) {
+    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga}) && !$self->{opts}->{remote}) {
 	my $rpcenv = PVE::RPCEnvironment::get();
 	my $authuser = $rpcenv->get_user();
 
@@ -1155,11 +1405,15 @@ sub phase2_cleanup {
 
     my $nodename = PVE::INotify::nodename();
 
-    my $cmd = [@{$self->{rem_ssh}}, 'qm', 'stop', $vmid, '--skiplock', '--migratedfrom', $nodename];
-    eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
-    if (my $err = $@) {
-        $self->log('err', $err);
-        $self->{errors} = 1;
+    if ($self->{tunnel} && $self->{tunnel}->{version} >= 2) {
+	PVE::Tunnel::write_tunnel($self->{tunnel}, 10, 'stop');
+    } else {
+	my $cmd = [@{$self->{rem_ssh}}, 'qm', 'stop', $vmid, '--skiplock', '--migratedfrom', $nodename];
+	eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
+	if (my $err = $@) {
+	    $self->log('err', $err);
+	    $self->{errors} = 1;
+	}
     }
 
     # cleanup after stopping, otherwise disks might be in-use by target VM!
@@ -1192,7 +1446,7 @@ sub phase3_cleanup {
 
     my $tunnel = $self->{tunnel};
 
-    if ($self->{volume_map}) {
+    if ($self->{volume_map} && !$self->{opts}->{remote}) {
 	my $target_drives = $self->{target_drive};
 
 	# FIXME: for NBD storage migration we now only update the volid, and
@@ -1208,20 +1462,26 @@ sub phase3_cleanup {
     }
 
     # transfer replication state before move config
-    $self->transfer_replication_state() if $self->{is_replicated};
-    PVE::QemuConfig->move_config_to_node($vmid, $self->{node});
-    $self->switch_replication_job_target() if $self->{is_replicated};
+    if (!$self->{opts}->{remote}) {
+	$self->transfer_replication_state() if $self->{is_replicated};
+	PVE::QemuConfig->move_config_to_node($vmid, $self->{node});
+	$self->switch_replication_job_target() if $self->{is_replicated};
+    }
 
     if ($self->{livemigration}) {
 	if ($self->{stopnbd}) {
 	    $self->log('info', "stopping NBD storage migration server on target.");
 	    # stop nbd server on remote vm - requirement for resume since 2.9
-	    my $cmd = [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid];
+	    if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 2) {
+		PVE::Tunnel::write_tunnel($tunnel, 30, 'nbdstop');
+	    } else {
+		my $cmd = [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid];
 
-	    eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
-	    if (my $err = $@) {
-		$self->log('err', $err);
-		$self->{errors} = 1;
+		eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
+		if (my $err = $@) {
+		    $self->log('err', $err);
+		    $self->{errors} = 1;
+		}
 	    }
 	}
 
@@ -1231,8 +1491,9 @@ sub phase3_cleanup {
 	if (!$self->{vm_was_paused}) {
 	    # config moved and nbd server stopped - now we can resume vm on target
 	    if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1) {
+		my $cmd = $tunnel->{version} == 1 ? "resume $vmid" : "resume";
 		eval {
-		    PVE::Tunnel::write_tunnel($tunnel, 30, "resume $vmid");
+		    PVE::Tunnel::write_tunnel($tunnel, 30, $cmd);
 		};
 		if (my $err = $@) {
 		    $self->log('err', $err);
@@ -1259,11 +1520,15 @@ sub phase3_cleanup {
 	) {
 	    if (!$self->{vm_was_paused}) {
 		$self->log('info', "issuing guest fstrim");
-		my $cmd = [@{$self->{rem_ssh}}, 'qm', 'guest', 'cmd', $vmid, 'fstrim'];
-		eval { PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
-		if (my $err = $@) {
-		    $self->log('err', "fstrim failed - $err");
-		    $self->{errors} = 1;
+		if ($self->{opts}->{remote}) {
+		    PVE::Tunnel::write_tunnel($self->{tunnel}, 600, 'fstrim');
+		} else {
+		    my $cmd = [@{$self->{rem_ssh}}, 'qm', 'guest', 'cmd', $vmid, 'fstrim'];
+		    eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
+		    if (my $err = $@) {
+			$self->log('err', "fstrim failed - $err");
+			$self->{errors} = 1;
+		    }
 		}
 	    } else {
 		$self->log('info', "skipping guest fstrim, because VM is paused");
@@ -1272,12 +1537,14 @@ sub phase3_cleanup {
     }
 
     # close tunnel on successful migration, on error phase2_cleanup closed it
-    if ($tunnel) {
+    if ($tunnel && $tunnel->{version} == 1) {
 	eval { PVE::Tunnel::finish_tunnel($tunnel); };
 	if (my $err = $@) {
 	    $self->log('err', $err);
 	    $self->{errors} = 1;
 	}
+	$tunnel = undef;
+	delete $self->{tunnel};
     }
 
     eval {
@@ -1315,6 +1582,9 @@ sub phase3_cleanup {
 
     # destroy local copies
     foreach my $volid (@not_replicated_volumes) {
+	# remote is cleaned up below
+	next if $self->{opts}->{remote};
+
 	eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); };
 	if (my $err = $@) {
 	    $self->log('err', "removing local copy of '$volid' failed - $err");
@@ -1324,8 +1594,19 @@ sub phase3_cleanup {
     }
 
     # clear migrate lock
-    my $cmd = [ @{$self->{rem_ssh}}, 'qm', 'unlock', $vmid ];
-    $self->cmd_logerr($cmd, errmsg => "failed to clear migrate lock");
+    if ($tunnel && $tunnel->{version} >= 2) {
+	PVE::Tunnel::write_tunnel($tunnel, 10, "unlock");
+
+	PVE::Tunnel::finish_tunnel($tunnel);
+    } else {
+	my $cmd = [ @{$self->{rem_ssh}}, 'qm', 'unlock', $vmid ];
+	$self->cmd_logerr($cmd, errmsg => "failed to clear migrate lock");
+    }
+
+    if ($self->{opts}->{remote} && $self->{opts}->{delete}) {
+	eval { PVE::QemuServer::destroy_vm($self->{storecfg}, $vmid, 1, undef, 0) };
+	warn "Failed to remove source VM - $@\n" if $@;
+    }
 }
 
 sub final_cleanup {
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index d3ab43ee..1319e105 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5670,7 +5670,10 @@ sub vm_start_nolock {
     my $defaults = load_defaults();
 
     # set environment variable useful inside network script
-    $ENV{PVE_MIGRATED_FROM} = $migratedfrom if $migratedfrom;
+    # for remote migration the config is available on the target node!
+    if (!$migrate_opts->{remote_node}) {
+	$ENV{PVE_MIGRATED_FROM} = $migratedfrom;
+    }
 
     PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1);
 
@@ -5920,7 +5923,7 @@ sub vm_start_nolock {
 
 	my $migrate_storage_uri;
 	# nbd_protocol_version > 0 for unix socket support
-	if ($nbd_protocol_version > 0 && $migration_type eq 'secure') {
+	if ($nbd_protocol_version > 0 && ($migration_type eq 'secure' || $migration_type eq 'websocket')) {
 	    my $socket_path = "/run/qemu-server/$vmid\_nbd.migrate";
 	    mon_cmd($vmid, "nbd-server-start", addr => { type => 'unix', data => { path => $socket_path } } );
 	    $migrate_storage_uri = "nbd:unix:$socket_path";
diff --git a/debian/control b/debian/control
index a2a7ce48..dd1e62f9 100644
--- a/debian/control
+++ b/debian/control
@@ -37,11 +37,12 @@ Depends: dbus,
          libpve-cluster-perl,
          libpve-common-perl (>= 7.2-5),
          libpve-guest-common-perl (>= 4.2-2),
-         libpve-storage-perl (>= 6.3-8),
+         libpve-storage-perl (>= 7.2-10),
          libterm-readline-gnu-perl,
          libuuid-perl,
          libxml-libxml-perl,
          perl (>= 5.10.0-19),
+         proxmox-websocket-tunnel,
          pve-cluster,
          pve-edk2-firmware (>= 3.20210831-1),
          pve-firewall,
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH qemu-server v7 6/7] api: add remote migrate endpoint
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (7 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 5/7] migrate: add remote migration handling Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 7/7] qm: add remote-migrate command Fabian Grünbichler
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

entry point for the remote migration on the source side, mainly
preparing the API client that gets passed to the actual migration code
and doing some parameter parsing.

querying of the remote sides resources (like available storages, free
VMIDs, lookup of endpoint details for specific nodes, ...) should be
done by the client - see next commit with CLI example.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    v6:
    - mark as experimental
    - remove `with-local-disks` from API parameters, always set to true
    v5:
    - add to API index
    v4:
    - removed target_node parameter, now determined by querying /cluster/status on the remote
    - moved checks to CLI

 PVE/API2/Qemu.pm | 213 ++++++++++++++++++++++++++++++++++++++++++++++-
 debian/control   |   2 +
 2 files changed, 212 insertions(+), 3 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 1517eada..6836c557 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -12,6 +12,7 @@ use URI::Escape;
 use Crypt::OpenSSL::Random;
 use Socket qw(SOCK_STREAM);
 
+use PVE::APIClient::LWP;
 use PVE::Cluster qw (cfs_read_file cfs_write_file);;
 use PVE::RRD;
 use PVE::SafeSyslog;
@@ -53,8 +54,6 @@ BEGIN {
     }
 }
 
-use Data::Dumper; # fixme: remove
-
 use base qw(PVE::RESTHandler);
 
 my $opt_force_description = "Force physical removal. Without this, we simple remove the disk from the config file and create an additional configuration entry called 'unused[n]', which contains the volume ID. Unlink of unused[n] always cause physical removal.";
@@ -1097,7 +1096,8 @@ __PACKAGE__->register_method({
 	    { subdir => 'sendkey' },
 	    { subdir => 'firewall' },
 	    { subdir => 'mtunnel' },
-	    ];
+	    { subdir => 'remote_migrate' },
+	];
 
 	return $res;
     }});
@@ -4427,6 +4427,202 @@ __PACKAGE__->register_method({
 
     }});
 
+__PACKAGE__->register_method({
+    name => 'remote_migrate_vm',
+    path => '{vmid}/remote_migrate',
+    method => 'POST',
+    protected => 1,
+    proxyto => 'node',
+    description => "Migrate virtual machine to a remote cluster. Creates a new migration task. EXPERIMENTAL feature!",
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
+	    'target-vmid' => get_standard_option('pve-vmid', { optional => 1 }),
+	    'target-endpoint' => get_standard_option('proxmox-remote', {
+		description => "Remote target endpoint",
+	    }),
+	    online => {
+		type => 'boolean',
+		description => "Use online/live migration if VM is running. Ignored if VM is stopped.",
+		optional => 1,
+	    },
+	    delete => {
+		type => 'boolean',
+		description => "Delete the original VM and related data after successful migration. By default the original VM is kept on the source cluster in a stopped state.",
+		optional => 1,
+		default => 0,
+	    },
+	    'target-storage' => get_standard_option('pve-targetstorage', {
+		completion => \&PVE::QemuServer::complete_migration_storage,
+		optional => 0,
+	    }),
+	    'target-bridge' => {
+		type => 'string',
+		description => "Mapping from source to target bridges. Providing only a single bridge ID maps all source bridges to that bridge. Providing the special value '1' will map each source bridge to itself.",
+		format => 'bridge-pair-list',
+	    },
+	    bwlimit => {
+		description => "Override I/O bandwidth limit (in KiB/s).",
+		optional => 1,
+		type => 'integer',
+		minimum => '0',
+		default => 'migrate limit from datacenter or storage config',
+	    },
+	},
+    },
+    returns => {
+	type => 'string',
+	description => "the task ID.",
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $source_vmid = extract_param($param, 'vmid');
+	my $target_endpoint = extract_param($param, 'target-endpoint');
+	my $target_vmid = extract_param($param, 'target-vmid') // $source_vmid;
+
+	my $delete = extract_param($param, 'delete') // 0;
+
+	PVE::Cluster::check_cfs_quorum();
+
+	# test if VM exists
+	my $conf = PVE::QemuConfig->load_config($source_vmid);
+
+	PVE::QemuConfig->check_lock($conf);
+
+	raise_param_exc({ vmid => "cannot migrate HA-managed VM to remote cluster" })
+	    if PVE::HA::Config::vm_is_ha_managed($source_vmid);
+
+	my $remote = PVE::JSONSchema::parse_property_string('proxmox-remote', $target_endpoint);
+
+	# TODO: move this as helper somewhere appropriate?
+	my $conn_args = {
+	    protocol => 'https',
+	    host => $remote->{host},
+	    port => $remote->{port} // 8006,
+	    apitoken => $remote->{apitoken},
+	};
+
+	my $fp;
+	if ($fp = $remote->{fingerprint}) {
+	    $conn_args->{cached_fingerprints} = { uc($fp) => 1 };
+	}
+
+	print "Establishing API connection with remote at '$remote->{host}'\n";
+
+	my $api_client = PVE::APIClient::LWP->new(%$conn_args);
+
+	if (!defined($fp)) {
+	    my $cert_info = $api_client->get("/nodes/localhost/certificates/info");
+	    foreach my $cert (@$cert_info) {
+		my $filename = $cert->{filename};
+		next if $filename ne 'pveproxy-ssl.pem' && $filename ne 'pve-ssl.pem';
+		$fp = $cert->{fingerprint} if !$fp || $filename eq 'pveproxy-ssl.pem';
+	    }
+	    $conn_args->{cached_fingerprints} = { uc($fp) => 1 }
+		if defined($fp);
+	}
+
+	my $repl_conf = PVE::ReplicationConfig->new();
+	my $is_replicated = $repl_conf->check_for_existing_jobs($source_vmid, 1);
+	die "cannot remote-migrate replicated VM\n" if $is_replicated;
+
+	if (PVE::QemuServer::check_running($source_vmid)) {
+	    die "can't migrate running VM without --online\n" if !$param->{online};
+
+	} else {
+	    warn "VM isn't running. Doing offline migration instead.\n" if $param->{online};
+	    $param->{online} = 0;
+	}
+
+	# FIXME: fork worker hear to avoid timeout? or poll these periodically
+	# in pvestatd and access cached info here? all of the below is actually
+	# checked at the remote end anyway once we call the mtunnel endpoint,
+	# we could also punt it to the client and not do it here at all..
+	my $resources = $api_client->get("/cluster/resources", { type => 'vm' });
+	if (grep { defined($_->{vmid}) && $_->{vmid} eq $target_vmid } @$resources) {
+	    raise_param_exc({ target_vmid => "Guest with ID '$target_vmid' already exists on remote cluster" });
+	}
+
+	my $storages = $api_client->get("/nodes/localhost/storage", { enabled => 1 });
+
+	my $storecfg = PVE::Storage::config();
+	my $target_storage = extract_param($param, 'target-storage');
+	my $storagemap = eval { PVE::JSONSchema::parse_idmap($target_storage, 'pve-storage-id') };
+	raise_param_exc({ 'target-storage' => "failed to parse storage map: $@" })
+	    if $@;
+
+	my $target_bridge = extract_param($param, 'target-bridge');
+	my $bridgemap = eval { PVE::JSONSchema::parse_idmap($target_bridge, 'pve-bridge-id') };
+	raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" })
+	    if $@;
+
+	my $check_remote_storage = sub {
+	    my ($storage) = @_;
+	    my $found = [ grep { $_->{storage} eq $storage } @$storages ];
+	    die "remote: storage '$storage' does not exist!\n"
+		if !@$found;
+
+	    $found = @$found[0];
+
+	    my $content_types = [ PVE::Tools::split_list($found->{content}) ];
+	    die "remote: storage '$storage' cannot store images\n"
+		if !grep { $_ eq 'images' } @$content_types;
+	};
+
+	foreach my $target_sid (values %{$storagemap->{entries}}) {
+	    $check_remote_storage->($target_sid);
+	}
+
+	$check_remote_storage->($storagemap->{default})
+	    if $storagemap->{default};
+
+	die "remote migration requires explicit storage mapping!\n"
+	    if $storagemap->{identity};
+
+	$param->{storagemap} = $storagemap;
+	$param->{bridgemap} = $bridgemap;
+	$param->{remote} = {
+	    conn => $conn_args, # re-use fingerprint for tunnel
+	    client => $api_client,
+	    vmid => $target_vmid,
+	};
+	$param->{migration_type} = 'websocket';
+	$param->{'with-local-disks'} = 1;
+	$param->{delete} = $delete if $delete;
+
+	my $cluster_status = $api_client->get("/cluster/status");
+	my $target_node;
+	foreach my $entry (@$cluster_status) {
+	    next if $entry->{type} ne 'node';
+	    if ($entry->{local}) {
+		$target_node = $entry->{name};
+		last;
+	    }
+	}
+
+	die "couldn't determine endpoint's node name\n"
+	    if !defined($target_node);
+
+	my $realcmd = sub {
+	    PVE::QemuMigrate->migrate($target_node, $remote->{host}, $source_vmid, $param);
+	};
+
+	my $worker = sub {
+	    return PVE::GuestHelpers::guest_migration_lock($source_vmid, 10, $realcmd);
+	};
+
+	return $rpcenv->fork_worker('qmigrate', $source_vmid, $authuser, $worker);
+    }});
+
 __PACKAGE__->register_method({
     name => 'monitor',
     path => '{vmid}/monitor',
@@ -5133,6 +5329,12 @@ __PACKAGE__->register_method({
 		optional => 1,
 		description => 'List of storages to check permission and availability. Will be checked again for all actually used storages during migration.',
 	    },
+	    bridges => {
+		type => 'string',
+		format => 'pve-bridge-id-list',
+		optional => 1,
+		description => 'List of network bridges to check availability. Will be checked again for actually used bridges during migration.',
+	    },
 	},
     },
     returns => {
@@ -5153,6 +5355,7 @@ __PACKAGE__->register_method({
 	my $vmid = extract_param($param, 'vmid');
 
 	my $storages = extract_param($param, 'storages');
+	my $bridges = extract_param($param, 'bridges');
 
 	my $nodename = PVE::INotify::nodename();
 
@@ -5166,6 +5369,10 @@ __PACKAGE__->register_method({
 	    $check_storage_access_migrate->($rpcenv, $authuser, $storecfg, $storeid, $node);
 	}
 
+	foreach my $bridge (PVE::Tools::split_list($bridges)) {
+	    PVE::Network::read_bridge_mtu($bridge);
+	}
+
 	PVE::Cluster::check_cfs_quorum();
 
 	my $lock = 'create';
diff --git a/debian/control b/debian/control
index dd1e62f9..db8efaf6 100644
--- a/debian/control
+++ b/debian/control
@@ -6,6 +6,7 @@ Build-Depends: debhelper (>= 12~),
                libglib2.0-dev,
                libio-multiplex-perl,
                libjson-c-dev,
+               libpve-apiclient-perl,
                libpve-cluster-perl,
                libpve-common-perl (>= 7.2-5),
                libpve-guest-common-perl (>= 4.1-1),
@@ -34,6 +35,7 @@ Depends: dbus,
          libjson-xs-perl,
          libnet-ssleay-perl,
          libpve-access-control (>= 7.2-5),
+         libpve-apiclient-perl,
          libpve-cluster-perl,
          libpve-common-perl (>= 7.2-5),
          libpve-guest-common-perl (>= 4.2-2),
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] [PATCH qemu-server v7 7/7] qm: add remote-migrate command
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (8 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 6/7] api: add remote migrate endpoint Fabian Grünbichler
@ 2022-11-17 13:33 ` Fabian Grünbichler
  2022-11-17 13:39 ` [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
  2022-11-17 14:23 ` [pve-devel] applied-series: " Thomas Lamprecht
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:33 UTC (permalink / raw)
  To: pve-devel

which wraps the remote_migrate_vm API endpoint, but does the
precondition checks that can be done up front itself.

this now just leaves the FP retrieval and target node name lookup to the
sync part of the API endpoint, which should be do-able in <30s ..

an example invocation:

$ qm remote-migrate 1234 4321 'host=123.123.123.123,apitoken=PVEAPIToken=user@pve!incoming=aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee,fingerprint=aa:bb:cc:dd:ee:ff:aa:bb:cc:dd:ee:ff:aa:bb:cc:dd:ee:ff:aa:bb:cc:dd:ee:ff:aa:bb:cc:dd:ee:ff:aa:bb' --target-bridge vmbr0 --target-storage zfs-a:rbd-b,nfs-c:dir-d,zfs-e --online

will migrate the local VM 1234 to the host 123.123.1232.123 using the
given API token, mapping the VMID to 4321 on the target cluster, all its
virtual NICs to the target vm bridge 'vmbr0', any volumes on storage
zfs-a to storage rbd-b, any on storage nfs-c to storage dir-d, and any
other volumes to storage zfs-e. the source VM will be stopped but remain
on the source node/cluster after the migration has finished.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    v7:
    - fix example in commit message
    - rebase on top of PVE::CLI::qm changes
    
    v6:
    - mark as experimental
    - drop `with-local-disks` parameter from API, always set to true
    - add example invocation to commit message
    
    v5: rename to 'remote-migrate'

 PVE/API2/Qemu.pm |  31 -------------
 PVE/CLI/qm.pm    | 113 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 113 insertions(+), 31 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 6836c557..b0c40fa5 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -4543,17 +4543,6 @@ __PACKAGE__->register_method({
 	    $param->{online} = 0;
 	}
 
-	# FIXME: fork worker hear to avoid timeout? or poll these periodically
-	# in pvestatd and access cached info here? all of the below is actually
-	# checked at the remote end anyway once we call the mtunnel endpoint,
-	# we could also punt it to the client and not do it here at all..
-	my $resources = $api_client->get("/cluster/resources", { type => 'vm' });
-	if (grep { defined($_->{vmid}) && $_->{vmid} eq $target_vmid } @$resources) {
-	    raise_param_exc({ target_vmid => "Guest with ID '$target_vmid' already exists on remote cluster" });
-	}
-
-	my $storages = $api_client->get("/nodes/localhost/storage", { enabled => 1 });
-
 	my $storecfg = PVE::Storage::config();
 	my $target_storage = extract_param($param, 'target-storage');
 	my $storagemap = eval { PVE::JSONSchema::parse_idmap($target_storage, 'pve-storage-id') };
@@ -4565,26 +4554,6 @@ __PACKAGE__->register_method({
 	raise_param_exc({ 'target-bridge' => "failed to parse bridge map: $@" })
 	    if $@;
 
-	my $check_remote_storage = sub {
-	    my ($storage) = @_;
-	    my $found = [ grep { $_->{storage} eq $storage } @$storages ];
-	    die "remote: storage '$storage' does not exist!\n"
-		if !@$found;
-
-	    $found = @$found[0];
-
-	    my $content_types = [ PVE::Tools::split_list($found->{content}) ];
-	    die "remote: storage '$storage' cannot store images\n"
-		if !grep { $_ eq 'images' } @$content_types;
-	};
-
-	foreach my $target_sid (values %{$storagemap->{entries}}) {
-	    $check_remote_storage->($target_sid);
-	}
-
-	$check_remote_storage->($storagemap->{default})
-	    if $storagemap->{default};
-
 	die "remote migration requires explicit storage mapping!\n"
 	    if $storagemap->{identity};
 
diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
index 6655842e..66feecce 100755
--- a/PVE/CLI/qm.pm
+++ b/PVE/CLI/qm.pm
@@ -15,6 +15,7 @@ use POSIX qw(strftime);
 use Term::ReadLine;
 use URI::Escape;
 
+use PVE::APIClient::LWP;
 use PVE::Cluster;
 use PVE::Exception qw(raise_param_exc);
 use PVE::GuestHelpers;
@@ -159,6 +160,117 @@ __PACKAGE__->register_method ({
 	return;
     }});
 
+
+__PACKAGE__->register_method({
+    name => 'remote_migrate_vm',
+    path => 'remote_migrate_vm',
+    method => 'POST',
+    description => "Migrate virtual machine to a remote cluster. Creates a new migration task. EXPERIMENTAL feature!",
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
+	    'target-vmid' => get_standard_option('pve-vmid', { optional => 1 }),
+	    'target-endpoint' => get_standard_option('proxmox-remote', {
+		description => "Remote target endpoint",
+	    }),
+	    online => {
+		type => 'boolean',
+		description => "Use online/live migration if VM is running. Ignored if VM is stopped.",
+		optional => 1,
+	    },
+	    delete => {
+		type => 'boolean',
+		description => "Delete the original VM and related data after successful migration. By default the original VM is kept on the source cluster in a stopped state.",
+		optional => 1,
+		default => 0,
+	    },
+	    'target-storage' => get_standard_option('pve-targetstorage', {
+		completion => \&PVE::QemuServer::complete_migration_storage,
+		optional => 0,
+	    }),
+	    'target-bridge' => {
+		type => 'string',
+		description => "Mapping from source to target bridges. Providing only a single bridge ID maps all source bridges to that bridge. Providing the special value '1' will map each source bridge to itself.",
+		format => 'bridge-pair-list',
+	    },
+	    bwlimit => {
+		description => "Override I/O bandwidth limit (in KiB/s).",
+		optional => 1,
+		type => 'integer',
+		minimum => '0',
+		default => 'migrate limit from datacenter or storage config',
+	    },
+	},
+    },
+    returns => {
+	type => 'string',
+	description => "the task ID.",
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $source_vmid = $param->{vmid};
+	my $target_endpoint = $param->{'target-endpoint'};
+	my $target_vmid = $param->{'target-vmid'} // $source_vmid;
+
+	my $remote = PVE::JSONSchema::parse_property_string('proxmox-remote', $target_endpoint);
+
+	# TODO: move this as helper somewhere appropriate?
+	my $conn_args = {
+	    protocol => 'https',
+	    host => $remote->{host},
+	    port => $remote->{port} // 8006,
+	    apitoken => $remote->{apitoken},
+	};
+
+	$conn_args->{cached_fingerprints} = { uc($remote->{fingerprint}) => 1 }
+	    if defined($remote->{fingerprint});
+
+	my $api_client = PVE::APIClient::LWP->new(%$conn_args);
+	my $resources = $api_client->get("/cluster/resources", { type => 'vm' });
+	if (grep { defined($_->{vmid}) && $_->{vmid} eq $target_vmid } @$resources) {
+	    raise_param_exc({ target_vmid => "Guest with ID '$target_vmid' already exists on remote cluster" });
+	}
+
+	my $storages = $api_client->get("/nodes/localhost/storage", { enabled => 1 });
+
+	my $storecfg = PVE::Storage::config();
+	my $target_storage = $param->{'target-storage'};
+	my $storagemap = eval { PVE::JSONSchema::parse_idmap($target_storage, 'pve-storage-id') };
+	raise_param_exc({ 'target-storage' => "failed to parse storage map: $@" })
+	    if $@;
+
+	my $check_remote_storage = sub {
+	    my ($storage) = @_;
+	    my $found = [ grep { $_->{storage} eq $storage } @$storages ];
+	    die "remote: storage '$storage' does not exist!\n"
+		if !@$found;
+
+	    $found = @$found[0];
+
+	    my $content_types = [ PVE::Tools::split_list($found->{content}) ];
+	    die "remote: storage '$storage' cannot store images\n"
+		if !grep { $_ eq 'images' } @$content_types;
+	};
+
+	foreach my $target_sid (values %{$storagemap->{entries}}) {
+	    $check_remote_storage->($target_sid);
+	}
+
+	$check_remote_storage->($storagemap->{default})
+	    if $storagemap->{default};
+
+	return PVE::API2::Qemu->remote_migrate_vm($param);
+    }});
+
 __PACKAGE__->register_method ({
     name => 'status',
     path => 'status',
@@ -900,6 +1012,7 @@ our $cmddef = {
     clone => [ "PVE::API2::Qemu", 'clone_vm', ['vmid', 'newid'], { %node }, $upid_exit ],
 
     migrate => [ "PVE::API2::Qemu", 'migrate_vm', ['vmid', 'target'], { %node }, $upid_exit ],
+    'remote-migrate' => [ __PACKAGE__, 'remote_migrate_vm', ['vmid', 'target-vmid', 'target-endpoint'], { %node }, $upid_exit ],
 
     set => [ "PVE::API2::Qemu", 'update_vm', ['vmid'], { %node } ],
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (9 preceding siblings ...)
  2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 7/7] qm: add remote-migrate command Fabian Grünbichler
@ 2022-11-17 13:39 ` Fabian Grünbichler
  2022-11-17 14:23 ` [pve-devel] applied-series: " Thomas Lamprecht
  11 siblings, 0 replies; 13+ messages in thread
From: Fabian Grünbichler @ 2022-11-17 13:39 UTC (permalink / raw)
  To: Proxmox VE development discussion

On November 17, 2022 2:33 pm, Fabian Grünbichler wrote:
> this series adds remote migration for VMs and CTs.
> 
> both live and offline migration of VMs including NBD and
> storage-migrated disks should work, containers don't have any live
> migration so both offline and restart mode work identical except for the
> restart part.
> 
> groundwork for extending to pvesr already laid.
> 
> uncovered (but still not fixed)
> https://bugzilla.proxmox.com/show_bug.cgi?id=3873
> (migration btrfs -> btrfs with snapshots)
> 
> follow-ups/todos:
> - implement disk export/import for shared storages like rbd
> - implement disk export/import raw+size for ZFS zvols
> - extend ZFS replication via websocket tunnel to remote cluster
> - extend replication to support RBD snapshot-based replication
> - extend RBD replication via websocket tunnel to remote cluster
> - switch regular migration SSH mtunnel to version 2 with json support
>   (related -> s.hanreichs pre-/post-migrate-hook series)
> 

and obviously here is the place where I forgot to add the v7 high-level
changelog before sending:

new in v7:
- fixed parsing bug reported by Stefan Hanreich
- rebased
- qemu: adapted to PVE::CLI::qm changes
- qemu: fixed $conf->{cloudinit} and cloudinit image handling (patch #1 and #3)

> new in v6:
> - --with-local-disks always set and not a parameter
> - `pct remote-migrate`
> - new Sys.Incoming privilege + checks
> - storage export taintedness bug fix
> - properly take over pve-targetstorage option (qemu-server ->
>   pve-common)
> - review feedback addressed
> 
> new in v5: lots of edge cases fixed, PoC for pve-container, some more
> helper moving for re-use in pve-container without duplication
> 
> new in v4: lots of small fixes, improved bwlimit handling, `qm` command
> (thanks Fabian Ebner and Dominik Csapak for the feedback on v3!)
> 
> new in v3: lots of refactoring and edge-case handling
> 
> new in v2: dropped parts already applied, incorporated Fabian's and
> Dominik's feedback (thanks!)
> 
> new in v1: explicit remote endpoint specified as part of API call
> instead of remote.cfg
> 
> pve-container:
> 
> Fabian Grünbichler (3):
>   migration: add remote migration
>   pct: add 'remote-migrate' command
>   migrate: print mapped volume in error
> 
>  debian/control         |   3 +-
>  src/PVE/API2/LXC.pm    | 635 +++++++++++++++++++++++++++++++++++++++++
>  src/PVE/CLI/pct.pm     | 124 ++++++++
>  src/PVE/LXC/Migrate.pm | 248 +++++++++++++---
>  4 files changed, 967 insertions(+), 43 deletions(-)
> 
> qemu-server:
> 
> Fabian Grünbichler (7):
>   pending changes: allow skipping cloud-init
>   pending: fix typo in variable name
>   mtunnel: add API endpoints
>   migrate: refactor remote VM/tunnel start
>   migrate: add remote migration handling
>   api: add remote migrate endpoint
>   qm: add remote-migrate command
> 
>  PVE/API2/Qemu.pm   | 717 ++++++++++++++++++++++++++++++++++++++++++++-
>  PVE/CLI/qm.pm      | 113 +++++++
>  PVE/QemuMigrate.pm | 590 ++++++++++++++++++++++++++++---------
>  PVE/QemuServer.pm  |  49 ++--
>  debian/control     |   7 +-
>  5 files changed, 1311 insertions(+), 165 deletions(-)
> 
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pve-devel] applied-series: [PATCH-series container/qemu-server v7 0/10] remote migration
  2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
                   ` (10 preceding siblings ...)
  2022-11-17 13:39 ` [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
@ 2022-11-17 14:23 ` Thomas Lamprecht
  11 siblings, 0 replies; 13+ messages in thread
From: Thomas Lamprecht @ 2022-11-17 14:23 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

Am 17/11/2022 um 14:33 schrieb Fabian Grünbichler:
> pve-container:
> 
> Fabian Grünbichler (3):
>   migration: add remote migration
>   pct: add 'remote-migrate' command
>   migrate: print mapped volume in error
> 
>  debian/control         |   3 +-
>  src/PVE/API2/LXC.pm    | 635 +++++++++++++++++++++++++++++++++++++++++
>  src/PVE/CLI/pct.pm     | 124 ++++++++
>  src/PVE/LXC/Migrate.pm | 248 +++++++++++++---
>  4 files changed, 967 insertions(+), 43 deletions(-)
> 
> qemu-server:
> 
> Fabian Grünbichler (7):
>   pending changes: allow skipping cloud-init
>   pending: fix typo in variable name
>   mtunnel: add API endpoints
>   migrate: refactor remote VM/tunnel start
>   migrate: add remote migration handling
>   api: add remote migrate endpoint
>   qm: add remote-migrate command
> 
>  PVE/API2/Qemu.pm   | 717 ++++++++++++++++++++++++++++++++++++++++++++-
>  PVE/CLI/qm.pm      | 113 +++++++
>  PVE/QemuMigrate.pm | 590 ++++++++++++++++++++++++++++---------
>  PVE/QemuServer.pm  |  49 ++--
>  debian/control     |   7 +-
>  5 files changed, 1311 insertions(+), 165 deletions(-)
> 


applied, thanks!




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-17 14:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 13:33 [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH container v7 1/3] migration: add " Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH container v7 2/3] pct: add 'remote-migrate' command Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH container v7 3/3] migrate: print mapped volume in error Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 1/7] pending changes: allow skipping cloud-init Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 2/7] pending: fix typo in variable name Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 3/7] mtunnel: add API endpoints Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 4/7] migrate: refactor remote VM/tunnel start Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 5/7] migrate: add remote migration handling Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 6/7] api: add remote migrate endpoint Fabian Grünbichler
2022-11-17 13:33 ` [pve-devel] [PATCH qemu-server v7 7/7] qm: add remote-migrate command Fabian Grünbichler
2022-11-17 13:39 ` [pve-devel] [PATCH-series container/qemu-server v7 0/10] remote migration Fabian Grünbichler
2022-11-17 14:23 ` [pve-devel] applied-series: " Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal