* [pve-devel] [RFC] qemu-server: add migration_type=insecure to remote-migrate
@ 2026-04-25 1:10 Bogdan Ionescu
0 siblings, 0 replies; only message in thread
From: Bogdan Ionescu @ 2026-04-25 1:10 UTC (permalink / raw)
To: pve-devel
Hi all,
I'd like to gauge interest in adding a migration_type=insecure option to
the qm remote-migrate / remote_migrate_vm endpoint, before investing
time in a review-ready patch series.
== Motivation ==
The current remote-migrate implementation tunnels both control plane
and data plane through the websocket connection to the target's API
endpoint on 8006/tcp. This is the right default for trust reasons
(API token + TLS fingerprint, no SSH trust between clusters needed),
but the data plane throughput is severely bottlenecked by:
- userspace bouncing through PVE::Tunnel + pveproxy + qmtunnel
(3 Perl processes in the data path, each context-switching per
chunk)
- per-byte WebSocket masking in pure Perl (RFC 6455 §5.3)
- TLS framing on top
- lack of zero-copy / TSO offload for the streamed bytes
- multiple TCP segments end-to-end with independent flow control
In our deployment between two DCs connected by WireGuard over a
10 Gbps link, we observe sustained ~1 MB/s for remote-migrate while
intra-cluster `qm migrate --migration_type insecure` between the same
hosts saturates the link at ~300+ MB/s. The bottleneck is clearly
the WS tunnel data path on a single Perl-bound core, not the network.
For VMs with 32+ GB of RAM, this difference is the difference between
a migration finishing in 2 minutes vs. failing to converge entirely
because the dirty rate exceeds the throughput.
== Proposal ==
Mirror the local-cluster migration model: keep secure (WS-tunneled) as
the default, allow opt-in 'insecure' for trusted networks where the
operator has out-of-band guarantees (private cross-connect, VPN,
overlay encryption at L2/L3).
qm remote-migrate <vmid> <target-vmid> 'apitoken=...,host=...,fp=...' \
--target-storage ... --target-bridge ... --online \
--migration_type insecure \
--migration_network 10.50.0.0/24
Semantics:
- control plane (config, NBD allocation requests, tunnel commands,
spice ticket, etc.) still goes through the WS tunnel as today
- data plane (QEMU memory stream + NBD storage drive-mirror) goes
direct TCP between source and target on the standard
60000-60050 range, with the target's listener IP resolved from
--migration_network (same logic as local-cluster insecure)
- root-only on the source side, consistent with migrate_vm
- target advertises an 'insecure-remote' capability in the mtunnel
version response so source can fail closed on older targets
== Backward compatibility approach ==
Rather than bumping WS_TUNNEL_VERSION (which would break
new-source -> old-target combinations because of the existing
"$WS_TUNNEL_VERSION > $tunnel->{version}" check), I'd add a
forward-compatible 'caps' field to the version response. Old sources
ignore unknown JSON keys; new sources require 'insecure-remote' to be
present in caps before allowing migration_type=insecure, and otherwise
fall through to the existing WS-tunneled path with no behavioral
change.
This means all four mix matrices are clean:
- patched <-> patched, secure: identical to today
- unpatched src -> patched tgt: caps ignored, WS path as today
- patched src -> unpatched tgt, secure: caps absent, not checked,
WS path as today
- patched src -> unpatched tgt, insecure: source dies early with a
clear "upgrade target or omit migration_type=insecure" error,
no side effects on target
== Security considerations ==
- root-only at the API/CLI layer, same as the local-cluster knob
- documented as requiring trusted/private network between clusters
- no change to control plane or auth (still API token + TLS fp)
- data plane confidentiality drops to network-layer controls only,
which is identical to the trade-off operators already make for
intra-cluster insecure migration
- no new ports beyond the existing 60000-60050 range that
insecure migration already uses
- source-side caps check ensures no silent downgrade when target
doesn't support it
== Open questions ==
1. Is this direction acceptable in principle, or would you prefer
a different direction?
2. Should the 'caps' mechanism be added in a standalone preliminary
patch (useful as future-proofing for any opt-in mtunnel feature),
or rolled into the same series?
3. Should NBD direct-TCP be gated by a separate flag, or is it fine
to have migration_type=insecure imply both RAM and NBD direct?
The intra-cluster knob ties them together today.
4. Any preference on the parameter name? I matched migrate_vm
('migration_type', 'migration_network') for consistency, but
'data-direct-tcp' or similar would also work and arguably be
less misleading since the control plane is still encrypted.
Thanks,
Bogdan
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-04-25 1:19 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-25 1:10 [pve-devel] [RFC] qemu-server: add migration_type=insecure to remote-migrate Bogdan Ionescu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox