From: Bogdan Ionescu <bogdan@ionescu.at>
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>
Subject: [pve-devel] [RFC] qemu-server: add migration_type=insecure to remote-migrate
Date: Sat, 25 Apr 2026 01:10:36 +0000 [thread overview]
Message-ID: <seafvOtCUaxpKaU9e1NllFBL1k4UGGejSJqGIsJuUBfiiBC_cGhLAlDi5t-Yb6HarnuUCACmvRvObiM3cKACfZUKpQfgyYQ93jobOI8HTL0=@ionescu.at> (raw)
Hi all,
I'd like to gauge interest in adding a migration_type=insecure option to
the qm remote-migrate / remote_migrate_vm endpoint, before investing
time in a review-ready patch series.
== Motivation ==
The current remote-migrate implementation tunnels both control plane
and data plane through the websocket connection to the target's API
endpoint on 8006/tcp. This is the right default for trust reasons
(API token + TLS fingerprint, no SSH trust between clusters needed),
but the data plane throughput is severely bottlenecked by:
- userspace bouncing through PVE::Tunnel + pveproxy + qmtunnel
(3 Perl processes in the data path, each context-switching per
chunk)
- per-byte WebSocket masking in pure Perl (RFC 6455 §5.3)
- TLS framing on top
- lack of zero-copy / TSO offload for the streamed bytes
- multiple TCP segments end-to-end with independent flow control
In our deployment between two DCs connected by WireGuard over a
10 Gbps link, we observe sustained ~1 MB/s for remote-migrate while
intra-cluster `qm migrate --migration_type insecure` between the same
hosts saturates the link at ~300+ MB/s. The bottleneck is clearly
the WS tunnel data path on a single Perl-bound core, not the network.
For VMs with 32+ GB of RAM, this difference is the difference between
a migration finishing in 2 minutes vs. failing to converge entirely
because the dirty rate exceeds the throughput.
== Proposal ==
Mirror the local-cluster migration model: keep secure (WS-tunneled) as
the default, allow opt-in 'insecure' for trusted networks where the
operator has out-of-band guarantees (private cross-connect, VPN,
overlay encryption at L2/L3).
qm remote-migrate <vmid> <target-vmid> 'apitoken=...,host=...,fp=...' \
--target-storage ... --target-bridge ... --online \
--migration_type insecure \
--migration_network 10.50.0.0/24
Semantics:
- control plane (config, NBD allocation requests, tunnel commands,
spice ticket, etc.) still goes through the WS tunnel as today
- data plane (QEMU memory stream + NBD storage drive-mirror) goes
direct TCP between source and target on the standard
60000-60050 range, with the target's listener IP resolved from
--migration_network (same logic as local-cluster insecure)
- root-only on the source side, consistent with migrate_vm
- target advertises an 'insecure-remote' capability in the mtunnel
version response so source can fail closed on older targets
== Backward compatibility approach ==
Rather than bumping WS_TUNNEL_VERSION (which would break
new-source -> old-target combinations because of the existing
"$WS_TUNNEL_VERSION > $tunnel->{version}" check), I'd add a
forward-compatible 'caps' field to the version response. Old sources
ignore unknown JSON keys; new sources require 'insecure-remote' to be
present in caps before allowing migration_type=insecure, and otherwise
fall through to the existing WS-tunneled path with no behavioral
change.
This means all four mix matrices are clean:
- patched <-> patched, secure: identical to today
- unpatched src -> patched tgt: caps ignored, WS path as today
- patched src -> unpatched tgt, secure: caps absent, not checked,
WS path as today
- patched src -> unpatched tgt, insecure: source dies early with a
clear "upgrade target or omit migration_type=insecure" error,
no side effects on target
== Security considerations ==
- root-only at the API/CLI layer, same as the local-cluster knob
- documented as requiring trusted/private network between clusters
- no change to control plane or auth (still API token + TLS fp)
- data plane confidentiality drops to network-layer controls only,
which is identical to the trade-off operators already make for
intra-cluster insecure migration
- no new ports beyond the existing 60000-60050 range that
insecure migration already uses
- source-side caps check ensures no silent downgrade when target
doesn't support it
== Open questions ==
1. Is this direction acceptable in principle, or would you prefer
a different direction?
2. Should the 'caps' mechanism be added in a standalone preliminary
patch (useful as future-proofing for any opt-in mtunnel feature),
or rolled into the same series?
3. Should NBD direct-TCP be gated by a separate flag, or is it fine
to have migration_type=insecure imply both RAM and NBD direct?
The intra-cluster knob ties them together today.
4. Any preference on the parameter name? I matched migrate_vm
('migration_type', 'migration_network') for consistency, but
'data-direct-tcp' or similar would also work and arguably be
less misleading since the control plane is still encrypted.
Thanks,
Bogdan
reply other threads:[~2026-04-25 1:19 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='seafvOtCUaxpKaU9e1NllFBL1k4UGGejSJqGIsJuUBfiiBC_cGhLAlDi5t-Yb6HarnuUCACmvRvObiM3cKACfZUKpQfgyYQ93jobOI8HTL0=@ionescu.at' \
--to=bogdan@ionescu.at \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox