From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 0C2561FF143 for ; Sat, 25 Apr 2026 03:19:51 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7F64A21252; Sat, 25 Apr 2026 03:19:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ionescu.at; s=protonmail; t=1777079441; x=1777338641; bh=DwT2wyPakvy2PUnr5n5ZgDMApKVm6hCnmE11b8/831U=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector; b=AqvKM8CHPWvXCyGfe4J1JqP46rct9HIegDyB93JFyMzcsDkNWxcTRvkBCsn4We5OM VBPI87hnumQZC0Ik1HDTn80KWaq1VDLP+2OwR+R7Xb8dAUp/kxsUBuWWoBb2UzOArP 8wEKTjlw8AKu2zq3KKy4DjuxYyOSPcHF/6Vd85GdEWNYKQQo8VePM3vWlMtXM/zhxJ YeSClHQxoXDajU/RcmRA9+PxfQxz38xTUAnUW83Z3PkNhPvzGKQ0Mer3UgEbLKUbX5 n7E/uCgNI3uBzN9XbV2deo4pluSMaiyi/yjQI9PZbiPNWe3WISJ5eGLZmmUxVLGkgn 354uzyNnMCUfQ== Date: Sat, 25 Apr 2026 01:10:36 +0000 To: "pve-devel@lists.proxmox.com" From: Bogdan Ionescu Subject: [pve-devel] [RFC] qemu-server: add migration_type=insecure to remote-migrate Message-ID: Feedback-ID: 36014335:user:proton X-Pm-Message-ID: 7de58de9d49bfba91721be329775f073e55eca6a MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy RCVD_IN_DNSWL_LOW -0.7 Sender listed at https://www.dnswl.org/, low trust SPF_HELO_PASS -0.001 SPF: HELO matches SPF record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 6ZWM4YOESMX25XXJBGJX2JUBZNRMFLJY X-Message-ID-Hash: 6ZWM4YOESMX25XXJBGJX2JUBZNRMFLJY X-MailFrom: bogdan@ionescu.at X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Hi all, I'd like to gauge interest in adding a migration_type=3Dinsecure option to the qm remote-migrate / remote_migrate_vm endpoint, before investing time in a review-ready patch series. =3D=3D Motivation =3D=3D The current remote-migrate implementation tunnels both control plane and data plane through the websocket connection to the target's API endpoint on 8006/tcp. This is the right default for trust reasons (API token + TLS fingerprint, no SSH trust between clusters needed), but the data plane throughput is severely bottlenecked by: - userspace bouncing through PVE::Tunnel + pveproxy + qmtunnel (3 Perl processes in the data path, each context-switching per chunk) - per-byte WebSocket masking in pure Perl (RFC 6455 =C2=A75.3) - TLS framing on top - lack of zero-copy / TSO offload for the streamed bytes - multiple TCP segments end-to-end with independent flow control In our deployment between two DCs connected by WireGuard over a 10 Gbps link, we observe sustained ~1 MB/s for remote-migrate while intra-cluster `qm migrate --migration_type insecure` between the same hosts saturates the link at ~300+ MB/s. The bottleneck is clearly the WS tunnel data path on a single Perl-bound core, not the network. For VMs with 32+ GB of RAM, this difference is the difference between a migration finishing in 2 minutes vs. failing to converge entirely because the dirty rate exceeds the throughput. =3D=3D Proposal =3D=3D Mirror the local-cluster migration model: keep secure (WS-tunneled) as the default, allow opt-in 'insecure' for trusted networks where the operator has out-of-band guarantees (private cross-connect, VPN, overlay encryption at L2/L3). qm remote-migrate 'apitoken=3D...,host=3D...,fp=3D..= .' \ --target-storage ... --target-bridge ... --online \ --migration_type insecure \ --migration_network 10.50.0.0/24 Semantics: - control plane (config, NBD allocation requests, tunnel commands, spice ticket, etc.) still goes through the WS tunnel as today - data plane (QEMU memory stream + NBD storage drive-mirror) goes direct TCP between source and target on the standard 60000-60050 range, with the target's listener IP resolved from --migration_network (same logic as local-cluster insecure) - root-only on the source side, consistent with migrate_vm - target advertises an 'insecure-remote' capability in the mtunnel version response so source can fail closed on older targets =3D=3D Backward compatibility approach =3D=3D Rather than bumping WS_TUNNEL_VERSION (which would break new-source -> old-target combinations because of the existing "$WS_TUNNEL_VERSION > $tunnel->{version}" check), I'd add a forward-compatible 'caps' field to the version response. Old sources ignore unknown JSON keys; new sources require 'insecure-remote' to be present in caps before allowing migration_type=3Dinsecure, and otherwise fall through to the existing WS-tunneled path with no behavioral change. This means all four mix matrices are clean: - patched <-> patched, secure: identical to today - unpatched src -> patched tgt: caps ignored, WS path as today - patched src -> unpatched tgt, secure: caps absent, not checked, WS path as today - patched src -> unpatched tgt, insecure: source dies early with a clear "upgrade target or omit migration_type=3Dinsecure" error, no side effects on target =3D=3D Security considerations =3D=3D - root-only at the API/CLI layer, same as the local-cluster knob - documented as requiring trusted/private network between clusters - no change to control plane or auth (still API token + TLS fp) - data plane confidentiality drops to network-layer controls only, which is identical to the trade-off operators already make for intra-cluster insecure migration - no new ports beyond the existing 60000-60050 range that insecure migration already uses - source-side caps check ensures no silent downgrade when target doesn't support it =3D=3D Open questions =3D=3D 1. Is this direction acceptable in principle, or would you prefer a different direction? 2. Should the 'caps' mechanism be added in a standalone preliminary patch (useful as future-proofing for any opt-in mtunnel feature), or rolled into the same series? 3. Should NBD direct-TCP be gated by a separate flag, or is it fine to have migration_type=3Dinsecure imply both RAM and NBD direct? The intra-cluster knob ties them together today. 4. Any preference on the parameter name? I matched migrate_vm ('migration_type', 'migration_network') for consistency, but 'data-direct-tcp' or similar would also work and arguably be less misleading since the control plane is still encrypted. Thanks, Bogdan