From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 0C2561FF143
	for <inbox@lore.proxmox.com>; Sat, 25 Apr 2026 03:19:51 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 7F64A21252;
	Sat, 25 Apr 2026 03:19:49 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ionescu.at;
	s=protonmail; t=1777079441; x=1777338641;
	bh=DwT2wyPakvy2PUnr5n5ZgDMApKVm6hCnmE11b8/831U=;
	h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date:
	 Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector;
	b=AqvKM8CHPWvXCyGfe4J1JqP46rct9HIegDyB93JFyMzcsDkNWxcTRvkBCsn4We5OM
	 VBPI87hnumQZC0Ik1HDTn80KWaq1VDLP+2OwR+R7Xb8dAUp/kxsUBuWWoBb2UzOArP
	 8wEKTjlw8AKu2zq3KKy4DjuxYyOSPcHF/6Vd85GdEWNYKQQo8VePM3vWlMtXM/zhxJ
	 YeSClHQxoXDajU/RcmRA9+PxfQxz38xTUAnUW83Z3PkNhPvzGKQ0Mer3UgEbLKUbX5
	 n7E/uCgNI3uBzN9XbV2deo4pluSMaiyi/yjQI9PZbiPNWe3WISJ5eGLZmmUxVLGkgn
	 354uzyNnMCUfQ==
Date: Sat, 25 Apr 2026 01:10:36 +0000
To: "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>
From: Bogdan Ionescu <bogdan@ionescu.at>
Subject: [pve-devel] [RFC] qemu-server: add migration_type=insecure to
 remote-migrate
Message-ID: 
 <seafvOtCUaxpKaU9e1NllFBL1k4UGGejSJqGIsJuUBfiiBC_cGhLAlDi5t-Yb6HarnuUCACmvRvObiM3cKACfZUKpQfgyYQ93jobOI8HTL0=@ionescu.at>
Feedback-ID: 36014335:user:proton
X-Pm-Message-ID: 7de58de9d49bfba91721be329775f073e55eca6a
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
	DKIM_VALID               -0.1 Message has at least one valid DKIM or DK
 signature
	DKIM_VALID_AU            -0.1 Message has a valid DKIM or DK signature from
 author's domain
	DKIM_VALID_EF            -0.1 Message has a valid DKIM or DK signature from
 envelope-from domain
	DMARC_PASS               -0.1 DMARC pass policy
	RCVD_IN_DNSWL_LOW        -0.7 Sender listed at https://www.dnswl.org/,
 low trust
	SPF_HELO_PASS          -0.001 SPF: HELO matches SPF record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: 6ZWM4YOESMX25XXJBGJX2JUBZNRMFLJY
X-Message-ID-Hash: 6ZWM4YOESMX25XXJBGJX2JUBZNRMFLJY
X-MailFrom: bogdan@ionescu.at
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

Hi all,

I'd like to gauge interest in adding a migration_type=3Dinsecure option to
the qm remote-migrate / remote_migrate_vm endpoint, before investing
time in a review-ready patch series.

=3D=3D Motivation =3D=3D

The current remote-migrate implementation tunnels both control plane
and data plane through the websocket connection to the target's API
endpoint on 8006/tcp. This is the right default for trust reasons
(API token + TLS fingerprint, no SSH trust between clusters needed),
but the data plane throughput is severely bottlenecked by:

  - userspace bouncing through PVE::Tunnel + pveproxy + qmtunnel
    (3 Perl processes in the data path, each context-switching per
    chunk)
  - per-byte WebSocket masking in pure Perl (RFC 6455 =C2=A75.3)
  - TLS framing on top
  - lack of zero-copy / TSO offload for the streamed bytes
  - multiple TCP segments end-to-end with independent flow control

In our deployment between two DCs connected by WireGuard over a
10 Gbps link, we observe sustained ~1 MB/s for remote-migrate while
intra-cluster `qm migrate --migration_type insecure` between the same
hosts saturates the link at ~300+ MB/s. The bottleneck is clearly
the WS tunnel data path on a single Perl-bound core, not the network.

For VMs with 32+ GB of RAM, this difference is the difference between
a migration finishing in 2 minutes vs. failing to converge entirely
because the dirty rate exceeds the throughput.

=3D=3D Proposal =3D=3D

Mirror the local-cluster migration model: keep secure (WS-tunneled) as
the default, allow opt-in 'insecure' for trusted networks where the
operator has out-of-band guarantees (private cross-connect, VPN,
overlay encryption at L2/L3).

  qm remote-migrate <vmid> <target-vmid> 'apitoken=3D...,host=3D...,fp=3D..=
.' \
      --target-storage ... --target-bridge ... --online \
      --migration_type insecure \
      --migration_network 10.50.0.0/24

Semantics:
  - control plane (config, NBD allocation requests, tunnel commands,
    spice ticket, etc.) still goes through the WS tunnel as today
  - data plane (QEMU memory stream + NBD storage drive-mirror) goes
    direct TCP between source and target on the standard
    60000-60050 range, with the target's listener IP resolved from
    --migration_network (same logic as local-cluster insecure)
  - root-only on the source side, consistent with migrate_vm
  - target advertises an 'insecure-remote' capability in the mtunnel
    version response so source can fail closed on older targets

=3D=3D Backward compatibility approach =3D=3D

Rather than bumping WS_TUNNEL_VERSION (which would break
new-source -> old-target combinations because of the existing
"$WS_TUNNEL_VERSION > $tunnel->{version}" check), I'd add a
forward-compatible 'caps' field to the version response. Old sources
ignore unknown JSON keys; new sources require 'insecure-remote' to be
present in caps before allowing migration_type=3Dinsecure, and otherwise
fall through to the existing WS-tunneled path with no behavioral
change.

This means all four mix matrices are clean:
  - patched <-> patched, secure: identical to today
  - unpatched src -> patched tgt: caps ignored, WS path as today
  - patched src -> unpatched tgt, secure: caps absent, not checked,
    WS path as today
  - patched src -> unpatched tgt, insecure: source dies early with a
    clear "upgrade target or omit migration_type=3Dinsecure" error,
    no side effects on target

=3D=3D Security considerations =3D=3D

  - root-only at the API/CLI layer, same as the local-cluster knob
  - documented as requiring trusted/private network between clusters
  - no change to control plane or auth (still API token + TLS fp)
  - data plane confidentiality drops to network-layer controls only,
    which is identical to the trade-off operators already make for
    intra-cluster insecure migration
  - no new ports beyond the existing 60000-60050 range that
    insecure migration already uses
  - source-side caps check ensures no silent downgrade when target
    doesn't support it

=3D=3D Open questions =3D=3D

  1. Is this direction acceptable in principle, or would you prefer
     a different direction?

  2. Should the 'caps' mechanism be added in a standalone preliminary
     patch (useful as future-proofing for any opt-in mtunnel feature),
     or rolled into the same series?

  3. Should NBD direct-TCP be gated by a separate flag, or is it fine
     to have migration_type=3Dinsecure imply both RAM and NBD direct?
     The intra-cluster knob ties them together today.

  4. Any preference on the parameter name? I matched migrate_vm
     ('migration_type', 'migration_network') for consistency, but
     'data-direct-tcp' or similar would also work and arguably be
     less misleading since the control plane is still encrypted.


Thanks,
Bogdan