From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 84C7D1FF141 for ; Tue, 16 Jun 2026 15:04:14 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 22DEC97C9; Tue, 16 Jun 2026 15:04:13 +0200 (CEST) Date: Tue, 16 Jun 2026 15:03:36 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH qemu-server] fix #7627: net: virtio: disable host_tunnel feature again with 11.0+pve1 To: Fiona Ebner , pve-devel@lists.proxmox.com References: <20260603152127.901085-1-f.ebner@proxmox.com> <2d28bb02-bf70-4f9a-a91b-b5c8162527d6@proxmox.com> In-Reply-To: <2d28bb02-bf70-4f9a-a91b-b5c8162527d6@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1781614973.8wdi1mzrwu.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1781614964092 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.054 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: HXLLIW6GIVVX36CNTHLORQUPNX2YDLOA X-Message-ID-Hash: HXLLIW6GIVVX36CNTHLORQUPNX2YDLOA X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On June 9, 2026 11:32 am, Fiona Ebner wrote: > Am 03.06.26 um 5:21 PM schrieb Fiona Ebner: >> QEMU machine version 10.2 started exposing the new features >> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM >> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO >> VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM >> VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO >>=20 >> but the host tunnel one causes issues with certain guest network >> configurations, in particular when using VXLAN [0][1][2][3] when the >> traffic goes over a physical NIC, at least when the NIC does not have >> support for these feature itself. >>=20 >> The negotiation in QEMU does not consider the physical NIC, it just >> looks whether the vhost-net device and the guest both support it and >> then turns on the feature for the tap device. However, it seems like >> the tap device does not itself add the inner TCP checksums for the >> encapsulated traffic. It's not entirely clear yet if this is a kernel >> issue or if the common configuration with bridged tap interface >> going to physical NIC is not supported in this configuration without >> some additional tweaks. When the traffic does not go via a physical >> NIC, it seems to work (i.e. both source and target VM on the same >> host). >>=20 >> For now, disable this advanced host tunnel feature again, until the >> issue can be properly diagnosed and fixed (if there is a fix to be >> made). If users do require the feature again, it can be exposed via >> the schema as CLI-only and maybe in the UI as an advanced >> configuration option. >>=20 >> [0]: https://bugzilla.proxmox.com/show_bug.cgi?id=3D7627 >> [1]: https://forum.proxmox.com/threads/183494/post-855144 >> [2]: https://forum.proxmox.com/threads/182328/post-854627 >> [3]: https://forum.proxmox.com/threads/183963/#post-855737 >>=20 >> Signed-off-by: Fiona Ebner >> --- >>=20 >> Many thanks to Stefan and Gabriel for discussions and continuing to >> analyze the issue! For now, let's make a stop-gap fix and turn the >> problematic host tunnel feature back off. I will also send a mail >> upstream asking about the issue, but not today, as I have to leave. >=20 > There is a patch now [0], but since the issue was in the virtio-net > driver, the fix will need to be rolled out to guest kernels, which we > don't have control over. While there is an easy workaround with pinning > the machine version to 10.1 for affected guests, I still wonder if we > should go for disabling the feature by default with 11.0+pve1 for now, > to avoid more people running into the regression? Maybe re-enabling it > with the next major PVE release next summer? sound sensible - do we want to offer an escape hatch for setups with fixed guest kernels that want to benefit performance-wise? >=20 > [0]: > https://lore.kernel.org/qemu-devel/566e0cc5-9a50-43b8-9866-f599a4657004@p= roxmox.com/ >=20 >=20 >=20 >=20 >=20