From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 482A71FF146 for ; Tue, 09 Jun 2026 11:32:49 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 37FF9A850; Tue, 9 Jun 2026 11:32:48 +0200 (CEST) Message-ID: <2d28bb02-bf70-4f9a-a91b-b5c8162527d6@proxmox.com> Date: Tue, 9 Jun 2026 11:32:12 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH qemu-server] fix #7627: net: virtio: disable host_tunnel feature again with 11.0+pve1 From: Fiona Ebner To: pve-devel@lists.proxmox.com References: <20260603152127.901085-1-f.ebner@proxmox.com> Content-Language: en-US In-Reply-To: <20260603152127.901085-1-f.ebner@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1780997488095 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.009 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: H2N7BGK3H26MU25ME3W42R4I7SUOOZFA X-Message-ID-Hash: H2N7BGK3H26MU25ME3W42R4I7SUOOZFA X-MailFrom: f.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 03.06.26 um 5:21 PM schrieb Fiona Ebner: > QEMU machine version 10.2 started exposing the new features > VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM > VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO > VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM > VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO > > but the host tunnel one causes issues with certain guest network > configurations, in particular when using VXLAN [0][1][2][3] when the > traffic goes over a physical NIC, at least when the NIC does not have > support for these feature itself. > > The negotiation in QEMU does not consider the physical NIC, it just > looks whether the vhost-net device and the guest both support it and > then turns on the feature for the tap device. However, it seems like > the tap device does not itself add the inner TCP checksums for the > encapsulated traffic. It's not entirely clear yet if this is a kernel > issue or if the common configuration with bridged tap interface > going to physical NIC is not supported in this configuration without > some additional tweaks. When the traffic does not go via a physical > NIC, it seems to work (i.e. both source and target VM on the same > host). > > For now, disable this advanced host tunnel feature again, until the > issue can be properly diagnosed and fixed (if there is a fix to be > made). If users do require the feature again, it can be exposed via > the schema as CLI-only and maybe in the UI as an advanced > configuration option. > > [0]: https://bugzilla.proxmox.com/show_bug.cgi?id=7627 > [1]: https://forum.proxmox.com/threads/183494/post-855144 > [2]: https://forum.proxmox.com/threads/182328/post-854627 > [3]: https://forum.proxmox.com/threads/183963/#post-855737 > > Signed-off-by: Fiona Ebner > --- > > Many thanks to Stefan and Gabriel for discussions and continuing to > analyze the issue! For now, let's make a stop-gap fix and turn the > problematic host tunnel feature back off. I will also send a mail > upstream asking about the issue, but not today, as I have to leave. There is a patch now [0], but since the issue was in the virtio-net driver, the fix will need to be rolled out to guest kernels, which we don't have control over. While there is an easy workaround with pinning the machine version to 10.1 for affected guests, I still wonder if we should go for disabling the feature by default with 11.0+pve1 for now, to avoid more people running into the regression? Maybe re-enabling it with the next major PVE release next summer? [0]: https://lore.kernel.org/qemu-devel/566e0cc5-9a50-43b8-9866-f599a4657004@proxmox.com/