From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id C2A041FF187 for ; Mon, 8 Sep 2025 15:16:32 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7416712809; Mon, 8 Sep 2025 15:16:33 +0200 (CEST) Message-ID: <096e0dc3-57de-4a71-b821-186b79b7fde8@proxmox.com> Date: Mon, 8 Sep 2025 15:16:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Fiona Ebner To: Thomas Lamprecht , Proxmox VE development discussion , =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= References: <20250904124113.81772-1-f.ebner@proxmox.com> <20250904124113.81772-9-f.ebner@proxmox.com> <4a9a4b8d-16cd-4d74-9ef5-bd010412cf2c@proxmox.com> Content-Language: en-US In-Reply-To: <4a9a4b8d-16cd-4d74-9ef5-bd010412cf2c@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1757337367472 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.024 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu-server v3 stable-bookworm 8/8] migration: preserve host_mtu for virtio-net devices X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Am 05.09.25 um 11:17 AM schrieb Fiona Ebner: > Am 05.09.25 um 11:09 AM schrieb Thomas Lamprecht: >> Am 05.09.25 um 10:54 schrieb Fiona Ebner: >>> Am 04.09.25 um 8:11 PM schrieb Thomas Lamprecht: >>>> Am 04.09.25 um 14:42 schrieb Fiona Ebner: >>>>> The virtual hardware is generated differently (at least for i440fx >>>>> machines) when host_mtu is set or not set on the netdev command line >>>>> [0]. When the MTU is the same value as the default 1500, Proxmox VE >>>>> did not add a host_mtu parameter. This is problematic for migration >>>>> where host_mtu is present on one end of the migration, but not on the >>>>> other [1]. Moreover, the effective setting in the guest (state) will >>>>> still be the host_mtu from the source side, even if a different value >>>>> is used for host_mtu on the target instance's commandline. This will >>>>> not lead to an error loading the migration stream in QEMU, but having >>>>> a larger host_mtu than the bridge MTU is still problematic for certain >>>>> network traffic like >>>>>> iperf3 -c 10.10.10.11 -u -l 2k >>>>> when host_mtu=9000 and bridge MTU=1500. >>>>> >>>>> Pass the values from the source to the target during migration to be >>>>> able to preserve them. >>>> >>>> Which breaks migration from new to old, which can be fine, but seems >>>> avoidable given that we got a tunnel that we can query stuff over. >>> >>> How can we query? The old tunnel only supports very specific commands >>> like 'quit' and 'resume $vmid'. Note that remote migration using the new >>> tunnel version is not broken - an old node will just ignore the >>> additional parameter in the passed-along JSON. >> >> The absence of a command gives you also information. > > Okay, so you mean adding a new command and using that to detect that the > node is recent enough? What should that command be? The capabilities one > you suggest below? > >>> >>> We could do something like >>> >>> ssh ... qm start 0 --nets-host-mtu >>> >>> and match for "Unknown option: nets-host-mtu" for detection. >> >> Yeah, that's exactly what I wrote later in my reply. > > I thought you meant matching the error for the actual command. My > suggestion is using a dummy command for early detection and guard using > the new option for the actual command based on that. > >>> Alternatively, we could bump the pve-manager version and guard adding >>> the option via the pmxcfs 'version-info' node kv. That mechanism wasn't >>> super reliable in the past though. >> >> FWIW, we now re-broadcast that periodically and IIRC even on pmxcfs >> start up though. > > Yes, and if we really can't get the info we can err on the side of > "assume it's recent enough". > >>>> Maybe we could at least catch the "Unknown option: nets-host-mtu" >>>> error explicitly and add some context that the target likely just >>>> needs to be updated to make the migration work. >>> >>> If we don't want to go for either of the above or if there isn't an >>> other way to query, I'll go for that? >> >> Would be fine for me, it's the simplest thing to do for now. >> >> Adding some more fleshed out general approach for such things might >> be nice to have available for the future. That could be some >> versioning or a more structured capabilities query, that is split >> into required ones (which block the migration) and hints, that are >> for best-effort stuff, probably also including some basic version >> info like qemu-server, as that often is needed to know if a >> capability is required or not, like here, when migrating to a >> another 8.x node it won't matter, but for a 9.x target node we >> should enforce an e.g. nets-host-mtu to be available. > > Sounds sensible. Unfortunately, this is impossible with tunnel version 1 currently, because when we set up the tunnel via SSH, we require the Unix sockets for forwarding up-front, but we only get the socket information after VM start on the remote side. Allowing JSON replies also requires an additional patch for tunnel version 1. Should we switch to always using tunnel version 2, also for local migration? @Fabian thoughts? For the issue at hand, I'll go for the detection via the dummy qm command. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel