From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 6F1A51FF13C
	for <inbox@lore.proxmox.com>; Thu, 02 Apr 2026 11:16:03 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id E138810133;
	Thu,  2 Apr 2026 11:16:31 +0200 (CEST)
Message-ID: <fa6bd524-0f0d-4e9c-87cb-92365007149b@proxmox.com>
Date: Thu, 2 Apr 2026 11:16:27 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Stefan Hanreich <s.hanreich@proxmox.com>
Subject: Re: ifupdown2: Severe race conditions and architectural issues due to
 async usage of netlink
To: pve-devel@lists.proxmox.com
References: <DHHA24SZLBP5.26PD4HSWU3HZY@rchrist.io>
Content-Language: en-US
In-Reply-To: <DHHA24SZLBP5.26PD4HSWU3HZY@rchrist.io>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.697 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: CMXIC3OMSZ4JBOFVM5ERGDB6EELKRDFR
X-Message-ID-Hash: CMXIC3OMSZ4JBOFVM5ERGDB6EELKRDFR
X-MailFrom: s.hanreich@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

On 3/31/26 11:15 PM, Robin Christ wrote:
> Hey folks,
> 
> While trying to debug a seemingly harmless issue "info: <vxlanif>: ipv6 addrgen is disabled on device with MTU lower than 1280 (current mtu 0): cannot set addrgen off", I realized ifupdown2 has severe race conditions and architectural issues.
> 
> The issue mentioned above happens due to the following sequence:
> 
> 1. Enqueue interface creation, async!
> Put a barebones version of the interface into the cache to signal that the interface creation has been started and make the interface generally known.
> 
> 2. Enqueue MTU change, async!
> address.py "self.process_mtu(ifaceobj, ifaceobj_getfunc)" is called immediately after
> The RTM_NEWLINK most likely has **not yet** arrived
> 
> This will ultimately call "link_set_mtu(self, ifname, mtu)" in nlcache.py, which in turn calls "self.cache.override_link_mtu(ifname, mtu)".
> This call will fail:
> self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu
> 
> attributes[Link.IFLA_MTU] is **not yet** present, because the cache only has the barebones version, which does not have this attribute
> 
> 3. Immediately call self.up_ipv6_addrgen(ifaceobj)
> 
> This will cause the abovementioned error, as it tries to read the MTU from the interface.. Which most likely is **still not present** because the RTM_NEWLINK most likely has **not yet** arrived and thus fail as it gets the "default" value of 0 as return from get_link_mtu.
> 
> 
> 4. At some point, receive RTM_NEWLINK
> 
> This will update the cache, but to a **stale** MTU value, this MTU value is maybe correct at the time of receiving the RTM_NEWLINK message, but it is not the intended or final MTU, so any other method reading this MTU will now get a stale value (just as worse as getting NO value)
> 
> 5. At some point, receive RTM_NEWNETCONF
> 
> This will update the cache, and now the MTU is correct.
> 
> As you can see, we have lots of race conditions here due to the underlying architectural issue, which is that interface intended state is held in the netlink cache.

:/ - maybe @Christoph can take a look as well?

> Even if you patch override_link_mtu to have a conditional self._link_cache[ifname].add_attribute(Link.IFLA_MTU, mtu), you still have this window between step 4 and 5 where you will have a stale MTU value in the cache.
> 
> I am considering a shitfix for up_ipv6_addrgen, or specifically link_set_ipv6_addrgen, which just adds an optional "intended_mtu" argument. But this only solves the issue for this specific function call in this specific sequence. There are several other places which use "get_link_mtu()"
> 
> 
> Additionally, the bridge MTU logic appears to be a bit inconsistent (broken). Commit a3df9e6930b2000d0ca104a317214f65ab94ed15 ("addons: address: mtu: set bridge mtu with policy default") resets bridge MTU to default unless explicitly set... But when you explicitly set the bridge MTU, you get a nice "bridge inherits mtu from its ports. There is no need to assign mtu on a bridge" warning (or syntax validation error).

yeah, that should probably be fixed - or at least the warning removed,
because that is *not* the case - the MTU gets set and it's explicitly
logged.

> As far as I'm aware, bridge MTU doesn't have an impact on bridged traffic itself (only the interfaces through which the traffic exits matter), and only matters for locally originated IP traffic through bridge interfaces / SVIs, netfilter stuff (again: interface MTU?) and IGMP / MLD query generation.

Unless I'm misremembering, only the MTU of the destination port should
matter in the forwarding decision.

> BUT it matters for Proxmox, as Proxmox automatically sets the interface MTU based on the bridge.
> 
> Wouldn't it make sense to set the bridge MTU, unless overridden manually, to the lowest MTU of any enslaved interface that is **specified in the ifupdown2 config** (to avoid issues with VM interfaces, that have a lower MTU, lowering the overall bridge MTU)?

Is this really the case - do you have a reproducer? Couldn't get the
bridge MTU to get lowered when plugging a tap interface. Both containers
and VMs have their IFs on the bridge with the bridge MTU, then inside
they get the MTU that is set via the MTU param on the network device
(either via setting it on the other end of the veth, or via VirtIO
driver, for everything else the MTU gets forced to the bridge MTU).

The bridge MTU gets set by ifupdown2 in any case explicitly (if the logs
are to be believed). That would mean that the mechanism from the bridge
driver kick in that prevent any messing with the MTU, *if* the MTU has
been set explicitly on the bridge interface [1].

In any case, the MTU should just be set everywhere explicitly in the
configuration to avoid this altogether...

> Given the overall bad shape of ifupdown2 - What is the way forward? Will ifupdown2 be replaced in the not-so-distant future?

We've been discussing this internally and we're also watching the
discussion upstream on what network manager will be used for forky. But
aside from that, I cannot give you any concrete plans atm.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_if.c?h=v7.0-rc6#n518