all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* ifupdown2: Severe race conditions and architectural issues due to async usage of netlink
@ 2026-03-31 21:15 Robin Christ
  2026-04-02  9:16 ` Stefan Hanreich
  0 siblings, 1 reply; 3+ messages in thread
From: Robin Christ @ 2026-03-31 21:15 UTC (permalink / raw)
  To: pve-devel

Hey folks,

While trying to debug a seemingly harmless issue "info: <vxlanif>: ipv6 addrgen is disabled on device with MTU lower than 1280 (current mtu 0): cannot set addrgen off", I realized ifupdown2 has severe race conditions and architectural issues.

The issue mentioned above happens due to the following sequence:

1. Enqueue interface creation, async!
Put a barebones version of the interface into the cache to signal that the interface creation has been started and make the interface generally known.

2. Enqueue MTU change, async!
address.py "self.process_mtu(ifaceobj, ifaceobj_getfunc)" is called immediately after
The RTM_NEWLINK most likely has **not yet** arrived

This will ultimately call "link_set_mtu(self, ifname, mtu)" in nlcache.py, which in turn calls "self.cache.override_link_mtu(ifname, mtu)".
This call will fail:
self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu

attributes[Link.IFLA_MTU] is **not yet** present, because the cache only has the barebones version, which does not have this attribute

3. Immediately call self.up_ipv6_addrgen(ifaceobj)

This will cause the abovementioned error, as it tries to read the MTU from the interface.. Which most likely is **still not present** because the RTM_NEWLINK most likely has **not yet** arrived and thus fail as it gets the "default" value of 0 as return from get_link_mtu.


4. At some point, receive RTM_NEWLINK

This will update the cache, but to a **stale** MTU value, this MTU value is maybe correct at the time of receiving the RTM_NEWLINK message, but it is not the intended or final MTU, so any other method reading this MTU will now get a stale value (just as worse as getting NO value)

5. At some point, receive RTM_NEWNETCONF

This will update the cache, and now the MTU is correct.

As you can see, we have lots of race conditions here due to the underlying architectural issue, which is that interface intended state is held in the netlink cache.

 

Even if you patch override_link_mtu to have a conditional self._link_cache[ifname].add_attribute(Link.IFLA_MTU, mtu), you still have this window between step 4 and 5 where you will have a stale MTU value in the cache.

I am considering a shitfix for up_ipv6_addrgen, or specifically link_set_ipv6_addrgen, which just adds an optional "intended_mtu" argument. But this only solves the issue for this specific function call in this specific sequence. There are several other places which use "get_link_mtu()"


Additionally, the bridge MTU logic appears to be a bit inconsistent (broken). Commit a3df9e6930b2000d0ca104a317214f65ab94ed15 ("addons: address: mtu: set bridge mtu with policy default") resets bridge MTU to default unless explicitly set... But when you explicitly set the bridge MTU, you get a nice "bridge inherits mtu from its ports. There is no need to assign mtu on a bridge" warning (or syntax validation error).

As far as I'm aware, bridge MTU doesn't have an impact on bridged traffic itself (only the interfaces through which the traffic exits matter), and only matters for locally originated IP traffic through bridge interfaces / SVIs, netfilter stuff (again: interface MTU?) and IGMP / MLD query generation.

BUT it matters for Proxmox, as Proxmox automatically sets the interface MTU based on the bridge.

Wouldn't it make sense to set the bridge MTU, unless overridden manually, to the lowest MTU of any enslaved interface that is **specified in the ifupdown2 config** (to avoid issues with VM interfaces, that have a lower MTU, lowering the overall bridge MTU)?


Given the overall bad shape of ifupdown2 - What is the way forward? Will ifupdown2 be replaced in the not-so-distant future?


Cheers,
Robin



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-02 15:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-31 21:15 ifupdown2: Severe race conditions and architectural issues due to async usage of netlink Robin Christ
2026-04-02  9:16 ` Stefan Hanreich
2026-04-02 15:05   ` Robin Christ

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal