* ifupdown2: Severe race conditions and architectural issues due to async usage of netlink
@ 2026-03-31 21:15 Robin Christ
2026-04-02 9:16 ` Stefan Hanreich
0 siblings, 1 reply; 3+ messages in thread
From: Robin Christ @ 2026-03-31 21:15 UTC (permalink / raw)
To: pve-devel
Hey folks,
While trying to debug a seemingly harmless issue "info: <vxlanif>: ipv6 addrgen is disabled on device with MTU lower than 1280 (current mtu 0): cannot set addrgen off", I realized ifupdown2 has severe race conditions and architectural issues.
The issue mentioned above happens due to the following sequence:
1. Enqueue interface creation, async!
Put a barebones version of the interface into the cache to signal that the interface creation has been started and make the interface generally known.
2. Enqueue MTU change, async!
address.py "self.process_mtu(ifaceobj, ifaceobj_getfunc)" is called immediately after
The RTM_NEWLINK most likely has **not yet** arrived
This will ultimately call "link_set_mtu(self, ifname, mtu)" in nlcache.py, which in turn calls "self.cache.override_link_mtu(ifname, mtu)".
This call will fail:
self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu
attributes[Link.IFLA_MTU] is **not yet** present, because the cache only has the barebones version, which does not have this attribute
3. Immediately call self.up_ipv6_addrgen(ifaceobj)
This will cause the abovementioned error, as it tries to read the MTU from the interface.. Which most likely is **still not present** because the RTM_NEWLINK most likely has **not yet** arrived and thus fail as it gets the "default" value of 0 as return from get_link_mtu.
4. At some point, receive RTM_NEWLINK
This will update the cache, but to a **stale** MTU value, this MTU value is maybe correct at the time of receiving the RTM_NEWLINK message, but it is not the intended or final MTU, so any other method reading this MTU will now get a stale value (just as worse as getting NO value)
5. At some point, receive RTM_NEWNETCONF
This will update the cache, and now the MTU is correct.
As you can see, we have lots of race conditions here due to the underlying architectural issue, which is that interface intended state is held in the netlink cache.
Even if you patch override_link_mtu to have a conditional self._link_cache[ifname].add_attribute(Link.IFLA_MTU, mtu), you still have this window between step 4 and 5 where you will have a stale MTU value in the cache.
I am considering a shitfix for up_ipv6_addrgen, or specifically link_set_ipv6_addrgen, which just adds an optional "intended_mtu" argument. But this only solves the issue for this specific function call in this specific sequence. There are several other places which use "get_link_mtu()"
Additionally, the bridge MTU logic appears to be a bit inconsistent (broken). Commit a3df9e6930b2000d0ca104a317214f65ab94ed15 ("addons: address: mtu: set bridge mtu with policy default") resets bridge MTU to default unless explicitly set... But when you explicitly set the bridge MTU, you get a nice "bridge inherits mtu from its ports. There is no need to assign mtu on a bridge" warning (or syntax validation error).
As far as I'm aware, bridge MTU doesn't have an impact on bridged traffic itself (only the interfaces through which the traffic exits matter), and only matters for locally originated IP traffic through bridge interfaces / SVIs, netfilter stuff (again: interface MTU?) and IGMP / MLD query generation.
BUT it matters for Proxmox, as Proxmox automatically sets the interface MTU based on the bridge.
Wouldn't it make sense to set the bridge MTU, unless overridden manually, to the lowest MTU of any enslaved interface that is **specified in the ifupdown2 config** (to avoid issues with VM interfaces, that have a lower MTU, lowering the overall bridge MTU)?
Given the overall bad shape of ifupdown2 - What is the way forward? Will ifupdown2 be replaced in the not-so-distant future?
Cheers,
Robin
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: ifupdown2: Severe race conditions and architectural issues due to async usage of netlink 2026-03-31 21:15 ifupdown2: Severe race conditions and architectural issues due to async usage of netlink Robin Christ @ 2026-04-02 9:16 ` Stefan Hanreich 2026-04-02 15:05 ` Robin Christ 0 siblings, 1 reply; 3+ messages in thread From: Stefan Hanreich @ 2026-04-02 9:16 UTC (permalink / raw) To: pve-devel On 3/31/26 11:15 PM, Robin Christ wrote: > Hey folks, > > While trying to debug a seemingly harmless issue "info: <vxlanif>: ipv6 addrgen is disabled on device with MTU lower than 1280 (current mtu 0): cannot set addrgen off", I realized ifupdown2 has severe race conditions and architectural issues. > > The issue mentioned above happens due to the following sequence: > > 1. Enqueue interface creation, async! > Put a barebones version of the interface into the cache to signal that the interface creation has been started and make the interface generally known. > > 2. Enqueue MTU change, async! > address.py "self.process_mtu(ifaceobj, ifaceobj_getfunc)" is called immediately after > The RTM_NEWLINK most likely has **not yet** arrived > > This will ultimately call "link_set_mtu(self, ifname, mtu)" in nlcache.py, which in turn calls "self.cache.override_link_mtu(ifname, mtu)". > This call will fail: > self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu > > attributes[Link.IFLA_MTU] is **not yet** present, because the cache only has the barebones version, which does not have this attribute > > 3. Immediately call self.up_ipv6_addrgen(ifaceobj) > > This will cause the abovementioned error, as it tries to read the MTU from the interface.. Which most likely is **still not present** because the RTM_NEWLINK most likely has **not yet** arrived and thus fail as it gets the "default" value of 0 as return from get_link_mtu. > > > 4. At some point, receive RTM_NEWLINK > > This will update the cache, but to a **stale** MTU value, this MTU value is maybe correct at the time of receiving the RTM_NEWLINK message, but it is not the intended or final MTU, so any other method reading this MTU will now get a stale value (just as worse as getting NO value) > > 5. At some point, receive RTM_NEWNETCONF > > This will update the cache, and now the MTU is correct. > > As you can see, we have lots of race conditions here due to the underlying architectural issue, which is that interface intended state is held in the netlink cache. :/ - maybe @Christoph can take a look as well? > Even if you patch override_link_mtu to have a conditional self._link_cache[ifname].add_attribute(Link.IFLA_MTU, mtu), you still have this window between step 4 and 5 where you will have a stale MTU value in the cache. > > I am considering a shitfix for up_ipv6_addrgen, or specifically link_set_ipv6_addrgen, which just adds an optional "intended_mtu" argument. But this only solves the issue for this specific function call in this specific sequence. There are several other places which use "get_link_mtu()" > > > Additionally, the bridge MTU logic appears to be a bit inconsistent (broken). Commit a3df9e6930b2000d0ca104a317214f65ab94ed15 ("addons: address: mtu: set bridge mtu with policy default") resets bridge MTU to default unless explicitly set... But when you explicitly set the bridge MTU, you get a nice "bridge inherits mtu from its ports. There is no need to assign mtu on a bridge" warning (or syntax validation error). yeah, that should probably be fixed - or at least the warning removed, because that is *not* the case - the MTU gets set and it's explicitly logged. > As far as I'm aware, bridge MTU doesn't have an impact on bridged traffic itself (only the interfaces through which the traffic exits matter), and only matters for locally originated IP traffic through bridge interfaces / SVIs, netfilter stuff (again: interface MTU?) and IGMP / MLD query generation. Unless I'm misremembering, only the MTU of the destination port should matter in the forwarding decision. > BUT it matters for Proxmox, as Proxmox automatically sets the interface MTU based on the bridge. > > Wouldn't it make sense to set the bridge MTU, unless overridden manually, to the lowest MTU of any enslaved interface that is **specified in the ifupdown2 config** (to avoid issues with VM interfaces, that have a lower MTU, lowering the overall bridge MTU)? Is this really the case - do you have a reproducer? Couldn't get the bridge MTU to get lowered when plugging a tap interface. Both containers and VMs have their IFs on the bridge with the bridge MTU, then inside they get the MTU that is set via the MTU param on the network device (either via setting it on the other end of the veth, or via VirtIO driver, for everything else the MTU gets forced to the bridge MTU). The bridge MTU gets set by ifupdown2 in any case explicitly (if the logs are to be believed). That would mean that the mechanism from the bridge driver kick in that prevent any messing with the MTU, *if* the MTU has been set explicitly on the bridge interface [1]. In any case, the MTU should just be set everywhere explicitly in the configuration to avoid this altogether... > Given the overall bad shape of ifupdown2 - What is the way forward? Will ifupdown2 be replaced in the not-so-distant future? We've been discussing this internally and we're also watching the discussion upstream on what network manager will be used for forky. But aside from that, I cannot give you any concrete plans atm. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_if.c?h=v7.0-rc6#n518 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: ifupdown2: Severe race conditions and architectural issues due to async usage of netlink 2026-04-02 9:16 ` Stefan Hanreich @ 2026-04-02 15:05 ` Robin Christ 0 siblings, 0 replies; 3+ messages in thread From: Robin Christ @ 2026-04-02 15:05 UTC (permalink / raw) To: Stefan Hanreich, pve-devel, Christoph Heiss >> 5. At some point, receive RTM_NEWNETCONF >> >> This will update the cache, and now the MTU is correct. >> [snip] > > :/ - maybe @Christoph can take a look as well? > I have to stand corrected though: Step 5 is not RTM_NEWNETCONF (this one will probably be received too, but it's not really used as it seems) Step 5 is triggered upon receiving RTM_NEWLINK (you receive a RTM_NEWLINK after every configuration change), but the conclusion remains the same: You may (or rather: will) end up with periods where stale config is in the cache > yeah, that should probably be fixed - or at least the warning removed, > because that is *not* the case - the MTU gets set and it's explicitly > logged. I did fix the warning in my patches, which I'm attaching for reference. > Unless I'm misremembering, only the MTU of the destination port should > matter in the forwarding decision. That's what I was trying to say. I doesn't matter for the bridged traffic itself while it's in the bridge, but the MTU of the interfaces in the bridge matters. There may still be some weird side effects, check Ivan's blog post [1] for some inspiration.. >> Wouldn't it make sense to set the bridge MTU, unless overridden manually, to the lowest MTU of any enslaved interface that is **specified in the ifupdown2 config** (to avoid issues with VM interfaces, that have a lower MTU, lowering the overall bridge MTU)? > > Is this really the case - do you have a reproducer? Couldn't get the > bridge MTU to get lowered when plugging a tap interface. [...] Sorry, I should've been clearer: I was not referring to the bridge MTU getting lowered automatically, e.g. by the Linux kernel. What I meant was: IF automatic bridge MTU is implemented in ifupdown2, it should NOT simply fetch the enslaved interfaces and iterate through them. This is because if you do ifreload and already have VMs running, ifupdown could fetch VM interfaces with a lower MTU and thus accidently lower the bridge mtu. Therefore, IF automatic bridge MTU is implemented in ifupdown2, it should only iterate through the slave interfaces that are configured in the ifupdown2 config. > In any case, the MTU should just be set everywhere explicitly in the > configuration to avoid this altogether... This is the best case, but I think sane default behaviour for users who do not do this would make sense. >> Given the overall bad shape of ifupdown2 - What is the way forward? Will ifupdown2 be replaced in the not-so-distant future? > > We've been discussing this internally and we're also watching the > discussion upstream on what network manager will be used for forky. But > aside from that, I cannot give you any concrete plans atm. > Alright, thanks for the info! I am attaching my patches from yesterday as a basis for discussion, please let me know what you think! But those are not exactly in a shape where I'd consider sending them to the mailing list as a pull request... [1] https://blog.ipspace.net/2025/03/linux-bridge-mtu-hell/ Patches: Patch 1/2: 0020-netlink-implement-dirty-workaround-for-mtu-cache-architecture.patch From: Robin Christ <r.christ@partimus.com> Date: Wed, 1 Apr 2026 14:07:30 +0200 Subject: netlink: Implement dirty workaround for MTU cache architectural issue --- ifupdown2/lib/iproute2.py | 3 +++ ifupdown2/lib/nlcache.py | 40 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/ifupdown2/lib/iproute2.py b/ifupdown2/lib/iproute2.py index 894afd3..8760d3f 100644 --- a/ifupdown2/lib/iproute2.py +++ b/ifupdown2/lib/iproute2.py @@ -550,6 +550,9 @@ class IPRoute2(Cache, Requirements): if is_link_up: self.link_down_force(ifname) + # iproute2 should use netlink in the background - so this SHOULD WORK even + # when the RTM_NEWLINK notifying a potential change in MTU earlier + # has not arrived yet self.__execute_or_batch( utils.ip_cmd, "link set dev %s addrgenmode %s" % (ifname, Link.ifla_inet6_addr_gen_mode_dict.get(addrgen)) diff --git a/ifupdown2/lib/nlcache.py b/ifupdown2/lib/nlcache.py index 2d37443..a8e11ff 100644 --- a/ifupdown2/lib/nlcache.py +++ b/ifupdown2/lib/nlcache.py @@ -405,7 +405,21 @@ class _NetlinkCache: """ try: with self._cache_lock: - self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu + # This function has an issue: + # Sometimes when it's used, we only have a barebones version of the link in the cache + # because the netlink notification with all the attributes, aka RTM_NEWLINK, + # has not been received yet + # This barebones version of the link in the cache + # does not have the Link.IFLA_MTU attribute, hence setting this value will fail + # + # this is a deeper architectural issue with how the netlink cache is used. + + # Dirty workaround for deeper architectural issues.. + if Link.IFLA_MTU not in self._link_cache[ifname].attributes: + self._link_cache[ifname].add_attribute(Link.IFLA_MTU, mtu) + self._link_cache[ifname].attributes[Link.IFLA_MTU].is_override_value = True + else: + self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu except Exception: pass @@ -1355,6 +1369,30 @@ class _NetlinkCache: # here just in case? pass + # Dirty Workaround: Merge the MTU + # if there is already a MTU value with is_override_value set, + # this means override_link_mtu was called for this link and we should keep + # the override value instead of the new one coming from the kernel, which will be stale + try: + cached_mtu = self._link_cache[ifname].attributes.get(Link.IFLA_MTU) + + if cached_mtu and ("is_override_value" in cached_mtu.__dict__) and cached_mtu.is_override_value: + mtu_from_packet = None + if link.attributes.get(Link.IFLA_MTU): + mtu_from_packet = link.attributes[Link.IFLA_MTU].value + + if mtu_from_packet != cached_mtu.value: + log.debug(f'_NetlinkCache add_link: Cached MTU for {ifname} is an override value, keeping cached value instead of new value, cached mtu: {cached_mtu.value}, new value in packet: {mtu_from_packet}') + link.attributes[Link.IFLA_MTU] = cached_mtu + else: + log.debug(f'_NetlinkCache add_link: Cached MTU for {ifname} is an override value but new value in packet is the same as cached value, using value from packet to remove override status, cached / new mtu: {cached_mtu.value}') + + except KeyError: + # link is not present in the cache + pass + except AttributeError: + pass + self._link_cache[ifname] = link # For each altname, also cache the reference to the `Link` object Patch 2/2: 0021-bridge-fix-broken-mtu-warning.patch From: Robin Christ <r.christ@partimus.com> Date: Wed, 1 Apr 2026 14:39:47 +0200 Subject: bridge: Fix broken MTU warning Bridge would complain if you specify an MTU... Although it's quite the opposite: It should inform you that if you don't set an MTU, ifupdown2 will do stupid things (those stupid things are so deeply rooted that we don't bother to fix them right now...) --- ifupdown2/addons/address.py | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/ifupdown2/addons/address.py b/ifupdown2/addons/address.py index 46226a9..e739af7 100644 --- a/ifupdown2/addons/address.py +++ b/ifupdown2/addons/address.py @@ -431,7 +431,8 @@ class address(AddonWithIpBlackList, moduleBase): self.logger.warning("%s: invalid mtu %s: %s" % (ifaceobj.name, mtu_str, str(e))) return False return self._check_mtu_config(ifaceobj, mtu_int, ifaceobj_getfunc, syntaxcheck=True) - return True + else: + return self._check_mtu_config_none(ifaceobj, ifaceobj_getfunc, syntaxcheck=True) def syntax_check_addr_allowed_on(self, ifaceobj, syntax_check=False): if ifaceobj.get_attr_value('address'): @@ -801,13 +802,7 @@ class address(AddonWithIpBlackList, moduleBase): """ retval = True - if (ifaceobj.link_kind & ifaceLinkKind.BRIDGE): - if syntaxcheck: - self.logger.warning('%s: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge' %ifaceobj.name) - retval = False - else: - self.logger.info('%s: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge' %ifaceobj.name) - elif ifaceobj_getfunc: + if ifaceobj_getfunc: if ((ifaceobj.link_privflags & ifaceLinkPrivFlags.BOND_SLAVE) and ifaceobj.upperifaces): masterobj = ifaceobj_getfunc(ifaceobj.upperifaces[0]) @@ -847,6 +842,17 @@ class address(AddonWithIpBlackList, moduleBase): retval = False return retval + + def _check_mtu_config_none(self, ifaceobj, ifaceobj_getfunc, syntaxcheck=False): + retval = True + if (ifaceobj.link_kind & ifaceLinkKind.BRIDGE): + if syntaxcheck: + self.logger.info(f'bridge {ifaceobj.name} does not have mtu assigned, will be set to default mtu from config, NO MTU INHERITANCE FROM ITS PORTS!!') + else: + self.logger.info(f'bridge {ifaceobj.name} does not have mtu assigned, will be set to default mtu from config, NO MTU INHERITANCE FROM ITS PORTS!!') + + return retval + def _propagate_mtu_to_upper_devs(self, ifaceobj, mtu, ifaceobj_getfunc): """ :param ifaceobj: @@ -893,11 +899,14 @@ class address(AddonWithIpBlackList, moduleBase): self._propagate_mtu_to_upper_devs(ifaceobj, mtu, ifaceobj_getfunc) return - def _process_mtu_config_mtu_none(self, ifaceobj): + def _process_mtu_config_mtu_none(self, ifaceobj, ifaceobj_getfunc): if (ifaceobj.link_privflags & ifaceLinkPrivFlags.MGMT_INTF): return + if not self._check_mtu_config_none(ifaceobj, ifaceobj_getfunc): + return + cached_link_mtu = self.cache.get_link_mtu(ifaceobj.name) if ifaceobj.link_kind: @@ -1126,7 +1135,7 @@ class address(AddonWithIpBlackList, moduleBase): self._process_mtu_config_mtu_valid(ifaceobj, ifaceobj_getfunc, mtu_int) else: - self._process_mtu_config_mtu_none(ifaceobj) + self._process_mtu_config_mtu_none(ifaceobj, ifaceobj_getfunc) def up_ipv6_addrgen(self, ifaceobj): user_configured_ipv6_addrgen = ifaceobj.get_attr_value_first('ipv6-addrgen') ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-02 15:05 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-03-31 21:15 ifupdown2: Severe race conditions and architectural issues due to async usage of netlink Robin Christ 2026-04-02 9:16 ` Stefan Hanreich 2026-04-02 15:05 ` Robin Christ
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.