Re: ifupdown2: Severe race conditions and architectural issues due to async usage of netlink

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

From: Stefan Hanreich <s.hanreich@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: Re: ifupdown2: Severe race conditions and architectural issues due to async usage of netlink
Date: Thu, 2 Apr 2026 11:16:27 +0200	[thread overview]
Message-ID: <fa6bd524-0f0d-4e9c-87cb-92365007149b@proxmox.com> (raw)
In-Reply-To: <DHHA24SZLBP5.26PD4HSWU3HZY@rchrist.io>

On 3/31/26 11:15 PM, Robin Christ wrote:
> Hey folks,
> 
> While trying to debug a seemingly harmless issue "info: <vxlanif>: ipv6 addrgen is disabled on device with MTU lower than 1280 (current mtu 0): cannot set addrgen off", I realized ifupdown2 has severe race conditions and architectural issues.
> 
> The issue mentioned above happens due to the following sequence:
> 
> 1. Enqueue interface creation, async!
> Put a barebones version of the interface into the cache to signal that the interface creation has been started and make the interface generally known.
> 
> 2. Enqueue MTU change, async!
> address.py "self.process_mtu(ifaceobj, ifaceobj_getfunc)" is called immediately after
> The RTM_NEWLINK most likely has **not yet** arrived
> 
> This will ultimately call "link_set_mtu(self, ifname, mtu)" in nlcache.py, which in turn calls "self.cache.override_link_mtu(ifname, mtu)".
> This call will fail:
> self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu
> 
> attributes[Link.IFLA_MTU] is **not yet** present, because the cache only has the barebones version, which does not have this attribute
> 
> 3. Immediately call self.up_ipv6_addrgen(ifaceobj)
> 
> This will cause the abovementioned error, as it tries to read the MTU from the interface.. Which most likely is **still not present** because the RTM_NEWLINK most likely has **not yet** arrived and thus fail as it gets the "default" value of 0 as return from get_link_mtu.
> 
> 
> 4. At some point, receive RTM_NEWLINK
> 
> This will update the cache, but to a **stale** MTU value, this MTU value is maybe correct at the time of receiving the RTM_NEWLINK message, but it is not the intended or final MTU, so any other method reading this MTU will now get a stale value (just as worse as getting NO value)
> 
> 5. At some point, receive RTM_NEWNETCONF
> 
> This will update the cache, and now the MTU is correct.
> 
> As you can see, we have lots of race conditions here due to the underlying architectural issue, which is that interface intended state is held in the netlink cache.

:/ - maybe @Christoph can take a look as well?

> Even if you patch override_link_mtu to have a conditional self._link_cache[ifname].add_attribute(Link.IFLA_MTU, mtu), you still have this window between step 4 and 5 where you will have a stale MTU value in the cache.
> 
> I am considering a shitfix for up_ipv6_addrgen, or specifically link_set_ipv6_addrgen, which just adds an optional "intended_mtu" argument. But this only solves the issue for this specific function call in this specific sequence. There are several other places which use "get_link_mtu()"
> 
> 
> Additionally, the bridge MTU logic appears to be a bit inconsistent (broken). Commit a3df9e6930b2000d0ca104a317214f65ab94ed15 ("addons: address: mtu: set bridge mtu with policy default") resets bridge MTU to default unless explicitly set... But when you explicitly set the bridge MTU, you get a nice "bridge inherits mtu from its ports. There is no need to assign mtu on a bridge" warning (or syntax validation error).

yeah, that should probably be fixed - or at least the warning removed,
because that is *not* the case - the MTU gets set and it's explicitly
logged.

> As far as I'm aware, bridge MTU doesn't have an impact on bridged traffic itself (only the interfaces through which the traffic exits matter), and only matters for locally originated IP traffic through bridge interfaces / SVIs, netfilter stuff (again: interface MTU?) and IGMP / MLD query generation.

Unless I'm misremembering, only the MTU of the destination port should
matter in the forwarding decision.

> BUT it matters for Proxmox, as Proxmox automatically sets the interface MTU based on the bridge.
> 
> Wouldn't it make sense to set the bridge MTU, unless overridden manually, to the lowest MTU of any enslaved interface that is **specified in the ifupdown2 config** (to avoid issues with VM interfaces, that have a lower MTU, lowering the overall bridge MTU)?

Is this really the case - do you have a reproducer? Couldn't get the
bridge MTU to get lowered when plugging a tap interface. Both containers
and VMs have their IFs on the bridge with the bridge MTU, then inside
they get the MTU that is set via the MTU param on the network device
(either via setting it on the other end of the veth, or via VirtIO
driver, for everything else the MTU gets forced to the bridge MTU).

The bridge MTU gets set by ifupdown2 in any case explicitly (if the logs
are to be believed). That would mean that the mechanism from the bridge
driver kick in that prevent any messing with the MTU, *if* the MTU has
been set explicitly on the bridge interface [1].

In any case, the MTU should just be set everywhere explicitly in the
configuration to avoid this altogether...

> Given the overall bad shape of ifupdown2 - What is the way forward? Will ifupdown2 be replaced in the not-so-distant future?

We've been discussing this internally and we're also watching the
discussion upstream on what network manager will be used for forky. But
aside from that, I cannot give you any concrete plans atm.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_if.c?h=v7.0-rc6#n518

next prev parent reply	other threads:[~2026-04-02  9:16 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31 21:15 Robin Christ
2026-04-02  9:16 ` Stefan Hanreich [this message]
2026-04-02 15:05   ` Robin Christ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa6bd524-0f0d-4e9c-87cb-92365007149b@proxmox.com \
    --to=s.hanreich@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal