public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Robin Christ <r.christ@partimus.com>
To: "Stefan Hanreich" <s.hanreich@proxmox.com>,
	<pve-devel@lists.proxmox.com>,
	"Christoph Heiss" <c.heiss@proxmox.com>
Subject: Re: ifupdown2: Severe race conditions and architectural issues due to async usage of netlink
Date: Thu, 2 Apr 2026 17:05:12 +0200	[thread overview]
Message-ID: <DHIRFK8FXOSN.1FGM3V4XV5HIG@partimus.com> (raw)
In-Reply-To: <fa6bd524-0f0d-4e9c-87cb-92365007149b@proxmox.com>

>> 5. At some point, receive RTM_NEWNETCONF
>>
>> This will update the cache, and now the MTU is correct.
>> [snip]
>
> :/ - maybe @Christoph can take a look as well?
>

I have to stand corrected though: Step 5 is not RTM_NEWNETCONF (this one will probably be received too, but it's not really used as it seems)
Step 5 is triggered upon receiving RTM_NEWLINK (you receive a RTM_NEWLINK after every configuration change), but the conclusion remains the same:

You may (or rather: will) end up with periods where stale config is in the cache

> yeah, that should probably be fixed - or at least the warning removed,
> because that is *not* the case - the MTU gets set and it's explicitly
> logged.

I did fix the warning in my patches, which I'm attaching for reference.

> Unless I'm misremembering, only the MTU of the destination port should
> matter in the forwarding decision.

That's what I was trying to say. I doesn't matter for the bridged traffic itself while it's in the bridge, but the MTU of the interfaces in the bridge matters.
There may still be some weird side effects, check Ivan's blog post [1] for some inspiration..

>> Wouldn't it make sense to set the bridge MTU, unless overridden manually, to the lowest MTU of any enslaved interface that is **specified in the ifupdown2 config** (to avoid issues with VM interfaces, that have a lower MTU, lowering the overall bridge MTU)?
>
> Is this really the case - do you have a reproducer? Couldn't get the
> bridge MTU to get lowered when plugging a tap interface. [...]

Sorry, I should've been clearer: I was not referring to the bridge MTU getting lowered automatically, e.g. by the Linux kernel.

What I meant was: IF automatic bridge MTU is implemented in ifupdown2, it should NOT simply fetch the enslaved interfaces and 
iterate through them. This is because if you do ifreload and already have VMs running, ifupdown could fetch VM interfaces with a lower MTU and thus accidently lower the bridge mtu.
Therefore, IF automatic bridge MTU is implemented in ifupdown2, it should only iterate through the slave interfaces that are configured in the ifupdown2 config.

> In any case, the MTU should just be set everywhere explicitly in the
> configuration to avoid this altogether...

This is the best case, but I think sane default behaviour for users who do not do this would make sense.

>> Given the overall bad shape of ifupdown2 - What is the way forward? Will ifupdown2 be replaced in the not-so-distant future?
>
> We've been discussing this internally and we're also watching the
> discussion upstream on what network manager will be used for forky. But
> aside from that, I cannot give you any concrete plans atm.
>

Alright, thanks for the info!

I am attaching my patches from yesterday as a basis for discussion, please let me know what you think!
But those are not exactly in a shape where I'd consider sending them to the mailing list as a pull request...

[1] https://blog.ipspace.net/2025/03/linux-bridge-mtu-hell/


Patches:

Patch 1/2: 0020-netlink-implement-dirty-workaround-for-mtu-cache-architecture.patch
From: Robin Christ <r.christ@partimus.com>
Date: Wed, 1 Apr 2026 14:07:30 +0200
Subject: netlink: Implement dirty workaround for MTU cache architectural
 issue

---
 ifupdown2/lib/iproute2.py |  3 +++
 ifupdown2/lib/nlcache.py  | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/ifupdown2/lib/iproute2.py b/ifupdown2/lib/iproute2.py
index 894afd3..8760d3f 100644
--- a/ifupdown2/lib/iproute2.py
+++ b/ifupdown2/lib/iproute2.py
@@ -550,6 +550,9 @@ class IPRoute2(Cache, Requirements):
         if is_link_up:
             self.link_down_force(ifname)
 
+        # iproute2 should use netlink in the background - so this SHOULD WORK even
+        # when the RTM_NEWLINK notifying a potential change in MTU earlier
+        # has not arrived yet
         self.__execute_or_batch(
             utils.ip_cmd,
             "link set dev %s addrgenmode %s" % (ifname, Link.ifla_inet6_addr_gen_mode_dict.get(addrgen))
diff --git a/ifupdown2/lib/nlcache.py b/ifupdown2/lib/nlcache.py
index 2d37443..a8e11ff 100644
--- a/ifupdown2/lib/nlcache.py
+++ b/ifupdown2/lib/nlcache.py
@@ -405,7 +405,21 @@ class _NetlinkCache:
         """
         try:
             with self._cache_lock:
-                self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu
+                # This function has an issue:
+                # Sometimes when it's used, we only have a barebones version of the link in the cache
+                # because the netlink notification with all the attributes, aka RTM_NEWLINK,
+                # has not been received yet
+                # This barebones version of the link in the cache
+                # does not have the Link.IFLA_MTU attribute, hence setting this value will fail
+                # 
+                # this is a deeper architectural issue with how the netlink cache is used.
+
+                # Dirty workaround for deeper architectural issues..
+                if Link.IFLA_MTU not in self._link_cache[ifname].attributes:
+                    self._link_cache[ifname].add_attribute(Link.IFLA_MTU, mtu)
+                    self._link_cache[ifname].attributes[Link.IFLA_MTU].is_override_value = True
+                else:
+                    self._link_cache[ifname].attributes[Link.IFLA_MTU].value = mtu
         except Exception:
             pass
 
@@ -1355,6 +1369,30 @@ class _NetlinkCache:
                 # here just in case?
                 pass
 
+            # Dirty Workaround: Merge the MTU
+            # if there is already a MTU value with is_override_value set,
+            # this means override_link_mtu was called for this link and we should keep
+            # the override value instead of the new one coming from the kernel, which will be stale
+            try:
+                cached_mtu = self._link_cache[ifname].attributes.get(Link.IFLA_MTU)
+
+                if cached_mtu and ("is_override_value" in cached_mtu.__dict__) and cached_mtu.is_override_value:
+                    mtu_from_packet = None
+                    if link.attributes.get(Link.IFLA_MTU):
+                        mtu_from_packet = link.attributes[Link.IFLA_MTU].value
+
+                    if mtu_from_packet != cached_mtu.value:
+                        log.debug(f'_NetlinkCache add_link: Cached MTU for {ifname} is an override value, keeping cached value instead of new value, cached mtu: {cached_mtu.value}, new value in packet: {mtu_from_packet}')
+                        link.attributes[Link.IFLA_MTU] = cached_mtu
+                    else:
+                        log.debug(f'_NetlinkCache add_link: Cached MTU for {ifname} is an override value but new value in packet is the same as cached value, using value from packet to remove override status, cached / new mtu: {cached_mtu.value}')
+
+            except KeyError:
+                # link is not present in the cache
+                pass
+            except AttributeError:
+                pass
+
             self._link_cache[ifname] = link
 
             # For each altname, also cache the reference to the `Link` object


Patch 2/2: 0021-bridge-fix-broken-mtu-warning.patch

From: Robin Christ <r.christ@partimus.com>
Date: Wed, 1 Apr 2026 14:39:47 +0200
Subject: bridge: Fix broken MTU warning

Bridge would complain if you specify an MTU... Although it's quite the opposite:
It should inform you that if you don't set an MTU, ifupdown2 will do stupid things
(those stupid things are so deeply rooted that we don't bother to fix them right now...)
---
 ifupdown2/addons/address.py | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/ifupdown2/addons/address.py b/ifupdown2/addons/address.py
index 46226a9..e739af7 100644
--- a/ifupdown2/addons/address.py
+++ b/ifupdown2/addons/address.py
@@ -431,7 +431,8 @@ class address(AddonWithIpBlackList, moduleBase):
                 self.logger.warning("%s: invalid mtu %s: %s" % (ifaceobj.name, mtu_str, str(e)))
                 return False
             return self._check_mtu_config(ifaceobj, mtu_int, ifaceobj_getfunc, syntaxcheck=True)
-        return True
+        else:
+            return self._check_mtu_config_none(ifaceobj, ifaceobj_getfunc, syntaxcheck=True)
 
     def syntax_check_addr_allowed_on(self, ifaceobj, syntax_check=False):
         if ifaceobj.get_attr_value('address'):
@@ -801,13 +802,7 @@ class address(AddonWithIpBlackList, moduleBase):
         """
 
         retval = True
-        if (ifaceobj.link_kind & ifaceLinkKind.BRIDGE):
-            if syntaxcheck:
-                self.logger.warning('%s: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge' %ifaceobj.name)
-                retval = False
-            else:
-                self.logger.info('%s: bridge inherits mtu from its ports. There is no need to assign mtu on a bridge' %ifaceobj.name)
-        elif ifaceobj_getfunc:
+        if ifaceobj_getfunc:
             if ((ifaceobj.link_privflags & ifaceLinkPrivFlags.BOND_SLAVE) and
                 ifaceobj.upperifaces):
                 masterobj = ifaceobj_getfunc(ifaceobj.upperifaces[0])
@@ -847,6 +842,17 @@ class address(AddonWithIpBlackList, moduleBase):
                 retval = False
         return retval
 
+
+    def _check_mtu_config_none(self, ifaceobj, ifaceobj_getfunc, syntaxcheck=False):
+        retval = True
+        if (ifaceobj.link_kind & ifaceLinkKind.BRIDGE):
+            if syntaxcheck:
+                self.logger.info(f'bridge {ifaceobj.name} does not have mtu assigned, will be set to default mtu from config, NO MTU INHERITANCE FROM ITS PORTS!!')
+            else:
+                self.logger.info(f'bridge {ifaceobj.name} does not have mtu assigned, will be set to default mtu from config, NO MTU INHERITANCE FROM ITS PORTS!!')
+
+        return retval
+
     def _propagate_mtu_to_upper_devs(self, ifaceobj, mtu, ifaceobj_getfunc):
         """
         :param ifaceobj:
@@ -893,11 +899,14 @@ class address(AddonWithIpBlackList, moduleBase):
                 self._propagate_mtu_to_upper_devs(ifaceobj, mtu, ifaceobj_getfunc)
         return
 
-    def _process_mtu_config_mtu_none(self, ifaceobj):
+    def _process_mtu_config_mtu_none(self, ifaceobj, ifaceobj_getfunc):
 
         if (ifaceobj.link_privflags & ifaceLinkPrivFlags.MGMT_INTF):
             return
 
+        if not self._check_mtu_config_none(ifaceobj, ifaceobj_getfunc):
+            return
+
         cached_link_mtu = self.cache.get_link_mtu(ifaceobj.name)
 
         if ifaceobj.link_kind:
@@ -1126,7 +1135,7 @@ class address(AddonWithIpBlackList, moduleBase):
 
             self._process_mtu_config_mtu_valid(ifaceobj, ifaceobj_getfunc, mtu_int)
         else:
-            self._process_mtu_config_mtu_none(ifaceobj)
+            self._process_mtu_config_mtu_none(ifaceobj, ifaceobj_getfunc)
 
     def up_ipv6_addrgen(self, ifaceobj):
         user_configured_ipv6_addrgen = ifaceobj.get_attr_value_first('ipv6-addrgen')





      reply	other threads:[~2026-04-02 15:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31 21:15 Robin Christ
2026-04-02  9:16 ` Stefan Hanreich
2026-04-02 15:05   ` Robin Christ [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHIRFK8FXOSN.1FGM3V4XV5HIG@partimus.com \
    --to=r.christ@partimus.com \
    --cc=c.heiss@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=s.hanreich@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal