all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Gabriel Goller <g.goller@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH frr 1/2] frr: backport #21166 and #21958, fixing EVPN IPv4 routes with IPv6 nexhtop
Date: Fri, 15 May 2026 17:23:56 +0200	[thread overview]
Message-ID: <20260515152400.726794-2-g.goller@proxmox.com> (raw)
In-Reply-To: <20260515152400.726794-1-g.goller@proxmox.com>

When leaking EVPN routes with a IPv4 prefix and a IPv6 nexthop (e.g. on
IPv6 VTEPs), then the routes in the destination VRF have a nexthop of
0.0.0.0. This is because the EVPN AF in bgpd sets the BGP_ATTR_NEXT_HOP
flag, which means only the bgp next-hop property is checked and not the
bgp MP (multiprotocol, bgp4) next-hop (which is the one that contains
the ipv6 addr). So bgpd just makes up a ipv4 address and sends it to
ipv4. Some changes have been done in a previous commit, but this
particular issue hasn't been fixed, so upstreamed the change.

[1]: https://github.com/FRRouting/frr/pull/21166
[2]: https://github.com/FRRouting/frr/pull/21958

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 debian/patches/series                         |   2 +
 ...R_NEXT_HOP-flag-handling-in-bgp_attr.patch | 149 ++++++++++++++++++
 ...v6-nexthops-when-importing-EVPN-IPv4.patch | 107 +++++++++++++
 3 files changed, 258 insertions(+)
 create mode 100644 debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
 create mode 100644 debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch

diff --git a/debian/patches/series b/debian/patches/series
index fed297922f2d..51b5fe2f29f4 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -2,6 +2,8 @@ upstream/0001-bgpd-fix-EVPN-VRF-auto-RT-deletion-collision.patch
 upstream/0002-bgpd-export-local-rt2-mac-ip-entries-to-unicast.patch
 upstream/0003-bgpd-do-not-add-local-vtep-as-remote.patch
 upstream/0004-topotests-add-bgp_evpn_rt2_local_leak.patch
+upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
+upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch
 pve/0001-enable-bgp-bfd-daemons.patch
 pve/0002-bgpd-add-an-option-for-RT-auto-derivation-to-force-A.patch
 pve/0003-tests-add-bgp-evpn-autort-test.patch
diff --git a/debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch b/debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
new file mode 100644
index 000000000000..290afb92eb17
--- /dev/null
+++ b/debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
@@ -0,0 +1,149 @@
+From c8bf184649db651a7cee4e9509ace06aeabf79d6 Mon Sep 17 00:00:00 2001
+From: Enke Chen <enchen@paloaltonetworks.com>
+Date: Mon, 16 Mar 2026 14:27:07 -0700
+Subject: [PATCH 1/2] bgpd: fix BGP_ATTR_NEXT_HOP flag handling in
+ bgp_attr_default_set()
+
+bgp_attr_default_set() unconditionally set the BGP_ATTR_NEXT_HOP flag
+on every call, even though attr.nexthop (the IPv4 address field) is
+all-zeros and not yet assigned. This flag is used by
+BGP_ATTR_NEXTHOP_AFI_IP6 to distinguish IPv4 vs IPv6 nexthops, so
+having it always set caused non-IPv4 routes to be misidentified.
+Callers were working around this by manually calling UNSET_FLAG for
+non-IPv4 cases, which was fragile and error-prone.
+
+Remove the unconditional flag from bgp_attr_default_set() and enforce
+the invariant that BGP_ATTR_NEXT_HOP is set where and only where
+attr.nexthop is assigned as an actual IPv4 nexthop:
+
+- bgp_evpn_vtep_ip_to_attr_nh(): set the flag alongside attr->nexthop
+  for IPv4 VTEPs, covering all EVPN call sites through this helper.
+- bgp_evpn_fill_rmac_nh_to_attr(): set the flag in both IPv4 nexthop
+  assignment paths (anycast-IP and PIP).
+- bgp_static_update(): set the flag explicitly for AFI_IP; remove the
+  UNSET_FLAG workaround from the else branch.
+- bgp_redistribute_add(): set the flag in all three IPv4 nexthop cases
+  (NEXTHOP_TYPE_IFINDEX/IPv4, NEXTHOP_TYPE_IPV4[_IFINDEX],
+  NEXTHOP_TYPE_BLACKHOLE/IPv4); remove the blanket UNSET_FLAG workaround.
+- subgroup_default_originate(): set the flag for the IPv4
+  default-originate path.
+
+Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
+---
+ bgpd/bgp_attr.c       |  1 -
+ bgpd/bgp_evpn.c       |  2 ++
+ bgpd/bgp_evpn_mh.c    |  1 +
+ bgpd/bgp_route.c      | 10 ++++++----
+ bgpd/bgp_updgrp_adv.c |  2 ++
+ 5 files changed, 11 insertions(+), 5 deletions(-)
+
+diff --git a/bgpd/bgp_attr.c b/bgpd/bgp_attr.c
+index 09d4948ab866..afe23a07a054 100644
+--- a/bgpd/bgp_attr.c
++++ b/bgpd/bgp_attr.c
+@@ -1396,7 +1396,6 @@ struct attr *bgp_attr_default_set(struct attr *attr, struct bgp *bgp,
+ 	attr->tag = 0;
+ 	attr->label_index = BGP_INVALID_LABEL_INDEX;
+ 	attr->label = MPLS_INVALID_LABEL;
+-	bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 	attr->mp_nexthop_len = IPV6_MAX_BYTELEN;
+ 	attr->local_pref = bgp->default_local_pref;
+ 
+diff --git a/bgpd/bgp_evpn.c b/bgpd/bgp_evpn.c
+index 8e3569b54419..0b0eb1d623cd 100644
+--- a/bgpd/bgp_evpn.c
++++ b/bgpd/bgp_evpn.c
+@@ -8429,6 +8429,7 @@ void bgp_evpn_fill_rmac_nh_to_attr(struct bgp *bgp_vrf, struct attr *attr, struc
+ 			attr->nexthop = bgp_vrf->originator_ip.ipaddr_v4;
+ 			attr->mp_nexthop_global_in = bgp_vrf->originator_ip.ipaddr_v4;
+ 			attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
++			bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 		} else {
+ 			IPV6_ADDR_COPY(&attr->mp_nexthop_global, &bgp_vrf->originator_ip.ipaddr_v6);
+ 			attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL;
+@@ -8449,6 +8450,7 @@ void bgp_evpn_fill_rmac_nh_to_attr(struct bgp *bgp_vrf, struct attr *attr, struc
+ 			if (bgp_vrf->evpn_info->pip_ip.ipaddr_v4.s_addr != INADDR_ANY) {
+ 				attr->nexthop = bgp_vrf->evpn_info->pip_ip.ipaddr_v4;
+ 				attr->mp_nexthop_global_in = bgp_vrf->evpn_info->pip_ip.ipaddr_v4;
++				bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 			} else if (bgp_vrf->evpn_info->pip_ip.ipaddr_v4.s_addr == INADDR_ANY) {
+ 				if (bgp_debug_zebra(NULL))
+ 					zlog_debug("VRF %s evp %pFX advertise-pip primary ip is not configured",
+diff --git a/bgpd/bgp_evpn_mh.c b/bgpd/bgp_evpn_mh.c
+index f79b65c69a97..fa3e60dde759 100644
+--- a/bgpd/bgp_evpn_mh.c
++++ b/bgpd/bgp_evpn_mh.c
+@@ -100,6 +100,7 @@ void bgp_evpn_vtep_ip_to_attr_nh(const struct ipaddr *vtep_ip, struct attr *attr
+ 		attr->nexthop = vtep_ip->ipaddr_v4;
+ 		attr->mp_nexthop_global_in = vtep_ip->ipaddr_v4;
+ 		attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
++		bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 	} else if (IS_IPADDR_V6(vtep_ip)) {
+ 		IPV6_ADDR_COPY(&attr->mp_nexthop_global, &vtep_ip->ipaddr_v6);
+ 		attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL;
+diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
+index ddbd24d9aafb..0a7fb527dce7 100644
+--- a/bgpd/bgp_route.c
++++ b/bgpd/bgp_route.c
+@@ -8267,8 +8267,10 @@ void bgp_static_update(struct bgp *bgp, const struct prefix *p,
+ 
+ 	bgp_attr_default_set(&attr, bgp, BGP_ORIGIN_IGP);
+ 
+-	if (afi == AFI_IP)
++	if (afi == AFI_IP) {
+ 		nh_length = IPV4_MAX_BYTELEN;
++		bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
++	}
+ 
+ 	/* NHC */
+ 	nhc = XCALLOC(MTYPE_BGP_NHC, sizeof(struct bgp_nhc));
+@@ -10575,9 +10577,6 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 	 */
+ 	assert(attr.aspath);
+ 
+-	if (p->family == AF_INET6)
+-		UNSET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+-
+ 	switch (nhtype) {
+ 	case NEXTHOP_TYPE_IFINDEX:
+ 		switch (p->family) {
+@@ -10585,6 +10584,7 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 			attr.nexthop.s_addr = INADDR_ANY;
+ 			attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
+ 			attr.mp_nexthop_global_in.s_addr = INADDR_ANY;
++			bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 			break;
+ 		case AF_INET6:
+ 			memset(&attr.mp_nexthop_global, 0,
+@@ -10598,6 +10598,7 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 		attr.nexthop = nexthop->ipv4;
+ 		attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
+ 		attr.mp_nexthop_global_in = nexthop->ipv4;
++		bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 		break;
+ 	case NEXTHOP_TYPE_IPV6:
+ 	case NEXTHOP_TYPE_IPV6_IFINDEX:
+@@ -10610,6 +10611,7 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 			attr.nexthop.s_addr = INADDR_ANY;
+ 			attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
+ 			attr.mp_nexthop_global_in.s_addr = INADDR_ANY;
++			bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 			break;
+ 		case AF_INET6:
+ 			memset(&attr.mp_nexthop_global, 0,
+diff --git a/bgpd/bgp_updgrp_adv.c b/bgpd/bgp_updgrp_adv.c
+index 07b532e2324c..9947948c995e 100644
+--- a/bgpd/bgp_updgrp_adv.c
++++ b/bgpd/bgp_updgrp_adv.c
+@@ -987,6 +987,8 @@ void subgroup_default_originate(struct update_subgroup *subgrp, bool withdraw)
+ 		if (peer->shared_network
+ 		    && !IN6_IS_ADDR_UNSPECIFIED(&peer->nexthop.v6_local))
+ 			attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL;
++	} else {
++		bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 	}
+ 
+ 	if (peer->default_rmap[afi][safi].name) {
+-- 
+2.47.3
+
diff --git a/debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch b/debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch
new file mode 100644
index 000000000000..ffa78c29f30d
--- /dev/null
+++ b/debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch
@@ -0,0 +1,107 @@
+From f512fac23368ddc1be4cdab95601410d907a8e92 Mon Sep 17 00:00:00 2001
+From: Gabriel Goller <g.goller@proxmox.com>
+Date: Fri, 15 May 2026 16:04:25 +0200
+Subject: [PATCH 2/2] bgpd: preserve IPv6 nexthops when importing EVPN IPv4
+ routes
+
+When importing an EVPN route into a VRF unicast table,
+install_evpn_route_entry_in_vrf() converted every imported IPv4 route
+into a route with the legacy IPv4 NEXT_HOP attribute set:
+
+    attr.nexthop = attr.mp_nexthop_global_in;
+    SET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+
+This is only valid when the imported EVPN nexthop is IPv4. With IPv6
+VTEPs we can get IPv4 prefixes with IPv6 nexthops and the route already
+has the real nexthop encoded in the MP nexthop fields. In that case
+setting BGP_ATTR_NEXT_HOP creates an inconsistent attribute: the route
+has an IPv6 MP nexthop, but is also marked as having a classic IPv4
+NEXT_HOP.
+
+This breaks code that uses BGP_ATTR_NEXTHOP_AFI_IP6() to determine
+the nexthop address family. BGP_ATTR_NEXTHOP_AFI_IP6() sees
+BGP_ATTR_NEXT_HOP and thinks this is a IPv4 route with a IPv4 nexthop
+even though mp_nexthop_len indicates an IPv6 nexthop. The result is that
+VRF import/leak drops the IPv6 nexthop and sends a 0.0.0.0 nexthop to
+zebra.
+
+Fix this by only assigning attr.nexthop and setting BGP_ATTR_NEXT_HOP
+when the imported EVPN route does not have an IPv6 MP nexthop. EVPN IPv4
+routes with IPv6 nexthops are left as MP-nexthop routes.
+
+This is related to the previous BGP_ATTR_NEXT_HOP cleanup (#21166) and
+was probably missed there.
+
+Also make the nexthop-change detection handle this case by comparing the
+MP IPv6 nexthop for IPv4 routes that carry one.
+
+Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
+---
+ bgpd/bgp_evpn.c | 36 ++++++++++++++++++++++--------------
+ 1 file changed, 22 insertions(+), 14 deletions(-)
+
+diff --git a/bgpd/bgp_evpn.c b/bgpd/bgp_evpn.c
+index 0b0eb1d623cd..b1de8948e4d3 100644
+--- a/bgpd/bgp_evpn.c
++++ b/bgpd/bgp_evpn.c
+@@ -3215,11 +3215,11 @@ static int install_evpn_route_entry_in_vrf(struct bgp *bgp_vrf,
+ 	} else
+ 		return 0;
+ 
+-	/* EVPN routes currently only support a IPv4 next hop which corresponds
+-	 * to the remote VTEP. When importing into a VRF, if it is IPv6 host
+-	 * or prefix route, we have to convert the next hop to an IPv4-mapped
+-	 * address for the rest of the code to flow through. In the case of IPv4,
+-	 * make sure to set the flag for next hop attribute.
++	/* EVPN routes may carry either an IPv4 or IPv6 next hop corresponding
++	 * to the remote VTEP. When importing into a VRF, IPv6 host/prefix routes
++	 * use an IPv6 MP nexthop. For IPv4 routes, set the legacy NEXT_HOP
++	 * attribute only when the imported nexthop is IPv4; IPv6 nexthops are
++	 * preserved as MP nexthops.
+ 	 */
+ 	attr = *parent_pi->attr;
+ 	bre = bgp_attr_get_evpn_overlay(&attr);
+@@ -3245,11 +3245,13 @@ static int install_evpn_route_entry_in_vrf(struct bgp *bgp_vrf,
+ 			SET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+ 		}
+ 	} else {
+-		if (afi == AFI_IP6)
++		if (afi == AFI_IP) {
++			if (!BGP_ATTR_MP_NEXTHOP_LEN_IP6(&attr)) {
++				attr.nexthop = attr.mp_nexthop_global_in;
++				bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
++			}
++		} else if (afi == AFI_IP6) {
+ 			evpn_convert_nexthop_to_ipv6(&attr);
+-		else {
+-			attr.nexthop = attr.mp_nexthop_global_in;
+-			SET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+ 		}
+ 	}
+ 
+@@ -3287,11 +3289,17 @@ static int install_evpn_route_entry_in_vrf(struct bgp *bgp_vrf,
+ 			bgp_path_info_restore(dest, pi);
+ 
+ 		/* Mark if nexthop has changed. */
+-		if ((afi == AFI_IP
+-		     && !IPV4_ADDR_SAME(&pi->attr->nexthop, &attr_new->nexthop))
+-		    || (afi == AFI_IP6
+-			&& !IPV6_ADDR_SAME(&pi->attr->mp_nexthop_global,
+-					   &attr_new->mp_nexthop_global)))
++		if (afi == AFI_IP) {
++			bool old_v6nh = BGP_ATTR_MP_NEXTHOP_LEN_IP6(pi->attr);
++			bool new_v6nh = BGP_ATTR_MP_NEXTHOP_LEN_IP6(attr_new);
++
++			if (old_v6nh != new_v6nh ||
++			    (old_v6nh && !IPV6_ADDR_SAME(&pi->attr->mp_nexthop_global,
++							 &attr_new->mp_nexthop_global)) ||
++			    (!old_v6nh && !IPV4_ADDR_SAME(&pi->attr->nexthop, &attr_new->nexthop)))
++				SET_FLAG(pi->flags, BGP_PATH_IGP_CHANGED);
++		} else if (afi == AFI_IP6 && !IPV6_ADDR_SAME(&pi->attr->mp_nexthop_global,
++							     &attr_new->mp_nexthop_global))
+ 			SET_FLAG(pi->flags, BGP_PATH_IGP_CHANGED);
+ 
+ 		bgp_path_info_set_flag(dest, pi, BGP_PATH_ATTR_CHANGED);
+-- 
+2.47.3
+
-- 
2.47.3





  reply	other threads:[~2026-05-15 15:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 15:23 [PATCH frr 0/2] Fix leaked EVPN routes having wrong nexthop on IPv4 via IPv6 routes Gabriel Goller
2026-05-15 15:23 ` Gabriel Goller [this message]
2026-05-15 16:06   ` [PATCH frr 1/2] frr: backport #21166 and #21958, fixing EVPN IPv4 routes with IPv6 nexhtop Gabriel Goller
2026-05-15 15:23 ` [PATCH frr 2/2] bump to version 10.6.1-1+pve2 Gabriel Goller
2026-05-16 23:59 ` [PATCH frr 0/2] Fix leaked EVPN routes having wrong nexthop on IPv4 via IPv6 routes Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260515152400.726794-2-g.goller@proxmox.com \
    --to=g.goller@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal