[PATCH frr 1/2] frr: backport #21166 and #21958, fixing EVPN IPv4 routes with IPv6 nexhtop

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

From: Gabriel Goller <g.goller@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH frr 1/2] frr: backport #21166 and #21958, fixing EVPN IPv4 routes with IPv6 nexhtop
Date: Fri, 15 May 2026 17:23:56 +0200	[thread overview]
Message-ID: <20260515152400.726794-2-g.goller@proxmox.com> (raw)
In-Reply-To: <20260515152400.726794-1-g.goller@proxmox.com>

When leaking EVPN routes with a IPv4 prefix and a IPv6 nexthop (e.g. on
IPv6 VTEPs), then the routes in the destination VRF have a nexthop of
0.0.0.0. This is because the EVPN AF in bgpd sets the BGP_ATTR_NEXT_HOP
flag, which means only the bgp next-hop property is checked and not the
bgp MP (multiprotocol, bgp4) next-hop (which is the one that contains
the ipv6 addr). So bgpd just makes up a ipv4 address and sends it to
ipv4. Some changes have been done in a previous commit, but this
particular issue hasn't been fixed, so upstreamed the change.

[1]: https://github.com/FRRouting/frr/pull/21166
[2]: https://github.com/FRRouting/frr/pull/21958

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 debian/patches/series                         |   2 +
 ...R_NEXT_HOP-flag-handling-in-bgp_attr.patch | 149 ++++++++++++++++++
 ...v6-nexthops-when-importing-EVPN-IPv4.patch | 107 +++++++++++++
 3 files changed, 258 insertions(+)
 create mode 100644 debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
 create mode 100644 debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch

diff --git a/debian/patches/series b/debian/patches/series
index fed297922f2d..51b5fe2f29f4 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -2,6 +2,8 @@ upstream/0001-bgpd-fix-EVPN-VRF-auto-RT-deletion-collision.patch
 upstream/0002-bgpd-export-local-rt2-mac-ip-entries-to-unicast.patch
 upstream/0003-bgpd-do-not-add-local-vtep-as-remote.patch
 upstream/0004-topotests-add-bgp_evpn_rt2_local_leak.patch
+upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
+upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch
 pve/0001-enable-bgp-bfd-daemons.patch
 pve/0002-bgpd-add-an-option-for-RT-auto-derivation-to-force-A.patch
 pve/0003-tests-add-bgp-evpn-autort-test.patch
diff --git a/debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch b/debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
new file mode 100644
index 000000000000..290afb92eb17
--- /dev/null
+++ b/debian/patches/upstream/0005-bgpd-fix-BGP_ATTR_NEXT_HOP-flag-handling-in-bgp_attr.patch
@@ -0,0 +1,149 @@
+From c8bf184649db651a7cee4e9509ace06aeabf79d6 Mon Sep 17 00:00:00 2001
+From: Enke Chen <enchen@paloaltonetworks.com>
+Date: Mon, 16 Mar 2026 14:27:07 -0700
+Subject: [PATCH 1/2] bgpd: fix BGP_ATTR_NEXT_HOP flag handling in
+ bgp_attr_default_set()
+
+bgp_attr_default_set() unconditionally set the BGP_ATTR_NEXT_HOP flag
+on every call, even though attr.nexthop (the IPv4 address field) is
+all-zeros and not yet assigned. This flag is used by
+BGP_ATTR_NEXTHOP_AFI_IP6 to distinguish IPv4 vs IPv6 nexthops, so
+having it always set caused non-IPv4 routes to be misidentified.
+Callers were working around this by manually calling UNSET_FLAG for
+non-IPv4 cases, which was fragile and error-prone.
+
+Remove the unconditional flag from bgp_attr_default_set() and enforce
+the invariant that BGP_ATTR_NEXT_HOP is set where and only where
+attr.nexthop is assigned as an actual IPv4 nexthop:
+
+- bgp_evpn_vtep_ip_to_attr_nh(): set the flag alongside attr->nexthop
+  for IPv4 VTEPs, covering all EVPN call sites through this helper.
+- bgp_evpn_fill_rmac_nh_to_attr(): set the flag in both IPv4 nexthop
+  assignment paths (anycast-IP and PIP).
+- bgp_static_update(): set the flag explicitly for AFI_IP; remove the
+  UNSET_FLAG workaround from the else branch.
+- bgp_redistribute_add(): set the flag in all three IPv4 nexthop cases
+  (NEXTHOP_TYPE_IFINDEX/IPv4, NEXTHOP_TYPE_IPV4[_IFINDEX],
+  NEXTHOP_TYPE_BLACKHOLE/IPv4); remove the blanket UNSET_FLAG workaround.
+- subgroup_default_originate(): set the flag for the IPv4
+  default-originate path.
+
+Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
+---
+ bgpd/bgp_attr.c       |  1 -
+ bgpd/bgp_evpn.c       |  2 ++
+ bgpd/bgp_evpn_mh.c    |  1 +
+ bgpd/bgp_route.c      | 10 ++++++----
+ bgpd/bgp_updgrp_adv.c |  2 ++
+ 5 files changed, 11 insertions(+), 5 deletions(-)
+
+diff --git a/bgpd/bgp_attr.c b/bgpd/bgp_attr.c
+index 09d4948ab866..afe23a07a054 100644
+--- a/bgpd/bgp_attr.c
++++ b/bgpd/bgp_attr.c
+@@ -1396,7 +1396,6 @@ struct attr *bgp_attr_default_set(struct attr *attr, struct bgp *bgp,
+ 	attr->tag = 0;
+ 	attr->label_index = BGP_INVALID_LABEL_INDEX;
+ 	attr->label = MPLS_INVALID_LABEL;
+-	bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 	attr->mp_nexthop_len = IPV6_MAX_BYTELEN;
+ 	attr->local_pref = bgp->default_local_pref;
+ 
+diff --git a/bgpd/bgp_evpn.c b/bgpd/bgp_evpn.c
+index 8e3569b54419..0b0eb1d623cd 100644
+--- a/bgpd/bgp_evpn.c
++++ b/bgpd/bgp_evpn.c
+@@ -8429,6 +8429,7 @@ void bgp_evpn_fill_rmac_nh_to_attr(struct bgp *bgp_vrf, struct attr *attr, struc
+ 			attr->nexthop = bgp_vrf->originator_ip.ipaddr_v4;
+ 			attr->mp_nexthop_global_in = bgp_vrf->originator_ip.ipaddr_v4;
+ 			attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
++			bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 		} else {
+ 			IPV6_ADDR_COPY(&attr->mp_nexthop_global, &bgp_vrf->originator_ip.ipaddr_v6);
+ 			attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL;
+@@ -8449,6 +8450,7 @@ void bgp_evpn_fill_rmac_nh_to_attr(struct bgp *bgp_vrf, struct attr *attr, struc
+ 			if (bgp_vrf->evpn_info->pip_ip.ipaddr_v4.s_addr != INADDR_ANY) {
+ 				attr->nexthop = bgp_vrf->evpn_info->pip_ip.ipaddr_v4;
+ 				attr->mp_nexthop_global_in = bgp_vrf->evpn_info->pip_ip.ipaddr_v4;
++				bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 			} else if (bgp_vrf->evpn_info->pip_ip.ipaddr_v4.s_addr == INADDR_ANY) {
+ 				if (bgp_debug_zebra(NULL))
+ 					zlog_debug("VRF %s evp %pFX advertise-pip primary ip is not configured",
+diff --git a/bgpd/bgp_evpn_mh.c b/bgpd/bgp_evpn_mh.c
+index f79b65c69a97..fa3e60dde759 100644
+--- a/bgpd/bgp_evpn_mh.c
++++ b/bgpd/bgp_evpn_mh.c
+@@ -100,6 +100,7 @@ void bgp_evpn_vtep_ip_to_attr_nh(const struct ipaddr *vtep_ip, struct attr *attr
+ 		attr->nexthop = vtep_ip->ipaddr_v4;
+ 		attr->mp_nexthop_global_in = vtep_ip->ipaddr_v4;
+ 		attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
++		bgp_attr_set(attr, BGP_ATTR_NEXT_HOP);
+ 	} else if (IS_IPADDR_V6(vtep_ip)) {
+ 		IPV6_ADDR_COPY(&attr->mp_nexthop_global, &vtep_ip->ipaddr_v6);
+ 		attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL;
+diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
+index ddbd24d9aafb..0a7fb527dce7 100644
+--- a/bgpd/bgp_route.c
++++ b/bgpd/bgp_route.c
+@@ -8267,8 +8267,10 @@ void bgp_static_update(struct bgp *bgp, const struct prefix *p,
+ 
+ 	bgp_attr_default_set(&attr, bgp, BGP_ORIGIN_IGP);
+ 
+-	if (afi == AFI_IP)
++	if (afi == AFI_IP) {
+ 		nh_length = IPV4_MAX_BYTELEN;
++		bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
++	}
+ 
+ 	/* NHC */
+ 	nhc = XCALLOC(MTYPE_BGP_NHC, sizeof(struct bgp_nhc));
+@@ -10575,9 +10577,6 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 	 */
+ 	assert(attr.aspath);
+ 
+-	if (p->family == AF_INET6)
+-		UNSET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+-
+ 	switch (nhtype) {
+ 	case NEXTHOP_TYPE_IFINDEX:
+ 		switch (p->family) {
+@@ -10585,6 +10584,7 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 			attr.nexthop.s_addr = INADDR_ANY;
+ 			attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
+ 			attr.mp_nexthop_global_in.s_addr = INADDR_ANY;
++			bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 			break;
+ 		case AF_INET6:
+ 			memset(&attr.mp_nexthop_global, 0,
+@@ -10598,6 +10598,7 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 		attr.nexthop = nexthop->ipv4;
+ 		attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
+ 		attr.mp_nexthop_global_in = nexthop->ipv4;
++		bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 		break;
+ 	case NEXTHOP_TYPE_IPV6:
+ 	case NEXTHOP_TYPE_IPV6_IFINDEX:
+@@ -10610,6 +10611,7 @@ void bgp_redistribute_add(struct bgp *bgp, struct prefix *p,
+ 			attr.nexthop.s_addr = INADDR_ANY;
+ 			attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV4;
+ 			attr.mp_nexthop_global_in.s_addr = INADDR_ANY;
++			bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 			break;
+ 		case AF_INET6:
+ 			memset(&attr.mp_nexthop_global, 0,
+diff --git a/bgpd/bgp_updgrp_adv.c b/bgpd/bgp_updgrp_adv.c
+index 07b532e2324c..9947948c995e 100644
+--- a/bgpd/bgp_updgrp_adv.c
++++ b/bgpd/bgp_updgrp_adv.c
+@@ -987,6 +987,8 @@ void subgroup_default_originate(struct update_subgroup *subgrp, bool withdraw)
+ 		if (peer->shared_network
+ 		    && !IN6_IS_ADDR_UNSPECIFIED(&peer->nexthop.v6_local))
+ 			attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL;
++	} else {
++		bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
+ 	}
+ 
+ 	if (peer->default_rmap[afi][safi].name) {
+-- 
+2.47.3
+
diff --git a/debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch b/debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch
new file mode 100644
index 000000000000..ffa78c29f30d
--- /dev/null
+++ b/debian/patches/upstream/0006-bgpd-preserve-IPv6-nexthops-when-importing-EVPN-IPv4.patch
@@ -0,0 +1,107 @@
+From f512fac23368ddc1be4cdab95601410d907a8e92 Mon Sep 17 00:00:00 2001
+From: Gabriel Goller <g.goller@proxmox.com>
+Date: Fri, 15 May 2026 16:04:25 +0200
+Subject: [PATCH 2/2] bgpd: preserve IPv6 nexthops when importing EVPN IPv4
+ routes
+
+When importing an EVPN route into a VRF unicast table,
+install_evpn_route_entry_in_vrf() converted every imported IPv4 route
+into a route with the legacy IPv4 NEXT_HOP attribute set:
+
+    attr.nexthop = attr.mp_nexthop_global_in;
+    SET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+
+This is only valid when the imported EVPN nexthop is IPv4. With IPv6
+VTEPs we can get IPv4 prefixes with IPv6 nexthops and the route already
+has the real nexthop encoded in the MP nexthop fields. In that case
+setting BGP_ATTR_NEXT_HOP creates an inconsistent attribute: the route
+has an IPv6 MP nexthop, but is also marked as having a classic IPv4
+NEXT_HOP.
+
+This breaks code that uses BGP_ATTR_NEXTHOP_AFI_IP6() to determine
+the nexthop address family. BGP_ATTR_NEXTHOP_AFI_IP6() sees
+BGP_ATTR_NEXT_HOP and thinks this is a IPv4 route with a IPv4 nexthop
+even though mp_nexthop_len indicates an IPv6 nexthop. The result is that
+VRF import/leak drops the IPv6 nexthop and sends a 0.0.0.0 nexthop to
+zebra.
+
+Fix this by only assigning attr.nexthop and setting BGP_ATTR_NEXT_HOP
+when the imported EVPN route does not have an IPv6 MP nexthop. EVPN IPv4
+routes with IPv6 nexthops are left as MP-nexthop routes.
+
+This is related to the previous BGP_ATTR_NEXT_HOP cleanup (#21166) and
+was probably missed there.
+
+Also make the nexthop-change detection handle this case by comparing the
+MP IPv6 nexthop for IPv4 routes that carry one.
+
+Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
+---
+ bgpd/bgp_evpn.c | 36 ++++++++++++++++++++++--------------
+ 1 file changed, 22 insertions(+), 14 deletions(-)
+
+diff --git a/bgpd/bgp_evpn.c b/bgpd/bgp_evpn.c
+index 0b0eb1d623cd..b1de8948e4d3 100644
+--- a/bgpd/bgp_evpn.c
++++ b/bgpd/bgp_evpn.c
+@@ -3215,11 +3215,11 @@ static int install_evpn_route_entry_in_vrf(struct bgp *bgp_vrf,
+ 	} else
+ 		return 0;
+ 
+-	/* EVPN routes currently only support a IPv4 next hop which corresponds
+-	 * to the remote VTEP. When importing into a VRF, if it is IPv6 host
+-	 * or prefix route, we have to convert the next hop to an IPv4-mapped
+-	 * address for the rest of the code to flow through. In the case of IPv4,
+-	 * make sure to set the flag for next hop attribute.
++	/* EVPN routes may carry either an IPv4 or IPv6 next hop corresponding
++	 * to the remote VTEP. When importing into a VRF, IPv6 host/prefix routes
++	 * use an IPv6 MP nexthop. For IPv4 routes, set the legacy NEXT_HOP
++	 * attribute only when the imported nexthop is IPv4; IPv6 nexthops are
++	 * preserved as MP nexthops.
+ 	 */
+ 	attr = *parent_pi->attr;
+ 	bre = bgp_attr_get_evpn_overlay(&attr);
+@@ -3245,11 +3245,13 @@ static int install_evpn_route_entry_in_vrf(struct bgp *bgp_vrf,
+ 			SET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+ 		}
+ 	} else {
+-		if (afi == AFI_IP6)
++		if (afi == AFI_IP) {
++			if (!BGP_ATTR_MP_NEXTHOP_LEN_IP6(&attr)) {
++				attr.nexthop = attr.mp_nexthop_global_in;
++				bgp_attr_set(&attr, BGP_ATTR_NEXT_HOP);
++			}
++		} else if (afi == AFI_IP6) {
+ 			evpn_convert_nexthop_to_ipv6(&attr);
+-		else {
+-			attr.nexthop = attr.mp_nexthop_global_in;
+-			SET_FLAG(attr.flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP));
+ 		}
+ 	}
+ 
+@@ -3287,11 +3289,17 @@ static int install_evpn_route_entry_in_vrf(struct bgp *bgp_vrf,
+ 			bgp_path_info_restore(dest, pi);
+ 
+ 		/* Mark if nexthop has changed. */
+-		if ((afi == AFI_IP
+-		     && !IPV4_ADDR_SAME(&pi->attr->nexthop, &attr_new->nexthop))
+-		    || (afi == AFI_IP6
+-			&& !IPV6_ADDR_SAME(&pi->attr->mp_nexthop_global,
+-					   &attr_new->mp_nexthop_global)))
++		if (afi == AFI_IP) {
++			bool old_v6nh = BGP_ATTR_MP_NEXTHOP_LEN_IP6(pi->attr);
++			bool new_v6nh = BGP_ATTR_MP_NEXTHOP_LEN_IP6(attr_new);
++
++			if (old_v6nh != new_v6nh ||
++			    (old_v6nh && !IPV6_ADDR_SAME(&pi->attr->mp_nexthop_global,
++							 &attr_new->mp_nexthop_global)) ||
++			    (!old_v6nh && !IPV4_ADDR_SAME(&pi->attr->nexthop, &attr_new->nexthop)))
++				SET_FLAG(pi->flags, BGP_PATH_IGP_CHANGED);
++		} else if (afi == AFI_IP6 && !IPV6_ADDR_SAME(&pi->attr->mp_nexthop_global,
++							     &attr_new->mp_nexthop_global))
+ 			SET_FLAG(pi->flags, BGP_PATH_IGP_CHANGED);
+ 
+ 		bgp_path_info_set_flag(dest, pi, BGP_PATH_ATTR_CHANGED);
+-- 
+2.47.3
+
-- 
2.47.3

next prev parent reply	other threads:[~2026-05-15 15:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 15:23 [PATCH frr 0/2] Fix leaked EVPN routes having wrong nexthop on IPv4 via IPv6 routes Gabriel Goller
2026-05-15 15:23 ` Gabriel Goller [this message]
2026-05-15 16:06   ` [PATCH frr 1/2] frr: backport #21166 and #21958, fixing EVPN IPv4 routes with IPv6 nexhtop Gabriel Goller
2026-05-15 15:23 ` [PATCH frr 2/2] bump to version 10.6.1-1+pve2 Gabriel Goller
2026-05-16 23:59 ` [PATCH frr 0/2] Fix leaked EVPN routes having wrong nexthop on IPv4 via IPv6 routes Thomas Lamprecht

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:fed297922f2 dfblob:51b5fe2f29f dfblob:290afb92eb1
dfblob:ffa78c29f30 )
 OR (
bs:"[PATCH frr 1/2] frr: backport #21166 and #21958, fixing EVPN IPv4 routes with IPv6 nexhtop" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260515152400.726794-2-g.goller@proxmox.com \
    --to=g.goller@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal