From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 8F24E1FF15E for ; Mon, 15 Sep 2025 20:09:25 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 5318629C0; Mon, 15 Sep 2025 20:09:29 +0200 (CEST) From: Gabriel Goller To: pve-devel@lists.proxmox.com Date: Mon, 15 Sep 2025 20:08:50 +0200 Message-ID: <20250915180851.423438-1-g.goller@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1757959728503 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.005 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH] fix: sdn: fabrics: always add node-ip to all fabric interfaces X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" OpenFabric can usually handle completely unnumbered interfaces, so interfaces without any ip configured. This works because FRR will search for a source ip on all the other fabric-interfaces and select the first one. We always set the node-ip on the dummy interface and thus OpenFabric will "borrow" the dummy interface ip (node ip) as a source ip on every fabric interface. The problem is that this can cause issues with ARP. ARP doesn't know anything about the fabric and also won't lookup the routing table (because routing-table = layer3, arp = layer2) and thus doesn't know which source ip address to select. So ARP just iterates through all interfaces and selects the first one with an ip. This is obviously not always correct, so you get ARP requests like the following: who-is 10.0.1.3 tell 172.16.0.1 where 10.0.1.3 is a fabric-ip (from a dummy interface) and 172.16.0.1. is an ip from a completely different, unrelated interface. Now the tricky thing is that a normal ping will still work, because the ping will lookup the ip in the routing table, which has an entry like: 10.0.1.3 nhid 27 via 10.0.1.2 dev ens21 proto openfabric src 10.0.1.1 metric 20 onlink And the "src 10.0.1.1" tells ARP to set the source ip to 10.0.1.1, generating a ARP message like this: who-has 10.0.1.3 tell 10.0.1.2 which is correct and will get an answer. This is implemented here: net/ipv4/arp.c:333 And we can see exactly that if the source ip is RTN_LOCAL (meaning it is on this host) then we search for a local matching ip (this is done when pinging from one to the other directly). If it isn't we iterate over all the interfaces in search of another ip address. When forwarding a packet, the second option comes into play, so when e.g. forwarding a packet from node1 to node3 over node2, node2 sees a non-local source address on the packet and tries to search another address (by iterating over all interfaces, meaning sometimes it gets the wrong address). This means that it sends out arp packets to find the mac-address of node3 using a wrong source address. We could set arp_announce=1 which means that when forwarding we can find the correct local source ip for arp, but this means we can't ping the other nodes anymore because net/ipv4/arp.c:363 fails (the ip addresses aren't in the same subnet, so they're not "local"). The easiest way is to just set the ip address on every interface, which will make arp select the local interface ip as a source. Signed-off-by: Gabriel Goller --- pve-rs/src/bindings/sdn/fabrics.rs | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/pve-rs/src/bindings/sdn/fabrics.rs b/pve-rs/src/bindings/sdn/fabrics.rs index 587b1d68c8fb..9d5fa6c53d70 100644 --- a/pve-rs/src/bindings/sdn/fabrics.rs +++ b/pve-rs/src/bindings/sdn/fabrics.rs @@ -544,12 +544,21 @@ pub mod pve_rs_sdn_fabrics { write!(interfaces, "{interface}")?; } - // If not ip is configured, add auto and empty iface to bring interface up + // If no ip is configured, add auto and iface with node ip to bring interface up + // OpenFabric doesn't really need an ip on the interface, but the problem + // is that arp can't tell which source address to use in some cases, so + // it's better if we set the node address on all the fabric interfaces. if let (None, None) = (interface.ip(), interface.ip6()) { + let cidr = Cidr::from(if let Some(ip) = node.ip() { + IpAddr::from(ip) + } else if let Some(ip) = node.ip6() { + IpAddr::from(ip) + } else { + anyhow::bail!("there has to be a ipv4 or ipv6 node address"); + }); + let interface = render_interface(interface.name(), cidr, false)?; writeln!(interfaces)?; - writeln!(interfaces, "auto {}", interface.name())?; - writeln!(interfaces, "iface {}", interface.name())?; - writeln!(interfaces, "\tip-forward 1")?; + write!(interfaces, "{interface}")?; } } } -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel