From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 15A211FF16B for <inbox@lore.proxmox.com>; Thu, 3 Apr 2025 10:30:22 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 992143A063; Thu, 3 Apr 2025 10:30:10 +0200 (CEST) Message-ID: <c3d5e091-7a92-4f2f-be6a-4753b5691492@proxmox.com> Date: Thu, 3 Apr 2025 10:30:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Friedrich Weber <f.weber@proxmox.com> To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>, Gabriel Goller <g.goller@proxmox.com> References: <20250328171340.885413-1-g.goller@proxmox.com> Content-Language: en-US In-Reply-To: <20250328171340.885413-1-g.goller@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.010 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH cluster/docs/manager/network/proxmox{, -ve-rs, -firewall, -perl-rs} 00/52] Add SDN Fabrics X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> On 28/03/2025 18:12, Gabriel Goller wrote: > This series allows the user to add fabrics such as OpenFabric and OSPF over > their clusters. > > Overview > ======== > > This series allows the user to create routed networks ('fabrics') across their > clusters, which can be used as the underlay network for a EVPN cluster, or for > creating Ceph full mesh clusters easily. > > This patch series adds the initial support for two routing protocols: > * OpenFabric > * OSPF I tested a bit with packages provided Gabriel built for me (thanks!), both OSPF and OpenFabric, and also set up a Ceph full mesh over OpenFabric. Overall it looked quite smooth! I didn't notice huge issues, but have some minor points below: - I think the error message when frr+frr-pythontools is not installed looked a bit scary. It's on me for not reading the docs, but still, might be nice to have a friendlier error message in that case :) - having already added one node, and then adding another using the "Add Node" dialog, it has happened multiple times that I kept "Node" at the default first node (which I already had defined) while I thought I was configuring the second one, and only noticed when I submitted and got "node already exists". And then, when I change the "Node" to the correct one, I lost my form input :) I understand that we need to reload when changing "Node" (the other node might have other interfaces), but to avoid the above, maybe the dialog could preselect a node that is not yet defined? - when removing a fabric, the IP addresses defined on the interfaces remain until the next reboot. I guess the reason is that ifupdown2 doesn't remove IP addresses when the corresponding stanza vanishes. Not sure if this can be easily fixed -- if not, maybe this would be worth a note in the docs? - when removing the only fabric and applying, the srvreload task has a couple of spurious error messages: > 2025-04-03 09:35:59,354 [91m ERROR[0m: Filename /etc/frr/frr.conf is an empty file > frr reload command fail: command '/usr/lib/frr/frr-reload.py --stdout --reload /etc/frr/frr.conf' failed: exit code 1 > Restarting frr. at /usr/share/perl5/PVE/Network/SDN/Frr.pm line 74. > TASK OK - regarding the hello/csnp intervals: it would be nice to mention what the default values are. Also, probably not relevant for this patch series, but wanted to mention anyway: For running a Ceph full mesh over a fabric, one probably wants to set relatively low values here (as our wiki guide does [3])? If there is a guide in the future for setting up Ceph full mesh over fabric, would be nice if the guide would mention that. - I'm not so sure about this, but maybe it would be nice to show the default-hidden hello/csnp interval columns if I have entered a value there? - when I remove hello interval+multiplier and the csnp via the GUI, I get the following warning in the journal: > Apr 03 10:20:50 fabric159 pveproxy[9244]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330. > Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330. > Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330. - after setting up an OSPF fabric in a 3-node full mesh, I couldn't ping the loopback addresses until I rebooted all nodes. I've attached the task logs of the srvreloads and the ospf.cfg below [1]. After a reboot, the pings work fine. Could it be because an OSPF with the same area existed previously? - probably a user error, but: after setting up an OpenFabric fabric and rebooting, the routes didn't come up automatically. My openfabric.cfg is in [2]. systemctl status frr shows the following: > Apr 03 10:02:20 fabric159 systemd[1]: Started frr.service - FRRouting. > Apr 03 10:02:21 fabric159 fabricd[699]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP! > Apr 03 10:03:48 fabric159 fabricd[699]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers > Apr 03 10:02:23 fabric160 systemd[1]: Started frr.service - FRRouting. > Apr 03 10:02:24 fabric160 fabricd[674]: [MZS0T-YRAMC] OpenFabric: Initial synchronization on ens19 complete. > Apr 03 10:03:48 fabric160 fabricd[674]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers > Apr 03 10:02:19 fabric161 systemd[1]: Started frr.service - FRRouting. > Apr 03 10:02:21 fabric161 fabricd[681]: [MZS0T-YRAMC] OpenFabric: Initial synchronization on ens20 complete. > Apr 03 10:03:48 fabric161 fabricd[681]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers Maybe I'm just too impatient, but estarting frr and waiting for ~30 seconds fixes it. [1] fabric159: 2025-04-03 09:30:06,673 INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)" 2025-04-03 09:30:06,673 INFO: Loading Config object from file /etc/frr/frr.conf 2025-04-03 09:30:06,690 INFO: Loading Config object from vtysh show running 2025-04-03 09:30:06,697 INFO: "frr defaults traditional" cannot be removed 2025-04-03 09:30:06,703 INFO: Executed "ip forwarding" 2025-04-03 09:30:06,709 INFO: Executed "ipv6 forwarding" 2025-04-03 09:30:06,709 INFO: /var/run/frr/reload-B14N3D.txt content ['frr defaults datacenter\n', 'log syslog informational\n', 'router ospf\nexit\n', 'router ospf\n ospf router-id 172.16.0.159\nexit\n', 'interface dummy_1234\nexit\n', 'interface dummy_1234\n ip ospf area 1234\nexit\n', 'interface dummy_1234\n ip ospf passive\nexit\n', 'interface ens19\nexit\n', 'interface ens19\n ip ospf area 1234\nexit\n', 'interface ens20\nexit\n', 'interface ens20\n ip ospf area 1234\nexit\n', 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 'route-map ospf permit 100\nexit\n', 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 'route-map ospf permit 100\n set src 172.16.0.159\nexit\n', 'ip protocol ospf route-map ospf\n', 'line vty\nexit\n'] [1667|mgmtd] sending configuration [1668|zebra] sending configuration [1671|ospfd] sending configuration [1674|bgpd] sending configuration [1668|zebra] done [1682|watchfrr] sending configuration [1684|staticd] sending configuration [1685|bfdd] sending configuration Waiting for children to finish applying config... [1682|watchfrr] done [1674|bgpd] done [1684|staticd] done [1685|bfdd] done [1667|mgmtd] done [1671|ospfd] done 2025-04-03 09:30:06,721 INFO: Loading Config object from vtysh show running 2025-04-03 09:30:06,729 INFO: /var/run/frr/reload-UJJQIC.txt content ['line vty\nexit\n', 'frr defaults datacenter\n', 'log syslog informational\n', 'router ospf\nexit\n', 'router ospf\n ospf router-id 172.16.0.159\nexit\n', 'interface dummy_1234\nexit\n', 'interface dummy_1234\n ip ospf area 1234\nexit\n', 'interface dummy_1234\n ip ospf passive\nexit\n', 'interface ens19\nexit\n', 'interface ens19\n ip ospf area 1234\nexit\n', 'interface ens20\nexit\n', 'interface ens20\n ip ospf area 1234\nexit\n', 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 'route-map ospf permit 100\nexit\n', 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 'route-map ospf permit 100\n set src 172.16.0.159\nexit\n', 'ip protocol ospf route-map ospf\n', 'line vty\nexit\n'] [1692|mgmtd] sending configuration [1693|zebra] sending configuration [1696|ospfd] sending configuration [1699|bgpd] sending configuration [1693|zebra] done [1707|watchfrr] sending configuration [1709|staticd] sending configuration [1710|bfdd] sending configuration Waiting for children to finish applying config... [1707|watchfrr] done [1696|ospfd] done MGMTD: No changes found to be committed! [1692|mgmtd] done [1709|staticd] done [1699|bgpd] done [1710|bfdd] done TASK OK fabric160: 2025-04-03 09:30:09,972 INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)" 2025-04-03 09:30:09,972 INFO: Loading Config object from file /etc/frr/frr.conf 2025-04-03 09:30:09,985 INFO: Loading Config object from vtysh show running 2025-04-03 09:30:09,992 INFO: "frr defaults traditional" cannot be removed 2025-04-03 09:30:09,998 INFO: Executed "ip forwarding" 2025-04-03 09:30:10,004 INFO: Executed "ipv6 forwarding" 2025-04-03 09:30:10,004 INFO: /var/run/frr/reload-5ATLT2.txt content ['frr defaults datacenter\n', 'log syslog informational\n', 'router ospf\nexit\n', 'router ospf\n ospf router-id 172.16.0.160\nexit\n', 'interface dummy_1234\nexit\n', 'interface dummy_1234\n ip ospf area 1234\nexit\n', 'interface dummy_1234\n ip ospf passive\nexit\n', 'interface ens19\nexit\n', 'interface ens19\n ip ospf area 1234\nexit\n', 'interface ens20\nexit\n', 'interface ens20\n ip ospf area 1234\nexit\n', 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 'route-map ospf permit 100\nexit\n', 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 'route-map ospf permit 100\n set src 172.16.0.160\nexit\n', 'ip protocol ospf route-map ospf\n', 'line vty\nexit\n'] [1699|mgmtd] sending configuration [1700|zebra] sending configuration [1703|ospfd] sending configuration [1706|bgpd] sending configuration [1700|zebra] done [1714|watchfrr] sending configuration [1716|staticd] sending configuration [1717|bfdd] sending configuration Waiting for children to finish applying config... [1714|watchfrr] done [1716|staticd] done [1706|bgpd] done [1717|bfdd] done [1699|mgmtd] done [1703|ospfd] done 2025-04-03 09:30:10,016 INFO: Loading Config object from vtysh show running 2025-04-03 09:30:10,023 INFO: /var/run/frr/reload-NFS4UM.txt content ['line vty\nexit\n', 'frr defaults datacenter\n', 'log syslog informational\n', 'router ospf\nexit\n', 'router ospf\n ospf router-id 172.16.0.160\nexit\n', 'interface dummy_1234\nexit\n', 'interface dummy_1234\n ip ospf area 1234\nexit\n', 'interface dummy_1234\n ip ospf passive\nexit\n', 'interface ens19\nexit\n', 'interface ens19\n ip ospf area 1234\nexit\n', 'interface ens20\nexit\n', 'interface ens20\n ip ospf area 1234\nexit\n', 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 'route-map ospf permit 100\nexit\n', 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 'route-map ospf permit 100\n set src 172.16.0.160\nexit\n', 'ip protocol ospf route-map ospf\n', 'line vty\nexit\n'] [1724|mgmtd] sending configuration [1725|zebra] sending configuration [1728|ospfd] sending configuration [1731|bgpd] sending configuration [1739|watchfrr] sending configuration [1725|zebra] done [1741|staticd] sending configuration [1742|bfdd] sending configuration Waiting for children to finish applying config... [1739|watchfrr] done [1741|staticd] done [1728|ospfd] done [1731|bgpd] done [1742|bfdd] done MGMTD: No changes found to be committed! [1724|mgmtd] done TASK OK fabric161: 2025-04-03 09:30:08,321 INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)" 2025-04-03 09:30:08,321 INFO: Loading Config object from file /etc/frr/frr.conf 2025-04-03 09:30:08,334 INFO: Loading Config object from vtysh show running 2025-04-03 09:30:08,342 INFO: "frr defaults traditional" cannot be removed 2025-04-03 09:30:08,348 INFO: Executed "ip forwarding" 2025-04-03 09:30:08,354 INFO: Executed "ipv6 forwarding" 2025-04-03 09:30:08,354 INFO: /var/run/frr/reload-PVFBCH.txt content ['frr defaults datacenter\n', 'log syslog informational\n', 'router ospf\nexit\n', 'router ospf\n ospf router-id 172.16.0.161\nexit\n', 'interface dummy_1234\nexit\n', 'interface dummy_1234\n ip ospf area 1234\nexit\n', 'interface dummy_1234\n ip ospf passive\nexit\n', 'interface ens19\nexit\n', 'interface ens19\n ip ospf area 1234\nexit\n', 'interface ens20\nexit\n', 'interface ens20\n ip ospf area 1234\nexit\n', 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 'route-map ospf permit 100\nexit\n', 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 'route-map ospf permit 100\n set src 172.16.0.161\nexit\n', 'ip protocol ospf route-map ospf\n', 'line vty\nexit\n'] [1671|mgmtd] sending configuration [1672|zebra] sending configuration [1675|ospfd] sending configuration [1678|bgpd] sending configuration [1686|watchfrr] sending configuration [1688|staticd] sending configuration [1672|zebra] done [1689|bfdd] sending configuration Waiting for children to finish applying config... [1688|staticd] done [1686|watchfrr] done [1689|bfdd] done [1678|bgpd] done [1671|mgmtd] done [1675|ospfd] done 2025-04-03 09:30:08,367 INFO: Loading Config object from vtysh show running 2025-04-03 09:30:08,374 INFO: /var/run/frr/reload-SKOSWJ.txt content ['line vty\nexit\n', 'frr defaults datacenter\n', 'log syslog informational\n', 'router ospf\nexit\n', 'router ospf\n ospf router-id 172.16.0.161\nexit\n', 'interface dummy_1234\nexit\n', 'interface dummy_1234\n ip ospf area 1234\nexit\n', 'interface dummy_1234\n ip ospf passive\nexit\n', 'interface ens19\nexit\n', 'interface ens19\n ip ospf area 1234\nexit\n', 'interface ens20\nexit\n', 'interface ens20\n ip ospf area 1234\nexit\n', 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 'route-map ospf permit 100\nexit\n', 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 'route-map ospf permit 100\n set src 172.16.0.161\nexit\n', 'ip protocol ospf route-map ospf\n', 'line vty\nexit\n'] [1696|mgmtd] sending configuration [1697|zebra] sending configuration [1700|ospfd] sending configuration [1703|bgpd] sending configuration [1697|zebra] done [1711|watchfrr] sending configuration [1713|staticd] sending configuration Waiting for children to finish applying config... [1714|bfdd] sending configuration [1711|watchfrr] done [1713|staticd] done [1714|bfdd] done [1700|ospfd] done [1703|bgpd] done MGMTD: No changes found to be committed! [1696|mgmtd] done TASK OK # cat /etc/pve/sdn/fabrics/ospf.cfg fabric: 1234 loopback_prefix 172.16.0.0/24 node: 1234_fabric159 interface name=ens19,ip=172.31.0.159/24 interface name=ens20,ip=172.31.2.159/24 router_id 172.16.0.159 node: 1234_fabric160 interface name=ens19,ip=172.31.0.160/24 interface name=ens20,ip=172.31.1.160/24 router_id 172.16.0.160 node: 1234_fabric161 interface name=ens19,ip=172.31.1.161/24 interface name=ens20,ip=172.31.2.161/24 router_id 172.16.0.161 [2] # cat /etc/pve/sdn/fabrics/openfabric.cfg fabric: fabric hello_interval 2 loopback_prefix 172.16.0.0/24 node: fabric_fabric159 interface name=ens19,ip=172.31.0.159/24 interface name=ens20,ip=172.31.2.159/24 router_id 172.16.0.159 node: fabric_fabric160 interface name=ens19,ip=172.31.0.160/24 interface name=ens20,ip=172.31.1.160/24 router_id 172.16.0.160 node: fabric_fabric161 interface name=ens19,ip=172.31.1.161/24 interface name=ens20,ip=172.31.2.161/24 router_id 172.16.0.161 [3] https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback) _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel