From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 15A211FF16B
	for <inbox@lore.proxmox.com>; Thu,  3 Apr 2025 10:30:22 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 992143A063;
	Thu,  3 Apr 2025 10:30:10 +0200 (CEST)
Message-ID: <c3d5e091-7a92-4f2f-be6a-4753b5691492@proxmox.com>
Date: Thu, 3 Apr 2025 10:30:05 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Friedrich Weber <f.weber@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Gabriel Goller <g.goller@proxmox.com>
References: <20250328171340.885413-1-g.goller@proxmox.com>
Content-Language: en-US
In-Reply-To: <20250328171340.885413-1-g.goller@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.010 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH cluster/docs/manager/network/proxmox{, -ve-rs,
 -firewall, -perl-rs} 00/52] Add SDN Fabrics
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

On 28/03/2025 18:12, Gabriel Goller wrote:
> This series allows the user to add fabrics such as OpenFabric and OSPF over
> their clusters.
> 
> Overview
> ========
> 
> This series allows the user to create routed networks ('fabrics') across their
> clusters, which can be used as the underlay network for a EVPN cluster, or for
> creating Ceph full mesh clusters easily.
> 
> This patch series adds the initial support for two routing protocols:
> * OpenFabric
> * OSPF

I tested a bit with packages provided Gabriel built for me (thanks!),
both OSPF and OpenFabric, and also set up a Ceph full mesh over OpenFabric.
Overall it looked quite smooth! I didn't notice huge issues, but have
some minor points below:

- I think the error message when frr+frr-pythontools is not installed
looked a bit scary. It's on me for not reading the docs, but still,
might be nice to have a friendlier error message in that case :)

- having already added one node, and then adding another using the "Add
Node" dialog, it has happened multiple times that I kept "Node" at the
default first node (which I already had defined) while I thought I was
configuring the second one, and only noticed when I submitted and got
"node already exists". And then, when I change the "Node" to the correct
one, I lost my form input :) I understand that we need to reload when
changing "Node" (the other node might have other interfaces), but to
avoid the above, maybe the dialog could preselect a node that is not yet
defined?

- when removing a fabric, the IP addresses defined on the interfaces
remain until the next reboot. I guess the reason is that ifupdown2
doesn't remove IP addresses when the corresponding stanza vanishes. Not
sure if this can be easily fixed -- if not, maybe this would be worth a
note in the docs?

- when removing the only fabric and applying, the srvreload task has a
couple of spurious error messages:

> 2025-04-03 09:35:59,354   ERROR: Filename /etc/frr/frr.conf is an empty file 
> frr reload command fail: command '/usr/lib/frr/frr-reload.py --stdout --reload /etc/frr/frr.conf' failed: exit code 1
> Restarting frr. at /usr/share/perl5/PVE/Network/SDN/Frr.pm line 74.
> TASK OK

- regarding the hello/csnp intervals: it would be nice to mention what the
default values are. Also, probably not relevant for this patch series, but 
wanted to mention anyway: For running a Ceph full mesh over a fabric,
one probably wants to set relatively low values here (as our wiki guide
does [3])? If there is a guide in the future for setting up Ceph full mesh
over fabric, would be nice if the guide would mention that.

- I'm not so sure about this, but maybe it would be nice to show the
default-hidden hello/csnp interval columns if I have entered a value
there?

- when I remove hello interval+multiplier and the csnp via the GUI, I get
the following warning in the journal:

> Apr 03 10:20:50 fabric159 pveproxy[9244]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330.
> Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330.
> Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330.

- after setting up an OSPF fabric in a 3-node full mesh, I couldn't ping
the loopback addresses until I rebooted all nodes. I've attached the
task logs of the srvreloads and the ospf.cfg below [1]. After a reboot,
the pings work fine. Could it be because an OSPF with the same area
existed previously?

- probably a user error, but: after setting up an OpenFabric fabric and
rebooting, the routes didn't come up automatically. My openfabric.cfg is
in [2]. systemctl status frr shows the following:

> Apr 03 10:02:20 fabric159 systemd[1]: Started frr.service - FRRouting.
> Apr 03 10:02:21 fabric159 fabricd[699]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
> Apr 03 10:03:48 fabric159 fabricd[699]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers

> Apr 03 10:02:23 fabric160 systemd[1]: Started frr.service - FRRouting.
> Apr 03 10:02:24 fabric160 fabricd[674]: [MZS0T-YRAMC] OpenFabric: Initial synchronization on ens19 complete.
> Apr 03 10:03:48 fabric160 fabricd[674]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers

> Apr 03 10:02:19 fabric161 systemd[1]: Started frr.service - FRRouting.
> Apr 03 10:02:21 fabric161 fabricd[681]: [MZS0T-YRAMC] OpenFabric: Initial synchronization on ens20 complete.
> Apr 03 10:03:48 fabric161 fabricd[681]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers

Maybe I'm just too impatient, but estarting frr and waiting for ~30 seconds fixes it.

[1]

fabric159:

2025-04-03 09:30:06,673  INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)" 
2025-04-03 09:30:06,673  INFO: Loading Config object from file /etc/frr/frr.conf 
2025-04-03 09:30:06,690  INFO: Loading Config object from vtysh show running 
2025-04-03 09:30:06,697  INFO: "frr defaults traditional" cannot be removed 
2025-04-03 09:30:06,703  INFO: Executed "ip forwarding" 
2025-04-03 09:30:06,709  INFO: Executed "ipv6 forwarding" 
2025-04-03 09:30:06,709  INFO: /var/run/frr/reload-B14N3D.txt content 
['frr defaults datacenter\n', 
 'log syslog informational\n', 
 'router ospf\nexit\n', 
 'router ospf\n ospf router-id 172.16.0.159\nexit\n', 
 'interface dummy_1234\nexit\n', 
 'interface dummy_1234\n ip ospf area 1234\nexit\n', 
 'interface dummy_1234\n ip ospf passive\nexit\n', 
 'interface ens19\nexit\n', 
 'interface ens19\n ip ospf area 1234\nexit\n', 
 'interface ens20\nexit\n', 
 'interface ens20\n ip ospf area 1234\nexit\n', 
 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 
 'route-map ospf permit 100\nexit\n', 
 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 
 'route-map ospf permit 100\n set src 172.16.0.159\nexit\n', 
 'ip protocol ospf route-map ospf\n', 
 'line vty\nexit\n'] 
[1667|mgmtd] sending configuration
[1668|zebra] sending configuration
[1671|ospfd] sending configuration
[1674|bgpd] sending configuration
[1668|zebra] done 
[1682|watchfrr] sending configuration
[1684|staticd] sending configuration
[1685|bfdd] sending configuration
Waiting for children to finish applying config...
[1682|watchfrr] done 
[1674|bgpd] done 
[1684|staticd] done 
[1685|bfdd] done 
[1667|mgmtd] done 
[1671|ospfd] done 
2025-04-03 09:30:06,721  INFO: Loading Config object from vtysh show running 
2025-04-03 09:30:06,729  INFO: /var/run/frr/reload-UJJQIC.txt content 
['line vty\nexit\n', 
 'frr defaults datacenter\n', 
 'log syslog informational\n', 
 'router ospf\nexit\n', 
 'router ospf\n ospf router-id 172.16.0.159\nexit\n', 
 'interface dummy_1234\nexit\n', 
 'interface dummy_1234\n ip ospf area 1234\nexit\n', 
 'interface dummy_1234\n ip ospf passive\nexit\n', 
 'interface ens19\nexit\n', 
 'interface ens19\n ip ospf area 1234\nexit\n', 
 'interface ens20\nexit\n', 
 'interface ens20\n ip ospf area 1234\nexit\n', 
 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 
 'route-map ospf permit 100\nexit\n', 
 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 
 'route-map ospf permit 100\n set src 172.16.0.159\nexit\n', 
 'ip protocol ospf route-map ospf\n', 
 'line vty\nexit\n'] 
[1692|mgmtd] sending configuration
[1693|zebra] sending configuration
[1696|ospfd] sending configuration
[1699|bgpd] sending configuration
[1693|zebra] done 
[1707|watchfrr] sending configuration
[1709|staticd] sending configuration
[1710|bfdd] sending configuration
Waiting for children to finish applying config...
[1707|watchfrr] done 
[1696|ospfd] done 
MGMTD: No changes found to be committed!
[1692|mgmtd] done 
[1709|staticd] done 
[1699|bgpd] done 
[1710|bfdd] done 
TASK OK

fabric160:

2025-04-03 09:30:09,972  INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)" 
2025-04-03 09:30:09,972  INFO: Loading Config object from file /etc/frr/frr.conf 
2025-04-03 09:30:09,985  INFO: Loading Config object from vtysh show running 
2025-04-03 09:30:09,992  INFO: "frr defaults traditional" cannot be removed 
2025-04-03 09:30:09,998  INFO: Executed "ip forwarding" 
2025-04-03 09:30:10,004  INFO: Executed "ipv6 forwarding" 
2025-04-03 09:30:10,004  INFO: /var/run/frr/reload-5ATLT2.txt content 
['frr defaults datacenter\n', 
 'log syslog informational\n', 
 'router ospf\nexit\n', 
 'router ospf\n ospf router-id 172.16.0.160\nexit\n', 
 'interface dummy_1234\nexit\n', 
 'interface dummy_1234\n ip ospf area 1234\nexit\n', 
 'interface dummy_1234\n ip ospf passive\nexit\n', 
 'interface ens19\nexit\n', 
 'interface ens19\n ip ospf area 1234\nexit\n', 
 'interface ens20\nexit\n', 
 'interface ens20\n ip ospf area 1234\nexit\n', 
 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 
 'route-map ospf permit 100\nexit\n', 
 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 
 'route-map ospf permit 100\n set src 172.16.0.160\nexit\n', 
 'ip protocol ospf route-map ospf\n', 
 'line vty\nexit\n'] 
[1699|mgmtd] sending configuration
[1700|zebra] sending configuration
[1703|ospfd] sending configuration
[1706|bgpd] sending configuration
[1700|zebra] done 
[1714|watchfrr] sending configuration
[1716|staticd] sending configuration
[1717|bfdd] sending configuration
Waiting for children to finish applying config...
[1714|watchfrr] done 
[1716|staticd] done 
[1706|bgpd] done 
[1717|bfdd] done 
[1699|mgmtd] done 
[1703|ospfd] done 
2025-04-03 09:30:10,016  INFO: Loading Config object from vtysh show running 
2025-04-03 09:30:10,023  INFO: /var/run/frr/reload-NFS4UM.txt content 
['line vty\nexit\n', 
 'frr defaults datacenter\n', 
 'log syslog informational\n', 
 'router ospf\nexit\n', 
 'router ospf\n ospf router-id 172.16.0.160\nexit\n', 
 'interface dummy_1234\nexit\n', 
 'interface dummy_1234\n ip ospf area 1234\nexit\n', 
 'interface dummy_1234\n ip ospf passive\nexit\n', 
 'interface ens19\nexit\n', 
 'interface ens19\n ip ospf area 1234\nexit\n', 
 'interface ens20\nexit\n', 
 'interface ens20\n ip ospf area 1234\nexit\n', 
 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 
 'route-map ospf permit 100\nexit\n', 
 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 
 'route-map ospf permit 100\n set src 172.16.0.160\nexit\n', 
 'ip protocol ospf route-map ospf\n', 
 'line vty\nexit\n'] 
[1724|mgmtd] sending configuration
[1725|zebra] sending configuration
[1728|ospfd] sending configuration
[1731|bgpd] sending configuration
[1739|watchfrr] sending configuration
[1725|zebra] done 
[1741|staticd] sending configuration
[1742|bfdd] sending configuration
Waiting for children to finish applying config...
[1739|watchfrr] done 
[1741|staticd] done 
[1728|ospfd] done 
[1731|bgpd] done 
[1742|bfdd] done 
MGMTD: No changes found to be committed!
[1724|mgmtd] done 
TASK OK

fabric161:

2025-04-03 09:30:08,321  INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)" 
2025-04-03 09:30:08,321  INFO: Loading Config object from file /etc/frr/frr.conf 
2025-04-03 09:30:08,334  INFO: Loading Config object from vtysh show running 
2025-04-03 09:30:08,342  INFO: "frr defaults traditional" cannot be removed 
2025-04-03 09:30:08,348  INFO: Executed "ip forwarding" 
2025-04-03 09:30:08,354  INFO: Executed "ipv6 forwarding" 
2025-04-03 09:30:08,354  INFO: /var/run/frr/reload-PVFBCH.txt content 
['frr defaults datacenter\n', 
 'log syslog informational\n', 
 'router ospf\nexit\n', 
 'router ospf\n ospf router-id 172.16.0.161\nexit\n', 
 'interface dummy_1234\nexit\n', 
 'interface dummy_1234\n ip ospf area 1234\nexit\n', 
 'interface dummy_1234\n ip ospf passive\nexit\n', 
 'interface ens19\nexit\n', 
 'interface ens19\n ip ospf area 1234\nexit\n', 
 'interface ens20\nexit\n', 
 'interface ens20\n ip ospf area 1234\nexit\n', 
 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 
 'route-map ospf permit 100\nexit\n', 
 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 
 'route-map ospf permit 100\n set src 172.16.0.161\nexit\n', 
 'ip protocol ospf route-map ospf\n', 
 'line vty\nexit\n'] 
[1671|mgmtd] sending configuration
[1672|zebra] sending configuration
[1675|ospfd] sending configuration
[1678|bgpd] sending configuration
[1686|watchfrr] sending configuration
[1688|staticd] sending configuration
[1672|zebra] done 
[1689|bfdd] sending configuration
Waiting for children to finish applying config...
[1688|staticd] done 
[1686|watchfrr] done 
[1689|bfdd] done 
[1678|bgpd] done 
[1671|mgmtd] done 
[1675|ospfd] done 
2025-04-03 09:30:08,367  INFO: Loading Config object from vtysh show running 
2025-04-03 09:30:08,374  INFO: /var/run/frr/reload-SKOSWJ.txt content 
['line vty\nexit\n', 
 'frr defaults datacenter\n', 
 'log syslog informational\n', 
 'router ospf\nexit\n', 
 'router ospf\n ospf router-id 172.16.0.161\nexit\n', 
 'interface dummy_1234\nexit\n', 
 'interface dummy_1234\n ip ospf area 1234\nexit\n', 
 'interface dummy_1234\n ip ospf passive\nexit\n', 
 'interface ens19\nexit\n', 
 'interface ens19\n ip ospf area 1234\nexit\n', 
 'interface ens20\nexit\n', 
 'interface ens20\n ip ospf area 1234\nexit\n', 
 'access-list ospf_1234_ips permit 172.16.0.0/24\n', 
 'route-map ospf permit 100\nexit\n', 
 'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n', 
 'route-map ospf permit 100\n set src 172.16.0.161\nexit\n', 
 'ip protocol ospf route-map ospf\n', 
 'line vty\nexit\n'] 
[1696|mgmtd] sending configuration
[1697|zebra] sending configuration
[1700|ospfd] sending configuration
[1703|bgpd] sending configuration
[1697|zebra] done 
[1711|watchfrr] sending configuration
[1713|staticd] sending configuration
Waiting for children to finish applying config...
[1714|bfdd] sending configuration
[1711|watchfrr] done 
[1713|staticd] done 
[1714|bfdd] done 
[1700|ospfd] done 
[1703|bgpd] done 
MGMTD: No changes found to be committed!
[1696|mgmtd] done 
TASK OK

# cat /etc/pve/sdn/fabrics/ospf.cfg
fabric: 1234
	loopback_prefix 172.16.0.0/24

node: 1234_fabric159
	interface name=ens19,ip=172.31.0.159/24
	interface name=ens20,ip=172.31.2.159/24
	router_id 172.16.0.159

node: 1234_fabric160
	interface name=ens19,ip=172.31.0.160/24
	interface name=ens20,ip=172.31.1.160/24
	router_id 172.16.0.160

node: 1234_fabric161
	interface name=ens19,ip=172.31.1.161/24
	interface name=ens20,ip=172.31.2.161/24
	router_id 172.16.0.161

[2]

# cat /etc/pve/sdn/fabrics/openfabric.cfg
fabric: fabric
	hello_interval 2
	loopback_prefix 172.16.0.0/24

node: fabric_fabric159
	interface name=ens19,ip=172.31.0.159/24
	interface name=ens20,ip=172.31.2.159/24
	router_id 172.16.0.159

node: fabric_fabric160
	interface name=ens19,ip=172.31.0.160/24
	interface name=ens20,ip=172.31.1.160/24
	router_id 172.16.0.160

node: fabric_fabric161
	interface name=ens19,ip=172.31.1.161/24
	interface name=ens20,ip=172.31.2.161/24
	router_id 172.16.0.161

[3] https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel