From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 23E1C1FF16B for ; Tue, 29 Jul 2025 18:24:40 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D7FAE18491; Tue, 29 Jul 2025 18:26:02 +0200 (CEST) Message-ID: <3f12167e-892a-4945-a43d-cab642464e2d@proxmox.com> Date: Tue, 29 Jul 2025 18:25:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Friedrich Weber To: Hannes Duerr , Proxmox VE development discussion References: <20250725140312.250936-1-f.weber@proxmox.com> <86841dab-96d5-403d-a91f-a69d17b716a7@proxmox.com> Content-Language: en-US In-Reply-To: <86841dab-96d5-403d-a91f-a69d17b716a7@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1753806350208 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.010 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH docs v3] pvecm, network: add section on corosync over bonds X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Thanks for taking a look! Discussed with HD off-list: - having the justification for the recommendations in the docs is good - but since the justification somewhat complex, probably not good to have it directly in the beginning of the new section. - It might be better to have the recommendations first ("We recommend [...]" plus the list of bond modes), and the justification below that, so readers immediately see the important part, and can optionally still read about the justification. Hence I'll send v3 that rearranges the paragraphs. On 28/07/2025 18:16, Hannes Duerr wrote: > > On 7/25/25 4:03 PM, Friedrich Weber wrote: >> +Corosync Over Bonds >> +~~~~~~~~~~~~~~~~~~~ >> + >> +Using a xref:sysadmin_network_bond[bond] as a Corosync link can be problematic >> +in certain failure scenarios. If one of the bonded interfaces fails and stops >> +transmitting packets, but its link state stays up, and there are no other >> +Corosync links available > I thought it can also occur if the are still other Corosync links available? In my tests so far, it didn't. Even if the bond is the primary corosync link, as long as there is still a fallback link available, corosync seems to simply switch over to the fallback link. Here 172.16.0.0/24 is the LACP-bonded network, and I stopped traffic on one bonded NIC of node 2. corosync just says: On node 1: Jul 29 11:31:39 pve1 corosync[841]: [KNET ] link: host: 2 link: 0 is down Jul 29 11:31:39 pve1 corosync[841]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 1) Jul 29 11:31:39 pve1 corosync[841]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 1) On node 2: Jul 29 11:31:39 pve2 corosync[837]: [KNET ] link: host: 4 link: 0 is down Jul 29 11:31:39 pve2 corosync[837]: [KNET ] link: host: 1 link: 0 is down Jul 29 11:31:39 pve2 corosync[837]: [KNET ] host: host: 4 (passive) best link: 1 (pri: 1) Jul 29 11:31:39 pve2 corosync[837]: [KNET ] host: host: 1 (passive) best link: 1 (pri: 1) And nothing on the other nodes. corosync-cfgtool reports the following on the four nodes, note that only the 1<->2 link is not "connected": Local node ID 1, transport knet nodeid: 2 reachable LINK: 0 udp (172.16.0.101->172.16.0.102) enabled mtu: 1397 LINK: 1 udp (192.168.0.101->192.168.0.102) enabled connected mtu: 1397 nodeid: 3 reachable LINK: 0 udp (172.16.0.101->172.16.0.103) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.101->192.168.0.103) enabled connected mtu: 1397 nodeid: 4 reachable LINK: 0 udp (172.16.0.101->172.16.0.104) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.101->192.168.0.104) enabled connected mtu: 1397 Local node ID 2, transport knet nodeid: 1 reachable LINK: 0 udp (172.16.0.102->172.16.0.101) enabled mtu: 1397 LINK: 1 udp (192.168.0.102->192.168.0.101) enabled connected mtu: 1397 nodeid: 3 reachable LINK: 0 udp (172.16.0.102->172.16.0.103) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.102->192.168.0.103) enabled connected mtu: 1397 nodeid: 4 reachable LINK: 0 udp (172.16.0.102->172.16.0.104) enabled mtu: 1397 LINK: 1 udp (192.168.0.102->192.168.0.104) enabled connected mtu: 1397 Local node ID 3, transport knet nodeid: 1 reachable LINK: 0 udp (172.16.0.103->172.16.0.101) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.103->192.168.0.101) enabled connected mtu: 1397 nodeid: 2 reachable LINK: 0 udp (172.16.0.103->172.16.0.102) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.103->192.168.0.102) enabled connected mtu: 1397 nodeid: 4 reachable LINK: 0 udp (172.16.0.103->172.16.0.104) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.103->192.168.0.104) enabled connected mtu: 1397 Local node ID 4, transport knet nodeid: 1 reachable LINK: 0 udp (172.16.0.104->172.16.0.101) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.104->192.168.0.101) enabled connected mtu: 1397 nodeid: 2 reachable LINK: 0 udp (172.16.0.104->172.16.0.102) enabled mtu: 1397 LINK: 1 udp (192.168.0.104->192.168.0.102) enabled connected mtu: 1397 nodeid: 3 reachable LINK: 0 udp (172.16.0.104->172.16.0.103) enabled connected mtu: 1397 LINK: 1 udp (192.168.0.104->192.168.0.103) enabled connected mtu: 1397 With `bond-lacp-rate slow`, this switches over to "connected" for all four interfaces after ~90 seconds. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel