From: Friedrich Weber <f.weber@proxmox.com>
To: Hannes Duerr <h.duerr@proxmox.com>,
Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH docs v3] pvecm, network: add section on corosync over bonds
Date: Tue, 29 Jul 2025 18:25:59 +0200 [thread overview]
Message-ID: <3f12167e-892a-4945-a43d-cab642464e2d@proxmox.com> (raw)
In-Reply-To: <86841dab-96d5-403d-a91f-a69d17b716a7@proxmox.com>
Thanks for taking a look!
Discussed with HD off-list:
- having the justification for the recommendations in the docs is good
- but since the justification somewhat complex, probably not good to
have it directly in the beginning of the new section.
- It might be better to have the recommendations first ("We recommend
[...]" plus the list of bond modes), and the justification below that,
so readers immediately see the important part, and can optionally still
read about the justification.
Hence I'll send v3 that rearranges the paragraphs.
On 28/07/2025 18:16, Hannes Duerr wrote:
>
> On 7/25/25 4:03 PM, Friedrich Weber wrote:
>> +Corosync Over Bonds
>> +~~~~~~~~~~~~~~~~~~~
>> +
>> +Using a xref:sysadmin_network_bond[bond] as a Corosync link can be problematic
>> +in certain failure scenarios. If one of the bonded interfaces fails and stops
>> +transmitting packets, but its link state stays up, and there are no other
>> +Corosync links available
> I thought it can also occur if the are still other Corosync links available?
In my tests so far, it didn't. Even if the bond is the primary corosync
link, as long as there is still a fallback link available, corosync
seems to simply switch over to the fallback link.
Here 172.16.0.0/24 is the LACP-bonded network, and I stopped traffic on
one bonded NIC of node 2. corosync just says:
On node 1:
Jul 29 11:31:39 pve1 corosync[841]: [KNET ] link: host: 2 link: 0 is down
Jul 29 11:31:39 pve1 corosync[841]: [KNET ] host: host: 2 (passive)
best link: 1 (pri: 1)
Jul 29 11:31:39 pve1 corosync[841]: [KNET ] host: host: 2 (passive)
best link: 1 (pri: 1)
On node 2:
Jul 29 11:31:39 pve2 corosync[837]: [KNET ] link: host: 4 link: 0 is down
Jul 29 11:31:39 pve2 corosync[837]: [KNET ] link: host: 1 link: 0 is down
Jul 29 11:31:39 pve2 corosync[837]: [KNET ] host: host: 4 (passive)
best link: 1 (pri: 1)
Jul 29 11:31:39 pve2 corosync[837]: [KNET ] host: host: 1 (passive)
best link: 1 (pri: 1)
And nothing on the other nodes.
corosync-cfgtool reports the following on the four nodes, note that only
the 1<->2 link is not "connected":
Local node ID 1, transport knet
nodeid: 2 reachable
LINK: 0 udp (172.16.0.101->172.16.0.102) enabled mtu: 1397
LINK: 1 udp (192.168.0.101->192.168.0.102) enabled connected mtu: 1397
nodeid: 3 reachable
LINK: 0 udp (172.16.0.101->172.16.0.103) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.101->192.168.0.103) enabled connected mtu: 1397
nodeid: 4 reachable
LINK: 0 udp (172.16.0.101->172.16.0.104) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.101->192.168.0.104) enabled connected mtu: 1397
Local node ID 2, transport knet
nodeid: 1 reachable
LINK: 0 udp (172.16.0.102->172.16.0.101) enabled mtu: 1397
LINK: 1 udp (192.168.0.102->192.168.0.101) enabled connected mtu: 1397
nodeid: 3 reachable
LINK: 0 udp (172.16.0.102->172.16.0.103) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.102->192.168.0.103) enabled connected mtu: 1397
nodeid: 4 reachable
LINK: 0 udp (172.16.0.102->172.16.0.104) enabled mtu: 1397
LINK: 1 udp (192.168.0.102->192.168.0.104) enabled connected mtu: 1397
Local node ID 3, transport knet
nodeid: 1 reachable
LINK: 0 udp (172.16.0.103->172.16.0.101) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.103->192.168.0.101) enabled connected mtu: 1397
nodeid: 2 reachable
LINK: 0 udp (172.16.0.103->172.16.0.102) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.103->192.168.0.102) enabled connected mtu: 1397
nodeid: 4 reachable
LINK: 0 udp (172.16.0.103->172.16.0.104) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.103->192.168.0.104) enabled connected mtu: 1397
Local node ID 4, transport knet
nodeid: 1 reachable
LINK: 0 udp (172.16.0.104->172.16.0.101) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.104->192.168.0.101) enabled connected mtu: 1397
nodeid: 2 reachable
LINK: 0 udp (172.16.0.104->172.16.0.102) enabled mtu: 1397
LINK: 1 udp (192.168.0.104->192.168.0.102) enabled connected mtu: 1397
nodeid: 3 reachable
LINK: 0 udp (172.16.0.104->172.16.0.103) enabled connected mtu: 1397
LINK: 1 udp (192.168.0.104->192.168.0.103) enabled connected mtu: 1397
With `bond-lacp-rate slow`, this switches over to "connected" for all four
interfaces after ~90 seconds.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-07-29 16:24 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-25 14:01 Friedrich Weber
2025-07-28 16:16 ` Hannes Duerr
2025-07-29 16:25 ` Friedrich Weber [this message]
2025-07-30 8:59 ` [pve-devel] superseded: " Friedrich Weber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3f12167e-892a-4945-a43d-cab642464e2d@proxmox.com \
--to=f.weber@proxmox.com \
--cc=h.duerr@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.