* [pve-devel] [PATCH docs v2] pvecm, network: add section on corosync over bonds
@ 2025-07-25 11:39 Friedrich Weber
2025-07-25 11:50 ` Friedrich Weber
2025-07-25 14:04 ` [pve-devel] superseded: " Friedrich Weber
0 siblings, 2 replies; 5+ messages in thread
From: Friedrich Weber @ 2025-07-25 11:39 UTC (permalink / raw)
To: pve-devel
Testing has shown that running corosync (only) over a bond can be
problematic in some failure scenarios and for certain bond modes. The
documentation only discourages bonds for corosync because corosync can
switch between available networks itself, but does not mention other
caveats when using bonds for corosync.
Hence, extend the documentation with recommendations and caveats
regarding bonds for corosync.
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
Notes:
Aaron suggested we could expose the bond-lacp-rate in the GUI to
make it easier to change the setting on the PVE side. I'd open a
feature report for this.
Changes since v1:
- move to its own section under "Cluster Network"
- reword remarks about bond-lacp-rate fast
- reword remark under "Requirements"
pve-network.adoc | 4 +++-
pvecm.adoc | 45 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 44 insertions(+), 5 deletions(-)
diff --git a/pve-network.adoc b/pve-network.adoc
index 2dec882..b361f97 100644
--- a/pve-network.adoc
+++ b/pve-network.adoc
@@ -495,7 +495,9 @@ use the active-backup mode.
For the cluster network (Corosync) we recommend configuring it with multiple
networks. Corosync does not need a bond for network redundancy as it can switch
-between networks by itself, if one becomes unusable.
+between networks by itself, if one becomes unusable. Some bond modes are known
+to be problematic for Corosync, see
+xref:pvecm_corosync_over_bonds[Corosync over Bonds].
The following bond configuration can be used as distributed/shared
storage network. The benefit would be that you get more speed and the
diff --git a/pvecm.adoc b/pvecm.adoc
index 312a26f..23fcebb 100644
--- a/pvecm.adoc
+++ b/pvecm.adoc
@@ -89,10 +89,8 @@ NOTE: To ensure reliable Corosync redundancy, it is essential to have at least
another link on a different physical network. This enables Corosync to keep the
cluster communication alive should the dedicated network be down.
+
-NOTE: A single link backed by a bond is not enough to provide Corosync
-redundancy. When a bonded interface fails and Corosync cannot fall back to
-another link, it can lead to asymmetric communication in the cluster, which in
-turn can lead to the cluster losing quorum.
+NOTE: A single link backed by a bond can be problematic in certain failure
+scenarios, see xref:pvecm_corosync_over_bonds[Corosync Over Bonds].
* The root password of a cluster node is required for adding nodes.
@@ -606,6 +604,45 @@ transport to `udp` or `udpu` in your xref:pvecm_edit_corosync_conf[corosync.conf
but keep in mind that this will disable all cryptography and redundancy support.
This is therefore not recommended.
+[[pvecm_corosync_over_bonds]]
+Corosync Over Bonds
+~~~~~~~~~~~~~~~~~~~
+
+Using a xref:sysadmin_network_bond[bond] as the only Corosync link can be
+problematic in certain failure scenarios. If one of the bonded interfaces fails
+and stops transmitting packets, but its link state stays up, some bond modes
+may cause a state of asymmetric connectivity where cluster nodes can only
+communicate with different subsets of other nodes. In case of asymmetric
+connectivity, Corosync may not be able to form a stable quorum in the cluster.
+If this state persists and HA is enabled, nodes may fence themselves, even if
+their respective bond is still fully functioning. In the worst case, the whole
+cluster may fence itself.
+
+For this reason, our recommendations are as follows.
+
+* We recommend a dedicated physical NIC for the primary Corosync link. Bonds
+ can be used as additional links for increased redundancy.
+
+* We *advise against* using bond modes *balance-rr*, *balance-xor*,
+ *balance-tlb*, or *balance-alb* for Corosync traffic. As explained above,
+ they can cause asymmetric connectivity in certain failure scenarios.
+
+* *IEEE 802.3ad (LACP)*: This bond mode can cause asymmetric connectivity in
+ certain failure scenarios as explained above, but it can recover from this
+ state, as each side can stop using a bonded interface if it has not received
+ three LACPDUs in a row. However, with default settings, LACPDUs are only sent
+ every 30 seconds, yielding a failover time of 90 seconds. This is too long,
+ as nodes with HA resources will fence themselves already after roughly one
+ minute without a stable quorum. If LACP bonds are used for corosync traffic,
+ we recommend setting `bond-lacp-rate fast` *on the Proxmox VE node and the
+ switch*! Setting this option on one side requests the other side to send an
+ LACPDU every second. Setting this option on both sides can reduce the
+ failover time in the scenario above to 3 seconds and thus prevent fencing.
+
+* Bond mode *active-backup* will not cause asymmetric connectivity in the
+ failure scenario described above, but the affected node may lose connection
+ to the cluster and, if HA is enabled, fence itself.
+
Separate Cluster Network
~~~~~~~~~~~~~~~~~~~~~~~~
--
2.47.2
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH docs v2] pvecm, network: add section on corosync over bonds
2025-07-25 11:39 [pve-devel] [PATCH docs v2] pvecm, network: add section on corosync over bonds Friedrich Weber
@ 2025-07-25 11:50 ` Friedrich Weber
2025-07-25 12:22 ` Mira Limbeck
2025-07-25 14:04 ` [pve-devel] superseded: " Friedrich Weber
1 sibling, 1 reply; 5+ messages in thread
From: Friedrich Weber @ 2025-07-25 11:50 UTC (permalink / raw)
To: pve-devel
On 25/07/2025 13:39, Friedrich Weber wrote:
> [...]
> +Corosync Over Bonds
> +~~~~~~~~~~~~~~~~~~~
> +
> +Using a xref:sysadmin_network_bond[bond] as the only Corosync link can be
> +problematic in certain failure scenarios. If one of the bonded interfaces fails
> +and stops transmitting packets, but its link state stays up, some bond modes
> +may cause a state of asymmetric connectivity where cluster nodes can only
> +communicate with different subsets of other nodes. In case of asymmetric
> +connectivity, Corosync may not be able to form a stable quorum in the cluster.
> +If this state persists and HA is enabled, nodes may fence themselves, even if
> +their respective bond is still fully functioning. In the worst case, the whole
> +cluster may fence itself.
> +
> +For this reason, our recommendations are as follows.
> +
> +* We recommend a dedicated physical NIC for the primary Corosync link. Bonds
> + can be used as additional links for increased redundancy.
These recommendations are still not 100% clear: Are we fine with a setup
with
- link 0: dedicated corosync link
- link 1: corosync link over a bond with a problematic mode (such as
balance-rr or LACP with bond-lacp-rate slow)
?
In my tests, as long as the dedicated link 0 is completely online, it
doesn't matter if a bond runs into the failure scenario above (one of
the bonded NICs stops transmitting packets), corosync will just continue
using link 0. But as soon as link 0 goes down and the failure scenario
happens, the whole-cluster fence may happen. So should our
recommendation be the relatively strict "if you put corosync on a bond
(even if it is only a redundant link), use only active-backup or
LACP+bond-lacp-rate fast"?
> +
> +* We *advise against* using bond modes *balance-rr*, *balance-xor*,
> + *balance-tlb*, or *balance-alb* for Corosync traffic. As explained above,
> + they can cause asymmetric connectivity in certain failure scenarios.
> +
> +* *IEEE 802.3ad (LACP)*: This bond mode can cause asymmetric connectivity in
> + certain failure scenarios as explained above, but it can recover from this
> + state, as each side can stop using a bonded interface if it has not received
> + three LACPDUs in a row. However, with default settings, LACPDUs are only sent
> + every 30 seconds, yielding a failover time of 90 seconds. This is too long,
> + as nodes with HA resources will fence themselves already after roughly one
> + minute without a stable quorum. If LACP bonds are used for corosync traffic,
> + we recommend setting `bond-lacp-rate fast` *on the Proxmox VE node and the
> + switch*! Setting this option on one side requests the other side to send an
> + LACPDU every second. Setting this option on both sides can reduce the
> + failover time in the scenario above to 3 seconds and thus prevent fencing.
> +
> +* Bond mode *active-backup* will not cause asymmetric connectivity in the
> + failure scenario described above, but the affected node may lose connection
> + to the cluster and, if HA is enabled, fence itself.
> +
> Separate Cluster Network
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH docs v2] pvecm, network: add section on corosync over bonds
2025-07-25 11:50 ` Friedrich Weber
@ 2025-07-25 12:22 ` Mira Limbeck
2025-07-25 14:05 ` Friedrich Weber
0 siblings, 1 reply; 5+ messages in thread
From: Mira Limbeck @ 2025-07-25 12:22 UTC (permalink / raw)
To: pve-devel
On 7/25/25 13:50, Friedrich Weber wrote:
> On 25/07/2025 13:39, Friedrich Weber wrote:
>> [...]
>> +Corosync Over Bonds
>> +~~~~~~~~~~~~~~~~~~~
>> +
>> +Using a xref:sysadmin_network_bond[bond] as the only Corosync link can be
>> +problematic in certain failure scenarios. If one of the bonded interfaces fails
>> +and stops transmitting packets, but its link state stays up, some bond modes
>> +may cause a state of asymmetric connectivity where cluster nodes can only
>> +communicate with different subsets of other nodes. In case of asymmetric
>> +connectivity, Corosync may not be able to form a stable quorum in the cluster.
>> +If this state persists and HA is enabled, nodes may fence themselves, even if
>> +their respective bond is still fully functioning. In the worst case, the whole
>> +cluster may fence itself.
>> +
>> +For this reason, our recommendations are as follows.
>> +
>> +* We recommend a dedicated physical NIC for the primary Corosync link. Bonds
>> + can be used as additional links for increased redundancy.
>
> These recommendations are still not 100% clear: Are we fine with a setup
> with
>
> - link 0: dedicated corosync link
> - link 1: corosync link over a bond with a problematic mode (such as
> balance-rr or LACP with bond-lacp-rate slow)
>
> ?
> In my tests, as long as the dedicated link 0 is completely online, it
> doesn't matter if a bond runs into the failure scenario above (one of
> the bonded NICs stops transmitting packets), corosync will just continue
> using link 0. But as soon as link 0 goes down and the failure scenario
> happens, the whole-cluster fence may happen. So should our
> recommendation be the relatively strict "if you put corosync on a bond
> (even if it is only a redundant link), use only active-backup or
> LACP+bond-lacp-rate fast"?
I'd say yes, the recommendation should be either dedicated link
directly, or a bond as redundant link with active-backup or
LACP+lacp-rate fast only.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] superseded: [PATCH docs v2] pvecm, network: add section on corosync over bonds
2025-07-25 11:39 [pve-devel] [PATCH docs v2] pvecm, network: add section on corosync over bonds Friedrich Weber
2025-07-25 11:50 ` Friedrich Weber
@ 2025-07-25 14:04 ` Friedrich Weber
1 sibling, 0 replies; 5+ messages in thread
From: Friedrich Weber @ 2025-07-25 14:04 UTC (permalink / raw)
To: pve-devel
Superseded by:
https://lore.proxmox.com/pve-devel/20250725140312.250936-1-f.weber@proxmox.com/T/
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [PATCH docs v2] pvecm, network: add section on corosync over bonds
2025-07-25 12:22 ` Mira Limbeck
@ 2025-07-25 14:05 ` Friedrich Weber
0 siblings, 0 replies; 5+ messages in thread
From: Friedrich Weber @ 2025-07-25 14:05 UTC (permalink / raw)
To: Proxmox VE development discussion, Mira Limbeck
On 25/07/2025 14:23, Mira Limbeck wrote:
>
>
> On 7/25/25 13:50, Friedrich Weber wrote:
>> On 25/07/2025 13:39, Friedrich Weber wrote:
>>> [...]
>>> +Corosync Over Bonds
>>> +~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Using a xref:sysadmin_network_bond[bond] as the only Corosync link can be
>>> +problematic in certain failure scenarios. If one of the bonded interfaces fails
>>> +and stops transmitting packets, but its link state stays up, some bond modes
>>> +may cause a state of asymmetric connectivity where cluster nodes can only
>>> +communicate with different subsets of other nodes. In case of asymmetric
>>> +connectivity, Corosync may not be able to form a stable quorum in the cluster.
>>> +If this state persists and HA is enabled, nodes may fence themselves, even if
>>> +their respective bond is still fully functioning. In the worst case, the whole
>>> +cluster may fence itself.
>>> +
>>> +For this reason, our recommendations are as follows.
>>> +
>>> +* We recommend a dedicated physical NIC for the primary Corosync link. Bonds
>>> + can be used as additional links for increased redundancy.
>>
>> These recommendations are still not 100% clear: Are we fine with a setup
>> with
>>
>> - link 0: dedicated corosync link
>> - link 1: corosync link over a bond with a problematic mode (such as
>> balance-rr or LACP with bond-lacp-rate slow)
>>
>> ?
>> In my tests, as long as the dedicated link 0 is completely online, it
>> doesn't matter if a bond runs into the failure scenario above (one of
>> the bonded NICs stops transmitting packets), corosync will just continue
>> using link 0. But as soon as link 0 goes down and the failure scenario
>> happens, the whole-cluster fence may happen. So should our
>> recommendation be the relatively strict "if you put corosync on a bond
>> (even if it is only a redundant link), use only active-backup or
>> LACP+bond-lacp-rate fast"?
>
> I'd say yes, the recommendation should be either dedicated link
> directly, or a bond as redundant link with active-backup or
> LACP+lacp-rate fast only.
Thanks for the input. I've rephrased the section (and did some other
adjustments) to make it clear that the caveats apply whenever a bond is
used for corosync traffic.
v3:
https://lore.proxmox.com/pve-devel/20250725140312.250936-1-f.weber@proxmox.com/T/
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-25 14:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-25 11:39 [pve-devel] [PATCH docs v2] pvecm, network: add section on corosync over bonds Friedrich Weber
2025-07-25 11:50 ` Friedrich Weber
2025-07-25 12:22 ` Mira Limbeck
2025-07-25 14:05 ` Friedrich Weber
2025-07-25 14:04 ` [pve-devel] superseded: " Friedrich Weber
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.