From: Mira Limbeck <m.limbeck@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH docs v4] pvecm, network: add section on corosync over bonds
Date: Mon, 4 Aug 2025 16:09:29 +0200 [thread overview]
Message-ID: <1e99dced-9619-417a-93a8-603abfc3579e@proxmox.com> (raw)
In-Reply-To: <20250730085836.147270-1-f.weber@proxmox.com>
On 7/30/25 10:59, Friedrich Weber wrote:
> Testing has shown that running corosync (only) over a bond can be
> problematic in some failure scenarios and for certain bond modes. The
> documentation only discourages bonds for corosync because corosync can
> switch between available networks itself, but does not mention other
> caveats when using bonds for corosync.
>
> Hence, extend the documentation with recommendations and caveats
> regarding bonds for corosync.
>
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
>
> Notes:
> Aaron suggested we could expose the bond-lacp-rate in the GUI to
> make it easier to change the setting on the PVE side. I'd open a
> feature report for this.
>
> Changes since v3:
> - describe recommendations first, and further details for interested
> readers below. Consequently, rephrase failure scenario description
> (thx HD!)
>
> Changes since v2:
> - fix wording in the failure scenario description
> - explain that load-balancing bond modes are affected and why
> - clarify that the caveats apply whenver a bond is used for Corosync
> traffic (even if only as a redundant link)
>
> Changes since v1:
> - move to its own section under "Cluster Network"
> - reword remarks about bond-lacp-rate fast
> - reword remark under "Requirements"
>
> pve-network.adoc | 4 ++-
> pvecm.adoc | 68 +++++++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 67 insertions(+), 5 deletions(-)
>
> diff --git a/pve-network.adoc b/pve-network.adoc
> index 2dec882..b361f97 100644
> --- a/pve-network.adoc
> +++ b/pve-network.adoc
> @@ -495,7 +495,9 @@ use the active-backup mode.
>
> For the cluster network (Corosync) we recommend configuring it with multiple
> networks. Corosync does not need a bond for network redundancy as it can switch
> -between networks by itself, if one becomes unusable.
> +between networks by itself, if one becomes unusable. Some bond modes are known
> +to be problematic for Corosync, see
> +xref:pvecm_corosync_over_bonds[Corosync over Bonds].
>
> The following bond configuration can be used as distributed/shared
> storage network. The benefit would be that you get more speed and the
> diff --git a/pvecm.adoc b/pvecm.adoc
> index 312a26f..3af1a06 100644
> --- a/pvecm.adoc
> +++ b/pvecm.adoc
> @@ -89,10 +89,8 @@ NOTE: To ensure reliable Corosync redundancy, it is essential to have at least
> another link on a different physical network. This enables Corosync to keep the
> cluster communication alive should the dedicated network be down.
> +
> -NOTE: A single link backed by a bond is not enough to provide Corosync
> -redundancy. When a bonded interface fails and Corosync cannot fall back to
> -another link, it can lead to asymmetric communication in the cluster, which in
> -turn can lead to the cluster losing quorum.
> +NOTE: A single link backed by a bond can be problematic in certain failure
> +scenarios, see xref:pvecm_corosync_over_bonds[Corosync Over Bonds].
>
> * The root password of a cluster node is required for adding nodes.
>
> @@ -606,6 +604,68 @@ transport to `udp` or `udpu` in your xref:pvecm_edit_corosync_conf[corosync.conf
> but keep in mind that this will disable all cryptography and redundancy support.
> This is therefore not recommended.
>
> +[[pvecm_corosync_over_bonds]]
> +Corosync Over Bonds
> +~~~~~~~~~~~~~~~~~~~
> +
> +Recommendations
> +^^^^^^^^^^^^^^^
> +
> +We recommend at least one dedicated physical NIC for the primary Corosync link,
> +see xref:pvecm_cluster_requirements[Requirements].
> +xref:sysadmin_network_bond[Bonds] may be used as additional links for increased
> +redundancy. The following caveats apply *whenever a bond is used for Corosync
> +traffic*:
> +
> +* Bond mode *active-backup* may not provide the expected redundancy in certain
> + failure scenarios, see below for details.
> +
> +* We *advise against* using bond modes *balance-rr*, *balance-xor*,
> + *balance-tlb*, or *balance-alb* for Corosync traffic. They are known to be
> + problematic in certain failure scenarios, see below for details.
> +
> +* *IEEE 802.3ad (LACP)*: If LACP bonds are used for corosync traffic, we
> + strongly recommend setting `bond-lacp-rate fast` *on the Proxmox VE node and
> + the switch*! With the default setting `bond-lacp-rate slow`, this mode is
Looking at the rendered version, having the `bond-lacp-rate fast` and
then the bold sentence afterwards seems a bit much. Maybe we could limit
the bold parts to just `Proxmox VE` and `switch` here instead?
> + known to be problematic in certain failure scenarios, see below for details.
> +
> +Background
> +^^^^^^^^^^
> +
> +Using a xref:sysadmin_network_bond[bond] as a Corosync link can be problematic
> +in certain failure scenarios. Consider the failure scenario where one of the
> +bonded interfaces fails and stops transmitting packets, but its link state
> +stays up, and there are no other Corosync links available. In this scenario,
> +some bond modes may cause a state of asymmetric connectivity where cluster
> +nodes can only communicate with different subsets of other nodes. Affected are
> +bond modes that provide load balancing, as these modes may still try to send
> +out a subset of packets via the failed interface. In case of asymmetric
> +connectivity, Corosync may not be able to form a stable quorum in the cluster.
> +If this state persists and HA is enabled, even nodes whose bond does not have
> +any issues may fence themselves. In the worst case, the whole cluster may fence
> +itself.
> +
> +The bond mode *active-backup* will not cause asymmetric connectivity in the
Maybe we can make the `not` here bold as well, to better differentiate
its behavior from the other bond modes?
> +failure scenario described above. However, the bond with the interface failure
> +may not switch over to the backup link. The node may lose connection to the
> +cluster and, if HA is enabled, fence itself.
> +
> +Bond modes *balance-rr*, *balance-xor*, *balance_tlb*, or *balance-alb* may
> +cause asymmetric connectivity in the failure scenario above, which can lead to
> +unexpected fencing if HA is enabled.
> +
> +Bond mode *IEEE 802.3ad (LACP)* can cause asymmetric connectivity in the
> +failure scenario above, but it can recover from this state, as each side of the
> +bond (Proxmox VE node and switch) can stop using a bonded interface if it has
> +not received three LACPDUs in a row on it. However, with default settings,
> +LACPDUs are only sent every 30 seconds, yielding a failover time of 90 seconds.
> +This is too long, as nodes with HA resources will fence themselves already
> +after roughly one minute without a stable quorum. If LACP bonds are used for
> +corosync traffic, we recommend setting `bond-lacp-rate fast` on the Proxmox VE
> +node and the switch! Setting this option on one side requests the other side to
This should match the part above and be bold as well.
> +send an LACPDU every second. Setting this option on both sides can reduce the
> +failover time in the scenario above to 3 seconds and thus prevent fencing.
> +
> Separate Cluster Network
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
The changes look good to me, so consider this:
Reviewed-by: Mira Limbeck <m.limbeck@proxmox.com>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-08-04 14:08 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 8:58 Friedrich Weber
2025-08-04 14:09 ` Mira Limbeck [this message]
2025-08-04 14:53 ` [pve-devel] applied: " Aaron Lauterer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1e99dced-9619-417a-93a8-603abfc3579e@proxmox.com \
--to=m.limbeck@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox