From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 200321FF141
	for <inbox@lore.proxmox.com>; Tue, 05 May 2026 16:50:32 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 8B1967445;
	Tue,  5 May 2026 16:50:29 +0200 (CEST)
From: Friedrich Weber <f.weber@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH docs] pvecm: elaborate when and how to change the token
 coefficient
Date: Tue,  5 May 2026 16:49:25 +0200
Message-ID: <20260505144946.234522-1-f.weber@proxmox.com>
X-Mailer: git-send-email 2.47.3
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1777992485053
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.013 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
	URIBL_BLOCKED           0.001 ADMINISTRATOR NOTICE: The query to URIBL was
 blocked.  See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
 for more information. [proxmox.com]
Message-ID-Hash: RDPP2IJG4AG7NZSME6GKHWJBSNO3U3YP
X-Message-ID-Hash: RDPP2IJG4AG7NZSME6GKHWJBSNO3U3YP
X-MailFrom: f.weber@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

Since pve-cluster 9.1.0, more specifically commit a7b1c76 ("corosync
config: allow to override token coefficient and lower default"), new
corosync clusters are created with a token_coefficient of 125ms (the
default being 650ms), primarily to avoid issues with larger clusters
in combination with HA.

Already existing clusters may need manual adjustment of the
token_coefficient. Hence, expand the "Changing the Token Coefficient"
section and provide instructions when and how to change the token
coefficient.

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---

Notes:
    Based on the pve-docs patches from Michael's patch series [1],
    it also refers to warnings added by Michael's patch series.
    
    Thanks @Michael for feedback on an initial draft of this!
    
    [1] https://lore.proxmox.com/all/20260427170548.307698-1-m.koeppl@proxmox.com/

 pvecm.adoc | 95 +++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 79 insertions(+), 16 deletions(-)

diff --git a/pvecm.adoc b/pvecm.adoc
index 3d65265..f3625c5 100644
--- a/pvecm.adoc
+++ b/pvecm.adoc
@@ -1386,23 +1386,86 @@ Changing the Token Coefficient
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The token coefficient can be configured in the `totem` section in
+`/etc/pve/corosync.conf`. If the token coefficient is not explicitly set, it
+defaults to 650 milliseconds. New clusters are created with a lower token
+coefficient of 125 milliseconds that is explicitly set in
 `/etc/pve/corosync.conf`. corosync uses the token coefficient to calculate
-several timeouts in relation to the cluster size.footnote:[
-`token_coefficient` in the corosync manual page
-https://manpages.debian.org/stable/corosync/corosync.conf.5.en.html#token_coefficient]
-
-If the token coefficient is not explicitly set, it defaults to 650 milliseconds.
-New clusters are created with a lower token coefficient of 125 milliseconds that
-is explicitly set in `/etc/pve/corosync.conf`.
-
-You can change the token coefficient of an existing cluster by
-xref:pvecm_edit_corosync_conf[editing corosync.conf]. Corosync will then
-automatically adopt the new value for the cluster.
-
-Cluster commands may display a warning if the sum of the Corosync token and
-consensus timeouts is considered too high (e.g., "Changing the token coefficient
-is recommended"). To resolve this warning, it is recommended to lower the token
-coefficient.
+several timeouts in relation to the cluster size footnote:[ `token_coefficient`
+in the corosync manual page
+https://manpages.debian.org/stable/corosync/corosync.conf.5.en.html#token_coefficient],
+most importantly the token timeout and consensus timeout.
+
+Corosync implements a token-passing protocol. The token timeout specifies how
+long a node waits for the token until it declares the token to be lost
+footnote:[ `token` in the corosync manual page
+https://manpages.debian.org/stable/corosync/corosync.conf.5.en.html#token]. The
+consensus timeout footnote:[ `consensus` in the corosync manual page
+https://manpages.debian.org/stable/corosync/corosync.conf.5.en.html#consensus]
+specifies the time nodes wait for a consensus on a new cluster membership. The
+sum of token and consensus timeouts defines the minimum time needed to
+reestablish a new cluster membership after a node goes offline.
+
+Keeping the sum of token and consensus timeouts below 30 seconds reduces the
+time needed for restablishing a new cluster membership after a node failure.
+When HA is enabled, it is especially important that this time stays below 45
+seconds to ensure that a new cluster membership is formed before the
+xref:ha_manager_crm[watchdog timeout] of 60 seconds expires, which would
+trigger a node fence. The recommended mechanism for lowering the token and
+consensus timeouts is lowering the token coefficient as explained below.
+
+You can check the current token and consensus timeouts (in milliseconds) with
+the following command:
+
+[source,bash]
+----
+corosync-cmapctl | grep -Ew 'runtime.config.totem.token|runtime.config.totem.consensus'
+----
+For example:
+[source,bash]
+----
+runtime.config.totem.consensus (u32) = 5940
+runtime.config.totem.token (u32) = 4950
+----
+
+The sum of these two values (10.89 seconds in the example) defines the minimum
+time needed to reestablish a new cluster membership after a node goes offline.
+Lowering the token coefficient is
+
+* strongly recommended if this value exceeds 45 seconds,
+* recommended if it exceeds 40 seconds,
+* a suggested optimization if it exceeds 30 seconds.
+
+Cluster commands like `pvecm status` will display corresponding warnings based
+on the sum of the token and consensus timeouts. When joining a new node into
+the cluster, the GUI will display a warning if adding the node would increase
+the timeouts above any of the recommended thresholds.
+
+To lower the token coefficient, first make sure your setup adheres
+xref:pvecm_cluster_network_requirements[to the network requirements]. Then:
+
+* If the `token_coefficient` is not yet set explicitly to 125 milliseconds in
+  corosync.conf, xref:pvecm_edit_corosync_conf[edit corosync.conf] and add
+  `token_coefficient: 125` to the `totem` section. Do not forget to
+  xref:pvecm_edit_corosync_conf[increase the `config_version`].
+* If the `token_coefficient` is already set explicitly to 125 milliseconds,
+  select a `token_coefficient` with which the token and consensus timeouts sum
+  up to at most 45 seconds. By default, corosync computes the token and
+  consensus timeouts (in milliseconds) according to the following formula:
++
+----
+token = 3000 + (number_of_nodes - 2) * token_coefficient
+consensus = 1.2 * token
+----
++
+xref:pvecm_edit_corosync_conf[Edit corosync.conf] and add a corresponding
+`token_coefficient` option to the `totem` section. Do not forget to
+xref:pvecm_edit_corosync_conf[increase the `config_version`]. Test your setup
+thoroughly for stability!
+
+After adjusting the `token_coefficient` in `corosync.conf`, recent corosync
+versions will automatically adopt the new value for the cluster. For corosync
+versions below `3.1.10-pve1`, corosync needs to be restarted on all nodes for
+the change to take effect.
 
 Troubleshooting
 ~~~~~~~~~~~~~~~
-- 
2.47.3