* Re: [PATCH pve-cluster 2/2] api: cluster config: create new clusters with lower token coefficient
2026-02-12 11:57 ` [PATCH pve-cluster 2/2] api: cluster config: create new clusters with lower " Friedrich Weber
@ 2026-02-16 16:00 ` Maximiliano Sandoval
2026-02-16 19:36 ` Thomas Lamprecht
2026-02-16 16:09 ` Maximiliano Sandoval
1 sibling, 1 reply; 10+ messages in thread
From: Maximiliano Sandoval @ 2026-02-16 16:00 UTC (permalink / raw)
To: Friedrich Weber; +Cc: pve-devel
Friedrich Weber <f.weber@proxmox.com> writes:
A comment below
> corosync makes use of several timeouts, in particular the token and
> consensus timeouts. The sum of these two timeouts yields the minimum
> time a cluster needs to reestablish a membership after a token loss
> due to a complete node failure.
>
> By default, corosync sets the timeouts based on the cluster size [1]:
>
> token timeout = token + (#nodes - 2) * token_coefficient
> consensus timeout = 1.2 * token timeout
>
> token defaults to 3000ms, token_coefficient defaults to 650ms.
>
> With more than ~30 nodes in the default settings, the sum of token and
> consensus timeouts gets close to or exceeds 50-60s. As a result, after
> a token loss due to a complete node failure in an HA cluster, the
> watchdog may fence nodes because it takes too long to reestablish a
> new membership and quorum.
>
> One way to avoid this is to lower the sum of the token and consensus
> timeouts. The consensus timeout is intentionally slightly larger than
> the token timeout [2], so the definition of the consensus timeout in
> terms of the token timeout should be preserved. Since it does make
> sense to define both timeouts in terms of the cluster size, the most
> viable option to lower the timeouts appears to be to adjust the
> token_coefficient. Experiments suggest that the default 650ms is
> overly conservative considering the low-latency network requirements
> postulated in the admin guide [3].
>
> Hence, create new clusters with a default token coefficient of 125ms.
> This keeps the sum of token and consensus timeouts well below 50s for
> realistic cluster sizes. Users who prefer a larger token coefficient
> can manually override the token coefficient when creating a cluster
> via pvecm create. The token coefficient can also be changed for an
> existing cluster, this will be documented separately.
>
> Note that knet_ping_interval and knet_ping_timeout are derived from
> the token timeout, hence, a lower token coefficient will result in
> more frequent kronosnet pings and shorter ping timeouts.
>
> With this change, newly created clusters will always set an explicit
> token_coefficient in their corosync.conf.
>
> [1] https://manpages.debian.org/trixie/corosync/corosync.conf.5.en.html#token_coefficient
> [2] https://github.com/corosync/corosync/commit/b3e19b29058eafc3e808ded7f4c2440c3f957392
> [3] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_network_requirements
>
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> src/PVE/API2/ClusterConfig.pm | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/src/PVE/API2/ClusterConfig.pm b/src/PVE/API2/ClusterConfig.pm
> index 1bc7bcf..8df257a 100644
> --- a/src/PVE/API2/ClusterConfig.pm
> +++ b/src/PVE/API2/ClusterConfig.pm
> @@ -111,12 +111,21 @@ __PACKAGE__->register_method({
> minimum => 1,
> optional => 1,
> },
> + 'token-coefficient' => {
> + type => 'integer',
> + description => "Token coefficient to set in the corosync configuration.",
> + default => 125,
> + minimum => 0,
>From man 5 corosync.conf's token_coefficient documentation: "This value
can be set to 0 resulting in effective removal of this feature.". If we
want to expose setting this to 0 I would document that it has a special
meaning and what does this entail. I would personally feel more
comfortable setting `minimum => 1` for now instead.
> + optional => 1,
> + },
> }),
> },
> returns => { type => 'string' },
> code => sub {
> my ($param) = @_;
>
> + $param->{'token-coefficient'} //= 125;
> +
> die "cluster config '$clusterconf' already exists\n" if -f $clusterconf;
>
> my $rpcenv = PVE::RPCEnvironment::get();
--
Maximiliano
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH pve-cluster 2/2] api: cluster config: create new clusters with lower token coefficient
2026-02-16 16:00 ` Maximiliano Sandoval
@ 2026-02-16 19:36 ` Thomas Lamprecht
2026-02-17 12:44 ` Maximiliano Sandoval
0 siblings, 1 reply; 10+ messages in thread
From: Thomas Lamprecht @ 2026-02-16 19:36 UTC (permalink / raw)
To: Maximiliano Sandoval, Friedrich Weber; +Cc: pve-devel
Am 16.02.26 um 17:00 schrieb Maximiliano Sandoval:
>> + 'token-coefficient' => {
>> + type => 'integer',
>> + description => "Token coefficient to set in the corosync configuration.",
>> + default => 125,
>> + minimum => 0,
>>From man 5 corosync.conf's token_coefficient documentation: "This value
> can be set to 0 resulting in effective removal of this feature.". If we
> want to expose setting this to 0 I would document that it has a special
> meaning and what does this entail. I would personally feel more
> comfortable setting `minimum => 1` for now instead.
At least a "see `man 5 corosync.conf` for details might be nice, adding some
extra hints here, like how it's roughly used and special values, could be
indeed nice too; some of that might be better off in the docs or the
verbose_descriptions property though.
But I'm not so sure about the actual value to the user of restricting this
here? I mean, if we ever would expose this in the UI in some advanced section
then one could show clear hints for such special/odd values and their potential
implications, for the CLI that's mostly the job of the docs and maybe an extra
informal "log" print, but forcing a user editing the corosync.conf manually in
case they want to try this, whyever that might be, seems to rather worsen UX not
improve it.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH pve-cluster 2/2] api: cluster config: create new clusters with lower token coefficient
2026-02-16 19:36 ` Thomas Lamprecht
@ 2026-02-17 12:44 ` Maximiliano Sandoval
2026-02-17 12:50 ` Friedrich Weber
0 siblings, 1 reply; 10+ messages in thread
From: Maximiliano Sandoval @ 2026-02-17 12:44 UTC (permalink / raw)
To: Thomas Lamprecht; +Cc: pve-devel
Thomas Lamprecht <t.lamprecht@proxmox.com> writes:
> Am 16.02.26 um 17:00 schrieb Maximiliano Sandoval:
>>> + 'token-coefficient' => {
>>> + type => 'integer',
>>> + description => "Token coefficient to set in the corosync configuration.",
>>> + default => 125,
>>> + minimum => 0,
>>>From man 5 corosync.conf's token_coefficient documentation: "This value
>> can be set to 0 resulting in effective removal of this feature.". If we
>> want to expose setting this to 0 I would document that it has a special
>> meaning and what does this entail. I would personally feel more
>> comfortable setting `minimum => 1` for now instead.
>
> At least a "see `man 5 corosync.conf` for details might be nice, adding some
> extra hints here, like how it's roughly used and special values, could be
> indeed nice too; some of that might be better off in the docs or the
> verbose_descriptions property though.
>
> But I'm not so sure about the actual value to the user of restricting this
> here? I mean, if we ever would expose this in the UI in some advanced section
> then one could show clear hints for such special/odd values and their potential
> implications, for the CLI that's mostly the job of the docs and maybe an extra
> informal "log" print, but forcing a user editing the corosync.conf manually in
> case they want to try this, whyever that might be, seems to rather worsen UX not
> improve it.
>From corosync.conf(5) I wrongly got the feeling that `0` had some
special-casing going on, but it actually does not. The docs just say in
a somewhat verbose fashion that multiplying with zero generally results
in zero.
We discussed this off-list a bit and my suggestion in my other reply,
namely:
"Coefficient used to determine Corosync's token timeout. See the
corosync.conf(5) manual for more details."
is OK.
--
Maximiliano
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH pve-cluster 2/2] api: cluster config: create new clusters with lower token coefficient
2026-02-17 12:44 ` Maximiliano Sandoval
@ 2026-02-17 12:50 ` Friedrich Weber
0 siblings, 0 replies; 10+ messages in thread
From: Friedrich Weber @ 2026-02-17 12:50 UTC (permalink / raw)
To: Maximiliano Sandoval, Thomas Lamprecht; +Cc: pve-devel
Thanks for the review!
On 17/02/2026 13:44, Maximiliano Sandoval wrote:
> Thomas Lamprecht <t.lamprecht@proxmox.com> writes:
>
>> Am 16.02.26 um 17:00 schrieb Maximiliano Sandoval:
>>>> + 'token-coefficient' => {
>>>> + type => 'integer',
>>>> + description => "Token coefficient to set in the corosync configuration.",
>>>> + default => 125,
>>>> + minimum => 0,
>>> >From man 5 corosync.conf's token_coefficient documentation: "This value
>>> can be set to 0 resulting in effective removal of this feature.". If we
>>> want to expose setting this to 0 I would document that it has a special
>>> meaning and what does this entail. I would personally feel more
>>> comfortable setting `minimum => 1` for now instead.
>>
>> At least a "see `man 5 corosync.conf` for details might be nice, adding some
>> extra hints here, like how it's roughly used and special values, could be
>> indeed nice too; some of that might be better off in the docs or the
>> verbose_descriptions property though.
>>
>> But I'm not so sure about the actual value to the user of restricting this
>> here? I mean, if we ever would expose this in the UI in some advanced section
>> then one could show clear hints for such special/odd values and their potential
>> implications, for the CLI that's mostly the job of the docs and maybe an extra
>> informal "log" print, but forcing a user editing the corosync.conf manually in
>> case they want to try this, whyever that might be, seems to rather worsen UX not
>> improve it.
>
> From corosync.conf(5) I wrongly got the feeling that `0` had some
> special-casing going on, but it actually does not. The docs just say in
> a somewhat verbose fashion that multiplying with zero generally results
> in zero.
>
> We discussed this off-list a bit and my suggestion in my other reply,
> namely:
>
> "Coefficient used to determine Corosync's token timeout. See the
> corosync.conf(5) manual for more details."
>
> is OK.
Yes, I agree my original description was not that fitting, I can send a
v2 with this updated description.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH pve-cluster 2/2] api: cluster config: create new clusters with lower token coefficient
2026-02-12 11:57 ` [PATCH pve-cluster 2/2] api: cluster config: create new clusters with lower " Friedrich Weber
2026-02-16 16:00 ` Maximiliano Sandoval
@ 2026-02-16 16:09 ` Maximiliano Sandoval
1 sibling, 0 replies; 10+ messages in thread
From: Maximiliano Sandoval @ 2026-02-16 16:09 UTC (permalink / raw)
To: Friedrich Weber; +Cc: pve-devel
Friedrich Weber <f.weber@proxmox.com> writes:
> corosync makes use of several timeouts, in particular the token and
> consensus timeouts. The sum of these two timeouts yields the minimum
> time a cluster needs to reestablish a membership after a token loss
> due to a complete node failure.
>
> By default, corosync sets the timeouts based on the cluster size [1]:
>
> token timeout = token + (#nodes - 2) * token_coefficient
> consensus timeout = 1.2 * token timeout
>
> token defaults to 3000ms, token_coefficient defaults to 650ms.
>
> With more than ~30 nodes in the default settings, the sum of token and
> consensus timeouts gets close to or exceeds 50-60s. As a result, after
> a token loss due to a complete node failure in an HA cluster, the
> watchdog may fence nodes because it takes too long to reestablish a
> new membership and quorum.
>
> One way to avoid this is to lower the sum of the token and consensus
> timeouts. The consensus timeout is intentionally slightly larger than
> the token timeout [2], so the definition of the consensus timeout in
> terms of the token timeout should be preserved. Since it does make
> sense to define both timeouts in terms of the cluster size, the most
> viable option to lower the timeouts appears to be to adjust the
> token_coefficient. Experiments suggest that the default 650ms is
> overly conservative considering the low-latency network requirements
> postulated in the admin guide [3].
>
> Hence, create new clusters with a default token coefficient of 125ms.
> This keeps the sum of token and consensus timeouts well below 50s for
> realistic cluster sizes. Users who prefer a larger token coefficient
> can manually override the token coefficient when creating a cluster
> via pvecm create. The token coefficient can also be changed for an
> existing cluster, this will be documented separately.
>
> Note that knet_ping_interval and knet_ping_timeout are derived from
> the token timeout, hence, a lower token coefficient will result in
> more frequent kronosnet pings and shorter ping timeouts.
>
> With this change, newly created clusters will always set an explicit
> token_coefficient in their corosync.conf.
>
> [1] https://manpages.debian.org/trixie/corosync/corosync.conf.5.en.html#token_coefficient
> [2] https://github.com/corosync/corosync/commit/b3e19b29058eafc3e808ded7f4c2440c3f957392
> [3] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_network_requirements
>
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> src/PVE/API2/ClusterConfig.pm | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/src/PVE/API2/ClusterConfig.pm b/src/PVE/API2/ClusterConfig.pm
> index 1bc7bcf..8df257a 100644
> --- a/src/PVE/API2/ClusterConfig.pm
> +++ b/src/PVE/API2/ClusterConfig.pm
> @@ -111,12 +111,21 @@ __PACKAGE__->register_method({
> minimum => 1,
> optional => 1,
> },
> + 'token-coefficient' => {
> + type => 'integer',
> + description => "Token coefficient to set in the corosync configuration.",
This description does not help understanding what it does, no more than
its name at least. It would perhaps be preferable to say something along
the lines of:
"Coefficient used to determine Corosync's token timeout. See the
corosync.conf(5) manual for more details."
> + default => 125,
> + minimum => 0,
> + optional => 1,
> + },
> }),
> },
> returns => { type => 'string' },
> code => sub {
> my ($param) = @_;
>
> + $param->{'token-coefficient'} //= 125;
> +
> die "cluster config '$clusterconf' already exists\n" if -f $clusterconf;
>
> my $rpcenv = PVE::RPCEnvironment::get();
--
Maximiliano
^ permalink raw reply [flat|nested] 10+ messages in thread