From: Friedrich Weber <f.weber@proxmox.com>
To: "Michael Köppl" <m.koeppl@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH cluster 1/5] add functions to determine warning level for high token timeouts
Date: Fri, 17 Apr 2026 10:33:16 +0200 [thread overview]
Message-ID: <a0d4baa9-8e7e-4cc5-8117-c8cf0bfc74b0@proxmox.com> (raw)
In-Reply-To: <20260330144321.321072-2-m.koeppl@proxmox.com>
On 30/03/2026 16:46, Michael Köppl wrote:
> High token timeouts can lead to stability problems in clusters. To
> inform users about the timeout in their current setup (or expected
> timeouts when adding nodes) and give recommendations regarding the token
> coefficient setting, introduce function to calculate the timeout as well
> as determine the warning / recommendation levels.
>
> Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
> ---
> The timeouts are chosen according to Friedrich's description in [0].
>
> [0] https://bugzilla.proxmox.com/show_bug.cgi?id=7398
>
> src/PVE/Corosync.pm | 50 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/src/PVE/Corosync.pm b/src/PVE/Corosync.pm
> index aef0d31..41d4c6f 100644
> --- a/src/PVE/Corosync.pm
> +++ b/src/PVE/Corosync.pm
> @@ -534,4 +534,54 @@ sub resolve_hostname_like_corosync {
> return $match_ip_and_version->($resolved_ip);
> }
>
> +sub calculate_total_timeout {
I think "total timeout" is a little too vague, especially because it's
also user-facing. I don't think totem/corosync have a term for "sum of
token and consensus timeout", and "sum of token and consensus timeout"
is a too long. Perhaps something like "recovery timeout" -- though not
perfect because "Recovery" is a specific state in the totem state
machine. Maybe "membership convergence timeout" (though that's a bit
long and obscure)?
> + my ($totemcfg, $node_count) = @_;
> +
> + my $token_timeout = $totemcfg->{token} // 3000;
> + my $token_coefficient = $totemcfg->{token_coefficient} // 650;
> +
> + my $expected_token_timeout = $token_timeout;
> + if ($node_count > 2) {
> + $expected_token_timeout += ($node_count - 2) * $token_coefficient;
> + }
> +
> + my $expected_consensus_timeout = $totemcfg->{consensus} // $expected_token_timeout * 1.2;
> + return ($expected_token_timeout + $expected_consensus_timeout) / 1000.0;
> +}
> +
> +sub get_timeout_warning_level {
> + my ($total_timeout_secs) = @_;
> +
> + if ($total_timeout_secs > 50) {
> + return 'change-strongly-recommended';
I realize I'm the source of these numbers :) But since >50 is actually
pretty bad already, if we phrase it as "strongly recommended" we can
probably go for a slightly lower number:
- > 45: change-strongly-recommended
- > 40: change recommended
- > 30: optimize
> + } elsif ($total_timeout_secs > 40) {
> + return 'change-recommended';
> + } elsif ($total_timeout_secs > 30) {
> + return 'optimize';
> + }
> +
> + return undef;
> +}
> +
> +sub get_timeout_warning {
> + my ($total_timeout_secs) = @_;
> +
> + my $level = get_timeout_warning_level($total_timeout_secs);
> + return undef if !defined($level);
> +
> + my $level_msg;
> + if ($level eq 'change-strongly-recommended') {
> + $level_msg = "Changing the token coefficient is strongly recommended";
> + } elsif ($level eq 'change-recommended') {
> + $level_msg = "Changing the token coefficient is recommended";
> + } elsif ($level eq 'optimize') {
> + $level_msg = "Token coefficient can be optimized";
> + }
> +
> + return
> + "Sum of Corosync token and consensus timeout is ${total_timeout_secs}s. "
> + . "$level_msg. "
> + . "See https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_changing_the_token_coefficient for details.";
> +}
> +
> 1;
next prev parent reply other threads:[~2026-04-17 8:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-30 14:43 [PATCH cluster/manager 0/5] add warning messages for high token timeouts in clusters Michael Köppl
2026-03-30 14:43 ` [PATCH cluster 1/5] add functions to determine warning level for high token timeouts Michael Köppl
2026-04-17 8:33 ` Friedrich Weber [this message]
2026-04-17 8:33 ` Friedrich Weber
2026-03-30 14:43 ` [PATCH cluster 2/5] pvecm: warn users of high token timeouts when using nodes command Michael Köppl
2026-04-17 8:33 ` Friedrich Weber
2026-03-30 14:43 ` [PATCH cluster 3/5] api: add token timeout and warning level to cluster join info Michael Köppl
2026-04-17 8:33 ` Friedrich Weber
2026-03-30 14:43 ` [PATCH manager 4/5] ui: cluster info: move initialization of items to initComponent Michael Köppl
2026-04-17 8:33 ` Friedrich Weber
2026-03-30 14:43 ` [PATCH manager 5/5] ui: cluster info: warn users of high token timeout in join info Michael Köppl
2026-04-17 8:34 ` Friedrich Weber
2026-04-17 8:33 ` [PATCH cluster/manager 0/5] add warning messages for high token timeouts in clusters Friedrich Weber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0d4baa9-8e7e-4cc5-8117-c8cf0bfc74b0@proxmox.com \
--to=f.weber@proxmox.com \
--cc=m.koeppl@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox