From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 96B441FF13E for ; Fri, 17 Apr 2026 10:33:43 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6DCAB19984; Fri, 17 Apr 2026 10:33:43 +0200 (CEST) Message-ID: Date: Fri, 17 Apr 2026 10:33:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH cluster 1/5] add functions to determine warning level for high token timeouts To: =?UTF-8?Q?Michael_K=C3=B6ppl?= , pve-devel@lists.proxmox.com References: <20260330144321.321072-1-m.koeppl@proxmox.com> <20260330144321.321072-2-m.koeppl@proxmox.com> Content-Language: en-US From: Friedrich Weber In-Reply-To: <20260330144321.321072-2-m.koeppl@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776414739392 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.013 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: WZXVFT3OJJ3XNOS3PVQ32VOK3UY6FGM4 X-Message-ID-Hash: WZXVFT3OJJ3XNOS3PVQ32VOK3UY6FGM4 X-MailFrom: f.weber@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 30/03/2026 16:46, Michael Köppl wrote: > High token timeouts can lead to stability problems in clusters. To > inform users about the timeout in their current setup (or expected > timeouts when adding nodes) and give recommendations regarding the token > coefficient setting, introduce function to calculate the timeout as well > as determine the warning / recommendation levels. > > Signed-off-by: Michael Köppl > --- > The timeouts are chosen according to Friedrich's description in [0]. > > [0] https://bugzilla.proxmox.com/show_bug.cgi?id=7398 > > src/PVE/Corosync.pm | 50 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 50 insertions(+) > > diff --git a/src/PVE/Corosync.pm b/src/PVE/Corosync.pm > index aef0d31..41d4c6f 100644 > --- a/src/PVE/Corosync.pm > +++ b/src/PVE/Corosync.pm > @@ -534,4 +534,54 @@ sub resolve_hostname_like_corosync { > return $match_ip_and_version->($resolved_ip); > } > > +sub calculate_total_timeout { > + my ($totemcfg, $node_count) = @_; > + > + my $token_timeout = $totemcfg->{token} // 3000; > + my $token_coefficient = $totemcfg->{token_coefficient} // 650; > + > + my $expected_token_timeout = $token_timeout; > + if ($node_count > 2) { > + $expected_token_timeout += ($node_count - 2) * $token_coefficient; > + } > + > + my $expected_consensus_timeout = $totemcfg->{consensus} // $expected_token_timeout * 1.2; > + return ($expected_token_timeout + $expected_consensus_timeout) / 1000.0; > +} > + > +sub get_timeout_warning_level { > + my ($total_timeout_secs) = @_; > + > + if ($total_timeout_secs > 50) { > + return 'change-strongly-recommended'; > + } elsif ($total_timeout_secs > 40) { > + return 'change-recommended'; > + } elsif ($total_timeout_secs > 30) { > + return 'optimize'; > + } > + > + return undef; > +} > + > +sub get_timeout_warning { > + my ($total_timeout_secs) = @_; > + > + my $level = get_timeout_warning_level($total_timeout_secs); > + return undef if !defined($level); > + > + my $level_msg; > + if ($level eq 'change-strongly-recommended') { > + $level_msg = "Changing the token coefficient is strongly recommended"; > + } elsif ($level eq 'change-recommended') { > + $level_msg = "Changing the token coefficient is recommended"; > + } elsif ($level eq 'optimize') { > + $level_msg = "Token coefficient can be optimized"; > + } > + > + return > + "Sum of Corosync token and consensus timeout is ${total_timeout_secs}s. " > + . "$level_msg. " > + . "See https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_changing_the_token_coefficient for details."; > +} nit: I think a linebreak before the "See" would be nice. I'm not sure what our policy is regarding docs links in the CLI. If we don't want them, we could have something like "See the admin guide for details" -- the "token coefficient" phrase should make the relevant section easy to find.