From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 4DD361FF13B for ; Wed, 22 Apr 2026 13:40:35 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F03211BC03; Wed, 22 Apr 2026 13:40:34 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Wed, 22 Apr 2026 13:40:00 +0200 Message-Id: Subject: Re: [PATCH cluster v2 4/8] add functions to determine warning level for high token timeouts From: "Lukas Sichert" To: =?utf-8?q?Michael_K=C3=B6ppl?= , References: <20260420164314.370023-1-m.koeppl@proxmox.com> <20260420164314.370023-5-m.koeppl@proxmox.com> In-Reply-To: <20260420164314.370023-5-m.koeppl@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776857913320 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.876 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [corosync.pm,proxmox.com] Message-ID-Hash: SK32BUFXZURSXNHLKQ7G4O3EAWOR43QR X-Message-ID-Hash: SK32BUFXZURSXNHLKQ7G4O3EAWOR43QR X-MailFrom: l.sichert@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 2026-04-20 18:43, [=3D?utf-8?q?Michael_K=3DC3=3DB6ppl?=3D ] wrote: > High token timeouts can lead to stability problems in clusters. To > inform users about the timeout in their current setup (or expected > timeouts when adding nodes) and give recommendations regarding the token > coefficient setting, introduce function to calculate the timeout as well > as determine the warning / recommendation levels. > > Signed-off-by: Michael K=C3=B6ppl > --- > src/PVE/Corosync.pm | 50 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 50 insertions(+) > > diff --git a/src/PVE/Corosync.pm b/src/PVE/Corosync.pm > index aef0d31..6391e3c 100644 > --- a/src/PVE/Corosync.pm > +++ b/src/PVE/Corosync.pm > @@ -534,4 +534,54 @@ sub resolve_hostname_like_corosync { > return $match_ip_and_version->($resolved_ip); > } > =20 > +sub calculate_membership_recovery_timeout { > + my ($totemcfg, $node_count) =3D @_; > + > + my $token_timeout =3D $totemcfg->{token} // 3000; > + my $token_coefficient =3D $totemcfg->{token_coefficient} // 650; > + > + my $expected_token_timeout =3D $token_timeout; > + if ($node_count > 2) { > + $expected_token_timeout +=3D ($node_count - 2) * $token_coeffici= ent; > + } > + > + my $expected_consensus_timeout =3D $totemcfg->{consensus} // $expect= ed_token_timeout * 1.2; > + return ($expected_token_timeout + $expected_consensus_timeout) / 100= 0.0; > +} > + > +sub get_timeout_warning_level { > + my ($total_timeout_secs) =3D @_; > + > + if ($total_timeout_secs > 45) { > + return 'change-strongly-recommended'; > + } elsif ($total_timeout_secs > 40) { > + return 'change-recommended'; > + } elsif ($total_timeout_secs > 30) { > + return 'optimize'; > + } > + > + return undef; > +} > + > +sub get_timeout_warning { > + my ($total_timeout_secs) =3D @_; > + > + my $level =3D get_timeout_warning_level($total_timeout_secs); > + return undef if !defined($level); > + > + my $level_msg; > + if ($level eq 'change-strongly-recommended') { > + $level_msg =3D "Changing the token coefficient is strongly recom= mended"; In my opinion, "lowering" would be clearer than "changing" here, because th= is warning is only emitted when the timeout is too high. > + } elsif ($level eq 'change-recommended') { > + $level_msg =3D "Changing the token coefficient is recommended"; > + } elsif ($level eq 'optimize') { > + $level_msg =3D "Token coefficient can be optimized"; > + } > + > + return > + "Sum of Corosync token and consensus timeout is ${total_timeout_= secs}s. " > + . "$level_msg. " > + . "See https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_chan= ging_the_token_coefficient for details."; Maybe it would be better to avoid hardcoding the 'pve.proxmox.com' URL, but= to derive the URL from '/etc/hosts' or use ':8006' instead. > +} > + > 1; > --=20 > 2.47.3