From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 606F21FF142 for ; Fri, 05 Jun 2026 17:39:05 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 137B41EBD0; Fri, 5 Jun 2026 17:39:01 +0200 (CEST) From: =?UTF-8?q?Michael=20K=C3=B6ppl?= To: pve-devel@lists.proxmox.com Subject: [PATCH cluster v4 3/8] add functions to determine warning level for high token timeouts Date: Fri, 5 Jun 2026 17:38:14 +0200 Message-ID: <20260605153819.310048-4-m.koeppl@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260605153819.310048-1-m.koeppl@proxmox.com> References: <20260605153819.310048-1-m.koeppl@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1780673866115 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.092 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: OBWN44BJNCAIDKPCR2GTHSGOTSZ2NUA7 X-Message-ID-Hash: OBWN44BJNCAIDKPCR2GTHSGOTSZ2NUA7 X-MailFrom: m.koeppl@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: High token timeouts can lead to stability problems in clusters. To inform users about the timeout in their current setup and give recommendations regarding the token coefficient setting, introduce functions to calculate the timeout as well as determine the warning level. Current token and consensus timeout are parsed from corosync-cmapctl's output directly to avoid future drift between this implementation and Corosync's implementation. The values are parsed from the output assuming the following format: runtime.config.totem.consensus (u32) = 3750 runtime.config.totem.token (u32) = 3125 Signed-off-by: Michael Köppl --- src/PVE/Corosync.pm | 59 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/src/PVE/Corosync.pm b/src/PVE/Corosync.pm index aef0d31..d379710 100644 --- a/src/PVE/Corosync.pm +++ b/src/PVE/Corosync.pm @@ -534,4 +534,63 @@ sub resolve_hostname_like_corosync { return $match_ip_and_version->($resolved_ip); } +sub read_cmap { + my $cmap = {}; + my $rc = eval { + PVE::Tools::run_command( + ['corosync-cmapctl'], + outfunc => sub { + my $line = shift; + if ($line =~ /^(\S+)\s+\(\w+\)\s+=\s+(.*)$/) { + $cmap->{$1} = $2; + } + }, + ); + 1; + }; + return undef if !$rc; + return $cmap; +} + +sub calculate_membership_recovery_timeout { + my $cmap = read_cmap(); + return undef if !$cmap; + + my $token = $cmap->{'runtime.config.totem.token'}; + my $consensus = $cmap->{'runtime.config.totem.consensus'}; + return undef if !defined($token) || !defined($consensus); + + return ($token + $consensus) / 1000.0; +} + +sub get_membership_recovery_timeout_warning_level { + my ($total_timeout_secs) = @_; + + if ($total_timeout_secs > 45) { + return 'critical'; + } elsif ($total_timeout_secs > 40) { + return 'warning'; + } elsif ($total_timeout_secs > 30) { + return 'info'; + } + + return undef; +} + +sub get_membership_recovery_timeout_warning { + my ($total_timeout_secs) = @_; + + my $level = get_membership_recovery_timeout_warning_level($total_timeout_secs); + return if !defined($level); + + my $level_msg = { + critical => "Lowering the token coefficient is strongly recommended", + warning => "Lowering the token coefficient is recommended", + info => "The token coefficient can be optimized", + }->{$level}; + + my $msg = "Sum of Corosync token and consensus timeout is ${total_timeout_secs}s. $level_msg."; + return ($level, $msg); +} + 1; -- 2.47.3