From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 9FD2C1FF136 for ; Mon, 20 Apr 2026 10:28:50 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 8EA7D211A1; Mon, 20 Apr 2026 10:28:49 +0200 (CEST) Content-Type: text/plain; charset=UTF-8 Date: Mon, 20 Apr 2026 10:28:14 +0200 Message-Id: Subject: Re: [PATCH cluster 1/5] add functions to determine warning level for high token timeouts From: =?utf-8?q?Michael_K=C3=B6ppl?= To: "Friedrich Weber" , =?utf-8?q?Michael_K=C3=B6ppl?= , Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: aerc 0.21.0 References: <20260330144321.321072-1-m.koeppl@proxmox.com> <20260330144321.321072-2-m.koeppl@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776673611108 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.102 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: CST6DWV7FBWC3DVM4X4MCYNC2FMH77B7 X-Message-ID-Hash: CST6DWV7FBWC3DVM4X4MCYNC2FMH77B7 X-MailFrom: m.koeppl@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri Apr 17, 2026 at 10:33 AM CEST, Friedrich Weber wrote: [snip] >> >> +sub calculate_total_timeout { > > I think "total timeout" is a little too vague, especially because it's > also user-facing. I don't think totem/corosync have a term for "sum of > token and consensus timeout", and "sum of token and consensus timeout" > is a too long. Perhaps something like "recovery timeout" -- though not > perfect because "Recovery" is a specific state in the totem state > machine. Maybe "membership convergence timeout" (though that's a bit > long and obscure)? Thanks for having a look at this and for your feedback. I agree that the naming is a bit too vague, though I really wasn't sure what else to name it. I think calculate_membership_convergence_timeout could work. As an alternative suggestion, what do you think of calculate_cluster_reformation_timeout? Of course that's also a more "verbose" name, but I think it describes the effects of the timeout quite well. > >> + my ($totemcfg, $node_count) =3D @_; >> + >> + my $token_timeout =3D $totemcfg->{token} // 3000; >> + my $token_coefficient =3D $totemcfg->{token_coefficient} // 650; >> + >> + my $expected_token_timeout =3D $token_timeout; >> + if ($node_count > 2) { >> + $expected_token_timeout +=3D ($node_count - 2) * $token_coeffic= ient; >> + } >> + >> + my $expected_consensus_timeout =3D $totemcfg->{consensus} // $expec= ted_token_timeout * 1.2; >> + return ($expected_token_timeout + $expected_consensus_timeout) / 10= 00.0; >> +} >> + >> +sub get_timeout_warning_level { >> + my ($total_timeout_secs) =3D @_; >> + >> + if ($total_timeout_secs > 50) { >> + return 'change-strongly-recommended'; > > I realize I'm the source of these numbers :) But since >50 is actually > pretty bad already, if we phrase it as "strongly recommended" we can > probably go for a slightly lower number: > > - > 45: change-strongly-recommended > - > 40: change recommended > - > 30: optimize will lower them for a v2, thanks! :) I'd already wondered about the thresholds when I implemented this, but thought it'd make sense to start with the values from the bugzilla. > >> + } elsif ($total_timeout_secs > 40) { >> + return 'change-recommended'; [snip]