From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 07EFB1FF138 for ; Wed, 18 Feb 2026 19:22:50 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 34AC13870; Wed, 18 Feb 2026 19:23:48 +0100 (CET) Message-ID: Date: Wed, 18 Feb 2026 19:23:12 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PATCH manager] fix #4911: external metric: better handle failed connections To: Lukas Sichert , pve-devel@lists.proxmox.com References: <20260218174411.391815-1-l.sichert@proxmox.com> Content-Language: en-US From: Thomas Lamprecht In-Reply-To: <20260218174411.391815-1-l.sichert@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1771438984685 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.022 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: ZZISDNIVMD2NESB6TDMFNN6OO6XWR3OK X-Message-ID-Hash: ZZISDNIVMD2NESB6TDMFNN6OO6XWR3OK X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 18.02.26 um 18:44 schrieb Lukas Sichert: > When an external metric server configured to use TCP is unreachable, the > storage and VM indicators of the cluster nodes in the UI turn gray. This > is because, currently, a failed connection attempt raises an unhandled > exception, which aborts the status update flow. As the connection > attempts happen at the beginning of the update process, status > information is then not broadcasted within the system or across the > cluster. After five minutes without updates, the frontend marks the > indicators as gray. > > To catch connection errors, wrap connection establishment in an eval > block. The implementation ensures that other connections to external > metric servers are still established, even if one fails. > > Signed-off-by: Lukas Sichert > --- > > Notes: > Although this commit correctly catches the errors, it produces a lot of > warnings in the system log. Every 10 seconds, 4 connection attempts > (LXC, QEMU, Node and Storage) are made and therefore also 4 warnings are > printed to the system log. > > Better implementations might be: > 1. Establish connection once and then reuse it for all 4 data package > types. > This might be the most elegant solution, but if the connection is > lost in between the messages, it could lead to other errors. Would share some more state, which might make the code slightly uglier, but one would need to try to see if that's actually true. And yeah, trying to re-connect at least once in that case might be warranted. > 2. Use a Perl-hash that is propagated through the function call > layers. Check for uniqueness of the error, and if unique, > add error message to dictionary. Then at the end print the messages > in the dictionary to the system log. > This solution is less elegant, but it is robust and does not > require bigger changes in the code. > 3. Use a timer and only print a warning if sufficient time has > elapsed from the last print. > This option might be slightly more elegant than option 2, however > it could lead to nondeterministic behavior, because it depends on > which function the timer runs out in. > > If anyone has any suggestions regarding this, please let me know. > Thanks in advance! I mean, the systemd journal is good at compression, so size, or rather log-retention wise, the frequent logging won't really matter that much. So would maybe ignore that completely. > > PVE/ExtMetric.pm | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm > index ebc2817b..bb36930e 100644 > --- a/PVE/ExtMetric.pm > +++ b/PVE/ExtMetric.pm > @@ -52,17 +52,21 @@ sub transactions_start { > $cfg, > sub { > my ($plugin, $id, $plugin_config) = @_; > - > - my $connection = $plugin->_connect($plugin_config, $id); > - > - push @$transactions, > - { > - connection => $connection, > - cfg => $plugin_config, > - id => $id, > - data => '', > - }; > - }, > + eval { > + my $connection = $plugin->_connect($plugin_config, $id); You only care about catching the error here, below's push cannot really cause one, so might be nicer written as: my $connection = eval { $plugin->_connect($plugin_config, $id) }; if (my $err = $@) { syslog( "warning", "connection for plugin '$id' failed: $err"); return; } And keep below as is. Another variant might be to catch these at an higher, more central level in the statd code path. But not sure if that's really better, it has been a while since I looked at all parts involved here. > + > + push @$transactions, > + { > + connection => $connection, > + cfg => $plugin_config, > + id => $id, > + data => '', > + }; > + }; > + if (my $err = $@) { > + syslog( "warning", "connection for plugin '$id' failed: $err"); > + } > + }, > ); > > return $transactions;