From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 8506D1FF13C for ; Thu, 28 May 2026 13:29:15 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F301F14826; Thu, 28 May 2026 13:29:13 +0200 (CEST) From: Lukas Sichert To: pve-devel@lists.proxmox.com Subject: [PATCH manager v3] fix #4130: external metric: better handle failed connections Date: Thu, 28 May 2026 13:28:50 +0200 Message-ID: <20260528112904.66865-1-l.sichert@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1779967722758 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.407 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [extmetric.pm] Message-ID-Hash: 2YMFMRCA3WGHXVU3V4Z2WCE7FPKFZHJD X-Message-ID-Hash: 2YMFMRCA3WGHXVU3V4Z2WCE7FPKFZHJD X-MailFrom: l.sichert@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Lukas Sichert X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: When an external metric server configured to use TCP is unreachable, the storage and VM indicators of the cluster nodes in the UI turn gray. This is because, currently, a failed connection attempt raises an unhandled exception, which aborts the status update flow. As the connection attempts happen at the beginning of the update process, status information is then not broadcasted within the system or across the cluster. After five minutes without updates, the frontend marks the indicators as gray. To catch connection errors, wrap connection establishment in an eval block. The implementation ensures that other connections to external metric servers are still established, even if one fails. Signed-off-by: Lukas Sichert --- Notes: changes from v2 to v3 (thanks @Thomas): -run make tidy changes from v1 to v2: -add the SafeSyslog import required for syslog() -correct bug ID: #4911 -> #4130 -move the push operation outside the eval block as suggested by Thomas Regarding catching the errors at a higher level: Since this function is iterated through the plugins, not catching the error here would mean, that not all the plugins are checked. PVE/ExtMetric.pm | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm index ebc2817b..3695a6d4 100644 --- a/PVE/ExtMetric.pm +++ b/PVE/ExtMetric.pm @@ -7,6 +7,7 @@ use PVE::Status::Plugin; use PVE::Status::Graphite; use PVE::Status::InfluxDB; use PVE::Status::OpenTelemetry; +use PVE::SafeSyslog; PVE::Status::Graphite->register(); PVE::Status::InfluxDB->register(); @@ -53,8 +54,11 @@ sub transactions_start { sub { my ($plugin, $id, $plugin_config) = @_; - my $connection = $plugin->_connect($plugin_config, $id); - + my $connection = eval { $plugin->_connect($plugin_config, $id); }; + if (my $err = $@) { + syslog("warning", "connection for plugin '$id' failed: $err"); + return; + } push @$transactions, { connection => $connection, -- 2.47.3