From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 5C6831FF138 for ; Wed, 18 Feb 2026 18:43:50 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 40607308B; Wed, 18 Feb 2026 18:44:49 +0100 (CET) From: Lukas Sichert To: pve-devel@lists.proxmox.com Subject: [PATCH manager] fix #4911: external metric: better handle failed connections Date: Wed, 18 Feb 2026 18:44:11 +0100 Message-ID: <20260218174411.391815-1-l.sichert@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_LAZY_DOMAIN_SECURITY 1 Sending domain does not have any anti-forgery methods RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RDNS_NONE 0.793 Delivered to internal network by a host with no rDNS SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_NONE 0.001 SPF: sender does not publish an SPF Record Message-ID-Hash: ZLPN43BDRBL6HRKP2DE3UOA5QVP5AQEL X-Message-ID-Hash: ZLPN43BDRBL6HRKP2DE3UOA5QVP5AQEL X-MailFrom: lsichert@pve.proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Lukas Sichert X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: When an external metric server configured to use TCP is unreachable, the storage and VM indicators of the cluster nodes in the UI turn gray. This is because, currently, a failed connection attempt raises an unhandled exception, which aborts the status update flow. As the connection attempts happen at the beginning of the update process, status information is then not broadcasted within the system or across the cluster. After five minutes without updates, the frontend marks the indicators as gray. To catch connection errors, wrap connection establishment in an eval block. The implementation ensures that other connections to external metric servers are still established, even if one fails. Signed-off-by: Lukas Sichert --- Notes: Although this commit correctly catches the errors, it produces a lot of warnings in the system log. Every 10 seconds, 4 connection attempts (LXC, QEMU, Node and Storage) are made and therefore also 4 warnings are printed to the system log. Better implementations might be: 1. Establish connection once and then reuse it for all 4 data package types. This might be the most elegant solution, but if the connection is lost in between the messages, it could lead to other errors. 2. Use a Perl-hash that is propagated through the function call layers. Check for uniqueness of the error, and if unique, add error message to dictionary. Then at the end print the messages in the dictionary to the system log. This solution is less elegant, but it is robust and does not require bigger changes in the code. 3. Use a timer and only print a warning if sufficient time has elapsed from the last print. This option might be slightly more elegant than option 2, however it could lead to nondeterministic behavior, because it depends on which function the timer runs out in. If anyone has any suggestions regarding this, please let me know. Thanks in advance! PVE/ExtMetric.pm | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm index ebc2817b..bb36930e 100644 --- a/PVE/ExtMetric.pm +++ b/PVE/ExtMetric.pm @@ -52,17 +52,21 @@ sub transactions_start { $cfg, sub { my ($plugin, $id, $plugin_config) = @_; - - my $connection = $plugin->_connect($plugin_config, $id); - - push @$transactions, - { - connection => $connection, - cfg => $plugin_config, - id => $id, - data => '', - }; - }, + eval { + my $connection = $plugin->_connect($plugin_config, $id); + + push @$transactions, + { + connection => $connection, + cfg => $plugin_config, + id => $id, + data => '', + }; + }; + if (my $err = $@) { + syslog( "warning", "connection for plugin '$id' failed: $err"); + } + }, ); return $transactions; -- 2.47.3