From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 702051FF13B for ; Wed, 06 May 2026 19:01:32 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 46BCD2741; Wed, 6 May 2026 19:01:32 +0200 (CEST) Message-ID: Date: Wed, 6 May 2026 19:00:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PATCH manager v2] fix #4130: external metric: better handle failed connections To: Lukas Sichert , pve-devel@lists.proxmox.com References: <20260219123417.597950-1-l.sichert@proxmox.com> Content-Language: en-US From: Thomas Lamprecht In-Reply-To: <20260219123417.597950-1-l.sichert@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778086746341 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.003 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: L4BTOWRKAHJH366XRZDNWGEO4OSDVS2M X-Message-ID-Hash: L4BTOWRKAHJH366XRZDNWGEO4OSDVS2M X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 19.02.26 um 13:34 schrieb Lukas Sichert: > When an external metric server configured to use TCP is unreachable, the > storage and VM indicators of the cluster nodes in the UI turn gray. This > is because, currently, a failed connection attempt raises an unhandled > exception, which aborts the status update flow. As the connection > attempts happen at the beginning of the update process, status > information is then not broadcasted within the system or across the > cluster. After five minutes without updates, the frontend marks the > indicators as gray. > > To catch connection errors, wrap connection establishment in an eval > block. The implementation ensures that other connections to external > metric servers are still established, even if one fails. > > Signed-off-by: Lukas Sichert > --- > > Notes: > changes from v1 to v2: > -add the SafeSyslog import required for syslog() > -correct bug ID: #4911 -> #4130 > -move the push operation outside the eval block as suggested by Thomas > > Regarding catching the errors at a higher level: Since this function > is iterated through the plugins, not catching the error here would mean, > that not all the plugins are checked. > > PVE/ExtMetric.pm | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm > index ebc2817b..18815efd 100644 > --- a/PVE/ExtMetric.pm > +++ b/PVE/ExtMetric.pm > @@ -7,6 +7,7 @@ use PVE::Status::Plugin; > use PVE::Status::Graphite; > use PVE::Status::InfluxDB; > use PVE::Status::OpenTelemetry; > +use PVE::SafeSyslog; > > PVE::Status::Graphite->register(); > PVE::Status::InfluxDB->register(); > @@ -52,8 +53,12 @@ sub transactions_start { > $cfg, > sub { > my ($plugin, $id, $plugin_config) = @_; > - > - my $connection = $plugin->_connect($plugin_config, $id); > + > + my $connection = eval { $plugin->_connect($plugin_config, $id);}; there are various whitespace/code format issues here, please run the top-level "make tidy" target or call promxox-perltidy manually on the files to fix this. > + if (my $err = $@) { > + syslog( "warning", "connection for plugin '$id' failed: $err"); > + return; This now returns an undef for transation, so the call sides in pvestatd probably need to be adapted too to: if (defined(my $transactions = PVE::ExtMetric::transactions_start($status_cfg))) { # do something with $transaction } As otherwise this could cause warnings about accessing an undef value. > + } > > push @$transactions, > {