From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 7972F1FF13F for ; Thu, 07 May 2026 15:08:08 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0F47B1C09B; Thu, 7 May 2026 15:08:08 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 07 May 2026 15:08:00 +0200 Message-Id: To: "Thomas Lamprecht" , Subject: Re: [PATCH manager v2] fix #4130: external metric: better handle failed connections From: "Lukas Sichert" References: <20260219123417.597950-1-l.sichert@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778159172826 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.567 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [extmetric.pm] Message-ID-Hash: ZATP7C34AQXMVZQKFGGOCNQFZLJRGVAI X-Message-ID-Hash: ZATP7C34AQXMVZQKFGGOCNQFZLJRGVAI X-MailFrom: l.sichert@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Thanks for looking into it. Comments are inline. On 2026-05-06 19:00, Thomas Lamprecht wrote: > Am 19.02.26 um 13:34 schrieb Lukas Sichert: >> When an external metric server configured to use TCP is unreachable, the >> storage and VM indicators of the cluster nodes in the UI turn gray. This >> is because, currently, a failed connection attempt raises an unhandled >> exception, which aborts the status update flow. As the connection >> attempts happen at the beginning of the update process, status >> information is then not broadcasted within the system or across the >> cluster. After five minutes without updates, the frontend marks the >> indicators as gray. >>=20 >> To catch connection errors, wrap connection establishment in an eval >> block. The implementation ensures that other connections to external >> metric servers are still established, even if one fails. >>=20 >> Signed-off-by: Lukas Sichert >> --- >>=20 >> Notes: >> changes from v1 to v2: >> -add the SafeSyslog import required for syslog() >> -correct bug ID: #4911 -> #4130 >> -move the push operation outside the eval block as suggested by Thom= as >> =20 >> Regarding catching the errors at a higher level: Since this function >> is iterated through the plugins, not catching the error here would m= ean, >> that not all the plugins are checked. >>=20 >> PVE/ExtMetric.pm | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >>=20 >> diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm >> index ebc2817b..18815efd 100644 >> --- a/PVE/ExtMetric.pm >> +++ b/PVE/ExtMetric.pm >> @@ -7,6 +7,7 @@ use PVE::Status::Plugin; >> use PVE::Status::Graphite; >> use PVE::Status::InfluxDB; >> use PVE::Status::OpenTelemetry; >> +use PVE::SafeSyslog; >> =20 >> PVE::Status::Graphite->register(); >> PVE::Status::InfluxDB->register(); >> @@ -52,8 +53,12 @@ sub transactions_start { >> $cfg, >> sub { >> my ($plugin, $id, $plugin_config) =3D @_; >> - >> - my $connection =3D $plugin->_connect($plugin_config, $id); >> + =20 >> + my $connection =3D eval { $plugin->_connect($plugin_config,= $id);};=20 > > there are various whitespace/code format issues here, please run the top-= level > "make tidy" target or call promxox-perltidy manually on the files to fix = this. Thank you for pointing it out. I will fix it in a v3. > >> + if (my $err =3D $@) { >> + syslog( "warning", "connection for plugin '$id' failed:= $err"); >> + return; > > This now returns an undef for transation, so the call sides in pvestatd p= robably > need to be adapted too to: > > if (defined(my $transactions =3D PVE::ExtMetric::transactions_start($stat= us_cfg))) { > # do something with $transaction > } > > As otherwise this could cause warnings about accessing an undef value. The '$transactions' variable is initialized as an empty array earlier. The 'return;' is inside the closure passed to 'foreach_plug', so it only returns from that closure and prevents the 'push @$transactions, ...' below from being executed for the current plugin. The 'foreach_plug' loop then continues with the next plugin in '$cfg'. After 'foreach_plug' finishes, '$transactions' is returned from 'transactions_start'. Please tell me if I am misunderstanding some Perl closure intricacy here. > >> + } >> =20 >> push @$transactions, >> {