public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH manager v2] fix #4130: external metric: better handle failed connections
@ 2026-02-19 12:34 Lukas Sichert
  2026-05-06 17:00 ` Thomas Lamprecht
  0 siblings, 1 reply; 3+ messages in thread
From: Lukas Sichert @ 2026-02-19 12:34 UTC (permalink / raw)
  To: pve-devel; +Cc: Lukas Sichert

When an external metric server configured to use TCP is unreachable, the
storage and VM indicators of the cluster nodes in the UI turn gray. This
is because, currently, a failed connection attempt raises an unhandled
exception, which aborts the status update flow.  As the connection
attempts happen at the beginning of the update process, status
information is then not broadcasted within the system or across the
cluster. After five minutes without updates, the frontend marks the
indicators as gray.

To catch connection errors, wrap connection establishment in an eval
block. The implementation ensures that other connections to external
metric servers are still established, even if one fails.

Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
---

Notes:
    changes from v1 to v2:
    -add the SafeSyslog import required for syslog()
    -correct bug ID: #4911 -> #4130
    -move the push operation outside the eval block as suggested by Thomas
    
    Regarding catching the errors at a higher level: Since this function
    is iterated through the plugins, not catching the error here would mean,
    that not all the plugins are checked.

 PVE/ExtMetric.pm | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
index ebc2817b..18815efd 100644
--- a/PVE/ExtMetric.pm
+++ b/PVE/ExtMetric.pm
@@ -7,6 +7,7 @@ use PVE::Status::Plugin;
 use PVE::Status::Graphite;
 use PVE::Status::InfluxDB;
 use PVE::Status::OpenTelemetry;
+use PVE::SafeSyslog;
 
 PVE::Status::Graphite->register();
 PVE::Status::InfluxDB->register();
@@ -52,8 +53,12 @@ sub transactions_start {
         $cfg,
         sub {
             my ($plugin, $id, $plugin_config) = @_;
-
-            my $connection = $plugin->_connect($plugin_config, $id);
+            
+            my $connection = eval { $plugin->_connect($plugin_config, $id);}; 
+            if (my $err = $@) {
+                syslog( "warning", "connection for plugin '$id' failed: $err");
+                return;
+            }
 
             push @$transactions,
                 {
-- 
2.47.3




^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH manager v2] fix #4130: external metric: better handle failed connections
  2026-02-19 12:34 [PATCH manager v2] fix #4130: external metric: better handle failed connections Lukas Sichert
@ 2026-05-06 17:00 ` Thomas Lamprecht
  2026-05-07 13:08   ` Lukas Sichert
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lamprecht @ 2026-05-06 17:00 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Am 19.02.26 um 13:34 schrieb Lukas Sichert:
> When an external metric server configured to use TCP is unreachable, the
> storage and VM indicators of the cluster nodes in the UI turn gray. This
> is because, currently, a failed connection attempt raises an unhandled
> exception, which aborts the status update flow.  As the connection
> attempts happen at the beginning of the update process, status
> information is then not broadcasted within the system or across the
> cluster. After five minutes without updates, the frontend marks the
> indicators as gray.
> 
> To catch connection errors, wrap connection establishment in an eval
> block. The implementation ensures that other connections to external
> metric servers are still established, even if one fails.
> 
> Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
> ---
> 
> Notes:
>     changes from v1 to v2:
>     -add the SafeSyslog import required for syslog()
>     -correct bug ID: #4911 -> #4130
>     -move the push operation outside the eval block as suggested by Thomas
>     
>     Regarding catching the errors at a higher level: Since this function
>     is iterated through the plugins, not catching the error here would mean,
>     that not all the plugins are checked.
> 
>  PVE/ExtMetric.pm | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
> index ebc2817b..18815efd 100644
> --- a/PVE/ExtMetric.pm
> +++ b/PVE/ExtMetric.pm
> @@ -7,6 +7,7 @@ use PVE::Status::Plugin;
>  use PVE::Status::Graphite;
>  use PVE::Status::InfluxDB;
>  use PVE::Status::OpenTelemetry;
> +use PVE::SafeSyslog;
>  
>  PVE::Status::Graphite->register();
>  PVE::Status::InfluxDB->register();
> @@ -52,8 +53,12 @@ sub transactions_start {
>          $cfg,
>          sub {
>              my ($plugin, $id, $plugin_config) = @_;
> -
> -            my $connection = $plugin->_connect($plugin_config, $id);
> +            
> +            my $connection = eval { $plugin->_connect($plugin_config, $id);}; 

there are various whitespace/code format issues here, please run the top-level
"make tidy" target or call promxox-perltidy manually on the files to fix this.

> +            if (my $err = $@) {
> +                syslog( "warning", "connection for plugin '$id' failed: $err");
> +                return;

This now returns an undef for transation, so the call sides in pvestatd probably
need to be adapted too to:

if (defined(my $transactions = PVE::ExtMetric::transactions_start($status_cfg))) {
    # do something with $transaction
}

As otherwise this could cause warnings about accessing an undef value.

> +            }
>  
>              push @$transactions,
>                  {





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH manager v2] fix #4130: external metric: better handle failed connections
  2026-05-06 17:00 ` Thomas Lamprecht
@ 2026-05-07 13:08   ` Lukas Sichert
  0 siblings, 0 replies; 3+ messages in thread
From: Lukas Sichert @ 2026-05-07 13:08 UTC (permalink / raw)
  To: Thomas Lamprecht, pve-devel

Thanks for looking into it. Comments are inline.

On 2026-05-06 19:00, Thomas Lamprecht <t.lamprecht@proxmox.com> wrote:

> Am 19.02.26 um 13:34 schrieb Lukas Sichert:
>> When an external metric server configured to use TCP is unreachable, the
>> storage and VM indicators of the cluster nodes in the UI turn gray. This
>> is because, currently, a failed connection attempt raises an unhandled
>> exception, which aborts the status update flow.  As the connection
>> attempts happen at the beginning of the update process, status
>> information is then not broadcasted within the system or across the
>> cluster. After five minutes without updates, the frontend marks the
>> indicators as gray.
>> 
>> To catch connection errors, wrap connection establishment in an eval
>> block. The implementation ensures that other connections to external
>> metric servers are still established, even if one fails.
>> 
>> Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
>> ---
>> 
>> Notes:
>>     changes from v1 to v2:
>>     -add the SafeSyslog import required for syslog()
>>     -correct bug ID: #4911 -> #4130
>>     -move the push operation outside the eval block as suggested by Thomas
>>     
>>     Regarding catching the errors at a higher level: Since this function
>>     is iterated through the plugins, not catching the error here would mean,
>>     that not all the plugins are checked.
>> 
>>  PVE/ExtMetric.pm | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
>> index ebc2817b..18815efd 100644
>> --- a/PVE/ExtMetric.pm
>> +++ b/PVE/ExtMetric.pm
>> @@ -7,6 +7,7 @@ use PVE::Status::Plugin;
>>  use PVE::Status::Graphite;
>>  use PVE::Status::InfluxDB;
>>  use PVE::Status::OpenTelemetry;
>> +use PVE::SafeSyslog;
>>  
>>  PVE::Status::Graphite->register();
>>  PVE::Status::InfluxDB->register();
>> @@ -52,8 +53,12 @@ sub transactions_start {
>>          $cfg,
>>          sub {
>>              my ($plugin, $id, $plugin_config) = @_;
>> -
>> -            my $connection = $plugin->_connect($plugin_config, $id);
>> +            
>> +            my $connection = eval { $plugin->_connect($plugin_config, $id);}; 
>
> there are various whitespace/code format issues here, please run the top-level
> "make tidy" target or call promxox-perltidy manually on the files to fix this.
Thank you for pointing it out. I will fix it in a v3.
>
>> +            if (my $err = $@) {
>> +                syslog( "warning", "connection for plugin '$id' failed: $err");
>> +                return;
>
> This now returns an undef for transation, so the call sides in pvestatd probably
> need to be adapted too to:
>
> if (defined(my $transactions = PVE::ExtMetric::transactions_start($status_cfg))) {
>     # do something with $transaction
> }
>
> As otherwise this could cause warnings about accessing an undef value.

The '$transactions' variable is initialized as an empty array earlier.
The 'return;' is inside the closure passed to 'foreach_plug', so it only
returns from that closure and prevents the 'push @$transactions, ...'
below from being executed for the current plugin. The 'foreach_plug'
loop then continues with the next plugin in '$cfg'. After 'foreach_plug'
finishes, '$transactions' is returned from 'transactions_start'.
Please tell me if I am misunderstanding some Perl closure intricacy
here.

>
>> +            }
>>  
>>              push @$transactions,
>>                  {





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-07 13:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 12:34 [PATCH manager v2] fix #4130: external metric: better handle failed connections Lukas Sichert
2026-05-06 17:00 ` Thomas Lamprecht
2026-05-07 13:08   ` Lukas Sichert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal