From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 66C281FF16B for ; Tue, 15 Jul 2025 08:52:34 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id CC9B134575; Tue, 15 Jul 2025 08:53:29 +0200 (CEST) Message-ID: Date: Tue, 15 Jul 2025 08:52:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta To: Proxmox VE development discussion , "nansen.su" References: <20250715035525.2012744-1-nansen.su@sianit.com> Content-Language: de-AT, en-US From: Thomas Lamprecht Autocrypt: addr=t.lamprecht@proxmox.com; keydata= xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+ wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/ bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx In-Reply-To: <20250715035525.2012744-1-nansen.su@sianit.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -1.185 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_MAILER 2 Automated Mailer Tag Left in Email POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes RCVD_IN_MSPIKE_H2 0.001 Average reputation (+2) RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [defines.mk, influxdb.pm, graphite.pm, perl.org, plugin.pm, extmetric.pm, opentelemetry.pm, proxmox.com] Subject: Re: [pve-devel] [PATCH pve-manager] metrics add OpenTelemetry support X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Hello, Am 15.07.25 um 05:55 schrieb nansen.su: > This patch adds OpenTelemetry metrics collection to the PVE manager > to improve observability and monitoring capabilities. > > The implementation includes: > - OTLP/HTTP JSON protocol support for OpenTelemetry Collector > - Comprehensive metrics collection for nodes, VMs, containers, and storage > - Batching with configurable size limits and compression > - Full compliance with OpenTelemetry v1 specification > > Technical features: > - Server/port configuration with HTTP/HTTPS protocol support > - Gzip compression with configurable body size limits (default 10MB) > - Custom HTTP headers (Bearer tokens, API keys) > - Resource attributes support (Unicode support) > - Timeout control and SSL certificate verification options > - Recursive metrics conversion supporting all PVE data types > > Signed-off-by: Nansen Su Thanks for your contribution, can you please send a signed CLA [0] to office at, this is a requirement for us to take in such a patch. [0]: https://pve.proxmox.com/wiki/Developer_Documentation#Software_License_and_Copyright I tried to gave the code a good look, but ran a bit out of time towards the end, but all in all it doesn't look bad at all, nice! > > --- > PVE/ExtMetric.pm | 2 + > PVE/Status/Makefile | 1 + > PVE/Status/OpenTelemetry.pm | 628 ++++++++++++++++++++++++++++ > www/manager6/dc/MetricServerView.js | 259 ++++++++++++ > 4 files changed, 890 insertions(+) > create mode 100644 PVE/Status/OpenTelemetry.pm > > diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm > index 02e7c327..ebc2817b 100644 > --- a/PVE/ExtMetric.pm > +++ b/PVE/ExtMetric.pm > @@ -6,9 +6,11 @@ use warnings; > use PVE::Status::Plugin; > use PVE::Status::Graphite; > use PVE::Status::InfluxDB; > +use PVE::Status::OpenTelemetry; > > PVE::Status::Graphite->register(); > PVE::Status::InfluxDB->register(); > +PVE::Status::OpenTelemetry->register(); > PVE::Status::Plugin->init(); > > sub foreach_plug($&) { > diff --git a/PVE/Status/Makefile b/PVE/Status/Makefile > index c2f2edbc..eebce6b7 100644 > --- a/PVE/Status/Makefile > +++ b/PVE/Status/Makefile > @@ -3,6 +3,7 @@ include ../../defines.mk > PERLSOURCE = \ > Graphite.pm \ > InfluxDB.pm \ > + OpenTelemetry.pm \ > Plugin.pm > > all: > diff --git a/PVE/Status/OpenTelemetry.pm b/PVE/Status/OpenTelemetry.pm > new file mode 100644 > index 00000000..3de6ac51 > --- /dev/null > +++ b/PVE/Status/OpenTelemetry.pm > @@ -0,0 +1,628 @@ > +package PVE::Status::OpenTelemetry; > + > +use strict; > +use warnings; > + > +use PVE::Status::Plugin; > +use base qw(PVE::Status::Plugin); > + > +use JSON; > +use LWP::UserAgent; > +use HTTP::Request; This comes from libhttp-message-perl and we already depend on it in pve-common and the pve-http-server, but still, for sake of completness we should now also add libhttp-message-perl as dependency in the debian/control file of the pve-manager package > +use IO::Compress::Gzip qw(gzip $GzipError); IIRC this uses the slower perl based implementation, in our pve-http-server repo we use the `Compress::Zlib::memGzip` method, which is also shipped by the core perl modules but uses the system zlib. This here is probably not really performance critical, but might make sense to use the same thing as the pve-http-server. See https://perldoc.perl.org/5.40.1/Compress::Zlib > +use PVE::Tools qw(extract_param); nit: please group imports for Proxmox dependencies separately and sort imports alphabetically in each group. > +use Encode; > +use MIME::Base64; FYI: this imports decode_base64 by default, so you can either use that or avoid the default import by adding a empty list like qw() at the end: use MIME::Base64 qw(); # no default imports # or explicitly import decode_base64 for import hygiene and clarity use MIME::Base64 qw(decode_base64); > + > +sub type { > + return 'opentelemetry'; > +} > + > +sub properties { > + return { > + 'otel-protocol' => { I'm fine with the plugin specific otel prefix, but @Dominik: these here might be a good fit for the property separation? It's not many plugins and each of them has not that many properties. Or do you know anything that would speak against this? Anyhow, nothing that needs to block this for real and nothing you @Nansen Su need to worry about. > + type => 'string', > + enum => ['http', 'https'], > + description => 'HTTP protocol', > + default => 'https', > + }, > + 'otel-path' => { > + type => 'string', > + description => 'OTLP endpoint path', > + default => '/v1/metrics', > + optional => 1, > + }, > + 'otel-timeout' => { > + type => 'integer', > + description => 'HTTP request timeout in seconds', > + default => 30, that's a rather high default timeout given that pvestatd produces stats every 10s, can me lower this to 5s or less? > + minimum => 1, > + maximum => 300, > + }, > + 'otel-headers' => { > + type => 'string', > + description => 'Custom HTTP headers (JSON format, base64 encoded)', > + optional => 1, > + }, > + 'otel-verify-ssl' => { > + type => 'boolean', > + description => 'Verify SSL certificates', > + default => 1, > + }, > + 'otel-max-body-size' => { > + type => 'integer', > + description => 'Maximum request body size in bytes', > + default => 10_000_000, > + minimum => 1024, > + }, > + 'otel-resource-attributes' => { > + type => 'string', > + description => 'Additional resource attributes as JSON, base64 encoded', Can you provide an example about what one might but in here? Mostly asking to derive a reasonable maximum length, as we would like to always have some limit for such free-form strings, especially as pmxcfs, the FUSE filesystem backing /etc/pve, imposes some relatively low max file size, so one entry being able to use up all of that is not really ideal. Something between 1 and 10 KiB is often a good starting limit, we can increase this relatively easily if anybody runs into it, but lowering it in the future is hard. E.g: maxLength => 1024, > + optional => 1, > + }, > + 'otel-compression' => { > + type => 'boolean', > + description => 'Enable gzip compression for requests', The property is already named quite generic, maybe make this an enum here to more easily allow adding other compression algorithms in the future? Something like: type => 'string', enum => ['none', 'gzip'], default => 'none' optional => 1 > + default => 1, > + optional => 1, > + }, > + }; > +} > + > +sub options { > + return { > + server => { optional => 0 }, > + port => { optional => 1 }, > + disable => { optional => 1 }, > + 'otel-protocol' => { optional => 1 }, > + 'otel-path' => { optional => 1 }, > + 'otel-timeout' => { optional => 1 }, > + 'otel-headers' => { optional => 1 }, > + 'otel-verify-ssl' => { optional => 1 }, > + 'otel-max-body-size' => { optional => 1 }, > + 'otel-resource-attributes' => { optional => 1 }, > + 'otel-compression' => { optional => 1 }, > + }; > +} > + > +sub _connect { > + my ($class, $cfg, $id) = @_; > + > + my $connection = { > + id => $id, > + cfg => $cfg, > + metrics => [], > + retry_count => 0, > + last_flush => time(), Above seems unused? > + stats => { > + total_metrics => 0, > + successful_batches => 0, > + failed_batches => 0, > + } > + }; > + > + return $connection; > +} > + > +sub _disconnect { > + my ($class, $connection) = @_; > + # No persistent connection to cleanup > +} > + > +sub _get_otlp_url { > + my ($class, $cfg) = @_; > + my $proto = $cfg->{'otel-protocol'} || 'https'; > + my $port = $cfg->{port} || ($proto eq 'https' ? 4318 : 4317); > + my $path = $cfg->{'otel-path'} || '/v1/metrics'; > + > + return "${proto}://$cfg->{server}:${port}${path}"; > +} > + > +sub _decode_base64_json { > + my ($class, $encoded_str) = @_; > + return $encoded_str unless defined $encoded_str && $encoded_str ne ''; > + > + # Always attempt base64 decode, fallback to original on any issue > + my $decoded_str = MIME::Base64::decode_base64($encoded_str); > + > + # If decode result is empty or doesn't look right, use original That seems a bit odd and such encoding "downgrade" things can be prone to bugs, if we always expect it to be base64 I'd rather enforce that here. > + if (!defined $decoded_str || length($decoded_str) == 0) { > + return $encoded_str; > + } > + > + return $decoded_str; > +} > + > +sub _parse_headers { > + my ($class, $headers_str) = @_; > + return {} unless defined $headers_str && $headers_str ne ''; > + > + my $decoded_str = $class->_decode_base64_json($headers_str); > + > + my $headers = {}; > + eval { > + my $json = JSON->new->decode($decoded_str); > + $headers = $json if ref($json) eq 'HASH'; It might be good to die here if ref($json) isn't a hash, to notice the user about possible misconfiguration? > + }; > + if ($@) { > + warn "Failed to parse headers: $@"; > + warn "Headers string was: $headers_str"; I would slightly prefer having a single warning here, as there is no guarantee that the two warnings are output close to each other on busy systems generating lots of logs. E.g.: warn "Failed to parse headers '$headers_str' - $@" if $@; > + } > + return $headers; > +} > + > +sub _parse_resource_attributes { > + my ($class, $json_str) = @_; > + return [] unless defined $json_str && $json_str ne ''; > + > + my $decoded_str = $class->_decode_base64_json($json_str); > + > + my $attributes = []; > + eval { > + # Ensure the JSON string is properly decoded as UTF-8 > + my $utf8_json = utf8::is_utf8($decoded_str) ? $decoded_str > + : Encode::decode('utf-8', $decoded_str); > + my $parsed = JSON->new->utf8(0)->decode($utf8_json); > + for my $key (keys %$parsed) { > + push @$attributes, { > + key => $key, > + value => { stringValue => $parsed->{$key} } > + }; > + } > + }; > + if ($@) { > + warn "Failed to parse resource_attributes: $@"; > + warn "Resource attributes string was: $json_str"; same as above w.r.t. single warning. > + } > + return $attributes; > +} > + > +sub _compress_json { > + my ($class, $data) = @_; > + > + my $json_str = JSON->new->utf8->encode($data); > + my $compressed; > + > + gzip \$json_str => \$compressed > + or die "gzip failed: $GzipError"; > + > + return $compressed; > +} > + > +sub _build_otlp_metrics { > + my ($class, $metrics_data, $cfg) = @_; > + > + my $cluster_name = 'proxmox-cluster'; Wouldn't something like 'single-node' make more sense for the fallback name here? > + eval { > + my $corosync_conf = PVE::Tools::file_get_contents( > + '/etc/pve/corosync.conf', 1); > + if ($corosync_conf && $corosync_conf =~ /cluster_name:\s*(\S+)/) { > + $cluster_name = $1; > + } no, please do not parse the corosync config here, especially not in such a hacky way! Rather use our methods and the in memory cached info to get this, e.g.: my $clinfo = PVE::Cluster::get_clinfo(); $clinfo->{cluster}->{name}; > + }; > + # If reading fails, use default cluster name > + > + my $node_name = PVE::INotify::nodename(); > + my $pve_version = PVE::pvecfg::version_text(); > + > + return { > + resourceMetrics => [{ > + resource => { > + attributes => [ > + { key => 'service.name', > + value => { stringValue => 'proxmox-ve' } }, > + { key => 'service.version', > + value => { stringValue => $pve_version } }, > + { key => 'proxmox.cluster', > + value => { stringValue => $cluster_name } }, > + { key => 'proxmox.node', > + value => { stringValue => $node_name } }, > + @{$class->_parse_resource_attributes( > + $cfg->{'otel-resource-attributes'})} > + ] > + }, > + scopeMetrics => [{ > + scope => {}, > + metrics => $metrics_data > + }] > + }] > + }; > +} > + > + > +sub _convert_node_metrics_recursive { > + my ($class, $data, $ctime, $metric_prefix, $attributes) = @_; > + > + my @metrics = (); > + > + # Skip non-metric fields > + my $skip_fields = { > + name => 1, > + tags => 1, > + vmid => 1, > + type => 1, > + status => 1, > + template => 1, > + pid => 1, > + agent => 1, > + serial => 1, > + ctime => 1, > + nics => 1, # Skip nics - handled separately with device labels > + storages => 1, # Skip storages - handled separately with storage labels > + }; > + > + # Unit mapping for common metrics > + my $unit_mapping = { > + # Memory and storage (bytes) > + mem => 'bytes', > + memory => 'bytes', > + swap => 'bytes', > + disk => 'bytes', > + size => 'bytes', > + used => 'bytes', > + free => 'bytes', > + total => 'bytes', > + avail => 'bytes', > + available => 'bytes', > + arcsize => 'bytes', > + blocks => 'bytes', > + bavail => 'bytes', > + bfree => 'bytes', > + > + # Network (bytes) > + net => 'bytes', > + receive => 'bytes', > + transmit => 'bytes', > + > + # CPU and time (seconds or percentage) > + cpu => 'percent', > + wait => 'seconds', > + iowait => 'seconds', > + user => 'seconds', > + system => 'seconds', > + idle => 'seconds', > + nice => 'seconds', > + steal => 'seconds', > + guest => 'seconds', > + irq => 'seconds', > + softirq => 'seconds', > + > + # Load average > + avg => '1', > + > + # Counters > + cpus => '1', > + uptime => 'seconds', > + > + # File system > + files => '1', > + ffree => '1', > + fused => '1', > + favail => '1', > + per => 'percent', > + fper => 'percent', > + }; > + > + for my $key (sort keys %$data) { > + next if $skip_fields->{$key}; > + my $value = $data->{$key}; > + next if !defined($value); > + > + my $metric_name = "${metric_prefix}_${key}"; > + > + if (ref($value) eq 'HASH') { > + # Recursive call for nested hashes > + push @metrics, $class->_convert_node_metrics_recursive( > + $value, $ctime, $metric_name, $attributes); > + } elsif (!ref($value) && $value ne '' && $value =~ /^[+-]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)?$/) { > + # Numeric value - create metric > + my $unit = '1'; # default unit > + > + # Try to determine unit based on key name > + for my $pattern (keys %$unit_mapping) { > + if ($key =~ /\Q$pattern\E/) { > + $unit = $unit_mapping->{$pattern}; > + last; > + } > + } > + > + # Determine if it's an integer or double > + my $data_point = { > + timeUnixNano => $ctime * 1_000_000_000, > + attributes => $attributes, > + }; > + > + if ($value =~ /\./ || $value =~ /[eE]/) { > + $data_point->{asDouble} = $value + 0; # Convert to number > + } else { > + $data_point->{asInt} = int($value); > + } > + > + push @metrics, { > + name => $metric_name, > + unit => $unit, > + gauge => { dataPoints => [$data_point] } > + }; > + } > + } > + > + return @metrics; > +} > + > +sub update_node_status { > + my ($class, $txn, $node, $data, $ctime) = @_; > + > + my @metrics = (); > + my $base_attributes = [ > + { key => 'node', value => { stringValue => $node } } > + ]; > + > + # Convert all node metrics recursively > + push @metrics, $class->_convert_node_metrics_recursive($data, $ctime, 'proxmox_node', $base_attributes); > + > + # Handle special cases that need different attributes > + # Network metrics with device labels > + if (defined $data->{nics}) { > + for my $iface (keys %{$data->{nics}}) { > + my $nic_attributes = [ > + { key => 'node', value => { stringValue => $node } }, > + { key => 'device', value => { stringValue => $iface } } > + ]; > + > + # Use recursive processing for network metrics with device-specific attributes > + push @metrics, $class->_convert_node_metrics_recursive($data->{nics}->{$iface}, $ctime, 'proxmox_node_network', $nic_attributes); > + } > + } > + > + # Storage metrics with storage labels > + if (defined $data->{storages}) { > + for my $storage (keys %{$data->{storages}}) { > + my $storage_attributes = [ > + { key => 'node', value => { stringValue => $node } }, > + { key => 'storage', value => { stringValue => $storage } } > + ]; > + > + # Use recursive processing for storage metrics with storage-specific attributes > + push @metrics, $class->_convert_node_metrics_recursive($data->{storages}->{$storage}, $ctime, 'proxmox_node_storage', $storage_attributes); > + } > + } > + > + push @{$txn->{metrics}}, @metrics; > +} > + > +sub update_qemu_status { > + my ($class, $txn, $vmid, $data, $ctime, $nodename) = @_; > + > + my @metrics = (); > + my $vm_attributes = [ > + { key => 'vmid', value => { stringValue => $vmid } }, > + { key => 'node', value => { stringValue => $nodename } }, > + { key => 'name', value => { stringValue => $data->{name} || '' } }, > + { key => 'type', value => { stringValue => 'qemu' } } > + ]; > + > + # Use recursive processing for all VM metrics > + push @metrics, $class->_convert_node_metrics_recursive($data, $ctime, 'proxmox_vm', $vm_attributes); > + > + push @{$txn->{metrics}}, @metrics; > +} > + > +sub update_lxc_status { > + my ($class, $txn, $vmid, $data, $ctime, $nodename) = @_; > + > + my @metrics = (); > + my $vm_attributes = [ > + { key => 'vmid', value => { stringValue => $vmid } }, > + { key => 'node', value => { stringValue => $nodename } }, > + { key => 'name', value => { stringValue => $data->{name} || '' } }, > + { key => 'type', value => { stringValue => 'lxc' } } > + ]; > + > + # Use recursive processing for all LXC metrics > + push @metrics, $class->_convert_node_metrics_recursive($data, $ctime, 'proxmox_vm', $vm_attributes); > + > + push @{$txn->{metrics}}, @metrics; > +} > + > +sub update_storage_status { > + my ($class, $txn, $nodename, $storeid, $data, $ctime) = @_; > + > + my @metrics = (); > + my $storage_attributes = [ > + { key => 'node', value => { stringValue => $nodename } }, > + { key => 'storage', value => { stringValue => $storeid } } > + ]; > + > + # Use recursive processing for all storage metrics > + push @metrics, $class->_convert_node_metrics_recursive($data, $ctime, 'proxmox_storage', $storage_attributes); > + > + push @{$txn->{metrics}}, @metrics; > +} > + > +sub flush_data { > + my ($class, $txn) = @_; > + > + return if !$txn->{connection}; > + return if !$txn->{metrics} || !@{$txn->{metrics}}; > + > + my $metrics = delete $txn->{metrics}; > + $txn->{metrics} = []; > + > + eval { > + $class->_send_metrics_batched($txn->{connection}, $metrics, $txn->{cfg}); > + $txn->{stats}->{successful_batches}++; > + }; > + > + if (my $err = $@) { > + $txn->{stats}->{failed_batches}++; > + die "OpenTelemetry export failed '$txn->{id}': $err"; > + } > +} > + > +sub _send_metrics_batched { > + my ($class, $connection, $metrics, $cfg) = @_; > + > + my $max_body_size = $cfg->{'otel-max-body-size'} || 10_000_000; > + my $total_metrics = @$metrics; > + > + # Estimate metrics per batch based on size heuristics > + my $estimated_batch_size = $class->_estimate_batch_size($metrics, $max_body_size, $cfg); > + > + # If estimated batch size covers all metrics, try sending everything at once > + if ($estimated_batch_size >= $total_metrics) { > + my $otlp_data = $class->_build_otlp_metrics($metrics, $cfg); > + my $serialized_size = $class->_get_serialized_size($otlp_data, $cfg); > + > + if ($serialized_size <= $max_body_size) { > + $class->send($connection, $otlp_data, $cfg); > + return; > + } > + # If estimation was wrong, fall through to batching > + } > + > + # Send in batches > + for (my $i = 0; $i < $total_metrics; $i += $estimated_batch_size) { > + my $end_idx = $i + $estimated_batch_size - 1; > + $end_idx = $total_metrics - 1 if $end_idx >= $total_metrics; > + > + my @batch_metrics = @$metrics[$i..$end_idx]; > + my $batch_otlp = $class->_build_otlp_metrics(\@batch_metrics, $cfg); > + > + # Verify batch size is within limits > + my $batch_size_bytes = $class->_get_serialized_size($batch_otlp, $cfg); > + if ($batch_size_bytes > $max_body_size) { > + # Fallback: send metrics one by one > + for my $single_metric (@batch_metrics) { > + my $single_otlp = $class->_build_otlp_metrics([$single_metric], $cfg); > + $class->send($connection, $single_otlp, $cfg); > + } > + } else { > + $class->send($connection, $batch_otlp, $cfg); > + } > + } > +} > + > +sub _estimate_batch_size { > + my ($class, $metrics, $max_body_size, $cfg) = @_; > + > + return 1 if @$metrics == 0; > + > + # Sample first few metrics to estimate size per metric > + my $sample_size = @$metrics > 10 ? 10 : @$metrics; > + my @sample_metrics = @$metrics[0..$sample_size-1]; > + > + my $sample_otlp = $class->_build_otlp_metrics(\@sample_metrics, $cfg); > + my $sample_bytes = $class->_get_serialized_size($sample_otlp, $cfg); > + > + # Calculate average bytes per metric with overhead > + my $bytes_per_metric = $sample_bytes / $sample_size; > + > + # Add 20% safety margin for OTLP structure overhead > + $bytes_per_metric *= 1.2; > + > + # Calculate how many metrics fit in max_body_size > + my $estimated_count = int($max_body_size / $bytes_per_metric); > + > + # Ensure at least 1 metric per batch, and cap at total metrics > + $estimated_count = 1 if $estimated_count < 1; > + $estimated_count = @$metrics if $estimated_count > @$metrics; > + > + return $estimated_count; > +} > + > + > +sub _get_serialized_size { > + my ($class, $data, $cfg) = @_; > + > + my $serialized; > + if ($cfg->{'otel-compression'} // 1) { > + $serialized = $class->_compress_json($data); > + } else { > + $serialized = JSON->new->utf8->encode($data); > + } > + > + return length($serialized); > +} > + > +sub send { > + my ($class, $connection, $data, $cfg) = @_; > + > + my $ua = LWP::UserAgent->new( > + timeout => $cfg->{'otel-timeout'} || 5, > + ssl_opts => { verify_hostname => $cfg->{'otel-verify-ssl'} // 1 } > + ); > + > + my $url = $class->_get_otlp_url($cfg); > + > + my $request_data; > + my %headers = ( > + 'Content-Type' => 'application/json', > + ); > + > + # Safely add parsed headers > + my $parsed_headers = $class->_parse_headers($cfg->{'otel-headers'}); > + if ($parsed_headers && ref($parsed_headers) eq 'HASH') { > + %headers = (%headers, %$parsed_headers); > + } > + > + if ($cfg->{'otel-compression'} // 1) { > + $request_data = $class->_compress_json($data); > + $headers{'Content-Encoding'} = 'gzip'; > + } else { > + $request_data = JSON->new->utf8->encode($data); > + } > + > + my $req = HTTP::Request->new('POST', $url, [%headers], $request_data); > + > + my $response = $ua->request($req); > + die "OTLP request failed: " . $response->status_line unless $response->is_success; > +} > + > +sub test_connection { > + my ($class, $cfg) = @_; > + > + my $ua = LWP::UserAgent->new( > + timeout => $cfg->{'otel-timeout'} || 5, > + ssl_opts => { verify_hostname => $cfg->{'otel-verify-ssl'} // 1 } > + ); > + > + my $url = $class->_get_otlp_url($cfg); > + > + # Send empty metrics payload for testing > + my $test_data = { > + resourceMetrics => [{ > + resource => { attributes => [] }, > + scopeMetrics => [{ > + scope => {}, > + metrics => [] > + }] > + }] > + }; > + > + my $request_data; > + my %headers = ( > + 'Content-Type' => 'application/json', > + ); > + > + # Safely add parsed headers > + my $parsed_headers = $class->_parse_headers($cfg->{'otel-headers'}); > + if ($parsed_headers && ref($parsed_headers) eq 'HASH') { > + %headers = (%headers, %$parsed_headers); > + } > + > + if ($cfg->{'otel-compression'} // 1) { > + $request_data = $class->_compress_json($test_data); > + $headers{'Content-Encoding'} = 'gzip'; > + } else { > + $request_data = JSON->new->utf8->encode($test_data); > + } > + > + my $req = HTTP::Request->new('POST', $url, [%headers], $request_data); > + > + my $response = $ua->request($req); > + die "Connection test failed: " . $response->status_line unless $response->is_success; > + > + return 1; > +} > + > +1; > \ No newline at end of file please add a trailing new line at the end of file. > diff --git a/www/manager6/dc/MetricServerView.js b/www/manager6/dc/MetricServerView.js > index baae7d71..8f7920ee 100644 > --- a/www/manager6/dc/MetricServerView.js > +++ b/www/manager6/dc/MetricServerView.js > @@ -14,6 +14,8 @@ Ext.define('PVE.dc.MetricServerView', { > return 'InfluxDB'; > case 'graphite': > return 'Graphite'; > + case 'opentelemetry': > + return 'OpenTelemetry'; > default: > return Proxmox.Utils.unknownText; > } > @@ -106,6 +108,11 @@ Ext.define('PVE.dc.MetricServerView', { > iconCls: 'fa fa-fw fa-bar-chart', > handler: 'addServer', > }, > + { > + text: 'OpenTelemetry', > + iconCls: 'fa fa-fw fa-bar-chart', > + handler: 'addServer', > + }, > ], > }, > { > @@ -164,6 +171,29 @@ Ext.define('PVE.dc.MetricServerBaseEdit', { > success: function (response, options) { > let values = response.result.data; > values.enable = !values.disable; > + > + // Handle OpenTelemetry advanced fields conversion > + if (values.type === 'opentelemetry') { > + if (values['otel-headers']) { > + try { > + // Use Proxmox standard base64 decode > + values.headers_advanced = Ext.util.Base64.decode(values['otel-headers']); > + } catch (_e) { > + // Fallback for non-base64 encoded values > + values.headers_advanced = values['otel-headers']; Also here, would prefer sticking to always expect and enforce base64. > + } > + } > + if (values['otel-resource-attributes']) { > + try { > + // Use Proxmox standard base64 decode > + values.resource_attributes_advanced = Ext.util.Base64.decode(values['otel-resource-attributes']); > + } catch (_e) { > + // Fallback for non-base64 encoded values > + values.resource_attributes_advanced = values['otel-resource-attributes']; > + } > + } > + } > + > me.down('inputpanel').setValues(values); > }, > }); > @@ -499,3 +529,232 @@ Ext.define('PVE.dc.GraphiteEdit', { > }, > ], > }); > + > +Ext.define('PVE.dc.OpenTelemetryEdit', { > + extend: 'PVE.dc.MetricServerBaseEdit', > + xtype: 'pveOpenTelemetryEdit', > + > + subject: gettext('OpenTelemetry Server'), > + > + items: [ > + { > + xtype: 'inputpanel', > + cbind: { > + isCreate: '{isCreate}', > + }, > + onGetValues: function(values) { > + values.disable = values.enable ? 0 : 1; > + delete values.enable; > + > + // Rename advanced fields to their final names and encode as base64 (same as webhook) > + if (values.headers_advanced && values.headers_advanced.trim()) { > + values['otel-headers'] = Ext.util.Base64.encode(values.headers_advanced); > + } else { > + values['otel-headers'] = ''; > + } > + delete values.headers_advanced; > + > + if (values.resource_attributes_advanced && values.resource_attributes_advanced.trim()) { > + values['otel-resource-attributes'] = Ext.util.Base64.encode(values.resource_attributes_advanced); > + } else { > + values['otel-resource-attributes'] = ''; > + } > + delete values.resource_attributes_advanced; > + > + return values; > + }, > + > + column1: [ > + { > + xtype: 'hidden', > + name: 'type', > + value: 'opentelemetry', > + cbind: { > + submitValue: '{isCreate}', > + }, > + }, > + { > + xtype: 'pmxDisplayEditField', > + name: 'id', > + fieldLabel: gettext('Name'), > + allowBlank: false, > + cbind: { > + editable: '{isCreate}', > + value: '{serverid}', > + }, > + }, > + { > + xtype: 'proxmoxtextfield', > + name: 'server', > + fieldLabel: gettext('Server'), > + allowBlank: false, > + emptyText: gettext('otel-collector.example.com'), > + }, > + { > + xtype: 'proxmoxintegerfield', > + name: 'port', > + fieldLabel: gettext('Port'), > + value: 4318, > + minValue: 1, > + maxValue: 65535, > + allowBlank: false, > + }, > + { > + xtype: 'proxmoxKVComboBox', > + name: 'otel-protocol', > + fieldLabel: gettext('Protocol'), > + value: 'https', > + comboItems: [ > + ['http', 'HTTP'], > + ['https', 'HTTPS'], > + ], > + allowBlank: false, > + }, > + { > + xtype: 'proxmoxtextfield', > + name: 'otel-path', > + fieldLabel: gettext('Path'), > + value: '/v1/metrics', > + allowBlank: false, > + }, > + ], > + > + column2: [ > + { > + xtype: 'checkbox', > + name: 'enable', > + fieldLabel: gettext('Enabled'), > + inputValue: 1, > + uncheckedValue: 0, > + checked: true, > + }, > + { > + xtype: 'proxmoxintegerfield', > + name: 'otel-timeout', > + fieldLabel: gettext('Timeout (s)'), > + value: 5, > + minValue: 1, > + maxValue: 300, > + allowBlank: false, > + }, > + { > + xtype: 'proxmoxcheckbox', > + name: 'otel-verify-ssl', > + fieldLabel: gettext('Verify SSL'), > + inputValue: 1, > + uncheckedValue: 0, > + defaultValue: 1, > + cbind: { > + value: function(get) { > + return get('isCreate') ? 1 : undefined; > + } > + }, > + }, > + { > + xtype: 'proxmoxintegerfield', > + name: 'otel-max-body-size', > + fieldLabel: gettext('Max Body Size (bytes)'), > + value: 10000000, > + minValue: 1024, > + allowBlank: false, > + }, > + { > + xtype: 'proxmoxcheckbox', > + name: 'otel-compression', > + fieldLabel: gettext('Enable Compression'), > + inputValue: 1, > + uncheckedValue: 0, > + defaultValue: 1, > + cbind: { > + value: function(get) { > + return get('isCreate') ? 1 : undefined; > + } > + }, > + }, > + ], > + > + > + columnB: [ > + { > + xtype: 'fieldset', > + title: gettext('Advanced JSON Configuration'), > + collapsible: true, > + collapsed: true, > + items: [ > + { > + xtype: 'textarea', > + name: 'headers_advanced', > + fieldLabel: gettext('HTTP Headers (JSON)'), > + labelAlign: 'top', > + emptyText: gettext('{\n "Authorization": "Bearer token",\n "X-Custom-Header": "value"\n}'), > + rows: 4, > + validator: function(value) { > + if (!value || value.trim() === '') { > + return true; > + } > + try { > + JSON.parse(value); > + return true; > + } catch (_e) { > + return gettext('Invalid JSON format'); > + } > + }, > + }, > + { > + xtype: 'textarea', > + name: 'resource_attributes_advanced', > + fieldLabel: gettext('Resource Attributes (JSON)'), > + labelAlign: 'top', > + emptyText: gettext('{\n "environment": "production",\n "datacenter": "dc1",\n "region": "us-east-1"\n}'), > + rows: 4, > + validator: function(value) { > + if (!value || value.trim() === '') { > + return true; > + } > + try { > + JSON.parse(value); > + return true; > + } catch (_e) { > + return gettext('Invalid JSON format'); > + } > + }, > + }, > + ], > + }, > + ], > + }, > + ], > + > + initComponent: function() { > + var me = this; > + var initialLoad = true; > + > + me.callParent(); > + > + // Auto-adjust port when protocol changes (only for user interaction) > + me.on('afterrender', function() { > + var protocolField = me.down('[name=otel-protocol]'); > + var portField = me.down('[name=port]'); > + > + if (protocolField && portField) { > + // Set flag to false after initial load > + me.on('loadrecord', function() { > + setTimeout(function() { > + initialLoad = false; > + }, 100); > + }); > + > + protocolField.on('change', function(field, newValue) { > + // Only auto-adjust port if this is user interaction, not initial load > + if (!initialLoad) { > + if (newValue === 'https') { > + portField.setValue(4318); > + } else { > + portField.setValue(4317); > + } > + } > + }); > + } > + }); > + }, > +}); > \ No newline at end of file _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel