From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id BE1AC6D0DE for ; Wed, 31 Mar 2021 14:06:45 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id ACF35F362 for ; Wed, 31 Mar 2021 14:06:45 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id B277DF354 for ; Wed, 31 Mar 2021 14:06:44 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7591A41DE8 for ; Wed, 31 Mar 2021 14:06:44 +0200 (CEST) To: Proxmox VE development discussion , Oguz Bektas References: <20210330122310.1117492-1-o.bektas@proxmox.com> From: Stefan Reiter Message-ID: Date: Wed, 31 Mar 2021 14:06:42 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20210330122310.1117492-1-o.bektas@proxmox.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.017 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [disks.pm, diskmanage.pm] Subject: Re: [pve-devel] [RFC v2 storage] smartctl: use json parsing X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Mar 2021 12:06:45 -0000 Not familiar with smartctl, but I can comment on the code at least :) First off, quick smoke test shows me this in the GUI on an NVMe device: available_spare : 100 available_spare_threshold : 10 controller_busy_time : 4089 critical_comp_time : 0 critical_warning : 0 data_units_read : 10402470 data_units_written : 142318653 host_reads : 341998913 host_writes : 1135808482 media_errors : 0 num_err_log_entries : 345 percentage_used : 11 power_cycles : 301 power_on_hours : 1033 temperature : 44 temperature_sensors : ARRAY(0x55c92a53a868) unsafe_shutdowns : 27 warning_temp_time : 0 ...so the "temperate_sensors" are certainly wrong, and also the format is much less readable then before. ATA output was the same. On 3/30/21 2:23 PM, Oguz Bektas wrote: > adapt the smartctl endpoint to run smartctl with the --json or -j flag > to parse it more reasonably. > > additionally add the $format parameter, one can choose between 'text' > and 'json' outputs in the API (only relevant for nvme devices, since for > API backwards compatibility we have to keep the 'text' field) > > for the unit tests we need to collect the smartctl json outputs from > now. the current tests cover ssd_smart and nvme_smart cases. > > Signed-off-by: Oguz Bektas > --- > > > NOTE: i deleted the rest of the tests so this builds properly with > the unit tests passing. but i could still keep the old outputs and place > the new ones with a new naming scheme, and adapt the > disklist_test.pm to read those instead. i'm open for ideas. > though applying this should result in make'able and working package > Please put at least the deletion (or moving/renaming if you prefer that) of old test cases into a separate patch, if it applies first there still won't be any breakage, but this one is a bit much to take in with additions and deletions all mixed together. > > [...snip...] > > diff --git a/PVE/API2/Disks.pm b/PVE/API2/Disks.pm > index 33bca76..a453efc 100644 > --- a/PVE/API2/Disks.pm > +++ b/PVE/API2/Disks.pm > @@ -195,6 +195,12 @@ __PACKAGE__->register_method ({ > description => "If true returns only the health status", > optional => 1, > }, > + format => { > + description => "Return json or text", > + type => 'string', > + enum => ['text', 'json'], > + optional => 1, > + }, As mentioned off-list, we already have 'pve-output-format', though for pvesh... not sure how this would help or apply here, but certainly worth giving a quick thought about. For instance, "pvesh get /nodes/.../disk/smart --disk /dev/nvme0n1 --output-format text --format json" or "--output-format json-pretty --format text" give some interesting results ;) > }, > }, > returns => { > @@ -204,6 +210,7 @@ __PACKAGE__->register_method ({ > type => { type => 'string', optional => 1 }, > attributes => { type => 'array', optional => 1}, > text => { type => 'string', optional => 1 }, > + json => { type => 'string', optional => 1 }, > }, > }, > code => sub { > @@ -211,7 +218,7 @@ __PACKAGE__->register_method ({ > > my $disk = PVE::Diskmanage::verify_blockdev_path($param->{disk}); > > - my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}); > + my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}, $param->{format}); > > $result->{health} = 'UNKNOWN' if !defined $result->{health}; > $result = { health => $result->{health} } if $param->{healthonly}; > diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm > index 64bb813..5ed140f 100644 > --- a/PVE/Diskmanage.pm > +++ b/PVE/Diskmanage.pm > @@ -81,11 +81,12 @@ sub disk_is_used { > } > > sub get_smart_data { > - my ($disk, $healthonly) = @_; > + my ($disk, $healthonly, $format) = @_; > > assert_blockdev($disk); > my $smartdata = {}; > my $type; > + $format = 'text' if !defined($format); $format //= 'text'; > > my $returncode = 0; > > @@ -95,69 +96,69 @@ sub get_smart_data { > or die "failed to get nvme controller device for $disk\n"); > } > > - my $cmd = [$SMARTCTL, '-H']; > - push @$cmd, '-A', '-f', 'brief' if !$healthonly; > + my $cmd = [$SMARTCTL, '-j', '-H']; > + if (!$healthonly) { > + push @$cmd, '-Afbrief'; > + } why change the 'push' here? > push @$cmd, $disk; > > + my $smart_result = ''; > eval { > - $returncode = run_command($cmd, noerr => 1, outfunc => sub{ > - my ($line) = @_; > - > -# ATA SMART attributes, e.g.: > -# ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > -# 1 Raw_Read_Error_Rate POSR-K 100 100 000 - 0 > -# > -# SAS and NVME disks, e.g.: > -# Data Units Written: 5,584,952 [2.85 TB] > -# Accumulated start-stop cycles: 34 > - > - if (defined($type) && $type eq 'ata' && $line =~ m/^([ \d]{2}\d)\s+(\S+)\s+(\S{6})\s+(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$/) { > - my $entry = {}; > - > - $entry->{name} = $2 if defined $2; > - $entry->{flags} = $3 if defined $3; > - # the +0 makes a number out of the strings > - $entry->{value} = $4+0 if defined $4; > - $entry->{worst} = $5+0 if defined $5; > - # some disks report the default threshold as --- instead of 000 > - if (defined($6) && $6 eq '---') { > - $entry->{threshold} = 0; > - } else { > - $entry->{threshold} = $6+0 if defined $6; > - } > - $entry->{fail} = $7 if defined $7; > - $entry->{raw} = $8 if defined $8; > - $entry->{id} = $1 if defined $1; > - push @{$smartdata->{attributes}}, $entry; > - } elsif ($line =~ m/(?:Health Status|self\-assessment test result): (.*)$/ ) { > - $smartdata->{health} = $1; > - } elsif ($line =~ m/Vendor Specific SMART Attributes with Thresholds:/) { > - $type = 'ata'; > - delete $smartdata->{text}; > - } elsif ($line =~ m/=== START OF (READ )?SMART DATA SECTION ===/) { > - $type = 'text'; > - } elsif (defined($type) && $type eq 'text') { > - $smartdata->{text} = '' if !defined $smartdata->{text}; > - $smartdata->{text} .= "$line\n"; > - # extract wearout from nvme/sas text, allow for decimal values > - if ($line =~ m/Percentage Used(?: endurance indicator)?:\s*(\d+(?:\.\d+)?)\%/i) { > - $smartdata->{wearout} = 100 - $1; > - } > - } elsif ($line =~ m/SMART Disabled/) { > - $smartdata->{health} = "SMART Disabled"; > - } > - }); > + run_command($cmd, noerr => 1, outfunc => sub { $smart_result .= shift }); > }; > my $err = $@; > > - # bit 0 and 1 mark an severe smartctl error > - # all others are for disk status, so ignore them > - # see smartctl(8) > - if ((defined($returncode) && ($returncode & 0b00000011)) || $err) { > - die "Error getting S.M.A.R.T. data: Exit code: $returncode\n"; > + my $json_result = decode_json($smart_result); > + > + my $smart_health = $json_result->{smart_status}->{passed}; > + if (JSON::is_bool($smart_health)) { > + $smart_health = ($smart_health)? "PASSED" : "FAILED"; unnecessary brackets? > + } else { > + $smart_health = 'UNKNOWN'; > } > > - $smartdata->{type} = $type; > + $smartdata->{health} = $smart_health; > + > + $type = $json_result->{device}->{type}; > + if ($type eq 'nvme') { > + $smartdata->{type} = $format; # text or json > + > + my $wearout = 100.0 - $json_result->{nvme_smart_health_information_log}->{percentage_used} if !$healthonly; too long even for my liking, needs a line break somewhere :) > + > + foreach my $k (sort keys %{$json_result->{nvme_smart_health_information_log}}) { > + my $v = $json_result->{nvme_smart_health_information_log}->{$k}; > + if ($format eq 'text') { > + $smartdata->{text} .= "$k : $v\n"; Previously, the "key : value" format would display human-readable info, i.e. capitalized key names, spacing around ':' and units for the value. With this, it simply prints out the json key/value. I suppose this could also be fixed by adapting the GUI, but on the CLI it would still look different. > + } else { > + $smartdata->{properties}->{$k} = $v; > + } > + if ($k eq 'percentage_used') { > + $smartdata->{wearout} = $wearout; > + } > + } > + delete $smartdata->{attributes}; # this is for ata disks only > + } else { > + $smartdata->{type} = 'ata'; > + foreach my $attribute (@{$json_result->{ata_smart_attributes}->{table}}) { > + # we need to override or delete some fields to fit expected json schema of unit tests while true, I feel like the reason isn't really the unit tests but rather consumers (i.e. the GUI) depending on it ;) > + my $flags_string = $attribute->{flags}->{string}; > + $flags_string =~ s/\s+$//; > + $attribute->{flags} = $flags_string; > + $attribute->{raw} = $attribute->{raw}->{string}; > + $attribute->{threshold} = delete $attribute->{thresh}; > + $attribute->{fail} = ${attribute}->{when_failed} eq "" ? "-" : ${attribute}->{when_failed}; > + delete ${attribute}->{when_failed}; > + > + push @{$smartdata->{attributes}}, $attribute; > + } > + my @sorted_attributes = sort { $a->{id} <=> $b->{id} } @{$smartdata->{attributes}}; > + @{$smartdata->{attributes}} = @sorted_attributes; > + } > + > + my $exit_status = $json_result->{smartctl}->{exit_status}; > + if ($exit_status ne 0) { > + die "Error getting S.M.A.R.T. data: Exit code: $exit_status\n"; > + } maybe do this before parsing? > > return $smartdata; > }