From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 9265A6D0EE for ; Wed, 31 Mar 2021 14:20:32 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 87766F613 for ; Wed, 31 Mar 2021 14:20:32 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id C78BAF607 for ; Wed, 31 Mar 2021 14:20:30 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9435A42F7C for ; Wed, 31 Mar 2021 14:20:30 +0200 (CEST) Date: Wed, 31 Mar 2021 14:17:40 +0200 From: Oguz Bektas To: Stefan Reiter Cc: Proxmox VE development discussion Message-ID: <20210331121740.GA12845@gaia.proxmox.com> Mail-Followup-To: Oguz Bektas , Stefan Reiter , Proxmox VE development discussion References: <20210330122310.1117492-1-o.bektas@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-SPAM-LEVEL: Spam detection results: 0 AWL 1.486 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [disks.pm, diskmanage.pm] Subject: Re: [pve-devel] [RFC v2 storage] smartctl: use json parsing X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Mar 2021 12:20:32 -0000 hi, thank you for the comments! On Wed, Mar 31, 2021 at 02:06:42PM +0200, Stefan Reiter wrote: > Not familiar with smartctl, but I can comment on the code at least :) > > First off, quick smoke test shows me this in the GUI on an NVMe device: > > available_spare : 100 > available_spare_threshold : 10 > controller_busy_time : 4089 > critical_comp_time : 0 > critical_warning : 0 > data_units_read : 10402470 > data_units_written : 142318653 > host_reads : 341998913 > host_writes : 1135808482 > media_errors : 0 > num_err_log_entries : 345 > percentage_used : 11 > power_cycles : 301 > power_on_hours : 1033 > temperature : 44 > temperature_sensors : ARRAY(0x55c92a53a868) > unsafe_shutdowns : 27 > warning_temp_time : 0 > > ...so the "temperate_sensors" are certainly wrong, and also the format is > much less readable then before. ATA output was the same. this field never came up on my machine so i missed it, thanks for pointing out > > On 3/30/21 2:23 PM, Oguz Bektas wrote: > > adapt the smartctl endpoint to run smartctl with the --json or -j flag > > to parse it more reasonably. > > > > additionally add the $format parameter, one can choose between 'text' > > and 'json' outputs in the API (only relevant for nvme devices, since for > > API backwards compatibility we have to keep the 'text' field) > > > > for the unit tests we need to collect the smartctl json outputs from > > now. the current tests cover ssd_smart and nvme_smart cases. > > > > Signed-off-by: Oguz Bektas > > --- > > > > > > NOTE: i deleted the rest of the tests so this builds properly with > > the unit tests passing. but i could still keep the old outputs and place > > the new ones with a new naming scheme, and adapt the > > disklist_test.pm to read those instead. i'm open for ideas. > > though applying this should result in make'able and working package > > > > Please put at least the deletion (or moving/renaming if you prefer that) of > old test cases into a separate patch, if it applies first there still won't > be any breakage, but this one is a bit much to take in with additions and > deletions all mixed together. will do > > > > > [...snip...] > > > > diff --git a/PVE/API2/Disks.pm b/PVE/API2/Disks.pm > > index 33bca76..a453efc 100644 > > --- a/PVE/API2/Disks.pm > > +++ b/PVE/API2/Disks.pm > > @@ -195,6 +195,12 @@ __PACKAGE__->register_method ({ > > description => "If true returns only the health status", > > optional => 1, > > }, > > + format => { > > + description => "Return json or text", > > + type => 'string', > > + enum => ['text', 'json'], > > + optional => 1, > > + }, > > As mentioned off-list, we already have 'pve-output-format', though for > pvesh... not sure how this would help or apply here, but certainly worth > giving a quick thought about. > > > For instance, "pvesh get /nodes/.../disk/smart --disk /dev/nvme0n1 > --output-format text --format json" or "--output-format json-pretty --format > text" give some interesting results ;) that is intended. the pvesh output format is separate from this '$format'. the reason to add this parameter was to keep it API backwards compatible (for nvme devices, we send the 'text' field in the json with the raw text output obtained from smartctl). changing this default would break things for API clients/scripts that depend on this field existing, thats why it was made optional to get the json key pairs in the json for nvme devices. > > > }, > > }, > > returns => { > > @@ -204,6 +210,7 @@ __PACKAGE__->register_method ({ > > type => { type => 'string', optional => 1 }, > > attributes => { type => 'array', optional => 1}, > > text => { type => 'string', optional => 1 }, > > + json => { type => 'string', optional => 1 }, > > }, > > }, > > code => sub { > > @@ -211,7 +218,7 @@ __PACKAGE__->register_method ({ > > my $disk = PVE::Diskmanage::verify_blockdev_path($param->{disk}); > > - my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}); > > + my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}, $param->{format}); > > $result->{health} = 'UNKNOWN' if !defined $result->{health}; > > $result = { health => $result->{health} } if $param->{healthonly}; > > diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm > > index 64bb813..5ed140f 100644 > > --- a/PVE/Diskmanage.pm > > +++ b/PVE/Diskmanage.pm > > @@ -81,11 +81,12 @@ sub disk_is_used { > > } > > sub get_smart_data { > > - my ($disk, $healthonly) = @_; > > + my ($disk, $healthonly, $format) = @_; > > assert_blockdev($disk); > > my $smartdata = {}; > > my $type; > > + $format = 'text' if !defined($format); > > $format //= 'text'; ok > > > my $returncode = 0; > > @@ -95,69 +96,69 @@ sub get_smart_data { > > or die "failed to get nvme controller device for $disk\n"); > > } > > - my $cmd = [$SMARTCTL, '-H']; > > - push @$cmd, '-A', '-f', 'brief' if !$healthonly; > > + my $cmd = [$SMARTCTL, '-j', '-H']; > > + if (!$healthonly) { > > + push @$cmd, '-Afbrief'; > > + } > > why change the 'push' here? ah, i was experimenting a bit and i guess this was leftover. i'd still write it as one array item in the v3 (unit test takes the disk arg by array element index): push @$cmd, '-Afbrief' if !$healthonly; > > > push @$cmd, $disk; > > + my $smart_result = ''; > > eval { > > - $returncode = run_command($cmd, noerr => 1, outfunc => sub{ > > - my ($line) = @_; > > - > > -# ATA SMART attributes, e.g.: > > -# ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > > -# 1 Raw_Read_Error_Rate POSR-K 100 100 000 - 0 > > -# > > -# SAS and NVME disks, e.g.: > > -# Data Units Written: 5,584,952 [2.85 TB] > > -# Accumulated start-stop cycles: 34 > > - > > - if (defined($type) && $type eq 'ata' && $line =~ m/^([ \d]{2}\d)\s+(\S+)\s+(\S{6})\s+(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$/) { > > - my $entry = {}; > > - > > - $entry->{name} = $2 if defined $2; > > - $entry->{flags} = $3 if defined $3; > > - # the +0 makes a number out of the strings > > - $entry->{value} = $4+0 if defined $4; > > - $entry->{worst} = $5+0 if defined $5; > > - # some disks report the default threshold as --- instead of 000 > > - if (defined($6) && $6 eq '---') { > > - $entry->{threshold} = 0; > > - } else { > > - $entry->{threshold} = $6+0 if defined $6; > > - } > > - $entry->{fail} = $7 if defined $7; > > - $entry->{raw} = $8 if defined $8; > > - $entry->{id} = $1 if defined $1; > > - push @{$smartdata->{attributes}}, $entry; > > - } elsif ($line =~ m/(?:Health Status|self\-assessment test result): (.*)$/ ) { > > - $smartdata->{health} = $1; > > - } elsif ($line =~ m/Vendor Specific SMART Attributes with Thresholds:/) { > > - $type = 'ata'; > > - delete $smartdata->{text}; > > - } elsif ($line =~ m/=== START OF (READ )?SMART DATA SECTION ===/) { > > - $type = 'text'; > > - } elsif (defined($type) && $type eq 'text') { > > - $smartdata->{text} = '' if !defined $smartdata->{text}; > > - $smartdata->{text} .= "$line\n"; > > - # extract wearout from nvme/sas text, allow for decimal values > > - if ($line =~ m/Percentage Used(?: endurance indicator)?:\s*(\d+(?:\.\d+)?)\%/i) { > > - $smartdata->{wearout} = 100 - $1; > > - } > > - } elsif ($line =~ m/SMART Disabled/) { > > - $smartdata->{health} = "SMART Disabled"; > > - } > > - }); > > + run_command($cmd, noerr => 1, outfunc => sub { $smart_result .= shift }); > > }; > > my $err = $@; > > - # bit 0 and 1 mark an severe smartctl error > > - # all others are for disk status, so ignore them > > - # see smartctl(8) > > - if ((defined($returncode) && ($returncode & 0b00000011)) || $err) { > > - die "Error getting S.M.A.R.T. data: Exit code: $returncode\n"; > > + my $json_result = decode_json($smart_result); > > + > > + my $smart_health = $json_result->{smart_status}->{passed}; > > + if (JSON::is_bool($smart_health)) { > > + $smart_health = ($smart_health)? "PASSED" : "FAILED"; > > unnecessary brackets? i guess > > > + } else { > > + $smart_health = 'UNKNOWN'; > > } > > - $smartdata->{type} = $type; > > + $smartdata->{health} = $smart_health; > > + > > + $type = $json_result->{device}->{type}; > > + if ($type eq 'nvme') { > > + $smartdata->{type} = $format; # text or json > > + > > + my $wearout = 100.0 - $json_result->{nvme_smart_health_information_log}->{percentage_used} if !$healthonly; > > too long even for my liking, needs a line break somewhere :) hmm yeah, probably good idea to make my $nvme_info = $json_result->{nvme_smart_health_information_log}; and then $nvme_info->{percentage_used} > > > + > > + foreach my $k (sort keys %{$json_result->{nvme_smart_health_information_log}}) { > > + my $v = $json_result->{nvme_smart_health_information_log}->{$k}; > > + if ($format eq 'text') { > > + $smartdata->{text} .= "$k : $v\n"; > > Previously, the "key : value" format would display human-readable info, i.e. > capitalized key names, spacing around ':' and units for the value. > With this, it simply prints out the json key/value. > > I suppose this could also be fixed by adapting the GUI, but on the CLI it > would still look different. true, the data is the same though. my original plan was to make the GUI part for nvme devices the same as ATA devices, e.g. with a table for the attributes/properties not sure what's the best way to go about it > > > + } else { > > + $smartdata->{properties}->{$k} = $v; > > + } > > + if ($k eq 'percentage_used') { > > + $smartdata->{wearout} = $wearout; > > + } > > + } > > + delete $smartdata->{attributes}; # this is for ata disks only > > + } else { > > + $smartdata->{type} = 'ata'; > > + foreach my $attribute (@{$json_result->{ata_smart_attributes}->{table}}) { > > + # we need to override or delete some fields to fit expected json schema of unit tests > > while true, I feel like the reason isn't really the unit tests but rather > consumers (i.e. the GUI) depending on it ;) > > > + my $flags_string = $attribute->{flags}->{string}; > > + $flags_string =~ s/\s+$//; > > + $attribute->{flags} = $flags_string; > > + $attribute->{raw} = $attribute->{raw}->{string}; > > + $attribute->{threshold} = delete $attribute->{thresh}; > > + $attribute->{fail} = ${attribute}->{when_failed} eq "" ? "-" : ${attribute}->{when_failed}; > > + delete ${attribute}->{when_failed}; > > + > > + push @{$smartdata->{attributes}}, $attribute; > > + } > > + my @sorted_attributes = sort { $a->{id} <=> $b->{id} } @{$smartdata->{attributes}}; > > + @{$smartdata->{attributes}} = @sorted_attributes; > > + } > > + > > + my $exit_status = $json_result->{smartctl}->{exit_status}; > > + if ($exit_status ne 0) { > > + die "Error getting S.M.A.R.T. data: Exit code: $exit_status\n"; > > + } > > maybe do this before parsing? hmm. maybe getting the exit_status from json wasn't a good idea. > > > return $smartdata; > > }