From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <s.reiter@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id BE1AC6D0DE
 for <pve-devel@lists.proxmox.com>; Wed, 31 Mar 2021 14:06:45 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id ACF35F362
 for <pve-devel@lists.proxmox.com>; Wed, 31 Mar 2021 14:06:45 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id B277DF354
 for <pve-devel@lists.proxmox.com>; Wed, 31 Mar 2021 14:06:44 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7591A41DE8
 for <pve-devel@lists.proxmox.com>; Wed, 31 Mar 2021 14:06:44 +0200 (CEST)
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Oguz Bektas <o.bektas@proxmox.com>
References: <20210330122310.1117492-1-o.bektas@proxmox.com>
From: Stefan Reiter <s.reiter@proxmox.com>
Message-ID: <f52d5092-5b67-ad0c-b260-719829bcb9ec@proxmox.com>
Date: Wed, 31 Mar 2021 14:06:42 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.9.0
MIME-Version: 1.0
In-Reply-To: <20210330122310.1117492-1-o.bektas@proxmox.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.017 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [disks.pm, diskmanage.pm]
Subject: Re: [pve-devel] [RFC v2 storage] smartctl: use json parsing
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 31 Mar 2021 12:06:45 -0000

Not familiar with smartctl, but I can comment on the code at least :)

First off, quick smoke test shows me this in the GUI on an NVMe device:

available_spare : 100
available_spare_threshold : 10
controller_busy_time : 4089
critical_comp_time : 0
critical_warning : 0
data_units_read : 10402470
data_units_written : 142318653
host_reads : 341998913
host_writes : 1135808482
media_errors : 0
num_err_log_entries : 345
percentage_used : 11
power_cycles : 301
power_on_hours : 1033
temperature : 44
temperature_sensors : ARRAY(0x55c92a53a868)
unsafe_shutdowns : 27
warning_temp_time : 0

...so the "temperate_sensors" are certainly wrong, and also the format 
is much less readable then before. ATA output was the same.

On 3/30/21 2:23 PM, Oguz Bektas wrote:
> adapt the smartctl endpoint to run smartctl with the --json or -j flag
> to parse it more reasonably.
> 
> additionally add the $format parameter, one can choose between 'text'
> and 'json' outputs in the API (only relevant for nvme devices, since for
> API backwards compatibility we have to keep the 'text' field)
> 
> for the unit tests we need to collect the smartctl json outputs from
> now. the current tests cover ssd_smart and nvme_smart cases.
> 
> Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
> ---
> 
> 
> NOTE: i deleted the rest of the tests so this builds properly with
> the unit tests passing. but i could still keep the old outputs and place
> the new ones with a new naming scheme, and adapt the
> disklist_test.pm to read those instead. i'm open for ideas.
> though applying this should result in make'able and working package
> 

Please put at least the deletion (or moving/renaming if you prefer that) 
of old test cases into a separate patch, if it applies first there still 
won't be any breakage, but this one is a bit much to take in with 
additions and deletions all mixed together.

> 
> [...snip...]
> 
> diff --git a/PVE/API2/Disks.pm b/PVE/API2/Disks.pm
> index 33bca76..a453efc 100644
> --- a/PVE/API2/Disks.pm
> +++ b/PVE/API2/Disks.pm
> @@ -195,6 +195,12 @@ __PACKAGE__->register_method ({
>   		description => "If true returns only the health status",
>   		optional => 1,
>   	    },
> +	    format => {
> +		description => "Return json or text",
> +		type => 'string',
> +		enum => ['text', 'json'],
> +		optional => 1,
> +	    },

As mentioned off-list, we already have 'pve-output-format', though for 
pvesh... not sure how this would help or apply here, but certainly worth 
giving a quick thought about.


For instance, "pvesh get /nodes/.../disk/smart --disk /dev/nvme0n1 
--output-format text --format json" or "--output-format json-pretty 
--format text" give some interesting results ;)

>   	},
>       },
>       returns => {
> @@ -204,6 +210,7 @@ __PACKAGE__->register_method ({
>   	    type => { type => 'string', optional => 1 },
>   	    attributes => { type => 'array', optional => 1},
>   	    text => { type => 'string', optional => 1 },
> +	    json => { type => 'string', optional => 1 },
>   	},
>       },
>       code => sub {
> @@ -211,7 +218,7 @@ __PACKAGE__->register_method ({
>   
>   	my $disk = PVE::Diskmanage::verify_blockdev_path($param->{disk});
>   
> -	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly});
> +	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}, $param->{format});
>   
>   	$result->{health} = 'UNKNOWN' if !defined $result->{health};
>   	$result = { health => $result->{health} } if $param->{healthonly};
> diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
> index 64bb813..5ed140f 100644
> --- a/PVE/Diskmanage.pm
> +++ b/PVE/Diskmanage.pm
> @@ -81,11 +81,12 @@ sub disk_is_used {
>   }
>   
>   sub get_smart_data {
> -    my ($disk, $healthonly) = @_;
> +    my ($disk, $healthonly, $format) = @_;
>   
>       assert_blockdev($disk);
>       my $smartdata = {};
>       my $type;
> +    $format = 'text' if !defined($format);

$format //= 'text';

>   
>       my $returncode = 0;
>   
> @@ -95,69 +96,69 @@ sub get_smart_data {
>   	    or die "failed to get nvme controller device for $disk\n");
>       }
>   
> -    my $cmd = [$SMARTCTL, '-H'];
> -    push @$cmd, '-A', '-f', 'brief' if !$healthonly;
> +    my $cmd = [$SMARTCTL, '-j', '-H'];
> +    if (!$healthonly) {
> +	push @$cmd, '-Afbrief';
> +    }

why change the 'push' here?

>       push @$cmd, $disk;
>   
> +    my $smart_result = '';
>       eval {
> -	$returncode = run_command($cmd, noerr => 1, outfunc => sub{
> -	    my ($line) = @_;
> -
> -# ATA SMART attributes, e.g.:
> -# ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
> -#   1 Raw_Read_Error_Rate     POSR-K   100   100   000    -    0
> -#
> -# SAS and NVME disks, e.g.:
> -# Data Units Written:                 5,584,952 [2.85 TB]
> -# Accumulated start-stop cycles:  34
> -
> -	    if (defined($type) && $type eq 'ata' && $line =~ m/^([ \d]{2}\d)\s+(\S+)\s+(\S{6})\s+(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$/) {
> -		my $entry = {};
> -
> -		$entry->{name} = $2 if defined $2;
> -		$entry->{flags} = $3 if defined $3;
> -		# the +0 makes a number out of the strings
> -		$entry->{value} = $4+0 if defined $4;
> -		$entry->{worst} = $5+0 if defined $5;
> -		# some disks report the default threshold as --- instead of 000
> -		if (defined($6) && $6 eq '---') {
> -		    $entry->{threshold} = 0;
> -		} else {
> -		    $entry->{threshold} = $6+0 if defined $6;
> -		}
> -		$entry->{fail} = $7 if defined $7;
> -		$entry->{raw} = $8 if defined $8;
> -		$entry->{id} = $1 if defined $1;
> -		push @{$smartdata->{attributes}}, $entry;
> -	    } elsif ($line =~ m/(?:Health Status|self\-assessment test result): (.*)$/ ) {
> -		$smartdata->{health} = $1;
> -	    } elsif ($line =~ m/Vendor Specific SMART Attributes with Thresholds:/) {
> -		$type = 'ata';
> -		delete $smartdata->{text};
> -	    } elsif ($line =~ m/=== START OF (READ )?SMART DATA SECTION ===/) {
> -		$type = 'text';
> -	    } elsif (defined($type) && $type eq 'text') {
> -		$smartdata->{text} = '' if !defined $smartdata->{text};
> -		$smartdata->{text} .= "$line\n";
> -		# extract wearout from nvme/sas text, allow for decimal values
> -		if ($line =~ m/Percentage Used(?: endurance indicator)?:\s*(\d+(?:\.\d+)?)\%/i) {
> -		    $smartdata->{wearout} = 100 - $1;
> -		}
> -	    } elsif ($line =~ m/SMART Disabled/) {
> -		$smartdata->{health} = "SMART Disabled";
> -	    }
> -	});
> +	run_command($cmd, noerr => 1, outfunc => sub { $smart_result .= shift });
>       };
>       my $err = $@;
>   
> -    # bit 0 and 1 mark an severe smartctl error
> -    # all others are for disk status, so ignore them
> -    # see smartctl(8)
> -    if ((defined($returncode) && ($returncode & 0b00000011)) || $err) {
> -	die "Error getting S.M.A.R.T. data: Exit code: $returncode\n";
> +    my $json_result = decode_json($smart_result);
> +
> +    my $smart_health = $json_result->{smart_status}->{passed};
> +    if (JSON::is_bool($smart_health)) {
> +	$smart_health = ($smart_health)? "PASSED" : "FAILED";

unnecessary brackets?

> +    } else {
> +	$smart_health = 'UNKNOWN';
>       }
>   
> -    $smartdata->{type} = $type;
> +    $smartdata->{health} = $smart_health;
> +
> +    $type = $json_result->{device}->{type};
> +    if ($type eq 'nvme') {
> +	$smartdata->{type} = $format; # text or json
> +
> +	my $wearout = 100.0 - $json_result->{nvme_smart_health_information_log}->{percentage_used} if !$healthonly;

too long even for my liking, needs a line break somewhere :)

> +
> +	foreach my $k (sort keys %{$json_result->{nvme_smart_health_information_log}}) {
> +	    my $v = $json_result->{nvme_smart_health_information_log}->{$k};
> +	    if ($format eq 'text') {
> +		$smartdata->{text} .= "$k : $v\n";

Previously, the "key : value" format would display human-readable info, 
i.e. capitalized key names, spacing around ':' and units for the value.
With this, it simply prints out the json key/value.

I suppose this could also be fixed by adapting the GUI, but on the CLI 
it would still look different.

> +	    } else {
> +		$smartdata->{properties}->{$k} = $v;
> +	    }
> +	    if ($k eq 'percentage_used') {
> +		$smartdata->{wearout} = $wearout;
> +	    }
> +	}
> +	delete $smartdata->{attributes}; # this is for ata disks only
> +    } else {
> +	$smartdata->{type} = 'ata';
> +	foreach my $attribute (@{$json_result->{ata_smart_attributes}->{table}}) {
> +	    # we need to override or delete some fields to fit expected json schema of unit tests

while true, I feel like the reason isn't really the unit tests but 
rather consumers (i.e. the GUI) depending on it ;)

> +	    my $flags_string = $attribute->{flags}->{string};
> +	    $flags_string =~ s/\s+$//;
> +	    $attribute->{flags} = $flags_string;
> +	    $attribute->{raw} = $attribute->{raw}->{string};
> +	    $attribute->{threshold} = delete $attribute->{thresh};
> +	    $attribute->{fail} = ${attribute}->{when_failed} eq "" ? "-" : ${attribute}->{when_failed};
> +	    delete ${attribute}->{when_failed};
> +
> +	    push @{$smartdata->{attributes}}, $attribute;
> +	}
> +	my @sorted_attributes = sort { $a->{id} <=> $b->{id} } @{$smartdata->{attributes}};
> +	@{$smartdata->{attributes}} = @sorted_attributes;
> +    }
> +
> +    my $exit_status = $json_result->{smartctl}->{exit_status};
> +    if ($exit_status ne 0) {
> +	die "Error getting S.M.A.R.T. data: Exit code: $exit_status\n";
> +    }

maybe do this before parsing?

>   
>       return $smartdata;
>   }