Re: [pve-devel] [RFC v2 storage] smartctl: use json parsing

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

From: Stefan Reiter <s.reiter@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Oguz Bektas <o.bektas@proxmox.com>
Subject: Re: [pve-devel] [RFC v2 storage] smartctl: use json parsing
Date: Wed, 31 Mar 2021 14:06:42 +0200	[thread overview]
Message-ID: <f52d5092-5b67-ad0c-b260-719829bcb9ec@proxmox.com> (raw)
In-Reply-To: <20210330122310.1117492-1-o.bektas@proxmox.com>

Not familiar with smartctl, but I can comment on the code at least :)

First off, quick smoke test shows me this in the GUI on an NVMe device:

available_spare : 100
available_spare_threshold : 10
controller_busy_time : 4089
critical_comp_time : 0
critical_warning : 0
data_units_read : 10402470
data_units_written : 142318653
host_reads : 341998913
host_writes : 1135808482
media_errors : 0
num_err_log_entries : 345
percentage_used : 11
power_cycles : 301
power_on_hours : 1033
temperature : 44
temperature_sensors : ARRAY(0x55c92a53a868)
unsafe_shutdowns : 27
warning_temp_time : 0

...so the "temperate_sensors" are certainly wrong, and also the format 
is much less readable then before. ATA output was the same.

On 3/30/21 2:23 PM, Oguz Bektas wrote:
> adapt the smartctl endpoint to run smartctl with the --json or -j flag
> to parse it more reasonably.
> 
> additionally add the $format parameter, one can choose between 'text'
> and 'json' outputs in the API (only relevant for nvme devices, since for
> API backwards compatibility we have to keep the 'text' field)
> 
> for the unit tests we need to collect the smartctl json outputs from
> now. the current tests cover ssd_smart and nvme_smart cases.
> 
> Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
> ---
> 
> 
> NOTE: i deleted the rest of the tests so this builds properly with
> the unit tests passing. but i could still keep the old outputs and place
> the new ones with a new naming scheme, and adapt the
> disklist_test.pm to read those instead. i'm open for ideas.
> though applying this should result in make'able and working package
> 

Please put at least the deletion (or moving/renaming if you prefer that) 
of old test cases into a separate patch, if it applies first there still 
won't be any breakage, but this one is a bit much to take in with 
additions and deletions all mixed together.

> 
> [...snip...]
> 
> diff --git a/PVE/API2/Disks.pm b/PVE/API2/Disks.pm
> index 33bca76..a453efc 100644
> --- a/PVE/API2/Disks.pm
> +++ b/PVE/API2/Disks.pm
> @@ -195,6 +195,12 @@ __PACKAGE__->register_method ({
>   		description => "If true returns only the health status",
>   		optional => 1,
>   	    },
> +	    format => {
> +		description => "Return json or text",
> +		type => 'string',
> +		enum => ['text', 'json'],
> +		optional => 1,
> +	    },

As mentioned off-list, we already have 'pve-output-format', though for 
pvesh... not sure how this would help or apply here, but certainly worth 
giving a quick thought about.


For instance, "pvesh get /nodes/.../disk/smart --disk /dev/nvme0n1 
--output-format text --format json" or "--output-format json-pretty 
--format text" give some interesting results ;)

>   	},
>       },
>       returns => {
> @@ -204,6 +210,7 @@ __PACKAGE__->register_method ({
>   	    type => { type => 'string', optional => 1 },
>   	    attributes => { type => 'array', optional => 1},
>   	    text => { type => 'string', optional => 1 },
> +	    json => { type => 'string', optional => 1 },
>   	},
>       },
>       code => sub {
> @@ -211,7 +218,7 @@ __PACKAGE__->register_method ({
>   
>   	my $disk = PVE::Diskmanage::verify_blockdev_path($param->{disk});
>   
> -	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly});
> +	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}, $param->{format});
>   
>   	$result->{health} = 'UNKNOWN' if !defined $result->{health};
>   	$result = { health => $result->{health} } if $param->{healthonly};
> diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
> index 64bb813..5ed140f 100644
> --- a/PVE/Diskmanage.pm
> +++ b/PVE/Diskmanage.pm
> @@ -81,11 +81,12 @@ sub disk_is_used {
>   }
>   
>   sub get_smart_data {
> -    my ($disk, $healthonly) = @_;
> +    my ($disk, $healthonly, $format) = @_;
>   
>       assert_blockdev($disk);
>       my $smartdata = {};
>       my $type;
> +    $format = 'text' if !defined($format);

$format //= 'text';

>   
>       my $returncode = 0;
>   
> @@ -95,69 +96,69 @@ sub get_smart_data {
>   	    or die "failed to get nvme controller device for $disk\n");
>       }
>   
> -    my $cmd = [$SMARTCTL, '-H'];
> -    push @$cmd, '-A', '-f', 'brief' if !$healthonly;
> +    my $cmd = [$SMARTCTL, '-j', '-H'];
> +    if (!$healthonly) {
> +	push @$cmd, '-Afbrief';
> +    }

why change the 'push' here?

>       push @$cmd, $disk;
>   
> +    my $smart_result = '';
>       eval {
> -	$returncode = run_command($cmd, noerr => 1, outfunc => sub{
> -	    my ($line) = @_;
> -
> -# ATA SMART attributes, e.g.:
> -# ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
> -#   1 Raw_Read_Error_Rate     POSR-K   100   100   000    -    0
> -#
> -# SAS and NVME disks, e.g.:
> -# Data Units Written:                 5,584,952 [2.85 TB]
> -# Accumulated start-stop cycles:  34
> -
> -	    if (defined($type) && $type eq 'ata' && $line =~ m/^([ \d]{2}\d)\s+(\S+)\s+(\S{6})\s+(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$/) {
> -		my $entry = {};
> -
> -		$entry->{name} = $2 if defined $2;
> -		$entry->{flags} = $3 if defined $3;
> -		# the +0 makes a number out of the strings
> -		$entry->{value} = $4+0 if defined $4;
> -		$entry->{worst} = $5+0 if defined $5;
> -		# some disks report the default threshold as --- instead of 000
> -		if (defined($6) && $6 eq '---') {
> -		    $entry->{threshold} = 0;
> -		} else {
> -		    $entry->{threshold} = $6+0 if defined $6;
> -		}
> -		$entry->{fail} = $7 if defined $7;
> -		$entry->{raw} = $8 if defined $8;
> -		$entry->{id} = $1 if defined $1;
> -		push @{$smartdata->{attributes}}, $entry;
> -	    } elsif ($line =~ m/(?:Health Status|self\-assessment test result): (.*)$/ ) {
> -		$smartdata->{health} = $1;
> -	    } elsif ($line =~ m/Vendor Specific SMART Attributes with Thresholds:/) {
> -		$type = 'ata';
> -		delete $smartdata->{text};
> -	    } elsif ($line =~ m/=== START OF (READ )?SMART DATA SECTION ===/) {
> -		$type = 'text';
> -	    } elsif (defined($type) && $type eq 'text') {
> -		$smartdata->{text} = '' if !defined $smartdata->{text};
> -		$smartdata->{text} .= "$line\n";
> -		# extract wearout from nvme/sas text, allow for decimal values
> -		if ($line =~ m/Percentage Used(?: endurance indicator)?:\s*(\d+(?:\.\d+)?)\%/i) {
> -		    $smartdata->{wearout} = 100 - $1;
> -		}
> -	    } elsif ($line =~ m/SMART Disabled/) {
> -		$smartdata->{health} = "SMART Disabled";
> -	    }
> -	});
> +	run_command($cmd, noerr => 1, outfunc => sub { $smart_result .= shift });
>       };
>       my $err = $@;
>   
> -    # bit 0 and 1 mark an severe smartctl error
> -    # all others are for disk status, so ignore them
> -    # see smartctl(8)
> -    if ((defined($returncode) && ($returncode & 0b00000011)) || $err) {
> -	die "Error getting S.M.A.R.T. data: Exit code: $returncode\n";
> +    my $json_result = decode_json($smart_result);
> +
> +    my $smart_health = $json_result->{smart_status}->{passed};
> +    if (JSON::is_bool($smart_health)) {
> +	$smart_health = ($smart_health)? "PASSED" : "FAILED";

unnecessary brackets?

> +    } else {
> +	$smart_health = 'UNKNOWN';
>       }
>   
> -    $smartdata->{type} = $type;
> +    $smartdata->{health} = $smart_health;
> +
> +    $type = $json_result->{device}->{type};
> +    if ($type eq 'nvme') {
> +	$smartdata->{type} = $format; # text or json
> +
> +	my $wearout = 100.0 - $json_result->{nvme_smart_health_information_log}->{percentage_used} if !$healthonly;

too long even for my liking, needs a line break somewhere :)

> +
> +	foreach my $k (sort keys %{$json_result->{nvme_smart_health_information_log}}) {
> +	    my $v = $json_result->{nvme_smart_health_information_log}->{$k};
> +	    if ($format eq 'text') {
> +		$smartdata->{text} .= "$k : $v\n";

Previously, the "key : value" format would display human-readable info, 
i.e. capitalized key names, spacing around ':' and units for the value.
With this, it simply prints out the json key/value.

I suppose this could also be fixed by adapting the GUI, but on the CLI 
it would still look different.

> +	    } else {
> +		$smartdata->{properties}->{$k} = $v;
> +	    }
> +	    if ($k eq 'percentage_used') {
> +		$smartdata->{wearout} = $wearout;
> +	    }
> +	}
> +	delete $smartdata->{attributes}; # this is for ata disks only
> +    } else {
> +	$smartdata->{type} = 'ata';
> +	foreach my $attribute (@{$json_result->{ata_smart_attributes}->{table}}) {
> +	    # we need to override or delete some fields to fit expected json schema of unit tests

while true, I feel like the reason isn't really the unit tests but 
rather consumers (i.e. the GUI) depending on it ;)

> +	    my $flags_string = $attribute->{flags}->{string};
> +	    $flags_string =~ s/\s+$//;
> +	    $attribute->{flags} = $flags_string;
> +	    $attribute->{raw} = $attribute->{raw}->{string};
> +	    $attribute->{threshold} = delete $attribute->{thresh};
> +	    $attribute->{fail} = ${attribute}->{when_failed} eq "" ? "-" : ${attribute}->{when_failed};
> +	    delete ${attribute}->{when_failed};
> +
> +	    push @{$smartdata->{attributes}}, $attribute;
> +	}
> +	my @sorted_attributes = sort { $a->{id} <=> $b->{id} } @{$smartdata->{attributes}};
> +	@{$smartdata->{attributes}} = @sorted_attributes;
> +    }
> +
> +    my $exit_status = $json_result->{smartctl}->{exit_status};
> +    if ($exit_status ne 0) {
> +	die "Error getting S.M.A.R.T. data: Exit code: $exit_status\n";
> +    }

maybe do this before parsing?

>   
>       return $smartdata;
>   }

next prev parent reply	other threads:[~2021-03-31 12:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30 12:23 Oguz Bektas
2021-03-31 12:06 ` Stefan Reiter [this message]
2021-03-31 12:17   ` Oguz Bektas
2021-03-31 12:33   ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f52d5092-5b67-ad0c-b260-719829bcb9ec@proxmox.com \
    --to=s.reiter@proxmox.com \
    --cc=o.bektas@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal