public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH] disk management: Add support for additional Crucial SSDs
@ 2020-10-19 22:53 Jan-Jonas Sämann
  2020-10-22 13:30 ` Dominik Csapak
  0 siblings, 1 reply; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-19 22:53 UTC (permalink / raw)
  To: pve-devel; +Cc: Jan-Jonas Sämann

Crucial SSDs do not necessarily contain their vendor name in the model
string. Hence, some of them are not recognized by get_wear_leveling_info().

This patch adds support for some common consumer-grade crucial disks,
such the CT500MX500SSD1.

Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
---
 PVE/Diskmanage.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
index 79aafcc..37dc3bc 100644
--- a/PVE/Diskmanage.pm
+++ b/PVE/Diskmanage.pm
@@ -410,7 +410,7 @@ sub get_wear_leveling_info {
 	'samsung' => 177,
 	'intel' => 233,
 	'sandisk' => 233,
-	'crucial' => 202,
+	'(crucial|ct[35]00[bm]x)' => 202,
 	'default' => 233,
     };
 
-- 
2.25.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] [PATCH] disk management: Add support for additional Crucial SSDs
  2020-10-19 22:53 [pve-devel] [PATCH] disk management: Add support for additional Crucial SSDs Jan-Jonas Sämann
@ 2020-10-22 13:30 ` Dominik Csapak
  2020-10-24 19:27   ` [pve-devel] New routine get_wear_leveling_info() Jan-Jonas Sämann
  0 siblings, 1 reply; 15+ messages in thread
From: Dominik Csapak @ 2020-10-22 13:30 UTC (permalink / raw)
  To: Proxmox VE development discussion, Jan-Jonas Sämann

Hi,

sorry for the late answer and thanks for your contribution :)

first, if you want to contribute please sign the harmony cla and send it 
to us (https://pve.proxmox.com/wiki/Developer_Documentation for details)

secondly, generally we do not want to start an exhaustive list of
vendor/models, but since we already support crucial and those ssds
are their current models, it probably makes sense to include it
i would prefer though to have an anchor at the beginning, since we do it 
not ourselves for that match

iow i would rather want to use

'(crucial|^ct[35]00[bm]x)' => 202,

if you do not want to sign the cla and are ok with it, one of us
can also send the (updated) patch ourselves

kind regards
Dominik

On 10/20/20 12:53 AM, Jan-Jonas Sämann wrote:
> Crucial SSDs do not necessarily contain their vendor name in the model
> string. Hence, some of them are not recognized by get_wear_leveling_info().
> 
> This patch adds support for some common consumer-grade crucial disks,
> such the CT500MX500SSD1.
> 
> Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
> ---
>   PVE/Diskmanage.pm | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
> index 79aafcc..37dc3bc 100644
> --- a/PVE/Diskmanage.pm
> +++ b/PVE/Diskmanage.pm
> @@ -410,7 +410,7 @@ sub get_wear_leveling_info {
>   	'samsung' => 177,
>   	'intel' => 233,
>   	'sandisk' => 233,
> -	'crucial' => 202,
> +	'(crucial|ct[35]00[bm]x)' => 202,
>   	'default' => 233,
>       };
>   
> 





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] New routine get_wear_leveling_info()
  2020-10-22 13:30 ` Dominik Csapak
@ 2020-10-24 19:27   ` Jan-Jonas Sämann
  2020-10-24 19:27     ` [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
  0 siblings, 1 reply; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-24 19:27 UTC (permalink / raw)
  To: pve-devel

Hi,

upon closer inspection and more testing on different systems, I also
discovered more issues with the current implementation itself. Hence
registers can vary on model bases for some vendors, it is clearly not
the correct way to map registers. The current implementation
even interprets entirely unrelated registers on some drives.
For instance on a Corsair SSD of mine Register 233 (default) is used,
wich according to smartctl is labeled as "Sandforce_Internal"

I am currently working on a different approach wich uses attribute names
from smartmontools drivedb.h to search for the correct register. This
way we can build up on existing knowledge. In theory and after more
testing, the new method could entirely replace the current lookup
procedure.  Sadly even smartmontools does not provide a generic label
for the closest possible wearout predicting value. So we still need to
look up the next best label and let smartctl translate down to the
register address.

It doesn't make sense to me anymore to maintain the kind of qnd
vendormap like Dominik allready said.

kind regards
Jan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup
  2020-10-24 19:27   ` [pve-devel] New routine get_wear_leveling_info() Jan-Jonas Sämann
@ 2020-10-24 19:27     ` Jan-Jonas Sämann
  2020-10-27  8:08       ` Thomas Lamprecht
  2020-10-29 18:21       ` Thomas Lamprecht
  0 siblings, 2 replies; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-24 19:27 UTC (permalink / raw)
  To: pve-devel

This replaces a locally maintained hardware map in
get_wear_leveling_info() by commonly used register names of
smartmontool. Smartmontool maintains a labeled register database that
contains a majority of drives (including versions). The current lookup
produces false estimates, this approach hopefully provides more reliable
data.

Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
---
 PVE/Diskmanage.pm | 58 +++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
index 79aafcc..20dbeeb 100644
--- a/PVE/Diskmanage.pm
+++ b/PVE/Diskmanage.pm
@@ -396,7 +396,7 @@ sub get_sysdir_info {
 }
 
 sub get_wear_leveling_info {
-    my ($smartdata, $model) = @_;
+    my ($smartdata) = @_;
     my $attributes = $smartdata->{attributes};
 
     if (defined($smartdata->{wearout})) {
@@ -405,37 +405,35 @@ sub get_wear_leveling_info {
 
     my $wearout;
 
-    my $vendormap = {
-	'kingston' => 231,
-	'samsung' => 177,
-	'intel' => 233,
-	'sandisk' => 233,
-	'crucial' => 202,
-	'default' => 233,
-    };
-
-    # find target attr id
-
-    my $attrid;
-
-    foreach my $vendor (keys %$vendormap) {
-	if ($model =~ m/$vendor/i) {
-	    $attrid = $vendormap->{$vendor};
-	    # found the attribute
-	    last;
+    # Common register names that represent percentage values of potential
+    # failure indicators used in drivedb.h of smartmontool's. Order matters,
+    # as some drives may have multiple definitions
+    my @wearoutregisters = (
+	"SSD_Life_Left",
+	"Wear_Leveling_Count",
+	"Perc_Write\/Erase_Ct_BC",
+	"Perc_Rated_Life_Remain",
+	"Remaining_Lifetime_Perc",
+	"Percent_Lifetime_Remain",
+	"Lifetime_Left",
+	"PCT_Life_Remaining",
+	"Lifetime_Remaining",
+	"Percent_Life_Remaining",
+	"Percent_Lifetime_Used",
+	"Perc_Rated_Life_Used"
+    );
+
+    # Search for S.M.A.R.T. attributes for known register
+    foreach my $register (@wearoutregisters) {
+	last if defined $wearout;
+	foreach my $attr (@$attributes) {
+	   next if $attr->{name} !~ m/$register/;
+	   # Store wearout value, invert value if register matches "Used"
+	   $wearout = ($attr->{name} =~ /Used/) ? 100 - $attr->{value} : $attr->{value};
+	   last;
 	}
     }
 
-    if (!$attrid) {
-	$attrid = $vendormap->{default};
-    }
-
-    foreach my $attr (@$attributes) {
-	next if $attr->{id} != $attrid;
-	$wearout = $attr->{value};
-	last;
-    }
-
     return $wearout;
 }
 
@@ -559,7 +557,7 @@ sub get_disks {
 
 		if (is_ssdlike($type)) {
 		    # if we have an ssd we try to get the wearout indicator
-		    my $wearval = get_wear_leveling_info($smartdata, $data->{model} || $sysdata->{model});
+		    my $wearval = get_wear_leveling_info($smartdata);
 		    $wearout = $wearval if defined($wearval);
 		}
 	    };
-- 
2.25.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup
  2020-10-24 19:27     ` [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
@ 2020-10-27  8:08       ` Thomas Lamprecht
  2020-10-27 19:06         ` Jan-Jonas Sämann
  2020-10-29 18:21       ` Thomas Lamprecht
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Lamprecht @ 2020-10-27  8:08 UTC (permalink / raw)
  To: Proxmox VE development discussion, Jan-Jonas Sämann

On 24.10.20 21:27, Jan-Jonas Sämann wrote:
> This replaces a locally maintained hardware map in
> get_wear_leveling_info() by commonly used register names of
> smartmontool. Smartmontool maintains a labeled register database that
> contains a majority of drives (including versions). The current lookup
> produces false estimates, this approach hopefully provides more reliable
> data.
> 
> Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
> ---
>  PVE/Diskmanage.pm | 58 +++++++++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 30 deletions(-)
> 

seems like a nicer approach in general, could you please point to some code/reference
from where you took those register names below?

> diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
> index 79aafcc..20dbeeb 100644
> --- a/PVE/Diskmanage.pm
> +++ b/PVE/Diskmanage.pm
> @@ -396,7 +396,7 @@ sub get_sysdir_info {
>  }
>  
>  sub get_wear_leveling_info {
> -    my ($smartdata, $model) = @_;
> +    my ($smartdata) = @_;
>      my $attributes = $smartdata->{attributes};
>  
>      if (defined($smartdata->{wearout})) {
> @@ -405,37 +405,35 @@ sub get_wear_leveling_info {
>  
>      my $wearout;
>  
> -    my $vendormap = {
> -	'kingston' => 231,
> -	'samsung' => 177,
> -	'intel' => 233,
> -	'sandisk' => 233,
> -	'crucial' => 202,
> -	'default' => 233,
> -    };
> -
> -    # find target attr id
> -
> -    my $attrid;
> -
> -    foreach my $vendor (keys %$vendormap) {
> -	if ($model =~ m/$vendor/i) {
> -	    $attrid = $vendormap->{$vendor};
> -	    # found the attribute
> -	    last;
> +    # Common register names that represent percentage values of potential
> +    # failure indicators used in drivedb.h of smartmontool's. Order matters,
> +    # as some drives may have multiple definitions
> +    my @wearoutregisters = (
> +	"SSD_Life_Left",
> +	"Wear_Leveling_Count",
> +	"Perc_Write\/Erase_Ct_BC",
> +	"Perc_Rated_Life_Remain",
> +	"Remaining_Lifetime_Perc",
> +	"Percent_Lifetime_Remain",
> +	"Lifetime_Left",
> +	"PCT_Life_Remaining",
> +	"Lifetime_Remaining",
> +	"Percent_Life_Remaining",
> +	"Percent_Lifetime_Used",
> +	"Perc_Rated_Life_Used"
> +    );
> +
> +    # Search for S.M.A.R.T. attributes for known register
> +    foreach my $register (@wearoutregisters) {
> +	last if defined $wearout;
> +	foreach my $attr (@$attributes) {
> +	   next if $attr->{name} !~ m/$register/;
> +	   # Store wearout value, invert value if register matches "Used"
> +	   $wearout = ($attr->{name} =~ /Used/) ? 100 - $attr->{value} : $attr->{value};
> +	   last;
>  	}
>      }
>  
> -    if (!$attrid) {
> -	$attrid = $vendormap->{default};
> -    }
> -
> -    foreach my $attr (@$attributes) {
> -	next if $attr->{id} != $attrid;
> -	$wearout = $attr->{value};
> -	last;
> -    }
> -
>      return $wearout;
>  }
>  
> @@ -559,7 +557,7 @@ sub get_disks {
>  
>  		if (is_ssdlike($type)) {
>  		    # if we have an ssd we try to get the wearout indicator
> -		    my $wearval = get_wear_leveling_info($smartdata, $data->{model} || $sysdata->{model});
> +		    my $wearval = get_wear_leveling_info($smartdata);
>  		    $wearout = $wearval if defined($wearval);
>  		}
>  	    };
> 






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup
  2020-10-27  8:08       ` Thomas Lamprecht
@ 2020-10-27 19:06         ` Jan-Jonas Sämann
  0 siblings, 0 replies; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-27 19:06 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: Thomas Lamprecht



On 27.10.20 09:08, Thomas Lamprecht wrote:
> On 24.10.20 21:27, Jan-Jonas Sämann wrote:
>> This replaces a locally maintained hardware map in
>> get_wear_leveling_info() by commonly used register names of
>> smartmontool. Smartmontool maintains a labeled register database that
>> contains a majority of drives (including versions). The current lookup
>> produces false estimates, this approach hopefully provides more reliable
>> data.
>>
>> Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
>> ---
>>  PVE/Diskmanage.pm | 58 +++++++++++++++++++++++------------------------
>>  1 file changed, 28 insertions(+), 30 deletions(-)
>>
> 
> seems like a nicer approach in general, could you please point to some code/reference
> from where you took those register names below?

It's directly from the sources of smartmontools:
 https://www.smartmontools.org/browser/trunk/smartmontools/drivedb.h

Most of them correspond to the old value from vendormap. There are a couple exceptions
where some vendors/drives do not have a direct equivalent. There I estimated the closest possible
match. It's still a mess. Would be really cool if at least smartmontools provided a common fixed
attribute for each drive. So maybe I missed some, hard to say.
The register names ensure we don't acidently interpret unrelated data in some weird circumstances.

> 
>> diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
>> index 79aafcc..20dbeeb 100644
>> --- a/PVE/Diskmanage.pm
>> +++ b/PVE/Diskmanage.pm
>> @@ -396,7 +396,7 @@ sub get_sysdir_info {
>>  }
>>  
>>  sub get_wear_leveling_info {
>> -    my ($smartdata, $model) = @_;
>> +    my ($smartdata) = @_;
>>      my $attributes = $smartdata->{attributes};
>>  
>>      if (defined($smartdata->{wearout})) {
>> @@ -405,37 +405,35 @@ sub get_wear_leveling_info {
>>  
>>      my $wearout;
>>  
>> -    my $vendormap = {
>> -	'kingston' => 231,
>> -	'samsung' => 177,
>> -	'intel' => 233,
>> -	'sandisk' => 233,
>> -	'crucial' => 202,
>> -	'default' => 233,
>> -    };
>> -
>> -    # find target attr id
>> -
>> -    my $attrid;
>> -
>> -    foreach my $vendor (keys %$vendormap) {
>> -	if ($model =~ m/$vendor/i) {
>> -	    $attrid = $vendormap->{$vendor};
>> -	    # found the attribute
>> -	    last;
>> +    # Common register names that represent percentage values of potential
>> +    # failure indicators used in drivedb.h of smartmontool's. Order matters,
>> +    # as some drives may have multiple definitions
>> +    my @wearoutregisters = (
>> +	"SSD_Life_Left",
>> +	"Wear_Leveling_Count",
>> +	"Perc_Write\/Erase_Ct_BC",
>> +	"Perc_Rated_Life_Remain",
>> +	"Remaining_Lifetime_Perc",
>> +	"Percent_Lifetime_Remain",
>> +	"Lifetime_Left",
>> +	"PCT_Life_Remaining",
>> +	"Lifetime_Remaining",
>> +	"Percent_Life_Remaining",
>> +	"Percent_Lifetime_Used",
>> +	"Perc_Rated_Life_Used"
>> +    );
>> +
>> +    # Search for S.M.A.R.T. attributes for known register
>> +    foreach my $register (@wearoutregisters) {
>> +	last if defined $wearout;
>> +	foreach my $attr (@$attributes) {
>> +	   next if $attr->{name} !~ m/$register/;
>> +	   # Store wearout value, invert value if register matches "Used"
>> +	   $wearout = ($attr->{name} =~ /Used/) ? 100 - $attr->{value} : $attr->{value};
>> +	   last;
>>  	}
>>      }
>>  
>> -    if (!$attrid) {
>> -	$attrid = $vendormap->{default};
>> -    }
>> -
>> -    foreach my $attr (@$attributes) {
>> -	next if $attr->{id} != $attrid;
>> -	$wearout = $attr->{value};
>> -	last;
>> -    }
>> -
>>      return $wearout;
>>  }
>>  
>> @@ -559,7 +557,7 @@ sub get_disks {
>>  
>>  		if (is_ssdlike($type)) {
>>  		    # if we have an ssd we try to get the wearout indicator
>> -		    my $wearval = get_wear_leveling_info($smartdata, $data->{model} || $sysdata->{model});
>> +		    my $wearval = get_wear_leveling_info($smartdata);
>>  		    $wearout = $wearval if defined($wearval);
>>  		}
>>  	    };
>>
> 
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup
  2020-10-24 19:27     ` [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
  2020-10-27  8:08       ` Thomas Lamprecht
@ 2020-10-29 18:21       ` Thomas Lamprecht
  2020-10-30  3:31         ` [pve-devel] Updated patch and test data Jan-Jonas Sämann
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Lamprecht @ 2020-10-29 18:21 UTC (permalink / raw)
  To: Proxmox VE development discussion, Jan-Jonas Sämann

On 24.10.20 21:27, Jan-Jonas Sämann wrote:
> This replaces a locally maintained hardware map in
> get_wear_leveling_info() by commonly used register names of
> smartmontool. Smartmontool maintains a labeled register database that
> contains a majority of drives (including versions). The current lookup
> produces false estimates, this approach hopefully provides more reliable
> data.
> 
> Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
> ---
>  PVE/Diskmanage.pm | 58 +++++++++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 30 deletions(-)

so, I was about to apply this, but it fails the disks test now, could you please
adapt them so that the new checks are covered.

The respective test could be run standalone with:

# cd test
# ./run_disk_tests.pl





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] Updated patch and test data
  2020-10-29 18:21       ` Thomas Lamprecht
@ 2020-10-30  3:31         ` Jan-Jonas Sämann
  2020-10-30  3:31           ` [pve-devel] [PATCH storage v3 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
  2020-10-30  3:31           ` [pve-devel] [PATCH storage v3 " Jan-Jonas Sämann
  0 siblings, 2 replies; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-30  3:31 UTC (permalink / raw)
  To: pve-devel

Two things:
* Test environment for sde had outdated smart data
* Added attribute name "Media_Wearout_Indicator"

Fortunately I had the exact same drive model at hand like Dominik had
for the original sde data.

All tests successful now. At least for me. :)





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH storage v3 1/2] Update disk_tests/ssd_smart/sde data
  2020-10-30  3:31         ` [pve-devel] Updated patch and test data Jan-Jonas Sämann
@ 2020-10-30  3:31           ` Jan-Jonas Sämann
  2020-10-30  3:57             ` [pve-devel] Commit fixup Jan-Jonas Sämann
  2020-10-30  3:31           ` [pve-devel] [PATCH storage v3 " Jan-Jonas Sämann
  1 sibling, 1 reply; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-30  3:31 UTC (permalink / raw)
  To: pve-devel; +Cc: Jan-Jonas Sämann

From: Jan-Jonas Sämann <sprintefreak@binary-kitchen.de>

Provides recent test data for disk_tests/ssd_smart/sde_smart. The
previous data was created using an older smartmontools version, which
did not yet support the drive and therefore had bogus attribute mapping.

Signed-off-by: Jan-Jonas Sämann <sprintefreak@binary-kitchen.de>
---
 test/disk_tests/ssd_smart/sde_smart           |  33 +-
 .../ssd_smart/sde_smart_expected.json         | 286 +++++++++++++++---
 2 files changed, 270 insertions(+), 49 deletions(-)

diff --git a/test/disk_tests/ssd_smart/sde_smart b/test/disk_tests/ssd_smart/sde_smart
index 147790b..f6f01d6 100644
--- a/test/disk_tests/ssd_smart/sde_smart
+++ b/test/disk_tests/ssd_smart/sde_smart
@@ -1,5 +1,5 @@
-smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.4.19-1-pve] (local build)
-Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
+smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.65-1-pve] (local build)
+Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
 
 === START OF READ SMART DATA SECTION ===
 SMART overall-health self-assessment test result: PASSED
@@ -7,13 +7,34 @@ SMART overall-health self-assessment test result: PASSED
 SMART Attributes Data Structure revision number: 10
 Vendor Specific SMART Attributes with Thresholds:
 ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
-177 Wear_Leveling_Count     ------   000   000   000    -    0
-230 Unknown_SSD_Attribute   PO--C-   100   100   000    -    100
-231 Temperature_Celsius     ------   091   091   011    -    4294967296
-233 Media_Wearout_Indicator -O--CK   000   000   000    -    43362
+  1 Raw_Read_Error_Rate     -O--CK   120   120   050    -    0/0
+  5 Retired_Block_Count     PO--CK   100   100   003    -    0
+  9 Power_On_Hours_and_Msec -O--CK   091   091   000    -    7963h+07m+54.620s
+ 12 Power_Cycle_Count       -O--CK   099   099   000    -    1153
+171 Program_Fail_Count      -O-R--   100   100   000    -    0
+172 Erase_Fail_Count        -O--CK   100   100   000    -    0
+174 Unexpect_Power_Loss_Ct  ----CK   000   000   000    -    113
+177 Wear_Range_Delta        ------   000   000   000    -    1
+181 Program_Fail_Count      -O-R--   100   100   000    -    0
+182 Erase_Fail_Count        -O--CK   100   100   000    -    0
+187 Reported_Uncorrect      -O--C-   100   100   000    -    0
+189 Airflow_Temperature_Cel ------   027   049   000    -    27 (Min/Max 2/49)
+194 Temperature_Celsius     -O---K   027   049   000    -    27 (Min/Max 2/49)
+195 ECC_Uncorr_Error_Count  --SRC-   120   120   000    -    0/0
+196 Reallocated_Event_Count PO--CK   100   100   003    -    0
+201 Unc_Soft_Read_Err_Rate  --SRC-   120   120   000    -    0/0
+204 Soft_ECC_Correct_Rate   --SRC-   120   120   000    -    0/0
+230 Life_Curve_Status       PO--C-   100   100   000    -    100
+231 SSD_Life_Left           ------   091   091   011    -    4294967296
+233 SandForce_Internal      -O--CK   000   000   000    -    6317
+234 SandForce_Internal      -O--CK   000   000   000    -    4252
+241 Lifetime_Writes_GiB     -O--CK   000   000   000    -    4252
+242 Lifetime_Reads_GiB      -O--CK   000   000   000    -    34599
+244 Unknown_Attribute       ------   099   099   010    -    4063273
                             ||||||_ K auto-keep
                             |||||__ C event count
                             ||||___ R error rate
                             |||____ S speed/performance
                             ||_____ O updated online
                             |______ P prefailure warning
+
diff --git a/test/disk_tests/ssd_smart/sde_smart_expected.json b/test/disk_tests/ssd_smart/sde_smart_expected.json
index f4e4bdf..1d45c1d 100644
--- a/test/disk_tests/ssd_smart/sde_smart_expected.json
+++ b/test/disk_tests/ssd_smart/sde_smart_expected.json
@@ -1,46 +1,246 @@
 {
-    "attributes" : [
-	{
-	    "worst" : 0,
-	    "threshold" : 0,
-	    "name" : "Wear_Leveling_Count",
-	    "value" : 0,
-	    "id" : "177",
-	    "raw" : "0",
-	    "flags" : "------",
-	    "fail" : "-"
-	},
-	{
-	    "worst" : 100,
-	    "name" : "Unknown_SSD_Attribute",
-	    "threshold" : 0,
-	    "id" : "230",
-	    "fail" : "-",
-	    "flags" : "PO--C-",
-	    "raw" : "100",
-	    "value" : 100
-	},
-	{
-	    "worst" : 91,
-	    "threshold" : 11,
-	    "name" : "Temperature_Celsius",
-	    "id" : "231",
-	    "flags" : "------",
-	    "raw" : "4294967296",
-	    "fail" : "-",
-	    "value" : 91
-	},
-	{
-	    "worst" : 0,
-	    "threshold" : 0,
-	    "name" : "Media_Wearout_Indicator",
-	    "id" : "233",
-	    "flags" : "-O--CK",
-	    "raw" : "43362",
-	    "fail" : "-",
-	    "value" : 0
+    "health": "PASSED",
+    "type": "ata",
+    "attributes": [
+	{
+	    "fail": "-",
+	    "id": "  1",
+	    "raw": "0/0",
+	    "flags": "-O--CK",
+	    "name": "Raw_Read_Error_Rate",
+	    "threshold": 50,
+	    "value": 120,
+	    "worst": 120
+	},
+	{
+	    "id": "  5",
+	    "fail": "-",
+	    "value": 100,
+	    "worst": 100,
+	    "threshold": 3,
+	    "name": "Retired_Block_Count",
+	    "flags": "PO--CK",
+	    "raw": "0"
+	},
+	{
+	    "fail": "-",
+	    "id": "  9",
+	    "raw": "7963h+07m+54.620s",
+	    "flags": "-O--CK",
+	    "worst": 91,
+	    "value": 91,
+	    "name": "Power_On_Hours_and_Msec",
+	    "threshold": 0
+	},
+	{
+	    "id": " 12",
+	    "fail": "-",
+	    "threshold": 0,
+	    "name": "Power_Cycle_Count",
+	    "worst": 99,
+	    "value": 99,
+	    "flags": "-O--CK",
+	    "raw": "1153"
+	},
+	{
+	    "flags": "-O-R--",
+	    "raw": "0",
+	    "worst": 100,
+	    "value": 100,
+	    "threshold": 0,
+	    "name": "Program_Fail_Count",
+	    "fail": "-",
+	    "id": "171"
+	},
+	{
+	    "fail": "-",
+	    "id": "172",
+	    "flags": "-O--CK",
+	    "raw": "0",
+	    "name": "Erase_Fail_Count",
+	    "threshold": 0,
+	    "worst": 100,
+	    "value": 100
+	},
+	{
+	    "fail": "-",
+	    "id": "174",
+	    "raw": "113",
+	    "flags": "----CK",
+	    "value": 0,
+	    "worst": 0,
+	    "threshold": 0,
+	    "name": "Unexpect_Power_Loss_Ct"
+	},
+	{
+	    "id": "177",
+	    "fail": "-",
+	    "value": 0,
+	    "worst": 0,
+	    "name": "Wear_Range_Delta",
+	    "threshold": 0,
+	    "flags": "------",
+	    "raw": "1"
+	},
+	{
+	    "flags": "-O-R--",
+	    "raw": "0",
+	    "threshold": 0,
+	    "name": "Program_Fail_Count",
+	    "worst": 100,
+	    "value": 100,
+	    "fail": "-",
+	    "id": "181"
+	},
+	{
+	    "threshold": 0,
+	    "name": "Erase_Fail_Count",
+	    "value": 100,
+	    "worst": 100,
+	    "flags": "-O--CK",
+	    "raw": "0",
+	    "id": "182",
+	    "fail": "-"
+	},
+	{
+	    "flags": "-O--C-",
+	    "raw": "0",
+	    "value": 100,
+	    "worst": 100,
+	    "threshold": 0,
+	    "name": "Reported_Uncorrect",
+	    "fail": "-",
+	    "id": "187"
+	},
+	{
+	    "value": 27,
+	    "worst": 49,
+	    "name": "Airflow_Temperature_Cel",
+	    "threshold": 0,
+	    "flags": "------",
+	    "raw": "27 (Min/Max 2/49)",
+	    "id": "189",
+	    "fail": "-"
+	},
+	{
+	    "threshold": 0,
+	    "name": "Temperature_Celsius",
+	    "worst": 49,
+	    "value": 27,
+	    "flags": "-O---K",
+	    "raw": "27 (Min/Max 2/49)",
+	    "id": "194",
+	    "fail": "-"
+	},
+	{
+	    "id": "195",
+	    "fail": "-",
+	    "worst": 120,
+	    "value": 120,
+	    "threshold": 0,
+	    "name": "ECC_Uncorr_Error_Count",
+	    "raw": "0/0",
+	    "flags": "--SRC-"
+	},
+	{
+	    "fail": "-",
+	    "id": "196",
+	    "raw": "0",
+	    "flags": "PO--CK",
+	    "threshold": 3,
+	    "name": "Reallocated_Event_Count",
+	    "value": 100,
+	    "worst": 100
+	},
+	{
+	    "value": 120,
+	    "worst": 120,
+	    "threshold": 0,
+	    "name": "Unc_Soft_Read_Err_Rate",
+	    "flags": "--SRC-",
+	    "raw": "0/0",
+	    "id": "201",
+	    "fail": "-"
+	},
+	{
+	    "raw": "0/0",
+	    "flags": "--SRC-",
+	    "value": 120,
+	    "worst": 120,
+	    "threshold": 0,
+	    "name": "Soft_ECC_Correct_Rate",
+	    "fail": "-",
+	    "id": "204"
+	},
+	{
+	    "value": 100,
+	    "worst": 100,
+	    "threshold": 0,
+	    "name": "Life_Curve_Status",
+	    "raw": "100",
+	    "flags": "PO--C-",
+	    "id": "230",
+	    "fail": "-"
+	},
+	{
+	    "id": "231",
+	    "fail": "-",
+	    "worst": 91,
+	    "value": 91,
+	    "name": "SSD_Life_Left",
+	    "threshold": 11,
+	    "flags": "------",
+	    "raw": "4294967296"
+	},
+	{
+	    "raw": "6317",
+	    "flags": "-O--CK",
+	    "name": "SandForce_Internal",
+	    "threshold": 0,
+	    "value": 0,
+	    "worst": 0,
+	    "fail": "-",
+	    "id": "233"
+	},
+	{
+	    "value": 0,
+	    "worst": 0,
+	    "name": "SandForce_Internal",
+	    "threshold": 0,
+	    "flags": "-O--CK",
+	    "raw": "4252",
+	    "id": "234",
+	    "fail": "-"
+	},
+	{
+	    "worst": 0,
+	    "value": 0,
+	    "name": "Lifetime_Writes_GiB",
+	    "threshold": 0,
+	    "flags": "-O--CK",
+	    "raw": "4252",
+	    "id": "241",
+	    "fail": "-"
+	},
+	{
+	    "flags": "-O--CK",
+	    "raw": "34599",
+	    "value": 0,
+	    "worst": 0,
+	    "threshold": 0,
+	    "name": "Lifetime_Reads_GiB",
+	    "fail": "-",
+	    "id": "242"
+	},
+	{
+	    "threshold": 10,
+	    "name": "Unknown_Attribute",
+	    "worst": 99,
+	    "value": 99,
+	    "flags": "------",
+	    "raw": "4063273",
+	    "id": "244",
+	    "fail": "-"
 	}
-    ],
-    "type" : "ata",
-    "health" : "PASSED"
+    ]
 }
-- 
2.25.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH storage v3 2/2] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup
  2020-10-30  3:31         ` [pve-devel] Updated patch and test data Jan-Jonas Sämann
  2020-10-30  3:31           ` [pve-devel] [PATCH storage v3 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
@ 2020-10-30  3:31           ` Jan-Jonas Sämann
  1 sibling, 0 replies; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-30  3:31 UTC (permalink / raw)
  To: pve-devel

This replaces a locally maintained hardware map in
get_wear_leveling_info() by commonly used register names of
smartmontool. Smartmontool maintains a labeled register database that
contains a majority of drives (including versions). The current lookup
produces false estimates, this approach hopefully provides more reliable
data.

Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
---
 PVE/Diskmanage.pm | 58 +++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
index 79aafcc..2552add 100644
--- a/PVE/Diskmanage.pm
+++ b/PVE/Diskmanage.pm
@@ -396,7 +396,7 @@ sub get_sysdir_info {
 }
 
 sub get_wear_leveling_info {
-    my ($smartdata, $model) = @_;
+    my ($smartdata) = @_;
     my $attributes = $smartdata->{attributes};
 
     if (defined($smartdata->{wearout})) {
@@ -405,37 +405,35 @@ sub get_wear_leveling_info {
 
     my $wearout;
 
-    my $vendormap = {
-	'kingston' => 231,
-	'samsung' => 177,
-	'intel' => 233,
-	'sandisk' => 233,
-	'crucial' => 202,
-	'default' => 233,
-    };
-
-    # find target attr id
-
-    my $attrid;
-
-    foreach my $vendor (keys %$vendormap) {
-	if ($model =~ m/$vendor/i) {
-	    $attrid = $vendormap->{$vendor};
-	    # found the attribute
-	    last;
+    # Common register names that represent percentage values of potential
+    # failure indicators used in drivedb.h of smartmontool's. Order matters,
+    # as some drives may have multiple definitions
+    my @wearoutregisters = (
+	"Media_Wearout_Indicator",
+	"SSD_Life_Left",
+	"Wear_Leveling_Count",
+	"Perc_Write\/Erase_Ct_BC",
+	"Perc_Rated_Life_Remain",
+	"Remaining_Lifetime_Perc",
+	"Percent_Lifetime_Remain",
+	"Lifetime_Left",
+	"PCT_Life_Remaining",
+	"Lifetime_Remaining",
+	"Percent_Life_Remaining",
+	"Percent_Lifetime_Used",
+	"Perc_Rated_Life_Used"
+    );
+
+    # Search for S.M.A.R.T. attributes for known register
+    foreach my $register (@wearoutregisters) {
+	last if defined $wearout;
+	foreach my $attr (@$attributes) {
+	   next if $attr->{name} !~ m/$register/;
+	   $wearout = $attr->{value};
+	   last;
 	}
     }
 
-    if (!$attrid) {
-	$attrid = $vendormap->{default};
-    }
-
-    foreach my $attr (@$attributes) {
-	next if $attr->{id} != $attrid;
-	$wearout = $attr->{value};
-	last;
-    }
-
     return $wearout;
 }
 
@@ -559,7 +557,7 @@ sub get_disks {
 
 		if (is_ssdlike($type)) {
 		    # if we have an ssd we try to get the wearout indicator
-		    my $wearval = get_wear_leveling_info($smartdata, $data->{model} || $sysdata->{model});
+		    my $wearval = get_wear_leveling_info($smartdata);
 		    $wearout = $wearval if defined($wearval);
 		}
 	    };
-- 
2.25.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] Commit fixup
  2020-10-30  3:31           ` [pve-devel] [PATCH storage v3 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
@ 2020-10-30  3:57             ` Jan-Jonas Sämann
  2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
  2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 2/2] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
  0 siblings, 2 replies; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-30  3:57 UTC (permalink / raw)
  To: pve-devel

v3 had a typo in one email address. Sorry





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH storage v4 1/2] Update disk_tests/ssd_smart/sde data
  2020-10-30  3:57             ` [pve-devel] Commit fixup Jan-Jonas Sämann
@ 2020-10-30  3:57               ` Jan-Jonas Sämann
  2020-10-30 14:32                 ` [pve-devel] applied: " Thomas Lamprecht
  2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 2/2] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
  1 sibling, 1 reply; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-30  3:57 UTC (permalink / raw)
  To: pve-devel

Provides recent test data for disk_tests/ssd_smart/sde_smart. The
previous data was created using an older smartmontools version, which
did not yet support the drive and therefore had bogus attribute mapping.

Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
---
 test/disk_tests/ssd_smart/sde_smart           |  33 +-
 .../ssd_smart/sde_smart_expected.json         | 286 +++++++++++++++---
 2 files changed, 270 insertions(+), 49 deletions(-)

diff --git a/test/disk_tests/ssd_smart/sde_smart b/test/disk_tests/ssd_smart/sde_smart
index 147790b..f6f01d6 100644
--- a/test/disk_tests/ssd_smart/sde_smart
+++ b/test/disk_tests/ssd_smart/sde_smart
@@ -1,5 +1,5 @@
-smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.4.19-1-pve] (local build)
-Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
+smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.65-1-pve] (local build)
+Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
 
 === START OF READ SMART DATA SECTION ===
 SMART overall-health self-assessment test result: PASSED
@@ -7,13 +7,34 @@ SMART overall-health self-assessment test result: PASSED
 SMART Attributes Data Structure revision number: 10
 Vendor Specific SMART Attributes with Thresholds:
 ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
-177 Wear_Leveling_Count     ------   000   000   000    -    0
-230 Unknown_SSD_Attribute   PO--C-   100   100   000    -    100
-231 Temperature_Celsius     ------   091   091   011    -    4294967296
-233 Media_Wearout_Indicator -O--CK   000   000   000    -    43362
+  1 Raw_Read_Error_Rate     -O--CK   120   120   050    -    0/0
+  5 Retired_Block_Count     PO--CK   100   100   003    -    0
+  9 Power_On_Hours_and_Msec -O--CK   091   091   000    -    7963h+07m+54.620s
+ 12 Power_Cycle_Count       -O--CK   099   099   000    -    1153
+171 Program_Fail_Count      -O-R--   100   100   000    -    0
+172 Erase_Fail_Count        -O--CK   100   100   000    -    0
+174 Unexpect_Power_Loss_Ct  ----CK   000   000   000    -    113
+177 Wear_Range_Delta        ------   000   000   000    -    1
+181 Program_Fail_Count      -O-R--   100   100   000    -    0
+182 Erase_Fail_Count        -O--CK   100   100   000    -    0
+187 Reported_Uncorrect      -O--C-   100   100   000    -    0
+189 Airflow_Temperature_Cel ------   027   049   000    -    27 (Min/Max 2/49)
+194 Temperature_Celsius     -O---K   027   049   000    -    27 (Min/Max 2/49)
+195 ECC_Uncorr_Error_Count  --SRC-   120   120   000    -    0/0
+196 Reallocated_Event_Count PO--CK   100   100   003    -    0
+201 Unc_Soft_Read_Err_Rate  --SRC-   120   120   000    -    0/0
+204 Soft_ECC_Correct_Rate   --SRC-   120   120   000    -    0/0
+230 Life_Curve_Status       PO--C-   100   100   000    -    100
+231 SSD_Life_Left           ------   091   091   011    -    4294967296
+233 SandForce_Internal      -O--CK   000   000   000    -    6317
+234 SandForce_Internal      -O--CK   000   000   000    -    4252
+241 Lifetime_Writes_GiB     -O--CK   000   000   000    -    4252
+242 Lifetime_Reads_GiB      -O--CK   000   000   000    -    34599
+244 Unknown_Attribute       ------   099   099   010    -    4063273
                             ||||||_ K auto-keep
                             |||||__ C event count
                             ||||___ R error rate
                             |||____ S speed/performance
                             ||_____ O updated online
                             |______ P prefailure warning
+
diff --git a/test/disk_tests/ssd_smart/sde_smart_expected.json b/test/disk_tests/ssd_smart/sde_smart_expected.json
index f4e4bdf..1d45c1d 100644
--- a/test/disk_tests/ssd_smart/sde_smart_expected.json
+++ b/test/disk_tests/ssd_smart/sde_smart_expected.json
@@ -1,46 +1,246 @@
 {
-    "attributes" : [
-	{
-	    "worst" : 0,
-	    "threshold" : 0,
-	    "name" : "Wear_Leveling_Count",
-	    "value" : 0,
-	    "id" : "177",
-	    "raw" : "0",
-	    "flags" : "------",
-	    "fail" : "-"
-	},
-	{
-	    "worst" : 100,
-	    "name" : "Unknown_SSD_Attribute",
-	    "threshold" : 0,
-	    "id" : "230",
-	    "fail" : "-",
-	    "flags" : "PO--C-",
-	    "raw" : "100",
-	    "value" : 100
-	},
-	{
-	    "worst" : 91,
-	    "threshold" : 11,
-	    "name" : "Temperature_Celsius",
-	    "id" : "231",
-	    "flags" : "------",
-	    "raw" : "4294967296",
-	    "fail" : "-",
-	    "value" : 91
-	},
-	{
-	    "worst" : 0,
-	    "threshold" : 0,
-	    "name" : "Media_Wearout_Indicator",
-	    "id" : "233",
-	    "flags" : "-O--CK",
-	    "raw" : "43362",
-	    "fail" : "-",
-	    "value" : 0
+    "health": "PASSED",
+    "type": "ata",
+    "attributes": [
+	{
+	    "fail": "-",
+	    "id": "  1",
+	    "raw": "0/0",
+	    "flags": "-O--CK",
+	    "name": "Raw_Read_Error_Rate",
+	    "threshold": 50,
+	    "value": 120,
+	    "worst": 120
+	},
+	{
+	    "id": "  5",
+	    "fail": "-",
+	    "value": 100,
+	    "worst": 100,
+	    "threshold": 3,
+	    "name": "Retired_Block_Count",
+	    "flags": "PO--CK",
+	    "raw": "0"
+	},
+	{
+	    "fail": "-",
+	    "id": "  9",
+	    "raw": "7963h+07m+54.620s",
+	    "flags": "-O--CK",
+	    "worst": 91,
+	    "value": 91,
+	    "name": "Power_On_Hours_and_Msec",
+	    "threshold": 0
+	},
+	{
+	    "id": " 12",
+	    "fail": "-",
+	    "threshold": 0,
+	    "name": "Power_Cycle_Count",
+	    "worst": 99,
+	    "value": 99,
+	    "flags": "-O--CK",
+	    "raw": "1153"
+	},
+	{
+	    "flags": "-O-R--",
+	    "raw": "0",
+	    "worst": 100,
+	    "value": 100,
+	    "threshold": 0,
+	    "name": "Program_Fail_Count",
+	    "fail": "-",
+	    "id": "171"
+	},
+	{
+	    "fail": "-",
+	    "id": "172",
+	    "flags": "-O--CK",
+	    "raw": "0",
+	    "name": "Erase_Fail_Count",
+	    "threshold": 0,
+	    "worst": 100,
+	    "value": 100
+	},
+	{
+	    "fail": "-",
+	    "id": "174",
+	    "raw": "113",
+	    "flags": "----CK",
+	    "value": 0,
+	    "worst": 0,
+	    "threshold": 0,
+	    "name": "Unexpect_Power_Loss_Ct"
+	},
+	{
+	    "id": "177",
+	    "fail": "-",
+	    "value": 0,
+	    "worst": 0,
+	    "name": "Wear_Range_Delta",
+	    "threshold": 0,
+	    "flags": "------",
+	    "raw": "1"
+	},
+	{
+	    "flags": "-O-R--",
+	    "raw": "0",
+	    "threshold": 0,
+	    "name": "Program_Fail_Count",
+	    "worst": 100,
+	    "value": 100,
+	    "fail": "-",
+	    "id": "181"
+	},
+	{
+	    "threshold": 0,
+	    "name": "Erase_Fail_Count",
+	    "value": 100,
+	    "worst": 100,
+	    "flags": "-O--CK",
+	    "raw": "0",
+	    "id": "182",
+	    "fail": "-"
+	},
+	{
+	    "flags": "-O--C-",
+	    "raw": "0",
+	    "value": 100,
+	    "worst": 100,
+	    "threshold": 0,
+	    "name": "Reported_Uncorrect",
+	    "fail": "-",
+	    "id": "187"
+	},
+	{
+	    "value": 27,
+	    "worst": 49,
+	    "name": "Airflow_Temperature_Cel",
+	    "threshold": 0,
+	    "flags": "------",
+	    "raw": "27 (Min/Max 2/49)",
+	    "id": "189",
+	    "fail": "-"
+	},
+	{
+	    "threshold": 0,
+	    "name": "Temperature_Celsius",
+	    "worst": 49,
+	    "value": 27,
+	    "flags": "-O---K",
+	    "raw": "27 (Min/Max 2/49)",
+	    "id": "194",
+	    "fail": "-"
+	},
+	{
+	    "id": "195",
+	    "fail": "-",
+	    "worst": 120,
+	    "value": 120,
+	    "threshold": 0,
+	    "name": "ECC_Uncorr_Error_Count",
+	    "raw": "0/0",
+	    "flags": "--SRC-"
+	},
+	{
+	    "fail": "-",
+	    "id": "196",
+	    "raw": "0",
+	    "flags": "PO--CK",
+	    "threshold": 3,
+	    "name": "Reallocated_Event_Count",
+	    "value": 100,
+	    "worst": 100
+	},
+	{
+	    "value": 120,
+	    "worst": 120,
+	    "threshold": 0,
+	    "name": "Unc_Soft_Read_Err_Rate",
+	    "flags": "--SRC-",
+	    "raw": "0/0",
+	    "id": "201",
+	    "fail": "-"
+	},
+	{
+	    "raw": "0/0",
+	    "flags": "--SRC-",
+	    "value": 120,
+	    "worst": 120,
+	    "threshold": 0,
+	    "name": "Soft_ECC_Correct_Rate",
+	    "fail": "-",
+	    "id": "204"
+	},
+	{
+	    "value": 100,
+	    "worst": 100,
+	    "threshold": 0,
+	    "name": "Life_Curve_Status",
+	    "raw": "100",
+	    "flags": "PO--C-",
+	    "id": "230",
+	    "fail": "-"
+	},
+	{
+	    "id": "231",
+	    "fail": "-",
+	    "worst": 91,
+	    "value": 91,
+	    "name": "SSD_Life_Left",
+	    "threshold": 11,
+	    "flags": "------",
+	    "raw": "4294967296"
+	},
+	{
+	    "raw": "6317",
+	    "flags": "-O--CK",
+	    "name": "SandForce_Internal",
+	    "threshold": 0,
+	    "value": 0,
+	    "worst": 0,
+	    "fail": "-",
+	    "id": "233"
+	},
+	{
+	    "value": 0,
+	    "worst": 0,
+	    "name": "SandForce_Internal",
+	    "threshold": 0,
+	    "flags": "-O--CK",
+	    "raw": "4252",
+	    "id": "234",
+	    "fail": "-"
+	},
+	{
+	    "worst": 0,
+	    "value": 0,
+	    "name": "Lifetime_Writes_GiB",
+	    "threshold": 0,
+	    "flags": "-O--CK",
+	    "raw": "4252",
+	    "id": "241",
+	    "fail": "-"
+	},
+	{
+	    "flags": "-O--CK",
+	    "raw": "34599",
+	    "value": 0,
+	    "worst": 0,
+	    "threshold": 0,
+	    "name": "Lifetime_Reads_GiB",
+	    "fail": "-",
+	    "id": "242"
+	},
+	{
+	    "threshold": 10,
+	    "name": "Unknown_Attribute",
+	    "worst": 99,
+	    "value": 99,
+	    "flags": "------",
+	    "raw": "4063273",
+	    "id": "244",
+	    "fail": "-"
 	}
-    ],
-    "type" : "ata",
-    "health" : "PASSED"
+    ]
 }
-- 
2.25.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] [PATCH storage v4 2/2] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup
  2020-10-30  3:57             ` [pve-devel] Commit fixup Jan-Jonas Sämann
  2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
@ 2020-10-30  3:57               ` Jan-Jonas Sämann
  2020-10-30 14:32                 ` [pve-devel] applied: " Thomas Lamprecht
  1 sibling, 1 reply; 15+ messages in thread
From: Jan-Jonas Sämann @ 2020-10-30  3:57 UTC (permalink / raw)
  To: pve-devel

This replaces a locally maintained hardware map in
get_wear_leveling_info() by commonly used register names of
smartmontool. Smartmontool maintains a labeled register database that
contains a majority of drives (including versions). The current lookup
produces false estimates, this approach hopefully provides more reliable
data.

Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
---
 PVE/Diskmanage.pm | 58 +++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
index 79aafcc..2552add 100644
--- a/PVE/Diskmanage.pm
+++ b/PVE/Diskmanage.pm
@@ -396,7 +396,7 @@ sub get_sysdir_info {
 }
 
 sub get_wear_leveling_info {
-    my ($smartdata, $model) = @_;
+    my ($smartdata) = @_;
     my $attributes = $smartdata->{attributes};
 
     if (defined($smartdata->{wearout})) {
@@ -405,37 +405,35 @@ sub get_wear_leveling_info {
 
     my $wearout;
 
-    my $vendormap = {
-	'kingston' => 231,
-	'samsung' => 177,
-	'intel' => 233,
-	'sandisk' => 233,
-	'crucial' => 202,
-	'default' => 233,
-    };
-
-    # find target attr id
-
-    my $attrid;
-
-    foreach my $vendor (keys %$vendormap) {
-	if ($model =~ m/$vendor/i) {
-	    $attrid = $vendormap->{$vendor};
-	    # found the attribute
-	    last;
+    # Common register names that represent percentage values of potential
+    # failure indicators used in drivedb.h of smartmontool's. Order matters,
+    # as some drives may have multiple definitions
+    my @wearoutregisters = (
+	"Media_Wearout_Indicator",
+	"SSD_Life_Left",
+	"Wear_Leveling_Count",
+	"Perc_Write\/Erase_Ct_BC",
+	"Perc_Rated_Life_Remain",
+	"Remaining_Lifetime_Perc",
+	"Percent_Lifetime_Remain",
+	"Lifetime_Left",
+	"PCT_Life_Remaining",
+	"Lifetime_Remaining",
+	"Percent_Life_Remaining",
+	"Percent_Lifetime_Used",
+	"Perc_Rated_Life_Used"
+    );
+
+    # Search for S.M.A.R.T. attributes for known register
+    foreach my $register (@wearoutregisters) {
+	last if defined $wearout;
+	foreach my $attr (@$attributes) {
+	   next if $attr->{name} !~ m/$register/;
+	   $wearout = $attr->{value};
+	   last;
 	}
     }
 
-    if (!$attrid) {
-	$attrid = $vendormap->{default};
-    }
-
-    foreach my $attr (@$attributes) {
-	next if $attr->{id} != $attrid;
-	$wearout = $attr->{value};
-	last;
-    }
-
     return $wearout;
 }
 
@@ -559,7 +557,7 @@ sub get_disks {
 
 		if (is_ssdlike($type)) {
 		    # if we have an ssd we try to get the wearout indicator
-		    my $wearval = get_wear_leveling_info($smartdata, $data->{model} || $sysdata->{model});
+		    my $wearval = get_wear_leveling_info($smartdata);
 		    $wearout = $wearval if defined($wearval);
 		}
 	    };
-- 
2.25.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] applied: [PATCH storage v4 1/2] Update disk_tests/ssd_smart/sde data
  2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
@ 2020-10-30 14:32                 ` Thomas Lamprecht
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Lamprecht @ 2020-10-30 14:32 UTC (permalink / raw)
  To: Proxmox VE development discussion, Jan-Jonas Sämann

On 30.10.20 04:57, Jan-Jonas Sämann wrote:
> Provides recent test data for disk_tests/ssd_smart/sde_smart. The
> previous data was created using an older smartmontools version, which
> did not yet support the drive and therefore had bogus attribute mapping.
> 
> Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
> ---
>  test/disk_tests/ssd_smart/sde_smart           |  33 +-
>  .../ssd_smart/sde_smart_expected.json         | 286 +++++++++++++++---
>  2 files changed, 270 insertions(+), 49 deletions(-)
> 
>

applied, thanks!





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pve-devel] applied: [PATCH storage v4 2/2] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup
  2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 2/2] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
@ 2020-10-30 14:32                 ` Thomas Lamprecht
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Lamprecht @ 2020-10-30 14:32 UTC (permalink / raw)
  To: Proxmox VE development discussion, Jan-Jonas Sämann

On 30.10.20 04:57, Jan-Jonas Sämann wrote:
> This replaces a locally maintained hardware map in
> get_wear_leveling_info() by commonly used register names of
> smartmontool. Smartmontool maintains a labeled register database that
> contains a majority of drives (including versions). The current lookup
> produces false estimates, this approach hopefully provides more reliable
> data.
> 
> Signed-off-by: Jan-Jonas Sämann <sprinterfreak@binary-kitchen.de>
> ---
>  PVE/Diskmanage.pm | 58 +++++++++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 30 deletions(-)
> 
>

applied, thanks! 





^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-30 14:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 22:53 [pve-devel] [PATCH] disk management: Add support for additional Crucial SSDs Jan-Jonas Sämann
2020-10-22 13:30 ` Dominik Csapak
2020-10-24 19:27   ` [pve-devel] New routine get_wear_leveling_info() Jan-Jonas Sämann
2020-10-24 19:27     ` [pve-devel] [PATCH v2 storage] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
2020-10-27  8:08       ` Thomas Lamprecht
2020-10-27 19:06         ` Jan-Jonas Sämann
2020-10-29 18:21       ` Thomas Lamprecht
2020-10-30  3:31         ` [pve-devel] Updated patch and test data Jan-Jonas Sämann
2020-10-30  3:31           ` [pve-devel] [PATCH storage v3 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
2020-10-30  3:57             ` [pve-devel] Commit fixup Jan-Jonas Sämann
2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 1/2] Update disk_tests/ssd_smart/sde data Jan-Jonas Sämann
2020-10-30 14:32                 ` [pve-devel] applied: " Thomas Lamprecht
2020-10-30  3:57               ` [pve-devel] [PATCH storage v4 2/2] Diskmanage: Use S.M.A.R.T. attributes for SSDs wearout lookup Jan-Jonas Sämann
2020-10-30 14:32                 ` [pve-devel] applied: " Thomas Lamprecht
2020-10-30  3:31           ` [pve-devel] [PATCH storage v3 " Jan-Jonas Sämann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal