* [PVE-User] OfflineUncorrectableSector, and now?!
@ 2022-11-23 11:02 Marco Gaiarin
[not found] ` <mailman.213.1669277536.284.pve-user@lists.proxmox.com>
2022-11-27 15:44 ` Yannick Palanque
0 siblings, 2 replies; 9+ messages in thread
From: Marco Gaiarin @ 2022-11-23 11:02 UTC (permalink / raw)
To: pve-user
On a DELL PowerEdge T440 we got:
Subject: [alerts.veneto] SMART error (OfflineUncorrectableSector) detected on host: pspve2
This message was generated by the smartd daemon running on:
host name: pspve2
DNS domain: ps.lnf.it
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 8 Offline uncorrectable sectors
Device info:
HFS480G32FEH-BA10A, S/N:ESABN5131I080BA2O, WWN:5-ace42e-025406ebe, FW:DD02, 480 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Wed Nov 16 15:00:46 2022 CET
Another message will be sent in 24 hours if the problem persists.
Disk is an SSD one, connected to the server controller PERC H750 Adapter put
in 'NonRAID' mode (some sort of JBOD).
RAID controller firmware (via iDRAC) says that the disk are full
operational.
I've tried to do some smartctrl shor test, but nothing hapened (so, no error
found i suppose).
Can i safely ignore this? There's some way to say to smartctl that '8 Offline
uncorrectable sectors' are good, and go over?
Thanks.
--
Uno dei più grossi problemi di questo paese è che la maggioranza delle
importazioni vengono dall'estero. (George W. Bush)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
[not found] ` <mailman.213.1669277536.284.pve-user@lists.proxmox.com>
@ 2022-11-24 9:52 ` Alain Péan
2022-11-25 11:11 ` Marco Gaiarin
0 siblings, 1 reply; 9+ messages in thread
From: Alain Péan @ 2022-11-24 9:52 UTC (permalink / raw)
To: pve-user
Le 24/11/2022 à 09:02, Daniel Berteaud via pve-user a écrit :
>> Can i safely ignore this? There's some way to say to smartctl that '8 Offline
>> uncorrectable sectors' are good, and go over?
> The SSD is most likely about to die, and I'd change it ASAP
Yes, this is something you can verify in IDRAC, storage, physical disks,
wear leveling.
Alain
--
Administrateur Système/Réseau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
2022-11-24 9:52 ` Alain Péan
@ 2022-11-25 11:11 ` Marco Gaiarin
0 siblings, 0 replies; 9+ messages in thread
From: Marco Gaiarin @ 2022-11-25 11:11 UTC (permalink / raw)
To: Alain Péan; +Cc: pve-user
Mandi! Alain Péan
In chel di` si favelave...
>> The SSD is most likely about to die, and I'd change it ASAP
> Yes, this is something you can verify in IDRAC, storage, physical disks,
> wear leveling.
Exactly; the strange thing is that: if i look at iDRAC:
Drive Details
Device Description Disk 0 in Backplane 1 of RAID Controller in Slot 4
Operational State Not Applicable
Block Size 512 bytes
Failure Predicted No
Remaining Rated Write Endurance 100%
Power Status On
Form Factor 2.5 inch
Certified Yes
T10 PI Capability Not Capable
Controller PERC H750 Adapter
Enclosure BP14G+ 0:1
RAID Information
Progress Not Applicable
Used RAID Disk Space 446.63 GB
Available RAID Disk Space 0 GB
Non RAID Disk Cache Policy Not Applicable
SAS/SATA/PCIe/NVMe Drive Information
Negotiated Speed 6 Gbps
Capable Speed 6 Gbps
SAS Address 3F4EE08014CE2508
Security
Encryption Capable Not Capable
Encryption Protocol None
System Erase Capability CryptographicErasePD
Cryptographic Erase Capable Capable
Manufacturing Information
Part Number KR0J1TYJSK5001BA01Y9A02
Manufacturer SKhynix
Non RAID Disk Cache Policy Not Applicable
Product ID HFS480G32FEH BA1
Revision DD02
Serial Number ESABN5131I080BA2O
Manufactured Day Not Applicable
Manufactured Week Not Applicable
Manufactured Year Not Applicable
Disk appear perfectly healthy.
--
La giustizia militare sta alla giustiza come la musica militare sta alla
musica. (George Clemenceau)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
2022-11-23 11:02 [PVE-User] OfflineUncorrectableSector, and now?! Marco Gaiarin
[not found] ` <mailman.213.1669277536.284.pve-user@lists.proxmox.com>
@ 2022-11-27 15:44 ` Yannick Palanque
2022-11-28 12:16 ` Marco Gaiarin
1 sibling, 1 reply; 9+ messages in thread
From: Yannick Palanque @ 2022-11-27 15:44 UTC (permalink / raw)
To: Marco Gaiarin; +Cc: Proxmox VE user list
À 2022-11-23T12:02:10+0100,
Marco Gaiarin <gaio@lilliput.linux.it> écrivit :
> The following warning/error was logged by the smartd daemon:
>
> Device: /dev/sda [SAT], 8 Offline uncorrectable sectors
What is the full output of `smartctl -a /dev/sda`?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
2022-11-27 15:44 ` Yannick Palanque
@ 2022-11-28 12:16 ` Marco Gaiarin
2022-11-28 22:36 ` Yannick Palanque
0 siblings, 1 reply; 9+ messages in thread
From: Marco Gaiarin @ 2022-11-28 12:16 UTC (permalink / raw)
To: pve-user
Mandi! Yannick Palanque
In chel di` si favelave...
> What is the full output of `smartctl -a /dev/sda`?
root@pspve2:~# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.178-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: HFS480G32FEH-BA10A
Serial Number: ESABN5131I080BA2O
LU WWN Device Id: 5 ace42e 025406ebe
Add. Product Id: DELL(tm)
Firmware Version: DD02
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4 (minor revision not indicated)
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Nov 28 13:14:10 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 110) seconds.
Offline data collection
capabilities: (0x19) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000e 100 100 006 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 002 Pre-fail Always - 0
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5763
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 22
13 Read_Soft_Error_Rate 0x002e 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 34
175 Program_Fail_Count_Chip 0x0032 100 100 000 Old_age Always - 13
179 Used_Rsvd_Blk_Cnt_Tot 0x0033 100 100 002 Pre-fail Always - 0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0010 100 100 000 Old_age Offline - 1517
181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 067 057 000 Old_age Always - 33 (Min/Max 16/43)
195 Hardware_ECC_Recovered 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
201 Unknown_SSD_Attribute 0x0033 100 100 050 Pre-fail Always - 0
202 Unknown_SSD_Attribute 0x0033 100 100 050 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 7361
235 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 7361
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 8936
245 Unknown_Attribute 0x0033 100 100 001 Pre-fail Always - 100
SMART Error Log not supported
SMART Self-test Log not supported
Selective Self-tests/Logging not supported
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
2022-11-28 12:16 ` Marco Gaiarin
@ 2022-11-28 22:36 ` Yannick Palanque
2022-12-12 16:07 ` Marco Gaiarin
0 siblings, 1 reply; 9+ messages in thread
From: Yannick Palanque @ 2022-11-28 22:36 UTC (permalink / raw)
To: Proxmox VE user list
Le 28/11/2022 13:16, Marco Gaiarin a écrit :
> > What is the full output of `smartctl -a /dev/sda`?
[...]
> root@pspve2:~# smartctl -a /dev/sda
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.178-1-pve] (local
> build) Copyright (C) 2002-20, Bruce Allen, Christian Franke,
> www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Device Model: HFS480G32FEH-BA10A
> Serial Number: ESABN5131I080BA2O
> Add. Product Id: DELL(tm)
> Firmware Version: DD02
> Rotation Rate: Solid State Device
[...]
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE 198 Offline_Uncorrectable 0x0010
[...]
> 100 100 000 Old_age Offline - 8
Well, it seems HFS480G32FEH-BA10A is a Hynix SE5031. I did not find its
specs but for different models, Hynix specs say that ID 198 is:
Offline Scan Uncorrectable Sector Count / The number of uncorrected
errors
It is maybe a bit worrisome but the normalized value of its attributes
is still at 100. This SSD does not seem to be (currently) dying. Do you
have ZFS on top of it? Have a look at its SMART attributes in a couple
of months.
I see that it is a Dell SSD. Do you have any contract support? You
could ask to their support what they think of it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
2022-11-28 22:36 ` Yannick Palanque
@ 2022-12-12 16:07 ` Marco Gaiarin
2022-12-13 10:51 ` Stefan Hanreich
0 siblings, 1 reply; 9+ messages in thread
From: Marco Gaiarin @ 2022-12-12 16:07 UTC (permalink / raw)
To: Yannick Palanque; +Cc: pve-user
Mandi! Yannick Palanque
In chel di` si favelave...
> I see that it is a Dell SSD. Do you have any contract support? You
> could ask to their support what they think of it.
DELL support say that disk is good.
Thre's some way to disable daily SMART email, eg defining that '8 bad sector
is good'? Clearly without disabling SMART at all...
If i've understood well, 'smartd' send notification using scripts in
'/etc/smartmontools/run.d/', and particulary:
/etc/smartmontools/run.d/10mail
I've to code a custom script?
I've used to have smartd signal disk trouble once, and reading manpages
seems that this is still the default behaviour...
--
La CIA ha scoperto chi porta il carbonchio... la befanchia!!!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
2022-12-12 16:07 ` Marco Gaiarin
@ 2022-12-13 10:51 ` Stefan Hanreich
[not found] ` <ad8d7d3c-0bcc-8a53-588d-d0e75875f925@proxmox.com>
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hanreich @ 2022-12-13 10:51 UTC (permalink / raw)
To: pve-user
These warnings get governed by the configuration in /etc/smartd.conf
The only line in the default configuration line looks like this:
DEVICESCAN -d removable -n standby -m root -M exec
/usr/share/smartmontools/smartd-runner
You can change this to the following line to only get email
notifications when the value of SMART attribute 198 increases:
|DEVICESCAN -U 198+ -d removable -n standby -m root -M exec
/usr/share/smartmontools/smartd-runner |
You can find the documentation for this file in the respective man page [1].
Kind Regards
Stefan
||
[1] https://linux.die.net/man/5/smartd.conf
||
On 12/12/22 17:07, Marco Gaiarin wrote:
> Mandi! Yannick Palanque
> In chel di` si favelave...
>
>> I see that it is a Dell SSD. Do you have any contract support? You
>> could ask to their support what they think of it.
> DELL support say that disk is good.
>
>
> Thre's some way to disable daily SMART email, eg defining that '8 bad sector
> is good'? Clearly without disabling SMART at all...
>
>
> If i've understood well, 'smartd' send notification using scripts in
> '/etc/smartmontools/run.d/', and particulary:
>
> /etc/smartmontools/run.d/10mail
>
> I've to code a custom script?
>
>
> I've used to have smartd signal disk trouble once, and reading manpages
> seems that this is still the default behaviour...
>
From s.hanreich@proxmox.com Tue Dec 13 11:56:58 2022
Return-Path: <s.hanreich@proxmox.com>
X-Original-To: pve-user@lists.proxmox.com
Delivered-To: pve-user@lists.proxmox.com
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits))
(No client certificate requested)
by lists.proxmox.com (Postfix) with ESMTPS id 3E361EE1F
for <pve-user@lists.proxmox.com>; Tue, 13 Dec 2022 11:56:58 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
by firstgate.proxmox.com (Proxmox) with ESMTP id 19B8A1E495
for <pve-user@lists.proxmox.com>; Tue, 13 Dec 2022 11:56:28 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
[94.136.29.106])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits))
(No client certificate requested)
by firstgate.proxmox.com (Proxmox) with ESMTPS
for <pve-user@lists.proxmox.com>; Tue, 13 Dec 2022 11:56:26 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 202B144D8C
for <pve-user@lists.proxmox.com>; Tue, 13 Dec 2022 11:56:26 +0100 (CET)
Message-ID: <ad8d7d3c-0bcc-8a53-588d-d0e75875f925@proxmox.com>
Date: Tue, 13 Dec 2022 11:56:24 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.5.0
Content-Language: en-US
To: pve-user@lists.proxmox.com
References: <Y4SmmJFsc/BJ8IIy@sv.lnf.it>
<20221128174156.gal7tmxds22tjreq@cloud0>
<360k6j-u5c1.ln1@hermione.lilliput.linux.it>
<b8aecceb-2c6a-3a01-8414-d0580452cc30@proxmox.com>
From: Stefan Hanreich <s.hanreich@proxmox.com>
In-Reply-To: <b8aecceb-2c6a-3a01-8414-d0580452cc30@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results: 0
AWL 0.518 Adjusted score from AWL reputation of From: address
BAYES_00 -1.9 Bayes spam probability is 0 to 1%
KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
NICE_REPLY_A -0.001 Looks like a legit reply (A)
PLING_QUERY 0.1 Subject has exclamation mark and question mark
SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record
SPF_PASS -0.001 SPF: sender matches SPF record
URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
information. [die.net, proxmox.com]
Subject: Re: [PVE-User] OfflineUncorrectableSector, and now?!
X-BeenThere: pve-user@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE user list <pve-user.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>,
<mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-user/>
List-Post: <mailto:pve-user@lists.proxmox.com>
List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>,
<mailto:pve-user-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 13 Dec 2022 10:56:58 -0000
Seems like there were some issues with the formatting of my last mail,
so I am writing again:
The default config looks like this:
DEVICESCAN -d removable -n standby -m root -M exec
/usr/share/smartmontools/smartd-runner
This would need to be adapted like this:
DEVICESCAN -U 198+ -d removable -n standby -m root -M exec
/usr/share/smartmontools/smartd-runner
Kind Regards
On 12/13/22 11:51, Stefan Hanreich wrote:
> These warnings get governed by the configuration in /etc/smartd.conf
>
> The only line in the default configuration line looks like this:
>
> DEVICESCAN -d removable -n standby -m root -M exec
> /usr/share/smartmontools/smartd-runner
>
> You can change this to the following line to only get email
> notifications when the value of SMART attribute 198 increases:
>
> |DEVICESCAN -U 198+ -d removable -n standby -m root -M exec
> /usr/share/smartmontools/smartd-runner |
>
> You can find the documentation for this file in the respective man page
> [1].
>
> Kind Regards
> Stefan
>
>
> ||
>
> [1] https://linux.die.net/man/5/smartd.conf
>
> ||
>
> On 12/12/22 17:07, Marco Gaiarin wrote:
>> Mandi! Yannick Palanque
>> In chel di` si favelave...
>>
>>> I see that it is a Dell SSD. Do you have any contract support? You
>>> could ask to their support what they think of it.
>> DELL support say that disk is good.
>>
>>
>> Thre's some way to disable daily SMART email, eg defining that '8 bad
>> sector
>> is good'? Clearly without disabling SMART at all...
>>
>>
>> If i've understood well, 'smartd' send notification using scripts in
>> '/etc/smartmontools/run.d/', and particulary:
>>
>> /etc/smartmontools/run.d/10mail
>>
>> I've to code a custom script?
>>
>>
>> I've used to have smartd signal disk trouble once, and reading manpages
>> seems that this is still the default behaviour...
>>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PVE-User] OfflineUncorrectableSector, and now?!
[not found] ` <ad8d7d3c-0bcc-8a53-588d-d0e75875f925@proxmox.com>
@ 2022-12-17 12:09 ` Marco Gaiarin
0 siblings, 0 replies; 9+ messages in thread
From: Marco Gaiarin @ 2022-12-17 12:09 UTC (permalink / raw)
To: Stefan Hanreich; +Cc: pve-user
Mandi! Stefan Hanreich
In chel di` si favelave...
> DEVICESCAN -U 198+ -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
It works! Thanks!!!
--
Worrying about case in a Windows (AD) context is one of the quickest paths to
insanity. (Patrick Goetz)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-12-17 13:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-23 11:02 [PVE-User] OfflineUncorrectableSector, and now?! Marco Gaiarin
[not found] ` <mailman.213.1669277536.284.pve-user@lists.proxmox.com>
2022-11-24 9:52 ` Alain Péan
2022-11-25 11:11 ` Marco Gaiarin
2022-11-27 15:44 ` Yannick Palanque
2022-11-28 12:16 ` Marco Gaiarin
2022-11-28 22:36 ` Yannick Palanque
2022-12-12 16:07 ` Marco Gaiarin
2022-12-13 10:51 ` Stefan Hanreich
[not found] ` <ad8d7d3c-0bcc-8a53-588d-d0e75875f925@proxmox.com>
2022-12-17 12:09 ` Marco Gaiarin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox