public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Jan Vlach <janus@volny.cz>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] ZFS-8000-8A on a non-system disk. How to do?
Date: Sun, 12 Nov 2023 20:47:01 +0100	[thread overview]
Message-ID: <5B3D932D-1AC8-4842-A14E-9B24D23FE67E@volny.cz> (raw)
In-Reply-To: <2E12D3CC-1D34-43F7-94B8-078C08078E8C@qwertz1.com>

Hi,

having same number of checksum errors on all drives really means bad cabling or bad RAM. 

- if you have ECC RAM, check for errors in ipmitool i.e.
ipmitool sel elist
- you could see something in dmesg too.
- if you don't have ECC ram, get the memtest in UEFI mode from https://www.memtest86.com/ <https://www.memtest86.com/> take the host offline and let it run for day or two.

- I've seen this with Supermicro server where the cable for last two slots out of 10 was bent and touching the case lid and those two slots have been resetting the bus showing me increasing errors on all drives. Scrubs just changed the affected files and metadata, so I didn't trust the host anymore and consistency of data, restored everything from good backup to different one and then debugged.

- If at this point you want to backup and restore and you don't have backups, it's game over for you. 

JV

> On 12. 11. 2023, at 19:32, Stefan <proxmox@qwertz1.com> wrote:
> 
> I assume you already have ruled out flaky hardware? (Bad cable, RAM). If so repairing is not possible. You can theoretically bypass the backup/destroy/restore way but why?
> You have three faulty drives that need to be replaced anyway. That operation + identifying the failed file(s) takes much longer than just copy back from backup.
> 
> 
> 
> Am 12. November 2023 16:21:17 MEZ schrieb Marco Gaiarin <gaio@lilliput.linux.it>:
>> 
>> I've got:
>> 
>> root@lisei:~# zpool status -xv
>> pool: rpool-hdd
>> state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>> 	corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>> 	entire pool from backup.
>>  see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
>> scan: scrub repaired 0B in 00:37:50 with 1 errors on Sun Nov 12 01:01:53 2023
>> config:
>> 
>> 	NAME                                            STATE     READ WRITE CKSUM
>> 	rpool-hdd                                       ONLINE       0     0     0
>> 	  raidz1-0                                      ONLINE       0     0     0
>> 	    ata-WDC_WD2003FZEX-00SRLA0_WD-WMC6N0D6J2LN  ONLINE       0     0     2
>> 	    ata-WDC_WD2003FZEX-00SRLA0_WD-WMC6N0D7Z60F  ONLINE       0     0     2
>> 	    ata-WDC_WD2003FZEX-00SRLA0_WD-WMC6N0D2JSHZ  ONLINE       0     0     2
>> 
>> errors: Permanent errors have been detected in the following files:
>> 
>>       rpool-hdd/vm-401-disk-0:<0x1>
>> 
>> disk is for an VM used as a mere repository for rsnapshot backup, so contain
>> many copy of the same files, with different and abunndant retention.
>> Is an addon disk for the VM, eg i can safely if needed umount it, even
>> detach it.
>> 
>> 
>> There's something i can do to repair the volue, possibly online? really i
>> have to backup it, destroy and restore from backup?!
>> 
>> 
>> Thanks.
>> 
>> -- 
>> Non può sentirsi degno di essere italiano chi non vota SI al referendum
>> 				(Silvio Berlusconi, 21 giugno 2006)
>> 
>> 
>> 
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user



  reply	other threads:[~2023-11-12 19:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-12 15:21 Marco Gaiarin
2023-11-12 18:32 ` Stefan
2023-11-12 19:47   ` Jan Vlach [this message]
2023-11-13 20:39     ` Marco Gaiarin
2023-11-13 20:28   ` Marco Gaiarin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B3D932D-1AC8-4842-A14E-9B24D23FE67E@volny.cz \
    --to=janus@volny.cz \
    --cc=pve-user@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal