[PVE-User] ZFS corruption and recovery...

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [PVE-User] ZFS corruption and recovery...
@ 2024-03-19 16:31 Marco Gaiarin
  0 siblings, 0 replies; only message in thread
From: Marco Gaiarin @ 2024-03-19 16:31 UTC (permalink / raw)
  To: pve-user

In a little PVE cluster i've a 'backup server', eg an old/reconditioned
server that do simply backup storage for other nodes: apart the rpool,
there's another pool, on slow HDD, used as data repository, mainly for
rsnapshot.

As a backup server, can be powered down without much effort; last week i
need an (unused) controller within, and so i've powered off, removed the
controller, powered on.

Saturday the backup pool start to complain for errors, and also disks/kernel
complain too. for all the four disk in pool. :-(
Looking at errors, the don't seems media errors, so i've powered off the
server, looked carefully at cabling finding that probably last week removing
the controller i've inadvertently 'loosen' a power connection on the
backpane of disks, damn me.

Reviewing cable worked as expected: server start, SMART on disks say the are
good, all work as expected.

After the server start, disks start to resilver, but some errors remain: a
dozen of files and dirs in 'Permanent error list'.

Because is a backup server, i've simply removed most of the errors, doing
some turns of 'zpool scrub' and 'zpool clear -F' leading to this situation:

 root@svpve3:~# zpool status -v rpool-backup
   pool: rpool-backup
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 	corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 	entire pool from backup.
    see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
   scan: scrub in progress since Tue Mar 19 16:58:52 2024
 	3.89T scanned at 2.97G/s, 745G issued at 569M/s, 13.5T total
 	0B repaired, 5.39% done, 06:32:11 to go
 config:

	NAME                                 STATE     READ WRITE CKSUM
	rpool-backup                         ONLINE       0     0     0
	  raidz1-0                           ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WWZ1MBA8  ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WWZ1Q7F1  ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WRQ0WQ44  ONLINE       0     0     0
	    ata-ST8000VN004-3CP101_WWZ1RFL5  ONLINE       0     0     0
	cache
	  scsi-33001438037cd8921             ONLINE       0     0     0

 errors: Permanent errors have been detected in the following files:

        rpool-backup:<0x63f218>
        rpool-backup:<0x108d421>
        /rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/DO/FS/P/26-02-19
        /rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/DO/2012/mc
        /rpool-backup/rsnapshot/daily.bad/FVG_PP/vdmpp2/srv/media/CD/mg2014/100HP507

apart the first two, other three are directory, that seems i cannot delete
anymore, errors is 'dir not empty' or 'Invalid exchange'.

How can i fix this errors?! As just stated, this is a backup server and so
loosing some files (knowing what file, of course!) it is not trouble...

Thanks.

-- 
  Chissà perché quando si sbaglia numero il telefono non è mai occupato.
							(Beppe Grillo)

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-03-19 21:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-19 16:31 [PVE-User] ZFS corruption and recovery Marco Gaiarin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal