public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] Severe disk corruption: PBS, SATA
@ 2022-05-18  8:04 Marco Gaiarin
  2022-05-18 16:20 ` nada
       [not found] ` <mailman.74.1652864024.356.pve-user@lists.proxmox.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Marco Gaiarin @ 2022-05-18  8:04 UTC (permalink / raw)
  To: pve-user


We are depicting some vary severe disk corruption on one of our
installation, that is indeed a bit 'niche' but...

PVE 6.4 host on a Dell PowerEdge T340:
	root@sdpve1:~# uname -a
	Linux sdpve1 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) x86_64 GNU/Linux

Debian squeeze i386 on guest:
	sdinny:~# uname -a
	Linux sdinny 2.6.32-5-686 #1 SMP Mon Feb 29 00:51:35 UTC 2016 i686 GNU/Linux

boot disk defined as:
	sata0: local-zfs:vm-120-disk-0,discard=on,size=100G


After enabling PBS, everytime the backup of the VM start:

 root@sdpve1:~# grep vzdump /var/log/syslog.1 
 May 17 20:27:17 sdpve1 pvedaemon[24825]: <root@pam> starting task UPID:sdpve1:00005132:36BE6E40:6283E905:vzdump:120:root@pam:
 May 17 20:27:17 sdpve1 pvedaemon[20786]: INFO: starting new backup job: vzdump 120 --node sdpve1 --storage nfs-scratch --compress zstd --remove 0 --mode snapshot
 May 17 20:36:50 sdpve1 pvedaemon[24825]: <root@pam> end task UPID:sdpve1:00005132:36BE6E40:6283E905:vzdump:120:root@pam: OK
 May 17 22:00:01 sdpve1 CRON[1734]: (root) CMD (vzdump 100 101 120 --mode snapshot --mailto sys@admin --quiet 1 --mailnotification failure --storage pbs-BP)
 May 17 22:00:02 sdpve1 vzdump[1738]: <root@pam> starting task UPID:sdpve1:00000AE6:36C6F7D7:6283FEC2:vzdump::root@pam:
 May 17 22:00:02 sdpve1 vzdump[2790]: INFO: starting new backup job: vzdump 100 101 120 --mailnotification failure --quiet 1 --mode snapshot --storage pbs-BP --mailto sys@admin
 May 17 22:00:02 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 100 (qemu)
 May 17 22:00:52 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 100 (00:00:50)
 May 17 22:00:52 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 101 (qemu)
 May 17 22:02:09 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 101 (00:01:17)
 May 17 22:02:10 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 120 (qemu)
 May 17 23:31:02 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 120 (01:28:52)
 May 17 23:31:02 sdpve1 vzdump[2790]: INFO: Backup job finished successfully
 May 17 23:31:02 sdpve1 vzdump[1738]: <root@pam> end task UPID:sdpve1:00000AE6:36C6F7D7:6283FEC2:vzdump::root@pam: OK

The VM depicted some massive and severe IO trouble:

 May 17 22:40:48 sdinny kernel: [124793.000045] ata3.00: exception Emask 0x0 SAct 0xf43d2c SErr 0x0 action 0x6 frozen
 May 17 22:40:48 sdinny kernel: [124793.000493] ata3.00: failed command: WRITE FPDMA QUEUED
 May 17 22:40:48 sdinny kernel: [124793.000749] ata3.00: cmd 61/10:10:58:e3:01/00:00:05:00:00/40 tag 2 ncq 8192 out
 May 17 22:40:48 sdinny kernel: [124793.000749]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 May 17 22:40:48 sdinny kernel: [124793.001628] ata3.00: status: { DRDY }
 May 17 22:40:48 sdinny kernel: [124793.001850] ata3.00: failed command: WRITE FPDMA QUEUED
 May 17 22:40:48 sdinny kernel: [124793.002175] ata3.00: cmd 61/10:18:70:79:09/00:00:05:00:00/40 tag 3 ncq 8192 out
 May 17 22:40:48 sdinny kernel: [124793.002175]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 May 17 22:40:48 sdinny kernel: [124793.003052] ata3.00: status: { DRDY }
 May 17 22:40:48 sdinny kernel: [124793.003273] ata3.00: failed command: WRITE FPDMA QUEUED
 May 17 22:40:48 sdinny kernel: [124793.003527] ata3.00: cmd 61/10:28:98:31:11/00:00:05:00:00/40 tag 5 ncq 8192 out
 May 17 22:40:48 sdinny kernel: [124793.003559]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 May 17 22:40:48 sdinny kernel: [124793.004420] ata3.00: status: { DRDY }
 May 17 22:40:48 sdinny kernel: [124793.004640] ata3.00: failed command: WRITE FPDMA QUEUED
 May 17 22:40:48 sdinny kernel: [124793.004893] ata3.00: cmd 61/10:40:d8:4a:20/00:00:05:00:00/40 tag 8 ncq 8192 out
 May 17 22:40:48 sdinny kernel: [124793.004894]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 May 17 22:40:48 sdinny kernel: [124793.005769] ata3.00: status: { DRDY }
 [...]
 May 17 22:40:48 sdinny kernel: [124793.020296] ata3: hard resetting link
 May 17 22:41:12 sdinny kernel: [124817.132126] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 May 17 22:41:12 sdinny kernel: [124817.132275] ata3.00: configured for UDMA/100
 May 17 22:41:12 sdinny kernel: [124817.132277] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132279] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132280] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132281] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132281] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132282] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132283] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132284] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132285] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132286] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132287] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132288] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132289] ata3.00: device reported invalid CHS sector 0
 May 17 22:41:12 sdinny kernel: [124817.132295] ata3: EH complete

VM is still 'alive', and works.
But we was forced to do a reboot (power outgage) and after that all the
partition of the disk desappeared, we were forced to restore them with
some tools like 'testdisk'.
Partition on backups the same, desappeared.


Note that there's also a 'plain' local backup that run on sunday, and this
backup task seems does not generate trouble (but still seems to have
partition desappeared, thus was done after an I/O error).


We have hit a Kernel/Qemu bug?

-- 
  E sempre allegri bisogna stare, che il nostro piangere fa male al Re
  fa male al ricco, al Cardinale,
  diventan tristi se noi piangiam...			(Fo, Jannacci)





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] Severe disk corruption: PBS, SATA
  2022-05-18  8:04 [PVE-User] Severe disk corruption: PBS, SATA Marco Gaiarin
@ 2022-05-18 16:20 ` nada
  2022-05-19  4:07   ` Wolf Noble
       [not found] ` <mailman.74.1652864024.356.pve-user@lists.proxmox.com>
  1 sibling, 1 reply; 4+ messages in thread
From: nada @ 2022-05-18 16:20 UTC (permalink / raw)
  To: Proxmox VE user list

hi Marco
you used some local ZFS filesystem according to your info, so you may 
try

zfs list
zpool list -v
zpool history
zpool import ...
zpool replace ...

all the best
Nada

On 2022-05-18 10:04, Marco Gaiarin wrote:
> We are depicting some vary severe disk corruption on one of our
> installation, that is indeed a bit 'niche' but...
> 
> PVE 6.4 host on a Dell PowerEdge T340:
> 	root@sdpve1:~# uname -a
> 	Linux sdpve1 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021
> 11:08:47 +0100) x86_64 GNU/Linux
> 
> Debian squeeze i386 on guest:
> 	sdinny:~# uname -a
> 	Linux sdinny 2.6.32-5-686 #1 SMP Mon Feb 29 00:51:35 UTC 2016 i686 
> GNU/Linux
> 
> boot disk defined as:
> 	sata0: local-zfs:vm-120-disk-0,discard=on,size=100G
> 
> 
> After enabling PBS, everytime the backup of the VM start:
> 
>  root@sdpve1:~# grep vzdump /var/log/syslog.1
>  May 17 20:27:17 sdpve1 pvedaemon[24825]: <root@pam> starting task
> UPID:sdpve1:00005132:36BE6E40:6283E905:vzdump:120:root@pam:
>  May 17 20:27:17 sdpve1 pvedaemon[20786]: INFO: starting new backup
> job: vzdump 120 --node sdpve1 --storage nfs-scratch --compress zstd
> --remove 0 --mode snapshot
>  May 17 20:36:50 sdpve1 pvedaemon[24825]: <root@pam> end task
> UPID:sdpve1:00005132:36BE6E40:6283E905:vzdump:120:root@pam: OK
>  May 17 22:00:01 sdpve1 CRON[1734]: (root) CMD (vzdump 100 101 120
> --mode snapshot --mailto sys@admin --quiet 1 --mailnotification
> failure --storage pbs-BP)
>  May 17 22:00:02 sdpve1 vzdump[1738]: <root@pam> starting task
> UPID:sdpve1:00000AE6:36C6F7D7:6283FEC2:vzdump::root@pam:
>  May 17 22:00:02 sdpve1 vzdump[2790]: INFO: starting new backup job:
> vzdump 100 101 120 --mailnotification failure --quiet 1 --mode
> snapshot --storage pbs-BP --mailto sys@admin
>  May 17 22:00:02 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 100 
> (qemu)
>  May 17 22:00:52 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 100 
> (00:00:50)
>  May 17 22:00:52 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 101 
> (qemu)
>  May 17 22:02:09 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 101 
> (00:01:17)
>  May 17 22:02:10 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 120 
> (qemu)
>  May 17 23:31:02 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 120 
> (01:28:52)
>  May 17 23:31:02 sdpve1 vzdump[2790]: INFO: Backup job finished 
> successfully
>  May 17 23:31:02 sdpve1 vzdump[1738]: <root@pam> end task
> UPID:sdpve1:00000AE6:36C6F7D7:6283FEC2:vzdump::root@pam: OK
> 
> The VM depicted some massive and severe IO trouble:
> 
>  May 17 22:40:48 sdinny kernel: [124793.000045] ata3.00: exception
> Emask 0x0 SAct 0xf43d2c SErr 0x0 action 0x6 frozen
>  May 17 22:40:48 sdinny kernel: [124793.000493] ata3.00: failed
> command: WRITE FPDMA QUEUED
>  May 17 22:40:48 sdinny kernel: [124793.000749] ata3.00: cmd
> 61/10:10:58:e3:01/00:00:05:00:00/40 tag 2 ncq 8192 out
>  May 17 22:40:48 sdinny kernel: [124793.000749]          res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>  May 17 22:40:48 sdinny kernel: [124793.001628] ata3.00: status: { DRDY 
> }
>  May 17 22:40:48 sdinny kernel: [124793.001850] ata3.00: failed
> command: WRITE FPDMA QUEUED
>  May 17 22:40:48 sdinny kernel: [124793.002175] ata3.00: cmd
> 61/10:18:70:79:09/00:00:05:00:00/40 tag 3 ncq 8192 out
>  May 17 22:40:48 sdinny kernel: [124793.002175]          res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>  May 17 22:40:48 sdinny kernel: [124793.003052] ata3.00: status: { DRDY 
> }
>  May 17 22:40:48 sdinny kernel: [124793.003273] ata3.00: failed
> command: WRITE FPDMA QUEUED
>  May 17 22:40:48 sdinny kernel: [124793.003527] ata3.00: cmd
> 61/10:28:98:31:11/00:00:05:00:00/40 tag 5 ncq 8192 out
>  May 17 22:40:48 sdinny kernel: [124793.003559]          res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>  May 17 22:40:48 sdinny kernel: [124793.004420] ata3.00: status: { DRDY 
> }
>  May 17 22:40:48 sdinny kernel: [124793.004640] ata3.00: failed
> command: WRITE FPDMA QUEUED
>  May 17 22:40:48 sdinny kernel: [124793.004893] ata3.00: cmd
> 61/10:40:d8:4a:20/00:00:05:00:00/40 tag 8 ncq 8192 out
>  May 17 22:40:48 sdinny kernel: [124793.004894]          res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>  May 17 22:40:48 sdinny kernel: [124793.005769] ata3.00: status: { DRDY 
> }
>  [...]
>  May 17 22:40:48 sdinny kernel: [124793.020296] ata3: hard resetting 
> link
>  May 17 22:41:12 sdinny kernel: [124817.132126] ata3: SATA link up 1.5
> Gbps (SStatus 113 SControl 300)
>  May 17 22:41:12 sdinny kernel: [124817.132275] ata3.00: configured for 
> UDMA/100
>  May 17 22:41:12 sdinny kernel: [124817.132277] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132279] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132280] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132281] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132281] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132282] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132283] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132284] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132285] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132286] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132287] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132288] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132289] ata3.00: device
> reported invalid CHS sector 0
>  May 17 22:41:12 sdinny kernel: [124817.132295] ata3: EH complete
> 
> VM is still 'alive', and works.
> But we was forced to do a reboot (power outgage) and after that all the
> partition of the disk desappeared, we were forced to restore them with
> some tools like 'testdisk'.
> Partition on backups the same, desappeared.
> 
> 
> Note that there's also a 'plain' local backup that run on sunday, and 
> this
> backup task seems does not generate trouble (but still seems to have
> partition desappeared, thus was done after an I/O error).
> 
> 
> We have hit a Kernel/Qemu bug?



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] Severe disk corruption: PBS, SATA
  2022-05-18 16:20 ` nada
@ 2022-05-19  4:07   ` Wolf Noble
  0 siblings, 0 replies; 4+ messages in thread
From: Wolf Noble @ 2022-05-19  4:07 UTC (permalink / raw)
  To: Proxmox VE user list

from over here in the cheap seats, another potential strangeness injector:

zfs + any sort of raid controller which plays the abstraction game between raw disk and the OS can cause any number of weird and painful scenarios.

ZFS believes it has an accurate idea of the underlying disks.

it does it’s voodoo wholly believing that it’s solely responsible for dealing with data durability.

with a raid controller in between playing the shell game with IO, things USUALLY work…. RIGHT UNTIL THEY DONT.

i’m sure you’re well aware of this, and have probably already mitigated this concern with a jbod controller, or something that isn’t preventing the OS (and thus ZFS) from talking directly to the disks… but It felt worth pointing out on the off chance that this got overlooked.

hope you are well and the gremlins are promptly discovered and put back into their comfortable chairs so they can resume their harmless heckling.


🐺W


[= The contents of this message have been written, read, processed, erased, sorted, sniffed, compressed, rewritten, misspelled, overcompensated, lost, found, and most importantly delivered entirely with recycled electrons =]

> On May 18, 2022, at 11:21, nada <nada@verdnatura.es> wrote:
> 
> hi Marco
> you used some local ZFS filesystem according to your info, so you may try
> 
> zfs list
> zpool list -v
> zpool history
> zpool import ...
> zpool replace ...
> 
> all the best
> Nada
> 
>> On 2022-05-18 10:04, Marco Gaiarin wrote:
>> We are depicting some vary severe disk corruption on one of our
>> installation, that is indeed a bit 'niche' but...
>> PVE 6.4 host on a Dell PowerEdge T340:
>>    root@sdpve1:~# uname -a
>>    Linux sdpve1 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021
>> 11:08:47 +0100) x86_64 GNU/Linux
>> Debian squeeze i386 on guest:
>>    sdinny:~# uname -a
>>    Linux sdinny 2.6.32-5-686 #1 SMP Mon Feb 29 00:51:35 UTC 2016 i686 GNU/Linux
>> boot disk defined as:
>>    sata0: local-zfs:vm-120-disk-0,discard=on,size=100G
>> After enabling PBS, everytime the backup of the VM start:
>> root@sdpve1:~# grep vzdump /var/log/syslog.1
>> May 17 20:27:17 sdpve1 pvedaemon[24825]: <root@pam> starting task
>> UPID:sdpve1:00005132:36BE6E40:6283E905:vzdump:120:root@pam:
>> May 17 20:27:17 sdpve1 pvedaemon[20786]: INFO: starting new backup
>> job: vzdump 120 --node sdpve1 --storage nfs-scratch --compress zstd
>> --remove 0 --mode snapshot
>> May 17 20:36:50 sdpve1 pvedaemon[24825]: <root@pam> end task
>> UPID:sdpve1:00005132:36BE6E40:6283E905:vzdump:120:root@pam: OK
>> May 17 22:00:01 sdpve1 CRON[1734]: (root) CMD (vzdump 100 101 120
>> --mode snapshot --mailto sys@admin --quiet 1 --mailnotification
>> failure --storage pbs-BP)
>> May 17 22:00:02 sdpve1 vzdump[1738]: <root@pam> starting task
>> UPID:sdpve1:00000AE6:36C6F7D7:6283FEC2:vzdump::root@pam:
>> May 17 22:00:02 sdpve1 vzdump[2790]: INFO: starting new backup job:
>> vzdump 100 101 120 --mailnotification failure --quiet 1 --mode
>> snapshot --storage pbs-BP --mailto sys@admin
>> May 17 22:00:02 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 100 (qemu)
>> May 17 22:00:52 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 100 (00:00:50)
>> May 17 22:00:52 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 101 (qemu)
>> May 17 22:02:09 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 101 (00:01:17)
>> May 17 22:02:10 sdpve1 vzdump[2790]: INFO: Starting Backup of VM 120 (qemu)
>> May 17 23:31:02 sdpve1 vzdump[2790]: INFO: Finished Backup of VM 120 (01:28:52)
>> May 17 23:31:02 sdpve1 vzdump[2790]: INFO: Backup job finished successfully
>> May 17 23:31:02 sdpve1 vzdump[1738]: <root@pam> end task
>> UPID:sdpve1:00000AE6:36C6F7D7:6283FEC2:vzdump::root@pam: OK
>> The VM depicted some massive and severe IO trouble:
>> May 17 22:40:48 sdinny kernel: [124793.000045] ata3.00: exception
>> Emask 0x0 SAct 0xf43d2c SErr 0x0 action 0x6 frozen
>> May 17 22:40:48 sdinny kernel: [124793.000493] ata3.00: failed
>> command: WRITE FPDMA QUEUED
>> May 17 22:40:48 sdinny kernel: [124793.000749] ata3.00: cmd
>> 61/10:10:58:e3:01/00:00:05:00:00/40 tag 2 ncq 8192 out
>> May 17 22:40:48 sdinny kernel: [124793.000749]          res
>> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> May 17 22:40:48 sdinny kernel: [124793.001628] ata3.00: status: { DRDY }
>> May 17 22:40:48 sdinny kernel: [124793.001850] ata3.00: failed
>> command: WRITE FPDMA QUEUED
>> May 17 22:40:48 sdinny kernel: [124793.002175] ata3.00: cmd
>> 61/10:18:70:79:09/00:00:05:00:00/40 tag 3 ncq 8192 out
>> May 17 22:40:48 sdinny kernel: [124793.002175]          res
>> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> May 17 22:40:48 sdinny kernel: [124793.003052] ata3.00: status: { DRDY }
>> May 17 22:40:48 sdinny kernel: [124793.003273] ata3.00: failed
>> command: WRITE FPDMA QUEUED
>> May 17 22:40:48 sdinny kernel: [124793.003527] ata3.00: cmd
>> 61/10:28:98:31:11/00:00:05:00:00/40 tag 5 ncq 8192 out
>> May 17 22:40:48 sdinny kernel: [124793.003559]          res
>> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> May 17 22:40:48 sdinny kernel: [124793.004420] ata3.00: status: { DRDY }
>> May 17 22:40:48 sdinny kernel: [124793.004640] ata3.00: failed
>> command: WRITE FPDMA QUEUED
>> May 17 22:40:48 sdinny kernel: [124793.004893] ata3.00: cmd
>> 61/10:40:d8:4a:20/00:00:05:00:00/40 tag 8 ncq 8192 out
>> May 17 22:40:48 sdinny kernel: [124793.004894]          res
>> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> May 17 22:40:48 sdinny kernel: [124793.005769] ata3.00: status: { DRDY }
>> [...]
>> May 17 22:40:48 sdinny kernel: [124793.020296] ata3: hard resetting link
>> May 17 22:41:12 sdinny kernel: [124817.132126] ata3: SATA link up 1.5
>> Gbps (SStatus 113 SControl 300)
>> May 17 22:41:12 sdinny kernel: [124817.132275] ata3.00: configured for UDMA/100
>> May 17 22:41:12 sdinny kernel: [124817.132277] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132279] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132280] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132281] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132281] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132282] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132283] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132284] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132285] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132286] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132287] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132288] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132289] ata3.00: device
>> reported invalid CHS sector 0
>> May 17 22:41:12 sdinny kernel: [124817.132295] ata3: EH complete
>> VM is still 'alive', and works.
>> But we was forced to do a reboot (power outgage) and after that all the
>> partition of the disk desappeared, we were forced to restore them with
>> some tools like 'testdisk'.
>> Partition on backups the same, desappeared.
>> Note that there's also a 'plain' local backup that run on sunday, and this
>> backup task seems does not generate trouble (but still seems to have
>> partition desappeared, thus was done after an I/O error).
>> We have hit a Kernel/Qemu bug?
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] Severe disk corruption: PBS, SATA
       [not found] ` <mailman.74.1652864024.356.pve-user@lists.proxmox.com>
@ 2022-05-20 11:24   ` Marco Gaiarin
  0 siblings, 0 replies; 4+ messages in thread
From: Marco Gaiarin @ 2022-05-20 11:24 UTC (permalink / raw)
  To: Eneko Lacunza via pve-user; +Cc: pve-user

Mandi! Eneko Lacunza via pve-user
  In chel di` si favelave...

> I would try changing that sata0 disk to virtio-blk (maybe in a clone VM 
> first). I think squeeze will support it; then try PBS backup again.

Disks migrated to 'Virtio Block'; now we are doing some tests, but seems to
work well. Thanks.


To others: seems is not an ZFS trouble, the same cluster run other VMs
without fuss... anyway thanks.

-- 
  Una volta qualcuno chiese al Mahatma Gandhi cosa ne pensasse della civiltà
  in occidente. «Credo che sarebbe una buona idea», rispose.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-21  8:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18  8:04 [PVE-User] Severe disk corruption: PBS, SATA Marco Gaiarin
2022-05-18 16:20 ` nada
2022-05-19  4:07   ` Wolf Noble
     [not found] ` <mailman.74.1652864024.356.pve-user@lists.proxmox.com>
2022-05-20 11:24   ` Marco Gaiarin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal