public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] Unresponsive VM(s) during VZdump
@ 2024-05-09  8:35 Iztok Gregori
  2024-05-09  9:30 ` Mike O'Connor
  0 siblings, 1 reply; 8+ messages in thread
From: Iztok Gregori @ 2024-05-09  8:35 UTC (permalink / raw)
  To: Proxmox VE user list

Hi to all!

We are in the process of upgrading our Hyper-converged (Ceph based) 
cluster from PVE 6 to PVE 8 and yesterday we finished upgrading all 
nodes to PVE 7.4.1 without issues. Tonight, during our usual VZdump 
backup (vzdump on NFS share), we were notified by our monitoring system 
that 2 VMs (of 107) were unresponsive. In the VM logs there were a lot 
of lines like this:

kernel: hda: irq timeout: status=0xd0 { Busy }
kernel: sd 2:0:0:0: [sda] abort
kernel: sd 2:0:0:1: [sdb] abort

After (successfully) finish the backup, the VM started to function 
correctly again.

On PVE 6 everything was ok.

The affected machines are running old kernels "2.6.18" and "2.6.32", one 
has qemu agent enabled the other has not. Both are using kvm64 as 
processor type, one is using "Virtio Scsi" the other "LSI 53C895A". All 
the disks are on Ceph RBD.

No related logs were logged on the host machine, the Ceph cluster was 
working as expected. Both VM are "biggish" 100-200GB and it takes 1/2 
hours to complete the backup.

Have you any idea what could be the culprit of the problem? I suspect 
something with qemu-kvm, but I didn't find (yet) any usefull hints.

I'm still planning to upgrade everything to PVE 8, maybe the "problem" 
was fixed in later releases of qemu-kvm...

I can give you more information if needed, any help is appreciated.

Thanks
   Iztok

P.S This is the software stack on our cluster (16 nodes):
# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.149-1-pve)
pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
pve-kernel-5.15: 7.4-12
pve-kernel-5.4: 6.4-20
pve-kernel-5.15.149-1-pve: 5.15.149-1
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph: 15.2.17-pve1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.3
libpve-apiclient-perl: 3.2-2
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.6-1
proxmox-backup-file-restore: 2.4.6-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+3
pve-firewall: 4.3-5
pve-firmware: 3.6-6
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.10-1
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-5
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.15-pve1

-- 
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
http://www.elettra.eu

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-10  9:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-09  8:35 [PVE-User] Unresponsive VM(s) during VZdump Iztok Gregori
2024-05-09  9:30 ` Mike O'Connor
2024-05-09 10:02   ` Iztok Gregori
2024-05-09 10:11     ` Mike O'Connor
2024-05-09 11:24       ` Alexander Burke via pve-user
     [not found]       ` <11db3d6f-1879-44b3-9f99-01e6fde6ebc8@alexburke.ca>
2024-05-10  5:21         ` Mike O'Connor
2024-05-10  9:07         ` Fiona Ebner
2024-05-10  7:36     ` Iztok Gregori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal