From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id E17861FF3A1 for ; Thu, 9 May 2024 10:44:31 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A6CBBB8D2; Thu, 9 May 2024 10:44:32 +0200 (CEST) X-Envelope-From: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=elettra.eu; s=esgkey1; t=1715243722; bh=hC66tVlpcJPzdsRPCf/R4YxnilyFMvMyyFbHw3/Y+SQ=; h=Date:From:To:Subject:From; b=roXblcYqkErpfOUMcDTTRa+Fm4Sp7tNZ5gkh/ugZsvBQnjBIEsO2Xtxjqr7AZzGwC d7S+EFdZ8PgMc47l5kMmIXQbKe/StXf3B+rKd7y2vqzOtg7GzjWoxDctoyp70GBmeV j3IbQTgi7HyJv4/TZ1Jx0WiJB/l380kKyWo1aMFE= X-Virus-Scanned: amavis at zmp.elettra.eu Message-ID: <738ba899-5a52-4000-ba61-83dd0e360df4@elettra.eu> Date: Thu, 9 May 2024 10:35:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Iztok Gregori Content-Language: it, en-US To: Proxmox VE user list X-elettra-Libra-ESVA-Information: Please contact elettra for more information X-elettra-Libra-ESVA-ID: 4VZlhB4yHtzBrKr X-elettra-Libra-ESVA: No virus found X-elettra-Libra-ESVA-From: iztok.gregori@elettra.eu X-elettra-Libra-ESVA-Watermark: 1715848524.65152@kkzniwpntdFjOYEDCAkuHg X-SPAM-LEVEL: Spam detection results: 0 AWL 0.008 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy KAM_EU 0.5 Prevalent use of .eu in spam/malware SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [elettra.eu] Subject: [PVE-User] Unresponsive VM(s) during VZdump X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE user list Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pve-user-bounces@lists.proxmox.com Sender: "pve-user" Hi to all! We are in the process of upgrading our Hyper-converged (Ceph based) cluster from PVE 6 to PVE 8 and yesterday we finished upgrading all nodes to PVE 7.4.1 without issues. Tonight, during our usual VZdump backup (vzdump on NFS share), we were notified by our monitoring system that 2 VMs (of 107) were unresponsive. In the VM logs there were a lot of lines like this: kernel: hda: irq timeout: status=0xd0 { Busy } kernel: sd 2:0:0:0: [sda] abort kernel: sd 2:0:0:1: [sdb] abort After (successfully) finish the backup, the VM started to function correctly again. On PVE 6 everything was ok. The affected machines are running old kernels "2.6.18" and "2.6.32", one has qemu agent enabled the other has not. Both are using kvm64 as processor type, one is using "Virtio Scsi" the other "LSI 53C895A". All the disks are on Ceph RBD. No related logs were logged on the host machine, the Ceph cluster was working as expected. Both VM are "biggish" 100-200GB and it takes 1/2 hours to complete the backup. Have you any idea what could be the culprit of the problem? I suspect something with qemu-kvm, but I didn't find (yet) any usefull hints. I'm still planning to upgrade everything to PVE 8, maybe the "problem" was fixed in later releases of qemu-kvm... I can give you more information if needed, any help is appreciated. Thanks Iztok P.S This is the software stack on our cluster (16 nodes): # pveversion -v proxmox-ve: 7.4-1 (running kernel: 5.15.149-1-pve) pve-manager: 7.4-17 (running version: 7.4-17/513c62be) pve-kernel-5.15: 7.4-12 pve-kernel-5.4: 6.4-20 pve-kernel-5.15.149-1-pve: 5.15.149-1 pve-kernel-5.4.203-1-pve: 5.4.203-1 pve-kernel-5.4.157-1-pve: 5.4.157-1 pve-kernel-5.4.106-1-pve: 5.4.106-1 ceph: 15.2.17-pve1 ceph-fuse: 15.2.17-pve1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: 0.8.36+pve2 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.4 libproxmox-backup-qemu0: 1.3.1-1 libproxmox-rs-perl: 0.2.1 libpve-access-control: 7.4.3 libpve-apiclient-perl: 3.2-2 libpve-common-perl: 7.4-2 libpve-guest-common-perl: 4.2-4 libpve-http-server-perl: 4.2-3 libpve-rs-perl: 0.7.7 libpve-storage-perl: 7.4-3 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-2 lxcfs: 5.0.3-pve1 novnc-pve: 1.4.0-1 proxmox-backup-client: 2.4.6-1 proxmox-backup-file-restore: 2.4.6-1 proxmox-kernel-helper: 7.4-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-offline-mirror-helper: 0.5.2 proxmox-widget-toolkit: 3.7.3 pve-cluster: 7.3-3 pve-container: 4.4-6 pve-docs: 7.4-2 pve-edk2-firmware: 3.20230228-4~bpo11+3 pve-firewall: 4.3-5 pve-firmware: 3.6-6 pve-ha-manager: 3.6.1 pve-i18n: 2.12-1 pve-qemu-kvm: 7.2.10-1 pve-xtermjs: 4.16.0-2 qemu-server: 7.4-5 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+3 vncterm: 1.7-1 zfsutils-linux: 2.1.15-pve1 -- Iztok Gregori ICT Systems and Services Elettra - Sincrotrone Trieste S.C.p.A. http://www.elettra.eu _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user