From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <gbr@majentis.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id BDA7761ED5 for <pve-user@lists.proxmox.com>; Thu, 3 Dec 2020 23:17:20 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A42BD2F64A for <pve-user@lists.proxmox.com>; Thu, 3 Dec 2020 23:16:50 +0100 (CET) Received: from sogo.majentis.com (mail.majentis.com [192.95.2.141]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 084842F63E for <pve-user@lists.proxmox.com>; Thu, 3 Dec 2020 23:16:46 +0100 (CET) Received: from [172.23.4.22] (S01061056118fd6d9.wp.shawcable.net [50.72.158.148]) (Authenticated sender: gbr) by sogo.majentis.com (Postfix) with ESMTPSA id 1EE4D56165E for <pve-user@lists.proxmox.com>; Thu, 3 Dec 2020 16:10:37 -0600 (CST) To: pve-user@lists.proxmox.com References: <6f8b35b3-bd74-93f1-5298-eb9980c70d77@dkfz-heidelberg.de> From: Gerald Brandt <gbr@majentis.com> Message-ID: <2d44472e-c0ad-5796-f257-46f51418a582@majentis.com> Date: Thu, 3 Dec 2020 16:10:36 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <6f8b35b3-bd74-93f1-5298-eb9980c70d77@dkfz-heidelberg.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-SPAM-LEVEL: Spam detection results: 0 KAM_ASCII_DIVIDERS 0.8 Spam that uses ascii formatting tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [PVE-User] Backup of one VM always fails X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list <pve-user.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>, <mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-user/> List-Post: <mailto:pve-user@lists.proxmox.com> List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>, <mailto:pve-user-request@lists.proxmox.com?subject=subscribe> X-List-Received-Date: Thu, 03 Dec 2020 22:17:20 -0000 Sounds like a disk space issue. You're running out of it during the backup process. Gerald On 2020-12-03 3:16 p.m., Frank Thommen wrote: > > Dear all, > > on our PVE cluster, the backup of a specific VM always fails (which > makes us worry, as it is our GitLab instance). The general backup > plan is "back up all VMs at 00:30". In the confirmation email we see, > that the backup of this specific VM takes six to seven hours and then > fails. The error message in the overview table used to be: > > vma_queue_write: write error - Broken pipe > > With detailed log > --------------------- > 123: 2020-12-01 02:53:08 INFO: Starting Backup of VM 123 (qemu) > 123: 2020-12-01 02:53:08 INFO: status = running > 123: 2020-12-01 02:53:09 INFO: update VM 123: -lock backup > 123: 2020-12-01 02:53:09 INFO: VM Name: odcf-vm123 > 123: 2020-12-01 02:53:09 INFO: include disk 'virtio0' > 'ceph-rbd:vm-123-disk-0' 20G > 123: 2020-12-01 02:53:09 INFO: include disk 'virtio1' > 'ceph-rbd:vm-123-disk-2' 1000G > 123: 2020-12-01 02:53:09 INFO: include disk 'virtio2' > 'ceph-rbd:vm-123-disk-3' 2T > 123: 2020-12-01 02:53:09 INFO: backup mode: snapshot > 123: 2020-12-01 02:53:09 INFO: ionice priority: 7 > 123: 2020-12-01 02:53:09 INFO: creating archive > '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_01-02_53_08.vma.lzo' > 123: 2020-12-01 02:53:09 INFO: started backup task > 'a38ff50a-f474-4b0a-a052-01a835d5c5c7' > 123: 2020-12-01 02:53:12 INFO: status: 0% (167772160/3294239916032), > sparse 0% (31563776), duration 3, read/write 55/45 MB/s > [... ecc. ecc. ...] > 123: 2020-12-01 09:42:14 INFO: status: 35% > (1170252365824/3294239916032), sparse 0% (26845003776), duration > 24545, read/write 59/56 MB/s > 123: 2020-12-01 09:42:14 ERROR: vma_queue_write: write error - Broken > pipe > 123: 2020-12-01 09:42:14 INFO: aborting backup job > 123: 2020-12-01 09:42:15 ERROR: Backup of VM 123 failed - > vma_queue_write: write error - Broken pipe > --------------------- > > Since lately (upgrade to the newest PVE release) it's > > VM 123 qmp command 'query-backup' failed - got timeout > > with log > --------------------- > 123: 2020-12-03 03:29:00 INFO: Starting Backup of VM 123 (qemu) > 123: 2020-12-03 03:29:00 INFO: status = running > 123: 2020-12-03 03:29:00 INFO: VM Name: odcf-vm123 > 123: 2020-12-03 03:29:00 INFO: include disk 'virtio0' > 'ceph-rbd:vm-123-disk-0' 20G > 123: 2020-12-03 03:29:00 INFO: include disk 'virtio1' > 'ceph-rbd:vm-123-disk-2' 1000G > 123: 2020-12-03 03:29:00 INFO: include disk 'virtio2' > 'ceph-rbd:vm-123-disk-3' 2T > 123: 2020-12-03 03:29:01 INFO: backup mode: snapshot > 123: 2020-12-03 03:29:01 INFO: ionice priority: 7 > 123: 2020-12-03 03:29:01 INFO: creating vzdump archive > '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_03-03_29_00.vma.lzo' > 123: 2020-12-03 03:29:01 INFO: started backup task > 'cc7cde4e-20e8-4e26-a89a-f6f1aa9e9612' > 123: 2020-12-03 03:29:01 INFO: resuming VM again > 123: 2020-12-03 03:29:04 INFO: 0% (284.0 MiB of 3.0 TiB) in 3s, > read: 94.7 MiB/s, write: 51.7 MiB/s > [... ecc. ecc. ...] > 123: 2020-12-03 09:05:08 INFO: 36% (1.1 TiB of 3.0 TiB) in 5h 36m > 7s, read: 57.3 MiB/s, write: 53.6 MiB/s > 123: 2020-12-03 09:22:57 ERROR: VM 123 qmp command 'query-backup' > failed - got timeout > 123: 2020-12-03 09:22:57 INFO: aborting backup job > 123: 2020-12-03 09:32:57 ERROR: VM 123 qmp command 'backup-cancel' > failed - unable to connect to VM 123 qmp socket - timeout after 5981 > retries > 123: 2020-12-03 09:32:57 ERROR: Backup of VM 123 failed - VM 123 qmp > command 'query-backup' failed - got timeout > --------------------- > > The VM has some quite big vdisks (20G, 1T and 2T). All stored in > Ceph. There is still plenty of space in Ceph. > > Can anyone give us some hint on how to investigate and debug this > further? > > Thanks in advance > Frank > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > >