From: Frank Thommen <f.thommen@dkfz-heidelberg.de>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] Backup of one VM always fails
Date: Fri, 4 Dec 2020 11:22:12 +0100 [thread overview]
Message-ID: <e93c3508-d164-4f6b-bfa1-e36975e36778@dkfz-heidelberg.de> (raw)
In-Reply-To: <c1c069d7-af43-ed63-176d-43a9d5fd11b2@dkfz-heidelberg.de>
On 04/12/2020 09:30, Frank Thommen wrote:
>> On Thursday, December 3, 2020 10:16 PM, Frank Thommen
>> <f.thommen@dkfz-heidelberg.de> wrote:
>>
>>>
>>>
>>> Dear all,
>>>
>>> on our PVE cluster, the backup of a specific VM always fails (which
>>> makes us worry, as it is our GitLab instance). The general backup plan
>>> is "back up all VMs at 00:30". In the confirmation email we see, that
>>> the backup of this specific VM takes six to seven hours and then fails.
>>> The error message in the overview table used to be:
>>>
>>> vma_queue_write: write error - Broken pipe
>>>
>>> With detailed log
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> 123: 2020-12-01 02:53:08 INFO: Starting Backup of VM 123 (qemu)
>>> 123: 2020-12-01 02:53:08 INFO: status = running
>>> 123: 2020-12-01 02:53:09 INFO: update VM 123: -lock backup
>>> 123: 2020-12-01 02:53:09 INFO: VM Name: odcf-vm123
>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio0'
>>> 'ceph-rbd:vm-123-disk-0' 20G
>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio1'
>>> 'ceph-rbd:vm-123-disk-2' 1000G
>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio2'
>>> 'ceph-rbd:vm-123-disk-3' 2T
>>> 123: 2020-12-01 02:53:09 INFO: backup mode: snapshot
>>> 123: 2020-12-01 02:53:09 INFO: ionice priority: 7
>>> 123: 2020-12-01 02:53:09 INFO: creating archive
>>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_01-02_53_08.vma.lzo'
>>> 123: 2020-12-01 02:53:09 INFO: started backup task
>>> 'a38ff50a-f474-4b0a-a052-01a835d5c5c7'
>>> 123: 2020-12-01 02:53:12 INFO: status: 0% (167772160/3294239916032),
>>> sparse 0% (31563776), duration 3, read/write 55/45 MB/s
>>> [... ecc. ecc. ...]
>>> 123: 2020-12-01 09:42:14 INFO: status: 35%
>>> (1170252365824/3294239916032), sparse 0% (26845003776), duration 24545,
>>> read/write 59/56 MB/s
>>> 123: 2020-12-01 09:42:14 ERROR: vma_queue_write: write error - Broken
>>> pipe
>>> 123: 2020-12-01 09:42:14 INFO: aborting backup job
>>> 123: 2020-12-01 09:42:15 ERROR: Backup of VM 123 failed -
>>> vma_queue_write: write error - Broken pipe
>>>
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>>>
> ---------
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>>>
>>> Since lately (upgrade to the newest PVE release) it's
>>>
>>> VM 123 qmp command 'query-backup' failed - got timeout
>>>
>>> with log
>>>
>>> --------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> 123: 2020-12-03 03:29:00 INFO: Starting Backup of VM 123 (qemu)
>>> 123: 2020-12-03 03:29:00 INFO: status = running
>>> 123: 2020-12-03 03:29:00 INFO: VM Name: odcf-vm123
>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio0'
>>> 'ceph-rbd:vm-123-disk-0' 20G
>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio1'
>>> 'ceph-rbd:vm-123-disk-2' 1000G
>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio2'
>>> 'ceph-rbd:vm-123-disk-3' 2T
>>> 123: 2020-12-03 03:29:01 INFO: backup mode: snapshot
>>> 123: 2020-12-03 03:29:01 INFO: ionice priority: 7
>>> 123: 2020-12-03 03:29:01 INFO: creating vzdump archive
>>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_03-03_29_00.vma.lzo'
>>> 123: 2020-12-03 03:29:01 INFO: started backup task
>>> 'cc7cde4e-20e8-4e26-a89a-f6f1aa9e9612'
>>> 123: 2020-12-03 03:29:01 INFO: resuming VM again
>>> 123: 2020-12-03 03:29:04 INFO: 0% (284.0 MiB of 3.0 TiB) in 3s, read:
>>> 94.7 MiB/s, write: 51.7 MiB/s
>>> [... ecc. ecc. ...]
>>> 123: 2020-12-03 09:05:08 INFO: 36% (1.1 TiB of 3.0 TiB) in 5h 36m 7s,
>>> read: 57.3 MiB/s, write: 53.6 MiB/s
>>> 123: 2020-12-03 09:22:57 ERROR: VM 123 qmp command 'query-backup' failed
>>>
>>> - got timeout
>>> 123: 2020-12-03 09:22:57 INFO: aborting backup job
>>> 123: 2020-12-03 09:32:57 ERROR: VM 123 qmp command 'backup-cancel'
>>> failed - unable to connect to VM 123 qmp socket - timeout after
>>> 5981 retries
>>> 123: 2020-12-03 09:32:57 ERROR: Backup of VM 123 failed - VM 123 qmp
>>> command 'query-backup' failed - got timeout
>>>
>>>
>>> The VM has some quite big vdisks (20G, 1T and 2T). All stored in Ceph.
>>> There is still plenty of space in Ceph.
>>>
>>> Can anyone give us some hint on how to investigate and debug this
>>> further?
>>
>> Because it is a write error, maybe we should look at the backup
>> destination.
>> Maybe it is a network connection issue? Maybe something wrong with the
>> host? Maybe the disk is full?
>> Which storage are you using for backup? Can you show us the
>> corresponding entry in /etc/pve/storage.cfg?
>
>
> We are backing up to cephfs with still 8 TB or so free.
>
> /etc/pve/storage.cfg is
> ------------
> dir: local
> path /var/lib/vz
> content vztmpl,backup,iso
>
> dir: data
> path /data
> content snippets,images,backup,iso,rootdir,vztmpl
>
> cephfs: cephfs
> path /mnt/pve/cephfs
> content backup,vztmpl,iso
> maxfiles 5
>
> rbd: ceph-rbd
> content images,rootdir
> krbd 0
> pool pve-pool1
> ------------
>
The problem has reached a new level of urgency, as since two days each
time after a failed backup the VMm becomes unaccessible and has to be
stopped and started manually from the PVE UI.
Frank
next prev parent reply other threads:[~2020-12-04 10:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-03 21:16 Frank Thommen
2020-12-03 22:10 ` Gerald Brandt
2020-12-04 8:26 ` Frank Thommen
[not found] ` <mailman.131.1607062291.440.pve-user@lists.proxmox.com>
2020-12-04 8:30 ` Frank Thommen
2020-12-04 10:22 ` Frank Thommen [this message]
2020-12-04 10:26 ` Fabrizio Cuseo
[not found] ` <mailman.2.1607078234.376.pve-user@lists.proxmox.com>
2020-12-04 11:09 ` Frank Thommen
2020-12-04 14:00 ` Yannis Milios
2020-12-04 14:20 ` Frank Thommen
2020-12-04 14:39 ` [PVE-User] PBS WAS : " Ronny Aasen
2020-12-16 18:30 ` [PVE-User] " Frank Thommen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e93c3508-d164-4f6b-bfa1-e36975e36778@dkfz-heidelberg.de \
--to=f.thommen@dkfz-heidelberg.de \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox