public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Frank Thommen <f.thommen@dkfz-heidelberg.de>
To: PVE User List <pve-user@pve.proxmox.com>
Subject: [PVE-User] Backup of one VM always fails
Date: Thu, 3 Dec 2020 22:16:59 +0100	[thread overview]
Message-ID: <6f8b35b3-bd74-93f1-5298-eb9980c70d77@dkfz-heidelberg.de> (raw)


Dear all,

on our PVE cluster, the backup of a specific VM always fails (which 
makes us worry, as it is our GitLab instance).  The general backup plan 
is "back up all VMs at 00:30".  In the confirmation email we see, that 
the backup of this specific VM takes six to seven hours and then fails. 
The error message in the overview table used to be:

   vma_queue_write: write error - Broken pipe

With detailed log
---------------------
123: 2020-12-01 02:53:08 INFO: Starting Backup of VM 123 (qemu)
123: 2020-12-01 02:53:08 INFO: status = running
123: 2020-12-01 02:53:09 INFO: update VM 123: -lock backup
123: 2020-12-01 02:53:09 INFO: VM Name: odcf-vm123
123: 2020-12-01 02:53:09 INFO: include disk 'virtio0' 
'ceph-rbd:vm-123-disk-0' 20G
123: 2020-12-01 02:53:09 INFO: include disk 'virtio1' 
'ceph-rbd:vm-123-disk-2' 1000G
123: 2020-12-01 02:53:09 INFO: include disk 'virtio2' 
'ceph-rbd:vm-123-disk-3' 2T
123: 2020-12-01 02:53:09 INFO: backup mode: snapshot
123: 2020-12-01 02:53:09 INFO: ionice priority: 7
123: 2020-12-01 02:53:09 INFO: creating archive 
'/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_01-02_53_08.vma.lzo'
123: 2020-12-01 02:53:09 INFO: started backup task 
'a38ff50a-f474-4b0a-a052-01a835d5c5c7'
123: 2020-12-01 02:53:12 INFO: status: 0% (167772160/3294239916032), 
sparse 0% (31563776), duration 3, read/write 55/45 MB/s
[... ecc. ecc. ...]
123: 2020-12-01 09:42:14 INFO: status: 35% 
(1170252365824/3294239916032), sparse 0% (26845003776), duration 24545, 
read/write 59/56 MB/s
123: 2020-12-01 09:42:14 ERROR: vma_queue_write: write error - Broken pipe
123: 2020-12-01 09:42:14 INFO: aborting backup job
123: 2020-12-01 09:42:15 ERROR: Backup of VM 123 failed - 
vma_queue_write: write error - Broken pipe
---------------------

Since lately (upgrade to the newest PVE release) it's

   VM 123 qmp command 'query-backup' failed - got timeout

with log
---------------------
123: 2020-12-03 03:29:00 INFO: Starting Backup of VM 123 (qemu)
123: 2020-12-03 03:29:00 INFO: status = running
123: 2020-12-03 03:29:00 INFO: VM Name: odcf-vm123
123: 2020-12-03 03:29:00 INFO: include disk 'virtio0' 
'ceph-rbd:vm-123-disk-0' 20G
123: 2020-12-03 03:29:00 INFO: include disk 'virtio1' 
'ceph-rbd:vm-123-disk-2' 1000G
123: 2020-12-03 03:29:00 INFO: include disk 'virtio2' 
'ceph-rbd:vm-123-disk-3' 2T
123: 2020-12-03 03:29:01 INFO: backup mode: snapshot
123: 2020-12-03 03:29:01 INFO: ionice priority: 7
123: 2020-12-03 03:29:01 INFO: creating vzdump archive 
'/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_03-03_29_00.vma.lzo'
123: 2020-12-03 03:29:01 INFO: started backup task 
'cc7cde4e-20e8-4e26-a89a-f6f1aa9e9612'
123: 2020-12-03 03:29:01 INFO: resuming VM again
123: 2020-12-03 03:29:04 INFO:   0% (284.0 MiB of 3.0 TiB) in  3s, read: 
94.7 MiB/s, write: 51.7 MiB/s
[... ecc. ecc. ...]
123: 2020-12-03 09:05:08 INFO:  36% (1.1 TiB of 3.0 TiB) in  5h 36m  7s, 
read: 57.3 MiB/s, write: 53.6 MiB/s
123: 2020-12-03 09:22:57 ERROR: VM 123 qmp command 'query-backup' failed 
- got timeout
123: 2020-12-03 09:22:57 INFO: aborting backup job
123: 2020-12-03 09:32:57 ERROR: VM 123 qmp command 'backup-cancel' 
failed - unable to connect to VM 123 qmp socket - timeout after 5981 retries
123: 2020-12-03 09:32:57 ERROR: Backup of VM 123 failed - VM 123 qmp 
command 'query-backup' failed - got timeout
---------------------

The VM has some quite big vdisks (20G, 1T and 2T).  All stored in Ceph. 
There is still plenty of space in Ceph.

Can anyone give us some hint on how to investigate and debug this further?

Thanks in advance
Frank



             reply	other threads:[~2020-12-03 22:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 21:16 Frank Thommen [this message]
2020-12-03 22:10 ` Gerald Brandt
2020-12-04  8:26   ` Frank Thommen
     [not found] ` <mailman.131.1607062291.440.pve-user@lists.proxmox.com>
2020-12-04  8:30   ` Frank Thommen
2020-12-04 10:22     ` Frank Thommen
2020-12-04 10:26       ` Fabrizio Cuseo
     [not found]       ` <mailman.2.1607078234.376.pve-user@lists.proxmox.com>
2020-12-04 11:09         ` Frank Thommen
2020-12-04 14:00           ` Yannis Milios
2020-12-04 14:20             ` Frank Thommen
2020-12-04 14:39               ` [PVE-User] PBS WAS : " Ronny Aasen
2020-12-16 18:30               ` [PVE-User] " Frank Thommen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f8b35b3-bd74-93f1-5298-eb9980c70d77@dkfz-heidelberg.de \
    --to=f.thommen@dkfz-heidelberg.de \
    --cc=pve-user@pve.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal