public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Frank Thommen <f.thommen@dkfz-heidelberg.de>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] Backup of one VM always fails
Date: Fri, 4 Dec 2020 12:09:44 +0100	[thread overview]
Message-ID: <9d09aa69-95aa-0d96-e119-57b724f29080@dkfz-heidelberg.de> (raw)
In-Reply-To: <mailman.2.1607078234.376.pve-user@lists.proxmox.com>

On 04/12/2020 11:36, Arjen via pve-user wrote:
> On Fri, 2020-12-04 at 11:22 +0100, Frank Thommen wrote:
>>
>> On 04/12/2020 09:30, Frank Thommen wrote:
>>>> On Thursday, December 3, 2020 10:16 PM, Frank Thommen
>>>> <f.thommen@dkfz-heidelberg.de> wrote:
>>>>
>>>>>
>>>>> Dear all,
>>>>>
>>>>> on our PVE cluster, the backup of a specific VM always fails
>>>>> (which
>>>>> makes us worry, as it is our GitLab instance). The general
>>>>> backup plan
>>>>> is "back up all VMs at 00:30". In the confirmation email we
>>>>> see, that
>>>>> the backup of this specific VM takes six to seven hours and
>>>>> then fails.
>>>>> The error message in the overview table used to be:
>>>>>
>>>>> vma_queue_write: write error - Broken pipe
>>>>>
>>>>> With detailed log
>>>>>
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -----------------------------------------------
>>>>>
>>>>>
>>>>> 123: 2020-12-01 02:53:08 INFO: Starting Backup of VM 123 (qemu)
>>>>> 123: 2020-12-01 02:53:08 INFO: status = running
>>>>> 123: 2020-12-01 02:53:09 INFO: update VM 123: -lock backup
>>>>> 123: 2020-12-01 02:53:09 INFO: VM Name: odcf-vm123
>>>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio0'
>>>>> 'ceph-rbd:vm-123-disk-0' 20G
>>>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio1'
>>>>> 'ceph-rbd:vm-123-disk-2' 1000G
>>>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio2'
>>>>> 'ceph-rbd:vm-123-disk-3' 2T
>>>>> 123: 2020-12-01 02:53:09 INFO: backup mode: snapshot
>>>>> 123: 2020-12-01 02:53:09 INFO: ionice priority: 7
>>>>> 123: 2020-12-01 02:53:09 INFO: creating archive
>>>>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_01-
>>>>> 02_53_08.vma.lzo'
>>>>> 123: 2020-12-01 02:53:09 INFO: started backup task
>>>>> 'a38ff50a-f474-4b0a-a052-01a835d5c5c7'
>>>>> 123: 2020-12-01 02:53:12 INFO: status: 0%
>>>>> (167772160/3294239916032),
>>>>> sparse 0% (31563776), duration 3, read/write 55/45 MB/s
>>>>> [... ecc. ecc. ...]
>>>>> 123: 2020-12-01 09:42:14 INFO: status: 35%
>>>>> (1170252365824/3294239916032), sparse 0% (26845003776),
>>>>> duration 24545,
>>>>> read/write 59/56 MB/s
>>>>> 123: 2020-12-01 09:42:14 ERROR: vma_queue_write: write error -
>>>>> Broken
>>>>> pipe
>>>>> 123: 2020-12-01 09:42:14 INFO: aborting backup job
>>>>> 123: 2020-12-01 09:42:15 ERROR: Backup of VM 123 failed -
>>>>> vma_queue_write: write error - Broken pipe
>>>>>
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>> ----------!
>>>>>
>>> ---------
>>> -----------------------------------------------------------------
>>> -----------------------------------------------------------------
>>> -----------------------------------------------------------------
>>> -----------------------------------------------------------------
>>> ----------------------------------
>>>
>>>>> Since lately (upgrade to the newest PVE release) it's
>>>>>
>>>>> VM 123 qmp command 'query-backup' failed - got timeout
>>>>>
>>>>> with log
>>>>>
>>>>> -------------------------------------------------------------
>>>>> -------------------------------------------------------------
>>>>>
>>>>>
>>>>> 123: 2020-12-03 03:29:00 INFO: Starting Backup of VM 123 (qemu)
>>>>> 123: 2020-12-03 03:29:00 INFO: status = running
>>>>> 123: 2020-12-03 03:29:00 INFO: VM Name: odcf-vm123
>>>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio0'
>>>>> 'ceph-rbd:vm-123-disk-0' 20G
>>>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio1'
>>>>> 'ceph-rbd:vm-123-disk-2' 1000G
>>>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio2'
>>>>> 'ceph-rbd:vm-123-disk-3' 2T
>>>>> 123: 2020-12-03 03:29:01 INFO: backup mode: snapshot
>>>>> 123: 2020-12-03 03:29:01 INFO: ionice priority: 7
>>>>> 123: 2020-12-03 03:29:01 INFO: creating vzdump archive
>>>>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_03-
>>>>> 03_29_00.vma.lzo'
>>>>> 123: 2020-12-03 03:29:01 INFO: started backup task
>>>>> 'cc7cde4e-20e8-4e26-a89a-f6f1aa9e9612'
>>>>> 123: 2020-12-03 03:29:01 INFO: resuming VM again
>>>>> 123: 2020-12-03 03:29:04 INFO: 0% (284.0 MiB of 3.0 TiB) in 3s,
>>>>> read:
>>>>> 94.7 MiB/s, write: 51.7 MiB/s
>>>>> [... ecc. ecc. ...]
>>>>> 123: 2020-12-03 09:05:08 INFO: 36% (1.1 TiB of 3.0 TiB) in 5h
>>>>> 36m 7s,
>>>>> read: 57.3 MiB/s, write: 53.6 MiB/s
>>>>> 123: 2020-12-03 09:22:57 ERROR: VM 123 qmp command 'query-
>>>>> backup' failed
>>>>>
>>>>> -   got timeout
>>>>>     123: 2020-12-03 09:22:57 INFO: aborting backup job
>>>>>     123: 2020-12-03 09:32:57 ERROR: VM 123 qmp command 'backup-
>>>>> cancel'
>>>>>     failed - unable to connect to VM 123 qmp socket - timeout
>>>>> after
>>>>> 5981 retries
>>>>>     123: 2020-12-03 09:32:57 ERROR: Backup of VM 123 failed -
>>>>> VM 123 qmp
>>>>>     command 'query-backup' failed - got timeout
>>>>>
>>>>>
>>>>> The VM has some quite big vdisks (20G, 1T and 2T). All stored
>>>>> in Ceph.
>>>>> There is still plenty of space in Ceph.
>>>>>
>>>>> Can anyone give us some hint on how to investigate and debug
>>>>> this
>>>>> further?
>>>>
>>>> Because it is a write error, maybe we should look at the backup
>>>> destination.
>>>> Maybe it is a network connection issue? Maybe something wrong
>>>> with the
>>>> host? Maybe the disk is full?
>>>> Which storage are you using for backup? Can you show us the
>>>> corresponding entry in /etc/pve/storage.cfg?
>>>
>>> We are backing up to cephfs with still 8 TB or so free.
>>>
>>> /etc/pve/storage.cfg is
>>> ------------
>>> dir: local
>>>          path /var/lib/vz
>>>          content vztmpl,backup,iso
>>>
>>> dir: data
>>>          path /data
>>>          content snippets,images,backup,iso,rootdir,vztmpl
>>>
>>> cephfs: cephfs
>>>          path /mnt/pve/cephfs
>>>          content backup,vztmpl,iso
>>>          maxfiles 5
>>>
>>> rbd: ceph-rbd
>>>          content images,rootdir
>>>          krbd 0
>>>          pool pve-pool1
>>> ------------
>>>
>>
>> The problem has reached a new level of urgency, as since two days
>> each
>> time after a failed backup the VMm becomes unaccessible and has to be
>> stopped and started manually from the PVE UI.
> 
> I don't see anything wrong the configuration that you shared.
> Was anything changed in the last few days since the last successful
> backup? Any updates from Proxmox? Changes to the network?
> I know very little about Ceph and clusters, sorry.
> What makes this VM different, except for the size of the disks?

On December 1st the Hypervisor has been updated to PVE 6.3-2 (I think 
from 6.1-3).  After that the error message slightly changed and - in 
hindsight - since then the VM stops being accessible after the failed 
backup.

However: The VM never ever backed up successfully, not even before the 
PVE upgrade.  It's just that no one really took notice of it.

The VM is not really special.  It's our only Debian VM (but I hope 
that's not an issue :-) and the VM has been migrated 1:1 from oVirt by 
migrating and importing the disk images.  But we have a few other such 
VMs and they run and back up just fine.

No network changes. Basically nothing changed that I could think of.

But to be clear: Our current main problem is the failing backup, not the 
crash.


Cheers, Frank






  parent reply	other threads:[~2020-12-04 11:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 21:16 Frank Thommen
2020-12-03 22:10 ` Gerald Brandt
2020-12-04  8:26   ` Frank Thommen
     [not found] ` <mailman.131.1607062291.440.pve-user@lists.proxmox.com>
2020-12-04  8:30   ` Frank Thommen
2020-12-04 10:22     ` Frank Thommen
2020-12-04 10:26       ` Fabrizio Cuseo
     [not found]       ` <mailman.2.1607078234.376.pve-user@lists.proxmox.com>
2020-12-04 11:09         ` Frank Thommen [this message]
2020-12-04 14:00           ` Yannis Milios
2020-12-04 14:20             ` Frank Thommen
2020-12-04 14:39               ` [PVE-User] PBS WAS : " Ronny Aasen
2020-12-16 18:30               ` [PVE-User] " Frank Thommen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d09aa69-95aa-0d96-e119-57b724f29080@dkfz-heidelberg.de \
    --to=f.thommen@dkfz-heidelberg.de \
    --cc=pve-user@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal