From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <yannis.milios@gmail.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 4C5A068D1A
 for <pve-user@lists.proxmox.com>; Fri,  4 Dec 2020 15:00:22 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 42800AAE0
 for <pve-user@lists.proxmox.com>; Fri,  4 Dec 2020 15:00:22 +0100 (CET)
Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com
 [IPv6:2a00:1450:4864:20::131])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 610C8AACB
 for <pve-user@lists.proxmox.com>; Fri,  4 Dec 2020 15:00:20 +0100 (CET)
Received: by mail-lf1-x131.google.com with SMTP id q13so7750114lfr.10
 for <pve-user@lists.proxmox.com>; Fri, 04 Dec 2020 06:00:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
 bh=N1E7GS/eZ5NEqDvN/4P0qdv6ekSnaz3H/BG/r7Hu3eo=;
 b=e369jHLUx3PBr3uuya1G777JScUeMPwXeJyyL/zEim8MhJAWAki7pPnO3VGb5uD/0D
 BJpMrCUBVYkIEcBMk24pWIL98ikaZzg0BkxIW2ArPVC2XOtUbssHG/h0+JBh+fog5iIL
 AAvTC6lBiy152VolJpBBiDyykOV8AeYnVYaK7bmtXuLfWjeyN6YjzsEypkMisJfxq+n9
 YIMyzwrIsWXlLjInQKjE1qbENqGpIX1dpABpy5Wml4eqY7e6U+EDUPZh9J65G6v8UzZO
 TG3/B7+X70cjNK40M1j0RnM/oVzUPfsrCjDaV7pjhP3z2SLgjJMD6GraZ/jkkf1Az5qX
 wpDg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to;
 bh=N1E7GS/eZ5NEqDvN/4P0qdv6ekSnaz3H/BG/r7Hu3eo=;
 b=hztYAM168Cm4ja9URRXF4CrX6M9/Rl0EEhLY2y49klfbRQzKcL/UssdEcJOGdIq1Or
 49M6wRH3nOLK53ZNSxWtBQ3Gkqnc4u1UzWKh4efixNce3skL/Qo2PgTZQyffctGr4JMS
 q8yU/cQBonJ9hJhvbyLGTAGK4y5ZXg66YxSB0wK1LLX16sQNO8NNmp4FxeZYWAiQ9h1q
 cIP3BULgdSoToyNbqi/6rfbRgDmsI5Ywxt/ZBQZVOWXmWgSzje4gyUoGqzBAHGNixhYL
 ZeIP6+OCnfPyQHKmQNo+8UFBqlJOM0/c2WPHEKVn8z98s+evXwSU50YilM3ZdtPhhJZg
 NCSw==
X-Gm-Message-State: AOAM5309V9+n+ucJimLkewWWOc52yvsPhCjzDMPmKw5tux6Q/44pP34j
 V5bW+GXEvlgY4FHKw8BekzrZCvuoPBmKF0vsx+J1w+iES9bs5g==
X-Google-Smtp-Source: ABdhPJwfAxZOeQ8O0IckcNolNiek2GGw6/UJXlM+q+SjlUvAdCSaRNWgrXh/QqCx729WukBxpUEQ8WZprA+eN513mlE=
X-Received: by 2002:a05:6512:33bc:: with SMTP id
 i28mr3295601lfg.52.1607090413142; 
 Fri, 04 Dec 2020 06:00:13 -0800 (PST)
MIME-Version: 1.0
References: <6f8b35b3-bd74-93f1-5298-eb9980c70d77@dkfz-heidelberg.de>
 <mailman.131.1607062291.440.pve-user@lists.proxmox.com>
 <c1c069d7-af43-ed63-176d-43a9d5fd11b2@dkfz-heidelberg.de>
 <e93c3508-d164-4f6b-bfa1-e36975e36778@dkfz-heidelberg.de>
 <mailman.2.1607078234.376.pve-user@lists.proxmox.com>
 <9d09aa69-95aa-0d96-e119-57b724f29080@dkfz-heidelberg.de>
In-Reply-To: <9d09aa69-95aa-0d96-e119-57b724f29080@dkfz-heidelberg.de>
From: Yannis Milios <yannis.milios@gmail.com>
Date: Fri, 4 Dec 2020 14:00:01 +0000
Message-ID: <CAFiF2Or3cXrb1zRQ4neCyQs6wqy6xDTppJxBzkUc-zL=2kYrDQ@mail.gmail.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.023 Adjusted score from AWL reputation of From: address
 DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
 DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature
 DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's
 domain
 DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from
 domain
 FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider
 HTML_MESSAGE            0.001 HTML included in message
 RCVD_IN_DNSWL_NONE     -0.0001 Sender listed at https://www.dnswl.org/,
 no trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox.com]
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [PVE-User] Backup of one VM always fails
X-BeenThere: pve-user@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE user list <pve-user.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-user/>
List-Post: <mailto:pve-user@lists.proxmox.com>
List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 04 Dec 2020 14:00:22 -0000

Can you try removing this specific VM from the normal backup schedule and
then create a new test schedule for it, if possible to a different backup
target (nfs, local etc) ?



On Fri, 4 Dec 2020 at 11:10, Frank Thommen <f.thommen@dkfz-heidelberg.de>
wrote:

> On 04/12/2020 11:36, Arjen via pve-user wrote:
> > On Fri, 2020-12-04 at 11:22 +0100, Frank Thommen wrote:
> >>
> >> On 04/12/2020 09:30, Frank Thommen wrote:
> >>>> On Thursday, December 3, 2020 10:16 PM, Frank Thommen
> >>>> <f.thommen@dkfz-heidelberg.de> wrote:
> >>>>
> >>>>>
> >>>>> Dear all,
> >>>>>
> >>>>> on our PVE cluster, the backup of a specific VM always fails
> >>>>> (which
> >>>>> makes us worry, as it is our GitLab instance). The general
> >>>>> backup plan
> >>>>> is "back up all VMs at 00:30". In the confirmation email we
> >>>>> see, that
> >>>>> the backup of this specific VM takes six to seven hours and
> >>>>> then fails.
> >>>>> The error message in the overview table used to be:
> >>>>>
> >>>>> vma_queue_write: write error - Broken pipe
> >>>>>
> >>>>> With detailed log
> >>>>>
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -----------------------------------------------
> >>>>>
> >>>>>
> >>>>> 123: 2020-12-01 02:53:08 INFO: Starting Backup of VM 123 (qemu)
> >>>>> 123: 2020-12-01 02:53:08 INFO: status = running
> >>>>> 123: 2020-12-01 02:53:09 INFO: update VM 123: -lock backup
> >>>>> 123: 2020-12-01 02:53:09 INFO: VM Name: odcf-vm123
> >>>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio0'
> >>>>> 'ceph-rbd:vm-123-disk-0' 20G
> >>>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio1'
> >>>>> 'ceph-rbd:vm-123-disk-2' 1000G
> >>>>> 123: 2020-12-01 02:53:09 INFO: include disk 'virtio2'
> >>>>> 'ceph-rbd:vm-123-disk-3' 2T
> >>>>> 123: 2020-12-01 02:53:09 INFO: backup mode: snapshot
> >>>>> 123: 2020-12-01 02:53:09 INFO: ionice priority: 7
> >>>>> 123: 2020-12-01 02:53:09 INFO: creating archive
> >>>>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_01-
> >>>>> 02_53_08.vma.lzo'
> >>>>> 123: 2020-12-01 02:53:09 INFO: started backup task
> >>>>> 'a38ff50a-f474-4b0a-a052-01a835d5c5c7'
> >>>>> 123: 2020-12-01 02:53:12 INFO: status: 0%
> >>>>> (167772160/3294239916032),
> >>>>> sparse 0% (31563776), duration 3, read/write 55/45 MB/s
> >>>>> [... ecc. ecc. ...]
> >>>>> 123: 2020-12-01 09:42:14 INFO: status: 35%
> >>>>> (1170252365824/3294239916032), sparse 0% (26845003776),
> >>>>> duration 24545,
> >>>>> read/write 59/56 MB/s
> >>>>> 123: 2020-12-01 09:42:14 ERROR: vma_queue_write: write error -
> >>>>> Broken
> >>>>> pipe
> >>>>> 123: 2020-12-01 09:42:14 INFO: aborting backup job
> >>>>> 123: 2020-12-01 09:42:15 ERROR: Backup of VM 123 failed -
> >>>>> vma_queue_write: write error - Broken pipe
> >>>>>
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>> ----------!
> >>>>>
> >>> ---------
> >>> -----------------------------------------------------------------
> >>> -----------------------------------------------------------------
> >>> -----------------------------------------------------------------
> >>> -----------------------------------------------------------------
> >>> ----------------------------------
> >>>
> >>>>> Since lately (upgrade to the newest PVE release) it's
> >>>>>
> >>>>> VM 123 qmp command 'query-backup' failed - got timeout
> >>>>>
> >>>>> with log
> >>>>>
> >>>>> -------------------------------------------------------------
> >>>>> -------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>> 123: 2020-12-03 03:29:00 INFO: Starting Backup of VM 123 (qemu)
> >>>>> 123: 2020-12-03 03:29:00 INFO: status = running
> >>>>> 123: 2020-12-03 03:29:00 INFO: VM Name: odcf-vm123
> >>>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio0'
> >>>>> 'ceph-rbd:vm-123-disk-0' 20G
> >>>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio1'
> >>>>> 'ceph-rbd:vm-123-disk-2' 1000G
> >>>>> 123: 2020-12-03 03:29:00 INFO: include disk 'virtio2'
> >>>>> 'ceph-rbd:vm-123-disk-3' 2T
> >>>>> 123: 2020-12-03 03:29:01 INFO: backup mode: snapshot
> >>>>> 123: 2020-12-03 03:29:01 INFO: ionice priority: 7
> >>>>> 123: 2020-12-03 03:29:01 INFO: creating vzdump archive
> >>>>> '/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_03-
> >>>>> 03_29_00.vma.lzo'
> >>>>> 123: 2020-12-03 03:29:01 INFO: started backup task
> >>>>> 'cc7cde4e-20e8-4e26-a89a-f6f1aa9e9612'
> >>>>> 123: 2020-12-03 03:29:01 INFO: resuming VM again
> >>>>> 123: 2020-12-03 03:29:04 INFO: 0% (284.0 MiB of 3.0 TiB) in 3s,
> >>>>> read:
> >>>>> 94.7 MiB/s, write: 51.7 MiB/s
> >>>>> [... ecc. ecc. ...]
> >>>>> 123: 2020-12-03 09:05:08 INFO: 36% (1.1 TiB of 3.0 TiB) in 5h
> >>>>> 36m 7s,
> >>>>> read: 57.3 MiB/s, write: 53.6 MiB/s
> >>>>> 123: 2020-12-03 09:22:57 ERROR: VM 123 qmp command 'query-
> >>>>> backup' failed
> >>>>>
> >>>>> -   got timeout
> >>>>>     123: 2020-12-03 09:22:57 INFO: aborting backup job
> >>>>>     123: 2020-12-03 09:32:57 ERROR: VM 123 qmp command 'backup-
> >>>>> cancel'
> >>>>>     failed - unable to connect to VM 123 qmp socket - timeout
> >>>>> after
> >>>>> 5981 retries
> >>>>>     123: 2020-12-03 09:32:57 ERROR: Backup of VM 123 failed -
> >>>>> VM 123 qmp
> >>>>>     command 'query-backup' failed - got timeout
> >>>>>
> >>>>>
> >>>>> The VM has some quite big vdisks (20G, 1T and 2T). All stored
> >>>>> in Ceph.
> >>>>> There is still plenty of space in Ceph.
> >>>>>
> >>>>> Can anyone give us some hint on how to investigate and debug
> >>>>> this
> >>>>> further?
> >>>>
> >>>> Because it is a write error, maybe we should look at the backup
> >>>> destination.
> >>>> Maybe it is a network connection issue? Maybe something wrong
> >>>> with the
> >>>> host? Maybe the disk is full?
> >>>> Which storage are you using for backup? Can you show us the
> >>>> corresponding entry in /etc/pve/storage.cfg?
> >>>
> >>> We are backing up to cephfs with still 8 TB or so free.
> >>>
> >>> /etc/pve/storage.cfg is
> >>> ------------
> >>> dir: local
> >>>          path /var/lib/vz
> >>>          content vztmpl,backup,iso
> >>>
> >>> dir: data
> >>>          path /data
> >>>          content snippets,images,backup,iso,rootdir,vztmpl
> >>>
> >>> cephfs: cephfs
> >>>          path /mnt/pve/cephfs
> >>>          content backup,vztmpl,iso
> >>>          maxfiles 5
> >>>
> >>> rbd: ceph-rbd
> >>>          content images,rootdir
> >>>          krbd 0
> >>>          pool pve-pool1
> >>> ------------
> >>>
> >>
> >> The problem has reached a new level of urgency, as since two days
> >> each
> >> time after a failed backup the VMm becomes unaccessible and has to be
> >> stopped and started manually from the PVE UI.
> >
> > I don't see anything wrong the configuration that you shared.
> > Was anything changed in the last few days since the last successful
> > backup? Any updates from Proxmox? Changes to the network?
> > I know very little about Ceph and clusters, sorry.
> > What makes this VM different, except for the size of the disks?
>
> On December 1st the Hypervisor has been updated to PVE 6.3-2 (I think
> from 6.1-3).  After that the error message slightly changed and - in
> hindsight - since then the VM stops being accessible after the failed
> backup.
>
> However: The VM never ever backed up successfully, not even before the
> PVE upgrade.  It's just that no one really took notice of it.
>
> The VM is not really special.  It's our only Debian VM (but I hope
> that's not an issue :-) and the VM has been migrated 1:1 from oVirt by
> migrating and importing the disk images.  But we have a few other such
> VMs and they run and back up just fine.
>
> No network changes. Basically nothing changed that I could think of.
>
> But to be clear: Our current main problem is the failing backup, not the
> crash.
>
>
> Cheers, Frank
>
>
>
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
> --
Sent from Gmail Mobile