From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 7AF1081EB8 for ; Fri, 26 Nov 2021 14:04:07 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 70E9418AC6 for ; Fri, 26 Nov 2021 14:04:07 +0100 (CET) Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id BA45E18AB8 for ; Fri, 26 Nov 2021 14:04:05 +0100 (CET) Received: by mail-wm1-x335.google.com with SMTP id j140-20020a1c2392000000b003399ae48f58so10656739wmj.5 for ; Fri, 26 Nov 2021 05:04:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linbit-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=ztH6Kuuc+PtsUgWcJaX5cyoO71cZrsacnJzt3v0Uk1M=; b=qtgirWnbEfOwTEPw5vakYMmwbAREIeYEYYaKT45q0+68x+9rYQhjtwY7B7YxxpJ7m7 KMemvoc5B11RaxW3Y/+QuowPd8CV2PSacyiRqF1dIyK5K4MyP6IWWC4DZjb+x3nsmkyO ln701KGXgAM77HGRWsMh46uzzfIEqw+s2GMIKXMrUatdieebEfwjLLujzaCbj/PvrpP1 U7Yz+0i3DGTvG8t0KOwRLkATZU7YINLSirJNLjQ5a3+vjnsNlvq34oIXBd/V+faULgbn sDM28mieQanXTvp6CeLoGSwLYtV3NhU8FVNUQiOlN19wgblavz6GOC2ZLsCTwC+UiPee Xt4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=ztH6Kuuc+PtsUgWcJaX5cyoO71cZrsacnJzt3v0Uk1M=; b=M58G+sCI3egiciauDSREZ4tu/dg4b+xLJZhEd7dzbbaSBeMEj5dFnqr4Yqfo+zucDM 5KfAFqkSreqKxhXwsQWi4k+6Y8yLs+xnEHtx7/ZR+T0ihXU7MeEnGMlvOMU8hir8i1oR otL/YqBrxUKe6Rm8xWkCeQo1AXjDhwqg/QMvQMWFeHyAmppQxUHHDHgg/iEQ5cEAuJi7 KP0v2aC9oGG6F+gf8xaofIXf+9Mh5eq5jN8sdHJOQMcct9pzZiMvyFPDBYZ9x4rDT0ON HfhAYFVB4+AGyY5bj0xcnO0ZMT7MXA0VW9CAyZHK5Egpf+Ydq2aptF7YaF0WggLflrxP PVJw== X-Gm-Message-State: AOAM532OfqN3qxP9yi2BewQbli41Z+geSL6sghWNTl2dz9dY7vwLhCdQ dghWDR8twJrhNp8fUYwA11XFpz+yYnFxAQ7U X-Google-Smtp-Source: ABdhPJwcAuUM85Kqg3pNEvYDLaQyDDfRhaCTm48J6cJrOP7jVOqAfcYihTxnjXn9VEdCRsJI45SHgQ== X-Received: by 2002:a05:600c:4e51:: with SMTP id e17mr15270956wmq.127.1637931838823; Fri, 26 Nov 2021 05:03:58 -0800 (PST) Received: from localhost (static.20.139.203.116.clients.your-server.de. [116.203.139.20]) by smtp.gmail.com with ESMTPSA id p12sm5539064wro.33.2021.11.26.05.03.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Nov 2021 05:03:58 -0800 (PST) Date: Fri, 26 Nov 2021 14:03:57 +0100 From: Roland Kammerer To: pve-devel@lists.proxmox.com Message-ID: <20211126130357.GS1745@rck.sh> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [qemuserver.pm] Subject: [pve-devel] migrate local -> drbd fails with vanished job X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Nov 2021 13:04:07 -0000 Dear PVE devs, While most of our users start with fresh VMs on DRBD storage, from time to time people try to migrate a local VM to DRBD storage. This currently fails. Migrating VMs from DRBD to DRBD works. I added some debug code to PVE/QemuServer.pm, which looks like the location things go wrong, or at least where I saw them going wrong: root@pve:/usr/share/perl5/PVE# diff -Nur QemuServer.pm{.orig,} --- QemuServer.pm.orig 2021-11-26 11:27:28.879989894 +0100 +++ QemuServer.pm 2021-11-26 11:26:30.490988789 +0100 @@ -7390,6 +7390,8 @@ $completion //= 'complete'; $op //= "mirror"; + print "$vmid, $vmiddst, $jobs, $completion, $qga, $op \n"; + { use Data::Dumper; print Dumper($jobs); }; eval { my $err_complete = 0; @@ -7419,6 +7421,7 @@ next; } + print "vanished: $vanished\n"; # same as !defined($jobs) die "$job_id: '$op' has been cancelled\n" if !defined($job); my $busy = $job->{busy}; With that in place, I try to live migrate the running VM from node "pve" to "pvf": 2021-11-26 11:29:10 starting migration of VM 100 to node 'pvf' (xx.xx.xx.xx) 2021-11-26 11:29:10 found local disk 'local-lvm:vm-100-disk-0' (in current VM config) 2021-11-26 11:29:10 starting VM 100 on remote node 'pvf' 2021-11-26 11:29:18 volume 'local-lvm:vm-100-disk-0' is 'drbdstorage:vm-100-disk-1' on the target 2021-11-26 11:29:18 start remote tunnel 2021-11-26 11:29:19 ssh tunnel ver 1 2021-11-26 11:29:19 starting storage migration 2021-11-26 11:29:19 scsi0: start migration to nbd:unix:/run/qemu-server/100_nbd.migrate:exportname=drive-scsi0 drive mirror is starting for drive-scsi0 Use of uninitialized value $qga in concatenation (.) or string at /usr/share/perl5/PVE/QemuServer.pm line 7393. 100, 100, HASH(0x557b44474a80), skip, , mirror $VAR1 = { 'drive-scsi0' => {} }; vanished: 1 drive-scsi0: Cancelling block job drive-scsi0: Done. 2021-11-26 11:29:19 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: 'mirror' has been cancelled 2021-11-26 11:29:19 aborting phase 2 - cleanup resources 2021-11-26 11:29:19 migrate_cancel 2021-11-26 11:29:22 ERROR: migration finished with problems (duration 00:00:12) TASK ERROR: migration problems What I also see on "pvf" is that the plugin actually creates the DRBD block device, and "something" even tries to write data to it, as the DRBD devices auto-promotes to Primary. Any hints how I can debug that further? The block device should be ready at that point. What is going on in the background here? FWIW the plugin can be found here: https://github.com/linbit/linstor-proxmox Regards, rck