From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id BEF728250D for ; Mon, 29 Nov 2021 09:20:13 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B52E62CED5 for ; Mon, 29 Nov 2021 09:20:13 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 1215E2CEC6 for ; Mon, 29 Nov 2021 09:20:10 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id D562944324; Mon, 29 Nov 2021 09:20:09 +0100 (CET) To: pve-devel@lists.proxmox.com, =?UTF-8?Q?Fabian_Gr=c3=bcnbichler?= , roland.kammerer@linbit.com References: <20211126130357.GS1745@rck.sh> From: Fabian Ebner Message-ID: <573a25be-5478-9ee4-8271-232fd5dbd32b@proxmox.com> Date: Mon, 29 Nov 2021 09:20:04 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20211126130357.GS1745@rck.sh> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.833 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -1.317 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] migrate local -> drbd fails with vanished job X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Nov 2021 08:20:13 -0000 Am 26.11.21 um 14:03 schrieb Roland Kammerer: > Dear PVE devs, > > While most of our users start with fresh VMs on DRBD storage, from time > to time people try to migrate a local VM to DRBD storage. This currently fails. > Migrating VMs from DRBD to DRBD works. Hi, there also is a forum thread about this [0]. It seems like the newly allocated disks are slightly bigger on the target DRBD storage. Just a hunch, but it does sounds similar to [1]. [0]: https://forum.proxmox.com/threads/online-migration-disk-move-problem.100171 [1]: https://bugzilla.proxmox.com/show_bug.cgi?id=3227#c8 > > I added some debug code to PVE/QemuServer.pm, which looks like the location > things go wrong, or at least where I saw them going wrong: > > root@pve:/usr/share/perl5/PVE# diff -Nur QemuServer.pm{.orig,} > --- QemuServer.pm.orig 2021-11-26 11:27:28.879989894 +0100 > +++ QemuServer.pm 2021-11-26 11:26:30.490988789 +0100 > @@ -7390,6 +7390,8 @@ > $completion //= 'complete'; > $op //= "mirror"; > > + print "$vmid, $vmiddst, $jobs, $completion, $qga, $op \n"; > + { use Data::Dumper; print Dumper($jobs); }; > eval { > my $err_complete = 0; > > @@ -7419,6 +7421,7 @@ > next; > } > > + print "vanished: $vanished\n"; # same as !defined($jobs) > die "$job_id: '$op' has been cancelled\n" if !defined($job); > > my $busy = $job->{busy}; > > > With that in place, I try to live migrate the running VM from node "pve" to > "pvf": > > 2021-11-26 11:29:10 starting migration of VM 100 to node 'pvf' (xx.xx.xx.xx) > 2021-11-26 11:29:10 found local disk 'local-lvm:vm-100-disk-0' (in current VM config) > 2021-11-26 11:29:10 starting VM 100 on remote node 'pvf' > 2021-11-26 11:29:18 volume 'local-lvm:vm-100-disk-0' is 'drbdstorage:vm-100-disk-1' on the target > 2021-11-26 11:29:18 start remote tunnel > 2021-11-26 11:29:19 ssh tunnel ver 1 > 2021-11-26 11:29:19 starting storage migration > 2021-11-26 11:29:19 scsi0: start migration to nbd:unix:/run/qemu-server/100_nbd.migrate:exportname=drive-scsi0 > drive mirror is starting for drive-scsi0 > Use of uninitialized value $qga in concatenation (.) or string at /usr/share/perl5/PVE/QemuServer.pm line 7393. > 100, 100, HASH(0x557b44474a80), skip, , mirror > $VAR1 = { > 'drive-scsi0' => {} > }; > vanished: 1 > drive-scsi0: Cancelling block job > drive-scsi0: Done. > 2021-11-26 11:29:19 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: 'mirror' has been cancelled > 2021-11-26 11:29:19 aborting phase 2 - cleanup resources > 2021-11-26 11:29:19 migrate_cancel > 2021-11-26 11:29:22 ERROR: migration finished with problems (duration 00:00:12) > TASK ERROR: migration problems > > What I also see on "pvf" is that the plugin actually creates the DRBD block > device, and "something" even tries to write data to it, as the DRBD devices > auto-promotes to Primary. > > Any hints how I can debug that further? The block device should be ready at > that point. What is going on in the background here? > > FWIW the plugin can be found here: > https://github.com/linbit/linstor-proxmox > > Regards, rck > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > >