all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] migrate local -> drbd fails with vanished job
@ 2021-11-26 13:03 Roland Kammerer
  2021-11-26 16:06 ` Fabian Grünbichler
  2021-11-29  8:20 ` Fabian Ebner
  0 siblings, 2 replies; 5+ messages in thread
From: Roland Kammerer @ 2021-11-26 13:03 UTC (permalink / raw)
  To: pve-devel

Dear PVE devs,

While most of our users start with fresh VMs on DRBD storage, from time
to time people try to migrate a local VM to DRBD storage. This currently fails.
Migrating VMs from DRBD to DRBD works.

I added some debug code to PVE/QemuServer.pm, which looks like the location
things go wrong, or at least where I saw them going wrong:

root@pve:/usr/share/perl5/PVE# diff -Nur QemuServer.pm{.orig,}
--- QemuServer.pm.orig  2021-11-26 11:27:28.879989894 +0100
+++ QemuServer.pm       2021-11-26 11:26:30.490988789 +0100
@@ -7390,6 +7390,8 @@
     $completion //= 'complete';
     $op //= "mirror";

+    print "$vmid, $vmiddst, $jobs, $completion, $qga, $op \n";
+    { use Data::Dumper; print Dumper($jobs); };
     eval {
        my $err_complete = 0;

@@ -7419,6 +7421,7 @@
                    next;
                }

+               print "vanished: $vanished\n"; # same as !defined($jobs)
                die "$job_id: '$op' has been cancelled\n" if !defined($job);

                my $busy = $job->{busy};


With that in place, I try to live migrate the running VM from node "pve" to
"pvf":

2021-11-26 11:29:10 starting migration of VM 100 to node 'pvf' (xx.xx.xx.xx)
2021-11-26 11:29:10 found local disk 'local-lvm:vm-100-disk-0' (in current VM config)
2021-11-26 11:29:10 starting VM 100 on remote node 'pvf'
2021-11-26 11:29:18 volume 'local-lvm:vm-100-disk-0' is 'drbdstorage:vm-100-disk-1' on the target
2021-11-26 11:29:18 start remote tunnel
2021-11-26 11:29:19 ssh tunnel ver 1
2021-11-26 11:29:19 starting storage migration
2021-11-26 11:29:19 scsi0: start migration to nbd:unix:/run/qemu-server/100_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
Use of uninitialized value $qga in concatenation (.) or string at /usr/share/perl5/PVE/QemuServer.pm line 7393.
100, 100, HASH(0x557b44474a80), skip, , mirror
$VAR1 = {
          'drive-scsi0' => {}
        };
vanished: 1
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2021-11-26 11:29:19 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: 'mirror' has been cancelled
2021-11-26 11:29:19 aborting phase 2 - cleanup resources
2021-11-26 11:29:19 migrate_cancel
2021-11-26 11:29:22 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems

What I also see on "pvf" is that the plugin actually creates the DRBD block
device, and "something" even tries to write data to it, as the DRBD devices
auto-promotes to Primary.

Any hints how I can debug that further? The block device should be ready at
that point. What is going on in the background here?

FWIW the plugin can be found here:
https://github.com/linbit/linstor-proxmox

Regards, rck



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-29  9:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-26 13:03 [pve-devel] migrate local -> drbd fails with vanished job Roland Kammerer
2021-11-26 16:06 ` Fabian Grünbichler
2021-11-29  8:48   ` Roland Kammerer
2021-11-29  8:20 ` Fabian Ebner
2021-11-29  9:03   ` Roland Kammerer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal