public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] qemu live migration: bigger downtime recently
@ 2021-01-22 14:34 aderumier
  2021-01-22 15:06 ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-22 14:34 UTC (permalink / raw)
  To: pve-devel

Hi,

I have notice recently bigger downtime on qemu live migration.
(I'm not sure if it's after qemu update or qemu-server update)

migration: type=insecure

 qemu-server                          6.3-2  
 pve-qemu-kvm                         5.1.0-7   

(I'm not sure about the machine running qemu version)



Here a sample:



2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
(10.3.94.70)
2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
2021-01-22 15:28:44 start remote tunnel
2021-01-22 15:28:45 ssh tunnel ver 1
2021-01-22 15:28:45 starting online/live migration on
tcp:10.3.94.70:60000
2021-01-22 15:28:45 set migration_caps
2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
2021-01-22 15:28:45 migration downtime limit: 100 ms
2021-01-22 15:28:45 migration cachesize: 268435456 B
2021-01-22 15:28:45 set migration parameters
2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 ms
2021-01-22 15:28:47 migration status: completed
2021-01-22 15:28:51 migration finished successfully (duration 00:00:13)
TASK OK

That's strange because I don't see the memory transfert loop logs



Migrate back to original host is working

2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
(::ffff:10.3.94.50)
2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
2021-01-22 15:29:39 start remote tunnel
2021-01-22 15:29:40 ssh tunnel ver 1
2021-01-22 15:29:40 starting online/live migration on
tcp:[::ffff:10.3.94.50]:60000
2021-01-22 15:29:40 set migration_caps
2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
2021-01-22 15:29:40 migration downtime limit: 100 ms
2021-01-22 15:29:40 migration cachesize: 268435456 B
2021-01-22 15:29:40 set migration parameters
2021-01-22 15:29:40 start migrate command to
tcp:[::ffff:10.3.94.50]:60000
2021-01-22 15:29:41 migration status: active (transferred 396107554,
remaining 1732018176), total 2165383168)
2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:29:42 migration status: active (transferred 973010921,
remaining 1089216512), total 2165383168)
2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:29:43 migration status: active (transferred 1511925476,
remaining 483463168), total 2165383168)
2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
2021-01-22 15:29:44 migration status: completed
2021-01-22 15:29:47 migration finished successfully (duration 00:00:13)
TASK OK


Then migrate it again like the first migration is working too


2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
(10.3.94.70)
2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
2021-01-22 15:31:12 start remote tunnel
2021-01-22 15:31:13 ssh tunnel ver 1
2021-01-22 15:31:13 starting online/live migration on
tcp:10.3.94.70:60000
2021-01-22 15:31:13 set migration_caps
2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
2021-01-22 15:31:13 migration downtime limit: 100 ms
2021-01-22 15:31:13 migration cachesize: 268435456 B
2021-01-22 15:31:13 set migration parameters
2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
2021-01-22 15:31:14 migration status: active (transferred 1092088188,
remaining 944365568), total 2165383168)
2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
2021-01-22 15:31:15 migration status: completed
2021-01-22 15:31:19 migration finished successfully (duration 00:00:12)
TASK OK


Any idea ? Maybe a specific qemu version bug ?







^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-22 14:34 [pve-devel] qemu live migration: bigger downtime recently aderumier
@ 2021-01-22 15:06 ` aderumier
  2021-01-22 18:55   ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-22 15:06 UTC (permalink / raw)
  To: pve-devel

I have tried to add a log to display the current status state of the
migration,
and It don't catch any "active" state, but "completed" directly.

Here another sample with a bigger downtime of 14s (real downtime, I
have checked with a ping to be sure)



2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
(10.3.94.70)
2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
2021-01-22 16:02:55 start remote tunnel
2021-01-22 16:02:56 ssh tunnel ver 1
2021-01-22 16:02:56 starting online/live migration on
tcp:10.3.94.70:60000
2021-01-22 16:02:56 set migration_caps
2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
2021-01-22 16:02:56 migration downtime limit: 100 ms
2021-01-22 16:02:56 migration cachesize: 2147483648 B
2021-01-22 16:02:56 set migration parameters
2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000



2021-01-22 16:03:11 status: completed ---> added log
2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms
2021-01-22 16:03:11 migration status: completed
2021-01-22 16:03:14 migration finished successfully (duration 00:00:21)
TASK OK



    my $merr = $@;
    $self->log('info', "migrate uri => $ruri failed: $merr") if $merr;

    my $lstat = 0;
    my $usleep = 1000000;
    my $i = 0;
    my $err_count = 0;
    my $lastrem = undef;
    my $downtimecounter = 0;
    while (1) {
        $i++;
        my $avglstat = $lstat ? $lstat / $i : 0;

        usleep($usleep);
        my $stat;
        eval {
            $stat = mon_cmd($vmid, "query-migrate");
        };
        if (my $err = $@) {
            $err_count++;
            warn "query migrate failed: $err\n";
            $self->log('info', "query migrate failed: $err");
            if ($err_count <= 5) {
                usleep(1000000);
                next;
            }
            die "too many query migrate failures - aborting\n";
        }

        $self->log('info', "status: $stat->{status}");   ---> added log


Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
écrit :
> Hi,
> 
> I have notice recently bigger downtime on qemu live migration.
> (I'm not sure if it's after qemu update or qemu-server update)
> 
> migration: type=insecure
> 
>  qemu-server                          6.3-2  
>  pve-qemu-kvm                         5.1.0-7   
> 
> (I'm not sure about the machine running qemu version)
> 
> 
> 
> Here a sample:
> 
> 
> 
> 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> 2021-01-22 15:28:44 start remote tunnel
> 2021-01-22 15:28:45 ssh tunnel ver 1
> 2021-01-22 15:28:45 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 15:28:45 set migration_caps
> 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> 2021-01-22 15:28:45 migration downtime limit: 100 ms
> 2021-01-22 15:28:45 migration cachesize: 268435456 B
> 2021-01-22 15:28:45 set migration parameters
> 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 ms
> 2021-01-22 15:28:47 migration status: completed
> 2021-01-22 15:28:51 migration finished successfully (duration
> 00:00:13)
> TASK OK
> 
> That's strange because I don't see the memory transfert loop logs
> 
> 
> 
> Migrate back to original host is working
> 
> 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> (::ffff:10.3.94.50)
> 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> 2021-01-22 15:29:39 start remote tunnel
> 2021-01-22 15:29:40 ssh tunnel ver 1
> 2021-01-22 15:29:40 starting online/live migration on
> tcp:[::ffff:10.3.94.50]:60000
> 2021-01-22 15:29:40 set migration_caps
> 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> 2021-01-22 15:29:40 migration downtime limit: 100 ms
> 2021-01-22 15:29:40 migration cachesize: 268435456 B
> 2021-01-22 15:29:40 set migration parameters
> 2021-01-22 15:29:40 start migrate command to
> tcp:[::ffff:10.3.94.50]:60000
> 2021-01-22 15:29:41 migration status: active (transferred 396107554,
> remaining 1732018176), total 2165383168)
> 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:29:42 migration status: active (transferred 973010921,
> remaining 1089216512), total 2165383168)
> 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:29:43 migration status: active (transferred 1511925476,
> remaining 483463168), total 2165383168)
> 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
> 2021-01-22 15:29:44 migration status: completed
> 2021-01-22 15:29:47 migration finished successfully (duration
> 00:00:13)
> TASK OK
> 
> 
> Then migrate it again like the first migration is working too
> 
> 
> 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> 2021-01-22 15:31:12 start remote tunnel
> 2021-01-22 15:31:13 ssh tunnel ver 1
> 2021-01-22 15:31:13 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 15:31:13 set migration_caps
> 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> 2021-01-22 15:31:13 migration downtime limit: 100 ms
> 2021-01-22 15:31:13 migration cachesize: 268435456 B
> 2021-01-22 15:31:13 set migration parameters
> 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> 2021-01-22 15:31:14 migration status: active (transferred 1092088188,
> remaining 944365568), total 2165383168)
> 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
> 2021-01-22 15:31:15 migration status: completed
> 2021-01-22 15:31:19 migration finished successfully (duration
> 00:00:12)
> TASK OK
> 
> 
> Any idea ? Maybe a specific qemu version bug ?
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-22 15:06 ` aderumier
@ 2021-01-22 18:55   ` aderumier
  2021-01-23  8:38     ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-22 18:55 UTC (permalink / raw)
  To: pve-devel

after some debug, it seem that it's hanging on 

$stat = mon_cmd($vmid, "query-migrate");




result of info migrate after the end of a migration:

# info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 9671 ms
downtime: 9595 ms
setup: 74 ms
transferred ram: 10445790 kbytes
throughput: 8916.93 mbps
remaining ram: 0 kbytes
total ram: 12600392 kbytes
duplicate: 544936 pages
skipped: 0 pages
normal: 2605162 pages
normal bytes: 10420648 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 296540
cache size: 2147483648 bytes
xbzrle transferred: 0 kbytes
xbzrle pages: 0 pages
xbzrle cache miss: 0 pages
xbzrle cache miss rate: 0.00
xbzrle encoding rate: 0.00
xbzrle overflow: 0




Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
écrit :
> I have tried to add a log to display the current status state of the
> migration,
> and It don't catch any "active" state, but "completed" directly.
> 
> Here another sample with a bigger downtime of 14s (real downtime, I
> have checked with a ping to be sure)
> 
> 
> 
> 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> 2021-01-22 16:02:55 start remote tunnel
> 2021-01-22 16:02:56 ssh tunnel ver 1
> 2021-01-22 16:02:56 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 16:02:56 set migration_caps
> 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> 2021-01-22 16:02:56 migration downtime limit: 100 ms
> 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> 2021-01-22 16:02:56 set migration parameters
> 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> 
> 
> 
> 2021-01-22 16:03:11 status: completed ---> added log
> 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms
> 2021-01-22 16:03:11 migration status: completed
> 2021-01-22 16:03:14 migration finished successfully (duration
> 00:00:21)
> TASK OK
> 
> 
> 
>     my $merr = $@;
>     $self->log('info', "migrate uri => $ruri failed: $merr") if
> $merr;
> 
>     my $lstat = 0;
>     my $usleep = 1000000;
>     my $i = 0;
>     my $err_count = 0;
>     my $lastrem = undef;
>     my $downtimecounter = 0;
>     while (1) {
>         $i++;
>         my $avglstat = $lstat ? $lstat / $i : 0;
> 
>         usleep($usleep);
>         my $stat;
>         eval {
>             $stat = mon_cmd($vmid, "query-migrate");
>         };
>         if (my $err = $@) {
>             $err_count++;
>             warn "query migrate failed: $err\n";
>             $self->log('info', "query migrate failed: $err");
>             if ($err_count <= 5) {
>                 usleep(1000000);
>                 next;
>             }
>             die "too many query migrate failures - aborting\n";
>         }
> 
>         $self->log('info', "status: $stat->{status}");   ---> added
> log
> 
> 
> Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> écrit :
> > Hi,
> > 
> > I have notice recently bigger downtime on qemu live migration.
> > (I'm not sure if it's after qemu update or qemu-server update)
> > 
> > migration: type=insecure
> > 
> >  qemu-server                          6.3-2  
> >  pve-qemu-kvm                         5.1.0-7   
> > 
> > (I'm not sure about the machine running qemu version)
> > 
> > 
> > 
> > Here a sample:
> > 
> > 
> > 
> > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:28:44 start remote tunnel
> > 2021-01-22 15:28:45 ssh tunnel ver 1
> > 2021-01-22 15:28:45 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:28:45 set migration_caps
> > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > 2021-01-22 15:28:45 set migration parameters
> > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > ms
> > 2021-01-22 15:28:47 migration status: completed
> > 2021-01-22 15:28:51 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> > 
> > That's strange because I don't see the memory transfert loop logs
> > 
> > 
> > 
> > Migrate back to original host is working
> > 
> > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > (::ffff:10.3.94.50)
> > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > 2021-01-22 15:29:39 start remote tunnel
> > 2021-01-22 15:29:40 ssh tunnel ver 1
> > 2021-01-22 15:29:40 starting online/live migration on
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:40 set migration_caps
> > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > 2021-01-22 15:29:40 set migration parameters
> > 2021-01-22 15:29:40 start migrate command to
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:41 migration status: active (transferred
> > 396107554,
> > remaining 1732018176), total 2165383168)
> > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:42 migration status: active (transferred
> > 973010921,
> > remaining 1089216512), total 2165383168)
> > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:43 migration status: active (transferred
> > 1511925476,
> > remaining 483463168), total 2165383168)
> > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
> > 2021-01-22 15:29:44 migration status: completed
> > 2021-01-22 15:29:47 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> > 
> > 
> > Then migrate it again like the first migration is working too
> > 
> > 
> > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:31:12 start remote tunnel
> > 2021-01-22 15:31:13 ssh tunnel ver 1
> > 2021-01-22 15:31:13 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:31:13 set migration_caps
> > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > 2021-01-22 15:31:13 set migration parameters
> > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:31:14 migration status: active (transferred
> > 1092088188,
> > remaining 944365568), total 2165383168)
> > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
> > 2021-01-22 15:31:15 migration status: completed
> > 2021-01-22 15:31:19 migration finished successfully (duration
> > 00:00:12)
> > TASK OK
> > 
> > 
> > Any idea ? Maybe a specific qemu version bug ?
> > 
> > 
> > 
> > 
> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-22 18:55   ` aderumier
@ 2021-01-23  8:38     ` aderumier
  2021-01-25  8:47       ` Fabian Grünbichler
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-23  8:38 UTC (permalink / raw)
  To: pve-devel

about qemu version,  

theses vms was started around 6 november, after an update of the qemu
package the 4 november.


looking at proxmox repo, I think it should be 5.1.0-4 or -5.


pve-qemu-kvm-dbg_5.1.0-4_amd64.deb                 29-Oct-2020 17:28  
75705544
pve-qemu-kvm-dbg_5.1.0-5_amd64.deb                 04-Nov-2020 17:41  
75737556
pve-qemu-kvm-dbg_5.1.0-6_amd64.deb                 05-Nov-2020 18:08  
75693264


Could it be a known bug introduced by new backups dirty-bitmap patches,
and fixed later ?  (I see a -6 version one day later)



Le vendredi 22 janvier 2021 à 19:55 +0100, aderumier@odiso.com a
écrit :
> after some debug, it seem that it's hanging on 
> 
> $stat = mon_cmd($vmid, "query-migrate");
> 
> 
> 
> 
> result of info migrate after the end of a migration:
> 
> # info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 9671 ms
> downtime: 9595 ms
> setup: 74 ms
> transferred ram: 10445790 kbytes
> throughput: 8916.93 mbps
> remaining ram: 0 kbytes
> total ram: 12600392 kbytes
> duplicate: 544936 pages
> skipped: 0 pages
> normal: 2605162 pages
> normal bytes: 10420648 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 296540
> cache size: 2147483648 bytes
> xbzrle transferred: 0 kbytes
> xbzrle pages: 0 pages
> xbzrle cache miss: 0 pages
> xbzrle cache miss rate: 0.00
> xbzrle encoding rate: 0.00
> xbzrle overflow: 0
> 
> 
> 
> 
> Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
> écrit :
> > I have tried to add a log to display the current status state of
> > the
> > migration,
> > and It don't catch any "active" state, but "completed" directly.
> > 
> > Here another sample with a bigger downtime of 14s (real downtime, I
> > have checked with a ping to be sure)
> > 
> > 
> > 
> > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> > 2021-01-22 16:02:55 start remote tunnel
> > 2021-01-22 16:02:56 ssh tunnel ver 1
> > 2021-01-22 16:02:56 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 16:02:56 set migration_caps
> > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> > 2021-01-22 16:02:56 migration downtime limit: 100 ms
> > 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> > 2021-01-22 16:02:56 set migration parameters
> > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> > 
> > 
> > 
> > 2021-01-22 16:03:11 status: completed ---> added log
> > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424
> > ms
> > 2021-01-22 16:03:11 migration status: completed
> > 2021-01-22 16:03:14 migration finished successfully (duration
> > 00:00:21)
> > TASK OK
> > 
> > 
> > 
> >     my $merr = $@;
> >     $self->log('info', "migrate uri => $ruri failed: $merr") if
> > $merr;
> > 
> >     my $lstat = 0;
> >     my $usleep = 1000000;
> >     my $i = 0;
> >     my $err_count = 0;
> >     my $lastrem = undef;
> >     my $downtimecounter = 0;
> >     while (1) {
> >         $i++;
> >         my $avglstat = $lstat ? $lstat / $i : 0;
> > 
> >         usleep($usleep);
> >         my $stat;
> >         eval {
> >             $stat = mon_cmd($vmid, "query-migrate");
> >         };
> >         if (my $err = $@) {
> >             $err_count++;
> >             warn "query migrate failed: $err\n";
> >             $self->log('info', "query migrate failed: $err");
> >             if ($err_count <= 5) {
> >                 usleep(1000000);
> >                 next;
> >             }
> >             die "too many query migrate failures - aborting\n";
> >         }
> > 
> >         $self->log('info', "status: $stat->{status}");   ---> added
> > log
> > 
> > 
> > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> > écrit :
> > > Hi,
> > > 
> > > I have notice recently bigger downtime on qemu live migration.
> > > (I'm not sure if it's after qemu update or qemu-server update)
> > > 
> > > migration: type=insecure
> > > 
> > >  qemu-server                          6.3-2  
> > >  pve-qemu-kvm                         5.1.0-7   
> > > 
> > > (I'm not sure about the machine running qemu version)
> > > 
> > > 
> > > 
> > > Here a sample:
> > > 
> > > 
> > > 
> > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:28:44 start remote tunnel
> > > 2021-01-22 15:28:45 ssh tunnel ver 1
> > > 2021-01-22 15:28:45 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:45 set migration_caps
> > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > > 2021-01-22 15:28:45 set migration parameters
> > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > > ms
> > > 2021-01-22 15:28:47 migration status: completed
> > > 2021-01-22 15:28:51 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > > 
> > > That's strange because I don't see the memory transfert loop logs
> > > 
> > > 
> > > 
> > > Migrate back to original host is working
> > > 
> > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > > (::ffff:10.3.94.50)
> > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > > 2021-01-22 15:29:39 start remote tunnel
> > > 2021-01-22 15:29:40 ssh tunnel ver 1
> > > 2021-01-22 15:29:40 starting online/live migration on
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:40 set migration_caps
> > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > > 2021-01-22 15:29:40 set migration parameters
> > > 2021-01-22 15:29:40 start migrate command to
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:41 migration status: active (transferred
> > > 396107554,
> > > remaining 1732018176), total 2165383168)
> > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:42 migration status: active (transferred
> > > 973010921,
> > > remaining 1089216512), total 2165383168)
> > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:43 migration status: active (transferred
> > > 1511925476,
> > > remaining 483463168), total 2165383168)
> > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148
> > > ms
> > > 2021-01-22 15:29:44 migration status: completed
> > > 2021-01-22 15:29:47 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > > 
> > > 
> > > Then migrate it again like the first migration is working too
> > > 
> > > 
> > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:31:12 start remote tunnel
> > > 2021-01-22 15:31:13 ssh tunnel ver 1
> > > 2021-01-22 15:31:13 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:13 set migration_caps
> > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > > 2021-01-22 15:31:13 set migration parameters
> > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:14 migration status: active (transferred
> > > 1092088188,
> > > remaining 944365568), total 2165383168)
> > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55
> > > ms
> > > 2021-01-22 15:31:15 migration status: completed
> > > 2021-01-22 15:31:19 migration finished successfully (duration
> > > 00:00:12)
> > > TASK OK
> > > 
> > > 
> > > Any idea ? Maybe a specific qemu version bug ?
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-23  8:38     ` aderumier
@ 2021-01-25  8:47       ` Fabian Grünbichler
  2021-01-25  9:26         ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: Fabian Grünbichler @ 2021-01-25  8:47 UTC (permalink / raw)
  To: pve-devel

On January 23, 2021 9:38 am, aderumier@odiso.com wrote:
> about qemu version,  
> 
> theses vms was started around 6 november, after an update of the qemu
> package the 4 november.
> 
> 
> looking at proxmox repo, I think it should be 5.1.0-4 or -5.
> 
> 
> pve-qemu-kvm-dbg_5.1.0-4_amd64.deb                 29-Oct-2020 17:28  
> 75705544
> pve-qemu-kvm-dbg_5.1.0-5_amd64.deb                 04-Nov-2020 17:41  
> 75737556
> pve-qemu-kvm-dbg_5.1.0-6_amd64.deb                 05-Nov-2020 18:08  
> 75693264
> 
> 
> Could it be a known bug introduced by new backups dirty-bitmap patches,
> and fixed later ?  (I see a -6 version one day later)
> 

pve-qemu-kvm (5.1.0-6) pve; urgency=medium

  * migration/block-dirty-bitmap: avoid telling QEMU that the bitmap migration
    is active longer than required

 -- Proxmox Support Team <support@proxmox.com>  Thu, 05 Nov 2020 18:59:40 +0100

sound like that could be the case? ;)




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-25  8:47       ` Fabian Grünbichler
@ 2021-01-25  9:26         ` aderumier
  0 siblings, 0 replies; 6+ messages in thread
From: aderumier @ 2021-01-25  9:26 UTC (permalink / raw)
  To: Proxmox VE development discussion

> 
> pve-qemu-kvm (5.1.0-6) pve; urgency=medium
> 
>   * migration/block-dirty-bitmap: avoid telling QEMU that the bitmap
> migration
>     is active longer than required
> 
>  -- Proxmox Support Team <support@proxmox.com>  Thu, 05 Nov 2020
> 18:59:40 +0100
> 
> sound like that could be the case? ;)


yes, I was not sure about this.
So,I was just out of luck when I have upgraded ^_^


I have tried to change dirty-bitmaps=0 in set_migration_caps, but it
don't fix it.

So, I think I'm good to plan some migrations with downtime.

Thanks for your response !

Alexandre







^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-25  9:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 14:34 [pve-devel] qemu live migration: bigger downtime recently aderumier
2021-01-22 15:06 ` aderumier
2021-01-22 18:55   ` aderumier
2021-01-23  8:38     ` aderumier
2021-01-25  8:47       ` Fabian Grünbichler
2021-01-25  9:26         ` aderumier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal