all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] qemu live migration: bigger downtime recently
@ 2021-01-22 14:34 aderumier
  2021-01-22 15:06 ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-22 14:34 UTC (permalink / raw)
  To: pve-devel

Hi,

I have notice recently bigger downtime on qemu live migration.
(I'm not sure if it's after qemu update or qemu-server update)

migration: type=insecure

 qemu-server                          6.3-2  
 pve-qemu-kvm                         5.1.0-7   

(I'm not sure about the machine running qemu version)



Here a sample:



2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
(10.3.94.70)
2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
2021-01-22 15:28:44 start remote tunnel
2021-01-22 15:28:45 ssh tunnel ver 1
2021-01-22 15:28:45 starting online/live migration on
tcp:10.3.94.70:60000
2021-01-22 15:28:45 set migration_caps
2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
2021-01-22 15:28:45 migration downtime limit: 100 ms
2021-01-22 15:28:45 migration cachesize: 268435456 B
2021-01-22 15:28:45 set migration parameters
2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 ms
2021-01-22 15:28:47 migration status: completed
2021-01-22 15:28:51 migration finished successfully (duration 00:00:13)
TASK OK

That's strange because I don't see the memory transfert loop logs



Migrate back to original host is working

2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
(::ffff:10.3.94.50)
2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
2021-01-22 15:29:39 start remote tunnel
2021-01-22 15:29:40 ssh tunnel ver 1
2021-01-22 15:29:40 starting online/live migration on
tcp:[::ffff:10.3.94.50]:60000
2021-01-22 15:29:40 set migration_caps
2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
2021-01-22 15:29:40 migration downtime limit: 100 ms
2021-01-22 15:29:40 migration cachesize: 268435456 B
2021-01-22 15:29:40 set migration parameters
2021-01-22 15:29:40 start migrate command to
tcp:[::ffff:10.3.94.50]:60000
2021-01-22 15:29:41 migration status: active (transferred 396107554,
remaining 1732018176), total 2165383168)
2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:29:42 migration status: active (transferred 973010921,
remaining 1089216512), total 2165383168)
2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:29:43 migration status: active (transferred 1511925476,
remaining 483463168), total 2165383168)
2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
2021-01-22 15:29:44 migration status: completed
2021-01-22 15:29:47 migration finished successfully (duration 00:00:13)
TASK OK


Then migrate it again like the first migration is working too


2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
(10.3.94.70)
2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
2021-01-22 15:31:12 start remote tunnel
2021-01-22 15:31:13 ssh tunnel ver 1
2021-01-22 15:31:13 starting online/live migration on
tcp:10.3.94.70:60000
2021-01-22 15:31:13 set migration_caps
2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
2021-01-22 15:31:13 migration downtime limit: 100 ms
2021-01-22 15:31:13 migration cachesize: 268435456 B
2021-01-22 15:31:13 set migration parameters
2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
2021-01-22 15:31:14 migration status: active (transferred 1092088188,
remaining 944365568), total 2165383168)
2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 transferred 0
pages 0 cachemiss 0 overflow 0
2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
2021-01-22 15:31:15 migration status: completed
2021-01-22 15:31:19 migration finished successfully (duration 00:00:12)
TASK OK


Any idea ? Maybe a specific qemu version bug ?







^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-22 14:34 [pve-devel] qemu live migration: bigger downtime recently aderumier
@ 2021-01-22 15:06 ` aderumier
  2021-01-22 18:55   ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-22 15:06 UTC (permalink / raw)
  To: pve-devel

I have tried to add a log to display the current status state of the
migration,
and It don't catch any "active" state, but "completed" directly.

Here another sample with a bigger downtime of 14s (real downtime, I
have checked with a ping to be sure)



2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
(10.3.94.70)
2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
2021-01-22 16:02:55 start remote tunnel
2021-01-22 16:02:56 ssh tunnel ver 1
2021-01-22 16:02:56 starting online/live migration on
tcp:10.3.94.70:60000
2021-01-22 16:02:56 set migration_caps
2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
2021-01-22 16:02:56 migration downtime limit: 100 ms
2021-01-22 16:02:56 migration cachesize: 2147483648 B
2021-01-22 16:02:56 set migration parameters
2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000



2021-01-22 16:03:11 status: completed ---> added log
2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms
2021-01-22 16:03:11 migration status: completed
2021-01-22 16:03:14 migration finished successfully (duration 00:00:21)
TASK OK



    my $merr = $@;
    $self->log('info', "migrate uri => $ruri failed: $merr") if $merr;

    my $lstat = 0;
    my $usleep = 1000000;
    my $i = 0;
    my $err_count = 0;
    my $lastrem = undef;
    my $downtimecounter = 0;
    while (1) {
        $i++;
        my $avglstat = $lstat ? $lstat / $i : 0;

        usleep($usleep);
        my $stat;
        eval {
            $stat = mon_cmd($vmid, "query-migrate");
        };
        if (my $err = $@) {
            $err_count++;
            warn "query migrate failed: $err\n";
            $self->log('info', "query migrate failed: $err");
            if ($err_count <= 5) {
                usleep(1000000);
                next;
            }
            die "too many query migrate failures - aborting\n";
        }

        $self->log('info', "status: $stat->{status}");   ---> added log


Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
écrit :
> Hi,
> 
> I have notice recently bigger downtime on qemu live migration.
> (I'm not sure if it's after qemu update or qemu-server update)
> 
> migration: type=insecure
> 
>  qemu-server                          6.3-2  
>  pve-qemu-kvm                         5.1.0-7   
> 
> (I'm not sure about the machine running qemu version)
> 
> 
> 
> Here a sample:
> 
> 
> 
> 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> 2021-01-22 15:28:44 start remote tunnel
> 2021-01-22 15:28:45 ssh tunnel ver 1
> 2021-01-22 15:28:45 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 15:28:45 set migration_caps
> 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> 2021-01-22 15:28:45 migration downtime limit: 100 ms
> 2021-01-22 15:28:45 migration cachesize: 268435456 B
> 2021-01-22 15:28:45 set migration parameters
> 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 ms
> 2021-01-22 15:28:47 migration status: completed
> 2021-01-22 15:28:51 migration finished successfully (duration
> 00:00:13)
> TASK OK
> 
> That's strange because I don't see the memory transfert loop logs
> 
> 
> 
> Migrate back to original host is working
> 
> 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> (::ffff:10.3.94.50)
> 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> 2021-01-22 15:29:39 start remote tunnel
> 2021-01-22 15:29:40 ssh tunnel ver 1
> 2021-01-22 15:29:40 starting online/live migration on
> tcp:[::ffff:10.3.94.50]:60000
> 2021-01-22 15:29:40 set migration_caps
> 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> 2021-01-22 15:29:40 migration downtime limit: 100 ms
> 2021-01-22 15:29:40 migration cachesize: 268435456 B
> 2021-01-22 15:29:40 set migration parameters
> 2021-01-22 15:29:40 start migrate command to
> tcp:[::ffff:10.3.94.50]:60000
> 2021-01-22 15:29:41 migration status: active (transferred 396107554,
> remaining 1732018176), total 2165383168)
> 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:29:42 migration status: active (transferred 973010921,
> remaining 1089216512), total 2165383168)
> 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:29:43 migration status: active (transferred 1511925476,
> remaining 483463168), total 2165383168)
> 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
> 2021-01-22 15:29:44 migration status: completed
> 2021-01-22 15:29:47 migration finished successfully (duration
> 00:00:13)
> TASK OK
> 
> 
> Then migrate it again like the first migration is working too
> 
> 
> 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> 2021-01-22 15:31:12 start remote tunnel
> 2021-01-22 15:31:13 ssh tunnel ver 1
> 2021-01-22 15:31:13 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 15:31:13 set migration_caps
> 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> 2021-01-22 15:31:13 migration downtime limit: 100 ms
> 2021-01-22 15:31:13 migration cachesize: 268435456 B
> 2021-01-22 15:31:13 set migration parameters
> 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> 2021-01-22 15:31:14 migration status: active (transferred 1092088188,
> remaining 944365568), total 2165383168)
> 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 transferred
> 0
> pages 0 cachemiss 0 overflow 0
> 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
> 2021-01-22 15:31:15 migration status: completed
> 2021-01-22 15:31:19 migration finished successfully (duration
> 00:00:12)
> TASK OK
> 
> 
> Any idea ? Maybe a specific qemu version bug ?
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-22 15:06 ` aderumier
@ 2021-01-22 18:55   ` aderumier
  2021-01-23  8:38     ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-22 18:55 UTC (permalink / raw)
  To: pve-devel

after some debug, it seem that it's hanging on 

$stat = mon_cmd($vmid, "query-migrate");




result of info migrate after the end of a migration:

# info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 9671 ms
downtime: 9595 ms
setup: 74 ms
transferred ram: 10445790 kbytes
throughput: 8916.93 mbps
remaining ram: 0 kbytes
total ram: 12600392 kbytes
duplicate: 544936 pages
skipped: 0 pages
normal: 2605162 pages
normal bytes: 10420648 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 296540
cache size: 2147483648 bytes
xbzrle transferred: 0 kbytes
xbzrle pages: 0 pages
xbzrle cache miss: 0 pages
xbzrle cache miss rate: 0.00
xbzrle encoding rate: 0.00
xbzrle overflow: 0




Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
écrit :
> I have tried to add a log to display the current status state of the
> migration,
> and It don't catch any "active" state, but "completed" directly.
> 
> Here another sample with a bigger downtime of 14s (real downtime, I
> have checked with a ping to be sure)
> 
> 
> 
> 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> 2021-01-22 16:02:55 start remote tunnel
> 2021-01-22 16:02:56 ssh tunnel ver 1
> 2021-01-22 16:02:56 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 16:02:56 set migration_caps
> 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> 2021-01-22 16:02:56 migration downtime limit: 100 ms
> 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> 2021-01-22 16:02:56 set migration parameters
> 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> 
> 
> 
> 2021-01-22 16:03:11 status: completed ---> added log
> 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms
> 2021-01-22 16:03:11 migration status: completed
> 2021-01-22 16:03:14 migration finished successfully (duration
> 00:00:21)
> TASK OK
> 
> 
> 
>     my $merr = $@;
>     $self->log('info', "migrate uri => $ruri failed: $merr") if
> $merr;
> 
>     my $lstat = 0;
>     my $usleep = 1000000;
>     my $i = 0;
>     my $err_count = 0;
>     my $lastrem = undef;
>     my $downtimecounter = 0;
>     while (1) {
>         $i++;
>         my $avglstat = $lstat ? $lstat / $i : 0;
> 
>         usleep($usleep);
>         my $stat;
>         eval {
>             $stat = mon_cmd($vmid, "query-migrate");
>         };
>         if (my $err = $@) {
>             $err_count++;
>             warn "query migrate failed: $err\n";
>             $self->log('info', "query migrate failed: $err");
>             if ($err_count <= 5) {
>                 usleep(1000000);
>                 next;
>             }
>             die "too many query migrate failures - aborting\n";
>         }
> 
>         $self->log('info', "status: $stat->{status}");   ---> added
> log
> 
> 
> Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> écrit :
> > Hi,
> > 
> > I have notice recently bigger downtime on qemu live migration.
> > (I'm not sure if it's after qemu update or qemu-server update)
> > 
> > migration: type=insecure
> > 
> >  qemu-server                          6.3-2  
> >  pve-qemu-kvm                         5.1.0-7   
> > 
> > (I'm not sure about the machine running qemu version)
> > 
> > 
> > 
> > Here a sample:
> > 
> > 
> > 
> > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:28:44 start remote tunnel
> > 2021-01-22 15:28:45 ssh tunnel ver 1
> > 2021-01-22 15:28:45 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:28:45 set migration_caps
> > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > 2021-01-22 15:28:45 set migration parameters
> > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > ms
> > 2021-01-22 15:28:47 migration status: completed
> > 2021-01-22 15:28:51 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> > 
> > That's strange because I don't see the memory transfert loop logs
> > 
> > 
> > 
> > Migrate back to original host is working
> > 
> > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > (::ffff:10.3.94.50)
> > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > 2021-01-22 15:29:39 start remote tunnel
> > 2021-01-22 15:29:40 ssh tunnel ver 1
> > 2021-01-22 15:29:40 starting online/live migration on
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:40 set migration_caps
> > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > 2021-01-22 15:29:40 set migration parameters
> > 2021-01-22 15:29:40 start migrate command to
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:41 migration status: active (transferred
> > 396107554,
> > remaining 1732018176), total 2165383168)
> > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:42 migration status: active (transferred
> > 973010921,
> > remaining 1089216512), total 2165383168)
> > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:43 migration status: active (transferred
> > 1511925476,
> > remaining 483463168), total 2165383168)
> > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
> > 2021-01-22 15:29:44 migration status: completed
> > 2021-01-22 15:29:47 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> > 
> > 
> > Then migrate it again like the first migration is working too
> > 
> > 
> > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:31:12 start remote tunnel
> > 2021-01-22 15:31:13 ssh tunnel ver 1
> > 2021-01-22 15:31:13 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:31:13 set migration_caps
> > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > 2021-01-22 15:31:13 set migration parameters
> > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:31:14 migration status: active (transferred
> > 1092088188,
> > remaining 944365568), total 2165383168)
> > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
> > 2021-01-22 15:31:15 migration status: completed
> > 2021-01-22 15:31:19 migration finished successfully (duration
> > 00:00:12)
> > TASK OK
> > 
> > 
> > Any idea ? Maybe a specific qemu version bug ?
> > 
> > 
> > 
> > 
> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-22 18:55   ` aderumier
@ 2021-01-23  8:38     ` aderumier
  2021-01-25  8:47       ` Fabian Grünbichler
  0 siblings, 1 reply; 6+ messages in thread
From: aderumier @ 2021-01-23  8:38 UTC (permalink / raw)
  To: pve-devel

about qemu version,  

theses vms was started around 6 november, after an update of the qemu
package the 4 november.


looking at proxmox repo, I think it should be 5.1.0-4 or -5.


pve-qemu-kvm-dbg_5.1.0-4_amd64.deb                 29-Oct-2020 17:28  
75705544
pve-qemu-kvm-dbg_5.1.0-5_amd64.deb                 04-Nov-2020 17:41  
75737556
pve-qemu-kvm-dbg_5.1.0-6_amd64.deb                 05-Nov-2020 18:08  
75693264


Could it be a known bug introduced by new backups dirty-bitmap patches,
and fixed later ?  (I see a -6 version one day later)



Le vendredi 22 janvier 2021 à 19:55 +0100, aderumier@odiso.com a
écrit :
> after some debug, it seem that it's hanging on 
> 
> $stat = mon_cmd($vmid, "query-migrate");
> 
> 
> 
> 
> result of info migrate after the end of a migration:
> 
> # info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 9671 ms
> downtime: 9595 ms
> setup: 74 ms
> transferred ram: 10445790 kbytes
> throughput: 8916.93 mbps
> remaining ram: 0 kbytes
> total ram: 12600392 kbytes
> duplicate: 544936 pages
> skipped: 0 pages
> normal: 2605162 pages
> normal bytes: 10420648 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 296540
> cache size: 2147483648 bytes
> xbzrle transferred: 0 kbytes
> xbzrle pages: 0 pages
> xbzrle cache miss: 0 pages
> xbzrle cache miss rate: 0.00
> xbzrle encoding rate: 0.00
> xbzrle overflow: 0
> 
> 
> 
> 
> Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
> écrit :
> > I have tried to add a log to display the current status state of
> > the
> > migration,
> > and It don't catch any "active" state, but "completed" directly.
> > 
> > Here another sample with a bigger downtime of 14s (real downtime, I
> > have checked with a ping to be sure)
> > 
> > 
> > 
> > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> > 2021-01-22 16:02:55 start remote tunnel
> > 2021-01-22 16:02:56 ssh tunnel ver 1
> > 2021-01-22 16:02:56 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 16:02:56 set migration_caps
> > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> > 2021-01-22 16:02:56 migration downtime limit: 100 ms
> > 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> > 2021-01-22 16:02:56 set migration parameters
> > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> > 
> > 
> > 
> > 2021-01-22 16:03:11 status: completed ---> added log
> > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424
> > ms
> > 2021-01-22 16:03:11 migration status: completed
> > 2021-01-22 16:03:14 migration finished successfully (duration
> > 00:00:21)
> > TASK OK
> > 
> > 
> > 
> >     my $merr = $@;
> >     $self->log('info', "migrate uri => $ruri failed: $merr") if
> > $merr;
> > 
> >     my $lstat = 0;
> >     my $usleep = 1000000;
> >     my $i = 0;
> >     my $err_count = 0;
> >     my $lastrem = undef;
> >     my $downtimecounter = 0;
> >     while (1) {
> >         $i++;
> >         my $avglstat = $lstat ? $lstat / $i : 0;
> > 
> >         usleep($usleep);
> >         my $stat;
> >         eval {
> >             $stat = mon_cmd($vmid, "query-migrate");
> >         };
> >         if (my $err = $@) {
> >             $err_count++;
> >             warn "query migrate failed: $err\n";
> >             $self->log('info', "query migrate failed: $err");
> >             if ($err_count <= 5) {
> >                 usleep(1000000);
> >                 next;
> >             }
> >             die "too many query migrate failures - aborting\n";
> >         }
> > 
> >         $self->log('info', "status: $stat->{status}");   ---> added
> > log
> > 
> > 
> > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> > écrit :
> > > Hi,
> > > 
> > > I have notice recently bigger downtime on qemu live migration.
> > > (I'm not sure if it's after qemu update or qemu-server update)
> > > 
> > > migration: type=insecure
> > > 
> > >  qemu-server                          6.3-2  
> > >  pve-qemu-kvm                         5.1.0-7   
> > > 
> > > (I'm not sure about the machine running qemu version)
> > > 
> > > 
> > > 
> > > Here a sample:
> > > 
> > > 
> > > 
> > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:28:44 start remote tunnel
> > > 2021-01-22 15:28:45 ssh tunnel ver 1
> > > 2021-01-22 15:28:45 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:45 set migration_caps
> > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > > 2021-01-22 15:28:45 set migration parameters
> > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > > ms
> > > 2021-01-22 15:28:47 migration status: completed
> > > 2021-01-22 15:28:51 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > > 
> > > That's strange because I don't see the memory transfert loop logs
> > > 
> > > 
> > > 
> > > Migrate back to original host is working
> > > 
> > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > > (::ffff:10.3.94.50)
> > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > > 2021-01-22 15:29:39 start remote tunnel
> > > 2021-01-22 15:29:40 ssh tunnel ver 1
> > > 2021-01-22 15:29:40 starting online/live migration on
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:40 set migration_caps
> > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > > 2021-01-22 15:29:40 set migration parameters
> > > 2021-01-22 15:29:40 start migrate command to
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:41 migration status: active (transferred
> > > 396107554,
> > > remaining 1732018176), total 2165383168)
> > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:42 migration status: active (transferred
> > > 973010921,
> > > remaining 1089216512), total 2165383168)
> > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:43 migration status: active (transferred
> > > 1511925476,
> > > remaining 483463168), total 2165383168)
> > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148
> > > ms
> > > 2021-01-22 15:29:44 migration status: completed
> > > 2021-01-22 15:29:47 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > > 
> > > 
> > > Then migrate it again like the first migration is working too
> > > 
> > > 
> > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:31:12 start remote tunnel
> > > 2021-01-22 15:31:13 ssh tunnel ver 1
> > > 2021-01-22 15:31:13 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:13 set migration_caps
> > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > > 2021-01-22 15:31:13 set migration parameters
> > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:14 migration status: active (transferred
> > > 1092088188,
> > > remaining 944365568), total 2165383168)
> > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55
> > > ms
> > > 2021-01-22 15:31:15 migration status: completed
> > > 2021-01-22 15:31:19 migration finished successfully (duration
> > > 00:00:12)
> > > TASK OK
> > > 
> > > 
> > > Any idea ? Maybe a specific qemu version bug ?
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> 
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-23  8:38     ` aderumier
@ 2021-01-25  8:47       ` Fabian Grünbichler
  2021-01-25  9:26         ` aderumier
  0 siblings, 1 reply; 6+ messages in thread
From: Fabian Grünbichler @ 2021-01-25  8:47 UTC (permalink / raw)
  To: pve-devel

On January 23, 2021 9:38 am, aderumier@odiso.com wrote:
> about qemu version,  
> 
> theses vms was started around 6 november, after an update of the qemu
> package the 4 november.
> 
> 
> looking at proxmox repo, I think it should be 5.1.0-4 or -5.
> 
> 
> pve-qemu-kvm-dbg_5.1.0-4_amd64.deb                 29-Oct-2020 17:28  
> 75705544
> pve-qemu-kvm-dbg_5.1.0-5_amd64.deb                 04-Nov-2020 17:41  
> 75737556
> pve-qemu-kvm-dbg_5.1.0-6_amd64.deb                 05-Nov-2020 18:08  
> 75693264
> 
> 
> Could it be a known bug introduced by new backups dirty-bitmap patches,
> and fixed later ?  (I see a -6 version one day later)
> 

pve-qemu-kvm (5.1.0-6) pve; urgency=medium

  * migration/block-dirty-bitmap: avoid telling QEMU that the bitmap migration
    is active longer than required

 -- Proxmox Support Team <support@proxmox.com>  Thu, 05 Nov 2020 18:59:40 +0100

sound like that could be the case? ;)




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu live migration: bigger downtime recently
  2021-01-25  8:47       ` Fabian Grünbichler
@ 2021-01-25  9:26         ` aderumier
  0 siblings, 0 replies; 6+ messages in thread
From: aderumier @ 2021-01-25  9:26 UTC (permalink / raw)
  To: Proxmox VE development discussion

> 
> pve-qemu-kvm (5.1.0-6) pve; urgency=medium
> 
>   * migration/block-dirty-bitmap: avoid telling QEMU that the bitmap
> migration
>     is active longer than required
> 
>  -- Proxmox Support Team <support@proxmox.com>  Thu, 05 Nov 2020
> 18:59:40 +0100
> 
> sound like that could be the case? ;)


yes, I was not sure about this.
So,I was just out of luck when I have upgraded ^_^


I have tried to change dirty-bitmaps=0 in set_migration_caps, but it
don't fix it.

So, I think I'm good to plan some migrations with downtime.

Thanks for your response !

Alexandre







^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-25  9:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 14:34 [pve-devel] qemu live migration: bigger downtime recently aderumier
2021-01-22 15:06 ` aderumier
2021-01-22 18:55   ` aderumier
2021-01-23  8:38     ` aderumier
2021-01-25  8:47       ` Fabian Grünbichler
2021-01-25  9:26         ` aderumier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal