all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: aderumier@odiso.com
To: pve-devel <pve-devel@pve.proxmox.com>
Subject: Re: [pve-devel] qemu live migration: bigger downtime recently
Date: Sat, 23 Jan 2021 09:38:16 +0100	[thread overview]
Message-ID: <8cdf4d6536899de1c6a6a43ff7fa21e28ac87331.camel@odiso.com> (raw)
In-Reply-To: <50ed1cad64907f845b0b545fdebf3af8ede41c7b.camel@odiso.com>

about qemu version,  

theses vms was started around 6 november, after an update of the qemu
package the 4 november.


looking at proxmox repo, I think it should be 5.1.0-4 or -5.


pve-qemu-kvm-dbg_5.1.0-4_amd64.deb                 29-Oct-2020 17:28  
75705544
pve-qemu-kvm-dbg_5.1.0-5_amd64.deb                 04-Nov-2020 17:41  
75737556
pve-qemu-kvm-dbg_5.1.0-6_amd64.deb                 05-Nov-2020 18:08  
75693264


Could it be a known bug introduced by new backups dirty-bitmap patches,
and fixed later ?  (I see a -6 version one day later)



Le vendredi 22 janvier 2021 à 19:55 +0100, aderumier@odiso.com a
écrit :
> after some debug, it seem that it's hanging on 
> 
> $stat = mon_cmd($vmid, "query-migrate");
> 
> 
> 
> 
> result of info migrate after the end of a migration:
> 
> # info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 9671 ms
> downtime: 9595 ms
> setup: 74 ms
> transferred ram: 10445790 kbytes
> throughput: 8916.93 mbps
> remaining ram: 0 kbytes
> total ram: 12600392 kbytes
> duplicate: 544936 pages
> skipped: 0 pages
> normal: 2605162 pages
> normal bytes: 10420648 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 296540
> cache size: 2147483648 bytes
> xbzrle transferred: 0 kbytes
> xbzrle pages: 0 pages
> xbzrle cache miss: 0 pages
> xbzrle cache miss rate: 0.00
> xbzrle encoding rate: 0.00
> xbzrle overflow: 0
> 
> 
> 
> 
> Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
> écrit :
> > I have tried to add a log to display the current status state of
> > the
> > migration,
> > and It don't catch any "active" state, but "completed" directly.
> > 
> > Here another sample with a bigger downtime of 14s (real downtime, I
> > have checked with a ping to be sure)
> > 
> > 
> > 
> > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> > 2021-01-22 16:02:55 start remote tunnel
> > 2021-01-22 16:02:56 ssh tunnel ver 1
> > 2021-01-22 16:02:56 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 16:02:56 set migration_caps
> > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> > 2021-01-22 16:02:56 migration downtime limit: 100 ms
> > 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> > 2021-01-22 16:02:56 set migration parameters
> > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> > 
> > 
> > 
> > 2021-01-22 16:03:11 status: completed ---> added log
> > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424
> > ms
> > 2021-01-22 16:03:11 migration status: completed
> > 2021-01-22 16:03:14 migration finished successfully (duration
> > 00:00:21)
> > TASK OK
> > 
> > 
> > 
> >     my $merr = $@;
> >     $self->log('info', "migrate uri => $ruri failed: $merr") if
> > $merr;
> > 
> >     my $lstat = 0;
> >     my $usleep = 1000000;
> >     my $i = 0;
> >     my $err_count = 0;
> >     my $lastrem = undef;
> >     my $downtimecounter = 0;
> >     while (1) {
> >         $i++;
> >         my $avglstat = $lstat ? $lstat / $i : 0;
> > 
> >         usleep($usleep);
> >         my $stat;
> >         eval {
> >             $stat = mon_cmd($vmid, "query-migrate");
> >         };
> >         if (my $err = $@) {
> >             $err_count++;
> >             warn "query migrate failed: $err\n";
> >             $self->log('info', "query migrate failed: $err");
> >             if ($err_count <= 5) {
> >                 usleep(1000000);
> >                 next;
> >             }
> >             die "too many query migrate failures - aborting\n";
> >         }
> > 
> >         $self->log('info', "status: $stat->{status}");   ---> added
> > log
> > 
> > 
> > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> > écrit :
> > > Hi,
> > > 
> > > I have notice recently bigger downtime on qemu live migration.
> > > (I'm not sure if it's after qemu update or qemu-server update)
> > > 
> > > migration: type=insecure
> > > 
> > >  qemu-server                          6.3-2  
> > >  pve-qemu-kvm                         5.1.0-7   
> > > 
> > > (I'm not sure about the machine running qemu version)
> > > 
> > > 
> > > 
> > > Here a sample:
> > > 
> > > 
> > > 
> > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:28:44 start remote tunnel
> > > 2021-01-22 15:28:45 ssh tunnel ver 1
> > > 2021-01-22 15:28:45 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:45 set migration_caps
> > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > > 2021-01-22 15:28:45 set migration parameters
> > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > > ms
> > > 2021-01-22 15:28:47 migration status: completed
> > > 2021-01-22 15:28:51 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > > 
> > > That's strange because I don't see the memory transfert loop logs
> > > 
> > > 
> > > 
> > > Migrate back to original host is working
> > > 
> > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > > (::ffff:10.3.94.50)
> > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > > 2021-01-22 15:29:39 start remote tunnel
> > > 2021-01-22 15:29:40 ssh tunnel ver 1
> > > 2021-01-22 15:29:40 starting online/live migration on
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:40 set migration_caps
> > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > > 2021-01-22 15:29:40 set migration parameters
> > > 2021-01-22 15:29:40 start migrate command to
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:41 migration status: active (transferred
> > > 396107554,
> > > remaining 1732018176), total 2165383168)
> > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:42 migration status: active (transferred
> > > 973010921,
> > > remaining 1089216512), total 2165383168)
> > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:43 migration status: active (transferred
> > > 1511925476,
> > > remaining 483463168), total 2165383168)
> > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148
> > > ms
> > > 2021-01-22 15:29:44 migration status: completed
> > > 2021-01-22 15:29:47 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > > 
> > > 
> > > Then migrate it again like the first migration is working too
> > > 
> > > 
> > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:31:12 start remote tunnel
> > > 2021-01-22 15:31:13 ssh tunnel ver 1
> > > 2021-01-22 15:31:13 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:13 set migration_caps
> > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > > 2021-01-22 15:31:13 set migration parameters
> > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:14 migration status: active (transferred
> > > 1092088188,
> > > remaining 944365568), total 2165383168)
> > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55
> > > ms
> > > 2021-01-22 15:31:15 migration status: completed
> > > 2021-01-22 15:31:19 migration finished successfully (duration
> > > 00:00:12)
> > > TASK OK
> > > 
> > > 
> > > Any idea ? Maybe a specific qemu version bug ?
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> 
> 





  reply	other threads:[~2021-01-23  8:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-22 14:34 aderumier
2021-01-22 15:06 ` aderumier
2021-01-22 18:55   ` aderumier
2021-01-23  8:38     ` aderumier [this message]
2021-01-25  8:47       ` Fabian Grünbichler
2021-01-25  9:26         ` aderumier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8cdf4d6536899de1c6a6a43ff7fa21e28ac87331.camel@odiso.com \
    --to=aderumier@odiso.com \
    --cc=pve-devel@pve.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal