From: aderumier@odiso.com
To: pve-devel <pve-devel@pve.proxmox.com>
Subject: Re: [pve-devel] qemu live migration: bigger downtime recently
Date: Sat, 23 Jan 2021 09:38:16 +0100 [thread overview]
Message-ID: <8cdf4d6536899de1c6a6a43ff7fa21e28ac87331.camel@odiso.com> (raw)
In-Reply-To: <50ed1cad64907f845b0b545fdebf3af8ede41c7b.camel@odiso.com>
about qemu version,
theses vms was started around 6 november, after an update of the qemu
package the 4 november.
looking at proxmox repo, I think it should be 5.1.0-4 or -5.
pve-qemu-kvm-dbg_5.1.0-4_amd64.deb 29-Oct-2020 17:28
75705544
pve-qemu-kvm-dbg_5.1.0-5_amd64.deb 04-Nov-2020 17:41
75737556
pve-qemu-kvm-dbg_5.1.0-6_amd64.deb 05-Nov-2020 18:08
75693264
Could it be a known bug introduced by new backups dirty-bitmap patches,
and fixed later ? (I see a -6 version one day later)
Le vendredi 22 janvier 2021 à 19:55 +0100, aderumier@odiso.com a
écrit :
> after some debug, it seem that it's hanging on
>
> $stat = mon_cmd($vmid, "query-migrate");
>
>
>
>
> result of info migrate after the end of a migration:
>
> # info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 9671 ms
> downtime: 9595 ms
> setup: 74 ms
> transferred ram: 10445790 kbytes
> throughput: 8916.93 mbps
> remaining ram: 0 kbytes
> total ram: 12600392 kbytes
> duplicate: 544936 pages
> skipped: 0 pages
> normal: 2605162 pages
> normal bytes: 10420648 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 296540
> cache size: 2147483648 bytes
> xbzrle transferred: 0 kbytes
> xbzrle pages: 0 pages
> xbzrle cache miss: 0 pages
> xbzrle cache miss rate: 0.00
> xbzrle encoding rate: 0.00
> xbzrle overflow: 0
>
>
>
>
> Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
> écrit :
> > I have tried to add a log to display the current status state of
> > the
> > migration,
> > and It don't catch any "active" state, but "completed" directly.
> >
> > Here another sample with a bigger downtime of 14s (real downtime, I
> > have checked with a ping to be sure)
> >
> >
> >
> > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> > 2021-01-22 16:02:55 start remote tunnel
> > 2021-01-22 16:02:56 ssh tunnel ver 1
> > 2021-01-22 16:02:56 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 16:02:56 set migration_caps
> > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> > 2021-01-22 16:02:56 migration downtime limit: 100 ms
> > 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> > 2021-01-22 16:02:56 set migration parameters
> > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> >
> >
> >
> > 2021-01-22 16:03:11 status: completed ---> added log
> > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424
> > ms
> > 2021-01-22 16:03:11 migration status: completed
> > 2021-01-22 16:03:14 migration finished successfully (duration
> > 00:00:21)
> > TASK OK
> >
> >
> >
> > my $merr = $@;
> > $self->log('info', "migrate uri => $ruri failed: $merr") if
> > $merr;
> >
> > my $lstat = 0;
> > my $usleep = 1000000;
> > my $i = 0;
> > my $err_count = 0;
> > my $lastrem = undef;
> > my $downtimecounter = 0;
> > while (1) {
> > $i++;
> > my $avglstat = $lstat ? $lstat / $i : 0;
> >
> > usleep($usleep);
> > my $stat;
> > eval {
> > $stat = mon_cmd($vmid, "query-migrate");
> > };
> > if (my $err = $@) {
> > $err_count++;
> > warn "query migrate failed: $err\n";
> > $self->log('info', "query migrate failed: $err");
> > if ($err_count <= 5) {
> > usleep(1000000);
> > next;
> > }
> > die "too many query migrate failures - aborting\n";
> > }
> >
> > $self->log('info', "status: $stat->{status}"); ---> added
> > log
> >
> >
> > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> > écrit :
> > > Hi,
> > >
> > > I have notice recently bigger downtime on qemu live migration.
> > > (I'm not sure if it's after qemu update or qemu-server update)
> > >
> > > migration: type=insecure
> > >
> > > qemu-server 6.3-2
> > > pve-qemu-kvm 5.1.0-7
> > >
> > > (I'm not sure about the machine running qemu version)
> > >
> > >
> > >
> > > Here a sample:
> > >
> > >
> > >
> > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:28:44 start remote tunnel
> > > 2021-01-22 15:28:45 ssh tunnel ver 1
> > > 2021-01-22 15:28:45 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:45 set migration_caps
> > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > > 2021-01-22 15:28:45 set migration parameters
> > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > > ms
> > > 2021-01-22 15:28:47 migration status: completed
> > > 2021-01-22 15:28:51 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > >
> > > That's strange because I don't see the memory transfert loop logs
> > >
> > >
> > >
> > > Migrate back to original host is working
> > >
> > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > > (::ffff:10.3.94.50)
> > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > > 2021-01-22 15:29:39 start remote tunnel
> > > 2021-01-22 15:29:40 ssh tunnel ver 1
> > > 2021-01-22 15:29:40 starting online/live migration on
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:40 set migration_caps
> > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > > 2021-01-22 15:29:40 set migration parameters
> > > 2021-01-22 15:29:40 start migrate command to
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:41 migration status: active (transferred
> > > 396107554,
> > > remaining 1732018176), total 2165383168)
> > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:42 migration status: active (transferred
> > > 973010921,
> > > remaining 1089216512), total 2165383168)
> > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:43 migration status: active (transferred
> > > 1511925476,
> > > remaining 483463168), total 2165383168)
> > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148
> > > ms
> > > 2021-01-22 15:29:44 migration status: completed
> > > 2021-01-22 15:29:47 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > >
> > >
> > > Then migrate it again like the first migration is working too
> > >
> > >
> > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:31:12 start remote tunnel
> > > 2021-01-22 15:31:13 ssh tunnel ver 1
> > > 2021-01-22 15:31:13 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:13 set migration_caps
> > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > > 2021-01-22 15:31:13 set migration parameters
> > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:14 migration status: active (transferred
> > > 1092088188,
> > > remaining 944365568), total 2165383168)
> > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55
> > > ms
> > > 2021-01-22 15:31:15 migration status: completed
> > > 2021-01-22 15:31:19 migration finished successfully (duration
> > > 00:00:12)
> > > TASK OK
> > >
> > >
> > > Any idea ? Maybe a specific qemu version bug ?
> > >
> > >
> > >
> > >
> >
> >
>
>
next prev parent reply other threads:[~2021-01-23 8:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-22 14:34 aderumier
2021-01-22 15:06 ` aderumier
2021-01-22 18:55 ` aderumier
2021-01-23 8:38 ` aderumier [this message]
2021-01-25 8:47 ` Fabian Grünbichler
2021-01-25 9:26 ` aderumier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8cdf4d6536899de1c6a6a43ff7fa21e28ac87331.camel@odiso.com \
--to=aderumier@odiso.com \
--cc=pve-devel@pve.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox