From: aderumier@odiso.com
To: pve-devel <pve-devel@pve.proxmox.com>
Subject: Re: [pve-devel] qemu live migration: bigger downtime recently
Date: Sat, 23 Jan 2021 09:38:16 +0100 [thread overview]
Message-ID: <8cdf4d6536899de1c6a6a43ff7fa21e28ac87331.camel@odiso.com> (raw)
In-Reply-To: <50ed1cad64907f845b0b545fdebf3af8ede41c7b.camel@odiso.com>
about qemu version,
theses vms was started around 6 november, after an update of the qemu
package the 4 november.
looking at proxmox repo, I think it should be 5.1.0-4 or -5.
pve-qemu-kvm-dbg_5.1.0-4_amd64.deb 29-Oct-2020 17:28
75705544
pve-qemu-kvm-dbg_5.1.0-5_amd64.deb 04-Nov-2020 17:41
75737556
pve-qemu-kvm-dbg_5.1.0-6_amd64.deb 05-Nov-2020 18:08
75693264
Could it be a known bug introduced by new backups dirty-bitmap patches,
and fixed later ? (I see a -6 version one day later)
Le vendredi 22 janvier 2021 à 19:55 +0100, aderumier@odiso.com a
écrit :
> after some debug, it seem that it's hanging on
>
> $stat = mon_cmd($vmid, "query-migrate");
>
>
>
>
> result of info migrate after the end of a migration:
>
> # info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 9671 ms
> downtime: 9595 ms
> setup: 74 ms
> transferred ram: 10445790 kbytes
> throughput: 8916.93 mbps
> remaining ram: 0 kbytes
> total ram: 12600392 kbytes
> duplicate: 544936 pages
> skipped: 0 pages
> normal: 2605162 pages
> normal bytes: 10420648 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 296540
> cache size: 2147483648 bytes
> xbzrle transferred: 0 kbytes
> xbzrle pages: 0 pages
> xbzrle cache miss: 0 pages
> xbzrle cache miss rate: 0.00
> xbzrle encoding rate: 0.00
> xbzrle overflow: 0
>
>
>
>
> Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
> écrit :
> > I have tried to add a log to display the current status state of
> > the
> > migration,
> > and It don't catch any "active" state, but "completed" directly.
> >
> > Here another sample with a bigger downtime of 14s (real downtime, I
> > have checked with a ping to be sure)
> >
> >
> >
> > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> > 2021-01-22 16:02:55 start remote tunnel
> > 2021-01-22 16:02:56 ssh tunnel ver 1
> > 2021-01-22 16:02:56 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 16:02:56 set migration_caps
> > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> > 2021-01-22 16:02:56 migration downtime limit: 100 ms
> > 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> > 2021-01-22 16:02:56 set migration parameters
> > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> >
> >
> >
> > 2021-01-22 16:03:11 status: completed ---> added log
> > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424
> > ms
> > 2021-01-22 16:03:11 migration status: completed
> > 2021-01-22 16:03:14 migration finished successfully (duration
> > 00:00:21)
> > TASK OK
> >
> >
> >
> > my $merr = $@;
> > $self->log('info', "migrate uri => $ruri failed: $merr") if
> > $merr;
> >
> > my $lstat = 0;
> > my $usleep = 1000000;
> > my $i = 0;
> > my $err_count = 0;
> > my $lastrem = undef;
> > my $downtimecounter = 0;
> > while (1) {
> > $i++;
> > my $avglstat = $lstat ? $lstat / $i : 0;
> >
> > usleep($usleep);
> > my $stat;
> > eval {
> > $stat = mon_cmd($vmid, "query-migrate");
> > };
> > if (my $err = $@) {
> > $err_count++;
> > warn "query migrate failed: $err\n";
> > $self->log('info', "query migrate failed: $err");
> > if ($err_count <= 5) {
> > usleep(1000000);
> > next;
> > }
> > die "too many query migrate failures - aborting\n";
> > }
> >
> > $self->log('info', "status: $stat->{status}"); ---> added
> > log
> >
> >
> > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> > écrit :
> > > Hi,
> > >
> > > I have notice recently bigger downtime on qemu live migration.
> > > (I'm not sure if it's after qemu update or qemu-server update)
> > >
> > > migration: type=insecure
> > >
> > > qemu-server 6.3-2
> > > pve-qemu-kvm 5.1.0-7
> > >
> > > (I'm not sure about the machine running qemu version)
> > >
> > >
> > >
> > > Here a sample:
> > >
> > >
> > >
> > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:28:44 start remote tunnel
> > > 2021-01-22 15:28:45 ssh tunnel ver 1
> > > 2021-01-22 15:28:45 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:45 set migration_caps
> > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > > 2021-01-22 15:28:45 set migration parameters
> > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > > ms
> > > 2021-01-22 15:28:47 migration status: completed
> > > 2021-01-22 15:28:51 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > >
> > > That's strange because I don't see the memory transfert loop logs
> > >
> > >
> > >
> > > Migrate back to original host is working
> > >
> > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > > (::ffff:10.3.94.50)
> > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > > 2021-01-22 15:29:39 start remote tunnel
> > > 2021-01-22 15:29:40 ssh tunnel ver 1
> > > 2021-01-22 15:29:40 starting online/live migration on
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:40 set migration_caps
> > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > > 2021-01-22 15:29:40 set migration parameters
> > > 2021-01-22 15:29:40 start migrate command to
> > > tcp:[::ffff:10.3.94.50]:60000
> > > 2021-01-22 15:29:41 migration status: active (transferred
> > > 396107554,
> > > remaining 1732018176), total 2165383168)
> > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:42 migration status: active (transferred
> > > 973010921,
> > > remaining 1089216512), total 2165383168)
> > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:43 migration status: active (transferred
> > > 1511925476,
> > > remaining 483463168), total 2165383168)
> > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148
> > > ms
> > > 2021-01-22 15:29:44 migration status: completed
> > > 2021-01-22 15:29:47 migration finished successfully (duration
> > > 00:00:13)
> > > TASK OK
> > >
> > >
> > > Then migrate it again like the first migration is working too
> > >
> > >
> > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > > (10.3.94.70)
> > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > > 2021-01-22 15:31:12 start remote tunnel
> > > 2021-01-22 15:31:13 ssh tunnel ver 1
> > > 2021-01-22 15:31:13 starting online/live migration on
> > > tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:13 set migration_caps
> > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > > 2021-01-22 15:31:13 set migration parameters
> > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > > 2021-01-22 15:31:14 migration status: active (transferred
> > > 1092088188,
> > > remaining 944365568), total 2165383168)
> > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > > transferred
> > > 0
> > > pages 0 cachemiss 0 overflow 0
> > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55
> > > ms
> > > 2021-01-22 15:31:15 migration status: completed
> > > 2021-01-22 15:31:19 migration finished successfully (duration
> > > 00:00:12)
> > > TASK OK
> > >
> > >
> > > Any idea ? Maybe a specific qemu version bug ?
> > >
> > >
> > >
> > >
> >
> >
>
>
next prev parent reply other threads:[~2021-01-23 8:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-22 14:34 aderumier
2021-01-22 15:06 ` aderumier
2021-01-22 18:55 ` aderumier
2021-01-23 8:38 ` aderumier [this message]
2021-01-25 8:47 ` Fabian Grünbichler
2021-01-25 9:26 ` aderumier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8cdf4d6536899de1c6a6a43ff7fa21e28ac87331.camel@odiso.com \
--to=aderumier@odiso.com \
--cc=pve-devel@pve.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal