* [pve-devel] qemu live migration: bigger downtime recently @ 2021-01-22 14:34 aderumier 2021-01-22 15:06 ` aderumier 0 siblings, 1 reply; 6+ messages in thread From: aderumier @ 2021-01-22 14:34 UTC (permalink / raw) To: pve-devel Hi, I have notice recently bigger downtime on qemu live migration. (I'm not sure if it's after qemu update or qemu-server update) migration: type=insecure qemu-server 6.3-2 pve-qemu-kvm 5.1.0-7 (I'm not sure about the machine running qemu version) Here a sample: 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13' (10.3.94.70) 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13' 2021-01-22 15:28:44 start remote tunnel 2021-01-22 15:28:45 ssh tunnel ver 1 2021-01-22 15:28:45 starting online/live migration on tcp:10.3.94.70:60000 2021-01-22 15:28:45 set migration_caps 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s 2021-01-22 15:28:45 migration downtime limit: 100 ms 2021-01-22 15:28:45 migration cachesize: 268435456 B 2021-01-22 15:28:45 set migration parameters 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 ms 2021-01-22 15:28:47 migration status: completed 2021-01-22 15:28:51 migration finished successfully (duration 00:00:13) TASK OK That's strange because I don't see the memory transfert loop logs Migrate back to original host is working 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2' (::ffff:10.3.94.50) 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2' 2021-01-22 15:29:39 start remote tunnel 2021-01-22 15:29:40 ssh tunnel ver 1 2021-01-22 15:29:40 starting online/live migration on tcp:[::ffff:10.3.94.50]:60000 2021-01-22 15:29:40 set migration_caps 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s 2021-01-22 15:29:40 migration downtime limit: 100 ms 2021-01-22 15:29:40 migration cachesize: 268435456 B 2021-01-22 15:29:40 set migration parameters 2021-01-22 15:29:40 start migrate command to tcp:[::ffff:10.3.94.50]:60000 2021-01-22 15:29:41 migration status: active (transferred 396107554, remaining 1732018176), total 2165383168) 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0 2021-01-22 15:29:42 migration status: active (transferred 973010921, remaining 1089216512), total 2165383168) 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0 2021-01-22 15:29:43 migration status: active (transferred 1511925476, remaining 483463168), total 2165383168) 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms 2021-01-22 15:29:44 migration status: completed 2021-01-22 15:29:47 migration finished successfully (duration 00:00:13) TASK OK Then migrate it again like the first migration is working too 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13' (10.3.94.70) 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13' 2021-01-22 15:31:12 start remote tunnel 2021-01-22 15:31:13 ssh tunnel ver 1 2021-01-22 15:31:13 starting online/live migration on tcp:10.3.94.70:60000 2021-01-22 15:31:13 set migration_caps 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s 2021-01-22 15:31:13 migration downtime limit: 100 ms 2021-01-22 15:31:13 migration cachesize: 268435456 B 2021-01-22 15:31:13 set migration parameters 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000 2021-01-22 15:31:14 migration status: active (transferred 1092088188, remaining 944365568), total 2165383168) 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms 2021-01-22 15:31:15 migration status: completed 2021-01-22 15:31:19 migration finished successfully (duration 00:00:12) TASK OK Any idea ? Maybe a specific qemu version bug ? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pve-devel] qemu live migration: bigger downtime recently 2021-01-22 14:34 [pve-devel] qemu live migration: bigger downtime recently aderumier @ 2021-01-22 15:06 ` aderumier 2021-01-22 18:55 ` aderumier 0 siblings, 1 reply; 6+ messages in thread From: aderumier @ 2021-01-22 15:06 UTC (permalink / raw) To: pve-devel I have tried to add a log to display the current status state of the migration, and It don't catch any "active" state, but "completed" directly. Here another sample with a bigger downtime of 14s (real downtime, I have checked with a ping to be sure) 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13' (10.3.94.70) 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13' 2021-01-22 16:02:55 start remote tunnel 2021-01-22 16:02:56 ssh tunnel ver 1 2021-01-22 16:02:56 starting online/live migration on tcp:10.3.94.70:60000 2021-01-22 16:02:56 set migration_caps 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s 2021-01-22 16:02:56 migration downtime limit: 100 ms 2021-01-22 16:02:56 migration cachesize: 2147483648 B 2021-01-22 16:02:56 set migration parameters 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000 2021-01-22 16:03:11 status: completed ---> added log 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms 2021-01-22 16:03:11 migration status: completed 2021-01-22 16:03:14 migration finished successfully (duration 00:00:21) TASK OK my $merr = $@; $self->log('info', "migrate uri => $ruri failed: $merr") if $merr; my $lstat = 0; my $usleep = 1000000; my $i = 0; my $err_count = 0; my $lastrem = undef; my $downtimecounter = 0; while (1) { $i++; my $avglstat = $lstat ? $lstat / $i : 0; usleep($usleep); my $stat; eval { $stat = mon_cmd($vmid, "query-migrate"); }; if (my $err = $@) { $err_count++; warn "query migrate failed: $err\n"; $self->log('info', "query migrate failed: $err"); if ($err_count <= 5) { usleep(1000000); next; } die "too many query migrate failures - aborting\n"; } $self->log('info', "status: $stat->{status}"); ---> added log Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a écrit : > Hi, > > I have notice recently bigger downtime on qemu live migration. > (I'm not sure if it's after qemu update or qemu-server update) > > migration: type=insecure > > qemu-server 6.3-2 > pve-qemu-kvm 5.1.0-7 > > (I'm not sure about the machine running qemu version) > > > > Here a sample: > > > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13' > (10.3.94.70) > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13' > 2021-01-22 15:28:44 start remote tunnel > 2021-01-22 15:28:45 ssh tunnel ver 1 > 2021-01-22 15:28:45 starting online/live migration on > tcp:10.3.94.70:60000 > 2021-01-22 15:28:45 set migration_caps > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s > 2021-01-22 15:28:45 migration downtime limit: 100 ms > 2021-01-22 15:28:45 migration cachesize: 268435456 B > 2021-01-22 15:28:45 set migration parameters > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000 > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 ms > 2021-01-22 15:28:47 migration status: completed > 2021-01-22 15:28:51 migration finished successfully (duration > 00:00:13) > TASK OK > > That's strange because I don't see the memory transfert loop logs > > > > Migrate back to original host is working > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2' > (::ffff:10.3.94.50) > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2' > 2021-01-22 15:29:39 start remote tunnel > 2021-01-22 15:29:40 ssh tunnel ver 1 > 2021-01-22 15:29:40 starting online/live migration on > tcp:[::ffff:10.3.94.50]:60000 > 2021-01-22 15:29:40 set migration_caps > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s > 2021-01-22 15:29:40 migration downtime limit: 100 ms > 2021-01-22 15:29:40 migration cachesize: 268435456 B > 2021-01-22 15:29:40 set migration parameters > 2021-01-22 15:29:40 start migrate command to > tcp:[::ffff:10.3.94.50]:60000 > 2021-01-22 15:29:41 migration status: active (transferred 396107554, > remaining 1732018176), total 2165383168) > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 transferred > 0 > pages 0 cachemiss 0 overflow 0 > 2021-01-22 15:29:42 migration status: active (transferred 973010921, > remaining 1089216512), total 2165383168) > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 transferred > 0 > pages 0 cachemiss 0 overflow 0 > 2021-01-22 15:29:43 migration status: active (transferred 1511925476, > remaining 483463168), total 2165383168) > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 transferred > 0 > pages 0 cachemiss 0 overflow 0 > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms > 2021-01-22 15:29:44 migration status: completed > 2021-01-22 15:29:47 migration finished successfully (duration > 00:00:13) > TASK OK > > > Then migrate it again like the first migration is working too > > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13' > (10.3.94.70) > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13' > 2021-01-22 15:31:12 start remote tunnel > 2021-01-22 15:31:13 ssh tunnel ver 1 > 2021-01-22 15:31:13 starting online/live migration on > tcp:10.3.94.70:60000 > 2021-01-22 15:31:13 set migration_caps > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s > 2021-01-22 15:31:13 migration downtime limit: 100 ms > 2021-01-22 15:31:13 migration cachesize: 268435456 B > 2021-01-22 15:31:13 set migration parameters > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000 > 2021-01-22 15:31:14 migration status: active (transferred 1092088188, > remaining 944365568), total 2165383168) > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 transferred > 0 > pages 0 cachemiss 0 overflow 0 > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms > 2021-01-22 15:31:15 migration status: completed > 2021-01-22 15:31:19 migration finished successfully (duration > 00:00:12) > TASK OK > > > Any idea ? Maybe a specific qemu version bug ? > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pve-devel] qemu live migration: bigger downtime recently 2021-01-22 15:06 ` aderumier @ 2021-01-22 18:55 ` aderumier 2021-01-23 8:38 ` aderumier 0 siblings, 1 reply; 6+ messages in thread From: aderumier @ 2021-01-22 18:55 UTC (permalink / raw) To: pve-devel after some debug, it seem that it's hanging on $stat = mon_cmd($vmid, "query-migrate"); result of info migrate after the end of a migration: # info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 9671 ms downtime: 9595 ms setup: 74 ms transferred ram: 10445790 kbytes throughput: 8916.93 mbps remaining ram: 0 kbytes total ram: 12600392 kbytes duplicate: 544936 pages skipped: 0 pages normal: 2605162 pages normal bytes: 10420648 kbytes dirty sync count: 2 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 296540 cache size: 2147483648 bytes xbzrle transferred: 0 kbytes xbzrle pages: 0 pages xbzrle cache miss: 0 pages xbzrle cache miss rate: 0.00 xbzrle encoding rate: 0.00 xbzrle overflow: 0 Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a écrit : > I have tried to add a log to display the current status state of the > migration, > and It don't catch any "active" state, but "completed" directly. > > Here another sample with a bigger downtime of 14s (real downtime, I > have checked with a ping to be sure) > > > > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13' > (10.3.94.70) > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13' > 2021-01-22 16:02:55 start remote tunnel > 2021-01-22 16:02:56 ssh tunnel ver 1 > 2021-01-22 16:02:56 starting online/live migration on > tcp:10.3.94.70:60000 > 2021-01-22 16:02:56 set migration_caps > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s > 2021-01-22 16:02:56 migration downtime limit: 100 ms > 2021-01-22 16:02:56 migration cachesize: 2147483648 B > 2021-01-22 16:02:56 set migration parameters > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000 > > > > 2021-01-22 16:03:11 status: completed ---> added log > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms > 2021-01-22 16:03:11 migration status: completed > 2021-01-22 16:03:14 migration finished successfully (duration > 00:00:21) > TASK OK > > > > my $merr = $@; > $self->log('info', "migrate uri => $ruri failed: $merr") if > $merr; > > my $lstat = 0; > my $usleep = 1000000; > my $i = 0; > my $err_count = 0; > my $lastrem = undef; > my $downtimecounter = 0; > while (1) { > $i++; > my $avglstat = $lstat ? $lstat / $i : 0; > > usleep($usleep); > my $stat; > eval { > $stat = mon_cmd($vmid, "query-migrate"); > }; > if (my $err = $@) { > $err_count++; > warn "query migrate failed: $err\n"; > $self->log('info', "query migrate failed: $err"); > if ($err_count <= 5) { > usleep(1000000); > next; > } > die "too many query migrate failures - aborting\n"; > } > > $self->log('info', "status: $stat->{status}"); ---> added > log > > > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a > écrit : > > Hi, > > > > I have notice recently bigger downtime on qemu live migration. > > (I'm not sure if it's after qemu update or qemu-server update) > > > > migration: type=insecure > > > > qemu-server 6.3-2 > > pve-qemu-kvm 5.1.0-7 > > > > (I'm not sure about the machine running qemu version) > > > > > > > > Here a sample: > > > > > > > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13' > > (10.3.94.70) > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13' > > 2021-01-22 15:28:44 start remote tunnel > > 2021-01-22 15:28:45 ssh tunnel ver 1 > > 2021-01-22 15:28:45 starting online/live migration on > > tcp:10.3.94.70:60000 > > 2021-01-22 15:28:45 set migration_caps > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s > > 2021-01-22 15:28:45 migration downtime limit: 100 ms > > 2021-01-22 15:28:45 migration cachesize: 268435456 B > > 2021-01-22 15:28:45 set migration parameters > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000 > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 > > ms > > 2021-01-22 15:28:47 migration status: completed > > 2021-01-22 15:28:51 migration finished successfully (duration > > 00:00:13) > > TASK OK > > > > That's strange because I don't see the memory transfert loop logs > > > > > > > > Migrate back to original host is working > > > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2' > > (::ffff:10.3.94.50) > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2' > > 2021-01-22 15:29:39 start remote tunnel > > 2021-01-22 15:29:40 ssh tunnel ver 1 > > 2021-01-22 15:29:40 starting online/live migration on > > tcp:[::ffff:10.3.94.50]:60000 > > 2021-01-22 15:29:40 set migration_caps > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s > > 2021-01-22 15:29:40 migration downtime limit: 100 ms > > 2021-01-22 15:29:40 migration cachesize: 268435456 B > > 2021-01-22 15:29:40 set migration parameters > > 2021-01-22 15:29:40 start migrate command to > > tcp:[::ffff:10.3.94.50]:60000 > > 2021-01-22 15:29:41 migration status: active (transferred > > 396107554, > > remaining 1732018176), total 2165383168) > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 > > transferred > > 0 > > pages 0 cachemiss 0 overflow 0 > > 2021-01-22 15:29:42 migration status: active (transferred > > 973010921, > > remaining 1089216512), total 2165383168) > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 > > transferred > > 0 > > pages 0 cachemiss 0 overflow 0 > > 2021-01-22 15:29:43 migration status: active (transferred > > 1511925476, > > remaining 483463168), total 2165383168) > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 > > transferred > > 0 > > pages 0 cachemiss 0 overflow 0 > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms > > 2021-01-22 15:29:44 migration status: completed > > 2021-01-22 15:29:47 migration finished successfully (duration > > 00:00:13) > > TASK OK > > > > > > Then migrate it again like the first migration is working too > > > > > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13' > > (10.3.94.70) > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13' > > 2021-01-22 15:31:12 start remote tunnel > > 2021-01-22 15:31:13 ssh tunnel ver 1 > > 2021-01-22 15:31:13 starting online/live migration on > > tcp:10.3.94.70:60000 > > 2021-01-22 15:31:13 set migration_caps > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s > > 2021-01-22 15:31:13 migration downtime limit: 100 ms > > 2021-01-22 15:31:13 migration cachesize: 268435456 B > > 2021-01-22 15:31:13 set migration parameters > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000 > > 2021-01-22 15:31:14 migration status: active (transferred > > 1092088188, > > remaining 944365568), total 2165383168) > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 > > transferred > > 0 > > pages 0 cachemiss 0 overflow 0 > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms > > 2021-01-22 15:31:15 migration status: completed > > 2021-01-22 15:31:19 migration finished successfully (duration > > 00:00:12) > > TASK OK > > > > > > Any idea ? Maybe a specific qemu version bug ? > > > > > > > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pve-devel] qemu live migration: bigger downtime recently 2021-01-22 18:55 ` aderumier @ 2021-01-23 8:38 ` aderumier 2021-01-25 8:47 ` Fabian Grünbichler 0 siblings, 1 reply; 6+ messages in thread From: aderumier @ 2021-01-23 8:38 UTC (permalink / raw) To: pve-devel about qemu version, theses vms was started around 6 november, after an update of the qemu package the 4 november. looking at proxmox repo, I think it should be 5.1.0-4 or -5. pve-qemu-kvm-dbg_5.1.0-4_amd64.deb 29-Oct-2020 17:28 75705544 pve-qemu-kvm-dbg_5.1.0-5_amd64.deb 04-Nov-2020 17:41 75737556 pve-qemu-kvm-dbg_5.1.0-6_amd64.deb 05-Nov-2020 18:08 75693264 Could it be a known bug introduced by new backups dirty-bitmap patches, and fixed later ? (I see a -6 version one day later) Le vendredi 22 janvier 2021 à 19:55 +0100, aderumier@odiso.com a écrit : > after some debug, it seem that it's hanging on > > $stat = mon_cmd($vmid, "query-migrate"); > > > > > result of info migrate after the end of a migration: > > # info migrate > globals: > store-global-state: on > only-migratable: off > send-configuration: on > send-section-footer: on > decompress-error-check: on > clear-bitmap-shift: 18 > Migration status: completed > total time: 9671 ms > downtime: 9595 ms > setup: 74 ms > transferred ram: 10445790 kbytes > throughput: 8916.93 mbps > remaining ram: 0 kbytes > total ram: 12600392 kbytes > duplicate: 544936 pages > skipped: 0 pages > normal: 2605162 pages > normal bytes: 10420648 kbytes > dirty sync count: 2 > page size: 4 kbytes > multifd bytes: 0 kbytes > pages-per-second: 296540 > cache size: 2147483648 bytes > xbzrle transferred: 0 kbytes > xbzrle pages: 0 pages > xbzrle cache miss: 0 pages > xbzrle cache miss rate: 0.00 > xbzrle encoding rate: 0.00 > xbzrle overflow: 0 > > > > > Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a > écrit : > > I have tried to add a log to display the current status state of > > the > > migration, > > and It don't catch any "active" state, but "completed" directly. > > > > Here another sample with a bigger downtime of 14s (real downtime, I > > have checked with a ping to be sure) > > > > > > > > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13' > > (10.3.94.70) > > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13' > > 2021-01-22 16:02:55 start remote tunnel > > 2021-01-22 16:02:56 ssh tunnel ver 1 > > 2021-01-22 16:02:56 starting online/live migration on > > tcp:10.3.94.70:60000 > > 2021-01-22 16:02:56 set migration_caps > > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s > > 2021-01-22 16:02:56 migration downtime limit: 100 ms > > 2021-01-22 16:02:56 migration cachesize: 2147483648 B > > 2021-01-22 16:02:56 set migration parameters > > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000 > > > > > > > > 2021-01-22 16:03:11 status: completed ---> added log > > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 > > ms > > 2021-01-22 16:03:11 migration status: completed > > 2021-01-22 16:03:14 migration finished successfully (duration > > 00:00:21) > > TASK OK > > > > > > > > my $merr = $@; > > $self->log('info', "migrate uri => $ruri failed: $merr") if > > $merr; > > > > my $lstat = 0; > > my $usleep = 1000000; > > my $i = 0; > > my $err_count = 0; > > my $lastrem = undef; > > my $downtimecounter = 0; > > while (1) { > > $i++; > > my $avglstat = $lstat ? $lstat / $i : 0; > > > > usleep($usleep); > > my $stat; > > eval { > > $stat = mon_cmd($vmid, "query-migrate"); > > }; > > if (my $err = $@) { > > $err_count++; > > warn "query migrate failed: $err\n"; > > $self->log('info', "query migrate failed: $err"); > > if ($err_count <= 5) { > > usleep(1000000); > > next; > > } > > die "too many query migrate failures - aborting\n"; > > } > > > > $self->log('info', "status: $stat->{status}"); ---> added > > log > > > > > > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a > > écrit : > > > Hi, > > > > > > I have notice recently bigger downtime on qemu live migration. > > > (I'm not sure if it's after qemu update or qemu-server update) > > > > > > migration: type=insecure > > > > > > qemu-server 6.3-2 > > > pve-qemu-kvm 5.1.0-7 > > > > > > (I'm not sure about the machine running qemu version) > > > > > > > > > > > > Here a sample: > > > > > > > > > > > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13' > > > (10.3.94.70) > > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13' > > > 2021-01-22 15:28:44 start remote tunnel > > > 2021-01-22 15:28:45 ssh tunnel ver 1 > > > 2021-01-22 15:28:45 starting online/live migration on > > > tcp:10.3.94.70:60000 > > > 2021-01-22 15:28:45 set migration_caps > > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s > > > 2021-01-22 15:28:45 migration downtime limit: 100 ms > > > 2021-01-22 15:28:45 migration cachesize: 268435456 B > > > 2021-01-22 15:28:45 set migration parameters > > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000 > > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 > > > ms > > > 2021-01-22 15:28:47 migration status: completed > > > 2021-01-22 15:28:51 migration finished successfully (duration > > > 00:00:13) > > > TASK OK > > > > > > That's strange because I don't see the memory transfert loop logs > > > > > > > > > > > > Migrate back to original host is working > > > > > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2' > > > (::ffff:10.3.94.50) > > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2' > > > 2021-01-22 15:29:39 start remote tunnel > > > 2021-01-22 15:29:40 ssh tunnel ver 1 > > > 2021-01-22 15:29:40 starting online/live migration on > > > tcp:[::ffff:10.3.94.50]:60000 > > > 2021-01-22 15:29:40 set migration_caps > > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s > > > 2021-01-22 15:29:40 migration downtime limit: 100 ms > > > 2021-01-22 15:29:40 migration cachesize: 268435456 B > > > 2021-01-22 15:29:40 set migration parameters > > > 2021-01-22 15:29:40 start migrate command to > > > tcp:[::ffff:10.3.94.50]:60000 > > > 2021-01-22 15:29:41 migration status: active (transferred > > > 396107554, > > > remaining 1732018176), total 2165383168) > > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:29:42 migration status: active (transferred > > > 973010921, > > > remaining 1089216512), total 2165383168) > > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:29:43 migration status: active (transferred > > > 1511925476, > > > remaining 483463168), total 2165383168) > > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 > > > ms > > > 2021-01-22 15:29:44 migration status: completed > > > 2021-01-22 15:29:47 migration finished successfully (duration > > > 00:00:13) > > > TASK OK > > > > > > > > > Then migrate it again like the first migration is working too > > > > > > > > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13' > > > (10.3.94.70) > > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13' > > > 2021-01-22 15:31:12 start remote tunnel > > > 2021-01-22 15:31:13 ssh tunnel ver 1 > > > 2021-01-22 15:31:13 starting online/live migration on > > > tcp:10.3.94.70:60000 > > > 2021-01-22 15:31:13 set migration_caps > > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s > > > 2021-01-22 15:31:13 migration downtime limit: 100 ms > > > 2021-01-22 15:31:13 migration cachesize: 268435456 B > > > 2021-01-22 15:31:13 set migration parameters > > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000 > > > 2021-01-22 15:31:14 migration status: active (transferred > > > 1092088188, > > > remaining 944365568), total 2165383168) > > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 > > > ms > > > 2021-01-22 15:31:15 migration status: completed > > > 2021-01-22 15:31:19 migration finished successfully (duration > > > 00:00:12) > > > TASK OK > > > > > > > > > Any idea ? Maybe a specific qemu version bug ? > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pve-devel] qemu live migration: bigger downtime recently 2021-01-23 8:38 ` aderumier @ 2021-01-25 8:47 ` Fabian Grünbichler 2021-01-25 9:26 ` aderumier 0 siblings, 1 reply; 6+ messages in thread From: Fabian Grünbichler @ 2021-01-25 8:47 UTC (permalink / raw) To: pve-devel On January 23, 2021 9:38 am, aderumier@odiso.com wrote: > about qemu version, > > theses vms was started around 6 november, after an update of the qemu > package the 4 november. > > > looking at proxmox repo, I think it should be 5.1.0-4 or -5. > > > pve-qemu-kvm-dbg_5.1.0-4_amd64.deb 29-Oct-2020 17:28 > 75705544 > pve-qemu-kvm-dbg_5.1.0-5_amd64.deb 04-Nov-2020 17:41 > 75737556 > pve-qemu-kvm-dbg_5.1.0-6_amd64.deb 05-Nov-2020 18:08 > 75693264 > > > Could it be a known bug introduced by new backups dirty-bitmap patches, > and fixed later ? (I see a -6 version one day later) > pve-qemu-kvm (5.1.0-6) pve; urgency=medium * migration/block-dirty-bitmap: avoid telling QEMU that the bitmap migration is active longer than required -- Proxmox Support Team <support@proxmox.com> Thu, 05 Nov 2020 18:59:40 +0100 sound like that could be the case? ;) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pve-devel] qemu live migration: bigger downtime recently 2021-01-25 8:47 ` Fabian Grünbichler @ 2021-01-25 9:26 ` aderumier 0 siblings, 0 replies; 6+ messages in thread From: aderumier @ 2021-01-25 9:26 UTC (permalink / raw) To: Proxmox VE development discussion > > pve-qemu-kvm (5.1.0-6) pve; urgency=medium > > * migration/block-dirty-bitmap: avoid telling QEMU that the bitmap > migration > is active longer than required > > -- Proxmox Support Team <support@proxmox.com> Thu, 05 Nov 2020 > 18:59:40 +0100 > > sound like that could be the case? ;) yes, I was not sure about this. So,I was just out of luck when I have upgraded ^_^ I have tried to change dirty-bitmaps=0 in set_migration_caps, but it don't fix it. So, I think I'm good to plan some migrations with downtime. Thanks for your response ! Alexandre ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-01-25 9:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-22 14:34 [pve-devel] qemu live migration: bigger downtime recently aderumier 2021-01-22 15:06 ` aderumier 2021-01-22 18:55 ` aderumier 2021-01-23 8:38 ` aderumier 2021-01-25 8:47 ` Fabian Grünbichler 2021-01-25 9:26 ` aderumier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox