From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id EAE766A61D for ; Sat, 23 Jan 2021 09:38:32 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id DDAA1228F6 for ; Sat, 23 Jan 2021 09:38:32 +0100 (CET) Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 29FA9228EA for ; Sat, 23 Jan 2021 09:38:28 +0100 (CET) Received: by mail-wm1-x334.google.com with SMTP id c127so6331859wmf.5 for ; Sat, 23 Jan 2021 00:38:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=odiso-com.20150623.gappssmtp.com; s=20150623; h=message-id:subject:from:to:date:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=4Qnkrk2AYgaMn6oWq02jVKg99MDsffb/ewAYAWx4MpA=; b=H/KIk4hPxOL0D8oF/6TpdZ0ORKLCchpeL6mz+b5YKxEkzBFiLzl+5Gbr5yXHjCGkXe WQy88N/dA5HLIYi4PtM5jvqHkDJwS/vSgF2BY8o6FMsR+bJmjogJK9IQLAkSGmVJmZCp W22g5+FKozsn++iqH3BABKXMdcelYXnUzOICaggdmu3lwxQvxApKBnoiZ/AtmF5ShdFe gBN61XqA53afJI2WOiOtvLbOlK28zE7XMdfIt4DThr9a5IZ35rlyVGUZ6qis6pw3GMbw /14cTi4mon1ClqnmYGok1Lj2V3Uh7qrgbYa1EBmJLY6zG0xC+CQ+yJZPTvFAaABKPDaC 71uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=4Qnkrk2AYgaMn6oWq02jVKg99MDsffb/ewAYAWx4MpA=; b=E5KKuemcABeQM6qMLjH1+cjtXc7AV/BqV/1O4I94UnvQ9wDaCY9+LtzZ6E+4Zyvola y/ph/645U+MnKkmFVnyBhPe0KeC7x586HiUOopzrc9Zn96dvwZtsvG9dI/bxhTRgITCX 5L5eik0bdHLaaYSA3WsJI1phlmRou/DGErEjclY9suCsV1UO4jaj8wgU0lj/le18Lg9f aYHiOKSW3qvYAE/riilcJYQXywNj6EjUeQn3pqYO4/DqLus0HNKHL3eqX0P4NC3POu9V L3ISqx8x/KablozK4tk30Mr0udwZXnlcvp1BIpE1MfKQuZAWDWg/LMoRyPh937F1g+eC +C5Q== X-Gm-Message-State: AOAM531sXL25yFEjgU1tkJWhhnilil4oBgZDcTqWkrqP5Az2UNxnmfGF UCX609fJfCROgMrIkzNBR1lsMn7DB/4gSa0+ X-Google-Smtp-Source: ABdhPJzXKotQK2S6x71EJbpyzbjViQMlf5MdRNssudqOqVXdo6+hB6EIKstmnGsKUFI+5ApShjWUDg== X-Received: by 2002:a1c:9e4b:: with SMTP id h72mr7466690wme.116.1611391101466; Sat, 23 Jan 2021 00:38:21 -0800 (PST) Received: from ?IPv6:2a0a:1580:0:1::100c? (ovpn1.odiso.net. [2a0a:1580:2000::3f]) by smtp.gmail.com with ESMTPSA id q2sm13953719wma.6.2021.01.23.00.38.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Jan 2021 00:38:20 -0800 (PST) Message-ID: <8cdf4d6536899de1c6a6a43ff7fa21e28ac87331.camel@odiso.com> From: aderumier@odiso.com To: pve-devel Date: Sat, 23 Jan 2021 09:38:16 +0100 In-Reply-To: <50ed1cad64907f845b0b545fdebf3af8ede41c7b.camel@odiso.com> References: <50ed1cad64907f845b0b545fdebf3af8ede41c7b.camel@odiso.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.179 Adjusted score from AWL reputation of From: address DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] qemu live migration: bigger downtime recently X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jan 2021 08:38:33 -0000 about qemu version, theses vms was started around 6 november, after an update of the qemu package the 4 november. looking at proxmox repo, I think it should be 5.1.0-4 or -5. pve-qemu-kvm-dbg_5.1.0-4_amd64.deb 29-Oct-2020 17:28 75705544 pve-qemu-kvm-dbg_5.1.0-5_amd64.deb 04-Nov-2020 17:41 75737556 pve-qemu-kvm-dbg_5.1.0-6_amd64.deb 05-Nov-2020 18:08 75693264 Could it be a known bug introduced by new backups dirty-bitmap patches, and fixed later ? (I see a -6 version one day later) Le vendredi 22 janvier 2021 à 19:55 +0100, aderumier@odiso.com a écrit : > after some debug, it seem that it's hanging on  > > $stat = mon_cmd($vmid, "query-migrate"); > > > > > result of info migrate after the end of a migration: > > # info migrate > globals: > store-global-state: on > only-migratable: off > send-configuration: on > send-section-footer: on > decompress-error-check: on > clear-bitmap-shift: 18 > Migration status: completed > total time: 9671 ms > downtime: 9595 ms > setup: 74 ms > transferred ram: 10445790 kbytes > throughput: 8916.93 mbps > remaining ram: 0 kbytes > total ram: 12600392 kbytes > duplicate: 544936 pages > skipped: 0 pages > normal: 2605162 pages > normal bytes: 10420648 kbytes > dirty sync count: 2 > page size: 4 kbytes > multifd bytes: 0 kbytes > pages-per-second: 296540 > cache size: 2147483648 bytes > xbzrle transferred: 0 kbytes > xbzrle pages: 0 pages > xbzrle cache miss: 0 pages > xbzrle cache miss rate: 0.00 > xbzrle encoding rate: 0.00 > xbzrle overflow: 0 > > > > > Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a > écrit : > > I have tried to add a log to display the current status state of > > the > > migration, > > and It don't catch any "active" state, but "completed" directly. > > > > Here another sample with a bigger downtime of 14s (real downtime, I > > have checked with a ping to be sure) > > > > > > > > 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13' > > (10.3.94.70) > > 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13' > > 2021-01-22 16:02:55 start remote tunnel > > 2021-01-22 16:02:56 ssh tunnel ver 1 > > 2021-01-22 16:02:56 starting online/live migration on > > tcp:10.3.94.70:60000 > > 2021-01-22 16:02:56 set migration_caps > > 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s > > 2021-01-22 16:02:56 migration downtime limit: 100 ms > > 2021-01-22 16:02:56 migration cachesize: 2147483648 B > > 2021-01-22 16:02:56 set migration parameters > > 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000 > > > > > > > > 2021-01-22 16:03:11 status: completed ---> added log > > 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 > > ms > > 2021-01-22 16:03:11 migration status: completed > > 2021-01-22 16:03:14 migration finished successfully (duration > > 00:00:21) > > TASK OK > > > > > > > >     my $merr = $@; > >     $self->log('info', "migrate uri => $ruri failed: $merr") if > > $merr; > > > >     my $lstat = 0; > >     my $usleep = 1000000; > >     my $i = 0; > >     my $err_count = 0; > >     my $lastrem = undef; > >     my $downtimecounter = 0; > >     while (1) { > >         $i++; > >         my $avglstat = $lstat ? $lstat / $i : 0; > > > >         usleep($usleep); > >         my $stat; > >         eval { > >             $stat = mon_cmd($vmid, "query-migrate"); > >         }; > >         if (my $err = $@) { > >             $err_count++; > >             warn "query migrate failed: $err\n"; > >             $self->log('info', "query migrate failed: $err"); > >             if ($err_count <= 5) { > >                 usleep(1000000); > >                 next; > >             } > >             die "too many query migrate failures - aborting\n"; > >         } > > > >         $self->log('info', "status: $stat->{status}");   ---> added > > log > > > > > > Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a > > écrit : > > > Hi, > > > > > > I have notice recently bigger downtime on qemu live migration. > > > (I'm not sure if it's after qemu update or qemu-server update) > > > > > > migration: type=insecure > > > > > >  qemu-server                          6.3-2  > > >  pve-qemu-kvm                         5.1.0-7   > > > > > > (I'm not sure about the machine running qemu version) > > > > > > > > > > > > Here a sample: > > > > > > > > > > > > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13' > > > (10.3.94.70) > > > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13' > > > 2021-01-22 15:28:44 start remote tunnel > > > 2021-01-22 15:28:45 ssh tunnel ver 1 > > > 2021-01-22 15:28:45 starting online/live migration on > > > tcp:10.3.94.70:60000 > > > 2021-01-22 15:28:45 set migration_caps > > > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s > > > 2021-01-22 15:28:45 migration downtime limit: 100 ms > > > 2021-01-22 15:28:45 migration cachesize: 268435456 B > > > 2021-01-22 15:28:45 set migration parameters > > > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000 > > > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117 > > > ms > > > 2021-01-22 15:28:47 migration status: completed > > > 2021-01-22 15:28:51 migration finished successfully (duration > > > 00:00:13) > > > TASK OK > > > > > > That's strange because I don't see the memory transfert loop logs > > > > > > > > > > > > Migrate back to original host is working > > > > > > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2' > > > (::ffff:10.3.94.50) > > > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2' > > > 2021-01-22 15:29:39 start remote tunnel > > > 2021-01-22 15:29:40 ssh tunnel ver 1 > > > 2021-01-22 15:29:40 starting online/live migration on > > > tcp:[::ffff:10.3.94.50]:60000 > > > 2021-01-22 15:29:40 set migration_caps > > > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s > > > 2021-01-22 15:29:40 migration downtime limit: 100 ms > > > 2021-01-22 15:29:40 migration cachesize: 268435456 B > > > 2021-01-22 15:29:40 set migration parameters > > > 2021-01-22 15:29:40 start migrate command to > > > tcp:[::ffff:10.3.94.50]:60000 > > > 2021-01-22 15:29:41 migration status: active (transferred > > > 396107554, > > > remaining 1732018176), total 2165383168) > > > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:29:42 migration status: active (transferred > > > 973010921, > > > remaining 1089216512), total 2165383168) > > > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:29:43 migration status: active (transferred > > > 1511925476, > > > remaining 483463168), total 2165383168) > > > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 > > > ms > > > 2021-01-22 15:29:44 migration status: completed > > > 2021-01-22 15:29:47 migration finished successfully (duration > > > 00:00:13) > > > TASK OK > > > > > > > > > Then migrate it again like the first migration is working too > > > > > > > > > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13' > > > (10.3.94.70) > > > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13' > > > 2021-01-22 15:31:12 start remote tunnel > > > 2021-01-22 15:31:13 ssh tunnel ver 1 > > > 2021-01-22 15:31:13 starting online/live migration on > > > tcp:10.3.94.70:60000 > > > 2021-01-22 15:31:13 set migration_caps > > > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s > > > 2021-01-22 15:31:13 migration downtime limit: 100 ms > > > 2021-01-22 15:31:13 migration cachesize: 268435456 B > > > 2021-01-22 15:31:13 set migration parameters > > > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000 > > > 2021-01-22 15:31:14 migration status: active (transferred > > > 1092088188, > > > remaining 944365568), total 2165383168) > > > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456 > > > transferred > > > 0 > > > pages 0 cachemiss 0 overflow 0 > > > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 > > > ms > > > 2021-01-22 15:31:15 migration status: completed > > > 2021-01-22 15:31:19 migration finished successfully (duration > > > 00:00:12) > > > TASK OK > > > > > > > > > Any idea ? Maybe a specific qemu version bug ? > > > > > > > > > > > > > > > > > >