From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aderumier@odiso.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 5CCD56A5B5
 for <pve-devel@pve.proxmox.com>; Fri, 22 Jan 2021 19:55:31 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 54B821F153
 for <pve-devel@pve.proxmox.com>; Fri, 22 Jan 2021 19:55:31 +0100 (CET)
Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com
 [IPv6:2a00:1450:4864:20::32e])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id CC9031F148
 for <pve-devel@pve.proxmox.com>; Fri, 22 Jan 2021 19:55:29 +0100 (CET)
Received: by mail-wm1-x32e.google.com with SMTP id j18so5182474wmi.3
 for <pve-devel@pve.proxmox.com>; Fri, 22 Jan 2021 10:55:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=odiso-com.20150623.gappssmtp.com; s=20150623;
 h=message-id:subject:from:to:date:in-reply-to:references:user-agent
 :mime-version:content-transfer-encoding;
 bh=tAKLzNuGeXl2C2X6mYi+68Q6e51EGD1SCl1BY7RGp3U=;
 b=13+PUrj5mcZ7wfAga/lybL4m97iPnNkSTHq97df/snlsJWHfNW7bGjS/ebledcIGxK
 KbXRFfwy0zqCI7eOD1D6vMGHgGxs3UeAkLDEnPpooH9Aue1SsjSve+rPtvUMg/y15z1p
 cqmPmQQI+OUDXo9N+y8Si2vuG0CXwWIh1B39gv5HCI6bi69tPK36oWITwXT0a+MYm3SL
 deZieEIEQVz9L90rG2Hizqi18P/UCxMoz9tSo1KPdOqtaqImjfn7/0BHkdLjKvNiUsiX
 zHso6Px+L+31rRs5ww3WkZqtabJ/e+KO3gLLKyekTOCS+P5XUK4TWJMx3Wlm+Sa8CpIL
 1Eqg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to
 :references:user-agent:mime-version:content-transfer-encoding;
 bh=tAKLzNuGeXl2C2X6mYi+68Q6e51EGD1SCl1BY7RGp3U=;
 b=jwA7SDP1VDbiDc/QQ/uIHIcqqua/Xw+8KKiFCsLlWQQrCTCI+G5dlgxWelR4jDg5VV
 7mDFmcPMxb19k0wVcD1KyL8ksp5jA7dj+3MeHhkUrqPcWwHB/8i+F/3OLy7GUOiCRPsJ
 y5OXqBBp2H1M4Z4CJ1/q+6i4VE6zU/Ijr7YkiMI/BK2IgRumYndwa/uLyLjONVCyVEjS
 FcM1aHRQ4DvGGQ/F4WhK/5LH8zIaeWBthetxifKshEAih9ylPKG3gXt8xbXrWjB3z9n4
 arVik6SX/VFq/FHuKZ3GN64H2p9AZfivjFRi/lh8xKcuNJ6vibjIpo0bjjt4GZ8fr2Qp
 4CEg==
X-Gm-Message-State: AOAM531kig/EKMwmwC0zAMIxzhHo2M8AZox9wN4XNOYT1oyq+0hdEl5I
 3volO7J20BizB/Kgm13r17ngh7bqLPU9yFHn
X-Google-Smtp-Source: ABdhPJzFxWX7wYgUnrnkiBPQClAKgGzE3X+CllsdTBl3KQaTdIkuVDhAJDrBursc72re7XJRI81TQQ==
X-Received: by 2002:a1c:4908:: with SMTP id w8mr5126914wma.160.1611341723122; 
 Fri, 22 Jan 2021 10:55:23 -0800 (PST)
Received: from ?IPv6:2a0a:1580:0:1::100c? (ovpn1.odiso.net.
 [2a0a:1580:2000::3f])
 by smtp.gmail.com with ESMTPSA id s1sm13054335wrv.97.2021.01.22.10.55.21
 for <pve-devel@pve.proxmox.com>
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 22 Jan 2021 10:55:22 -0800 (PST)
Message-ID: <50ed1cad64907f845b0b545fdebf3af8ede41c7b.camel@odiso.com>
From: aderumier@odiso.com
To: pve-devel <pve-devel@pve.proxmox.com>
Date: Fri, 22 Jan 2021 19:55:19 +0100
In-Reply-To: <ca2fa17ffd670a60e72071ff401139c104d1e854.camel@odiso.com>
References: <b08a69e25cdfe7615bd192ab07b169a0100fadeb.camel@odiso.com>
 <ca2fa17ffd670a60e72071ff401139c104d1e854.camel@odiso.com>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.38.3 
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.190 Adjusted score from AWL reputation of From: address
 DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
 DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature
 RCVD_IN_DNSWL_NONE     -0.0001 Sender listed at https://www.dnswl.org/,
 no trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] qemu live migration: bigger downtime recently
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 22 Jan 2021 18:55:31 -0000

after some debug, it seem that it's hanging on 

$stat = mon_cmd($vmid, "query-migrate");




result of info migrate after the end of a migration:

# info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 9671 ms
downtime: 9595 ms
setup: 74 ms
transferred ram: 10445790 kbytes
throughput: 8916.93 mbps
remaining ram: 0 kbytes
total ram: 12600392 kbytes
duplicate: 544936 pages
skipped: 0 pages
normal: 2605162 pages
normal bytes: 10420648 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 296540
cache size: 2147483648 bytes
xbzrle transferred: 0 kbytes
xbzrle pages: 0 pages
xbzrle cache miss: 0 pages
xbzrle cache miss rate: 0.00
xbzrle encoding rate: 0.00
xbzrle overflow: 0




Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier@odiso.com a
écrit :
> I have tried to add a log to display the current status state of the
> migration,
> and It don't catch any "active" state, but "completed" directly.
> 
> Here another sample with a bigger downtime of 14s (real downtime, I
> have checked with a ping to be sure)
> 
> 
> 
> 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> 2021-01-22 16:02:55 start remote tunnel
> 2021-01-22 16:02:56 ssh tunnel ver 1
> 2021-01-22 16:02:56 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 16:02:56 set migration_caps
> 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> 2021-01-22 16:02:56 migration downtime limit: 100 ms
> 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> 2021-01-22 16:02:56 set migration parameters
> 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
> 
> 
> 
> 2021-01-22 16:03:11 status: completed ---> added log
> 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms
> 2021-01-22 16:03:11 migration status: completed
> 2021-01-22 16:03:14 migration finished successfully (duration
> 00:00:21)
> TASK OK
> 
> 
> 
>     my $merr = $@;
>     $self->log('info', "migrate uri => $ruri failed: $merr") if
> $merr;
> 
>     my $lstat = 0;
>     my $usleep = 1000000;
>     my $i = 0;
>     my $err_count = 0;
>     my $lastrem = undef;
>     my $downtimecounter = 0;
>     while (1) {
>         $i++;
>         my $avglstat = $lstat ? $lstat / $i : 0;
> 
>         usleep($usleep);
>         my $stat;
>         eval {
>             $stat = mon_cmd($vmid, "query-migrate");
>         };
>         if (my $err = $@) {
>             $err_count++;
>             warn "query migrate failed: $err\n";
>             $self->log('info', "query migrate failed: $err");
>             if ($err_count <= 5) {
>                 usleep(1000000);
>                 next;
>             }
>             die "too many query migrate failures - aborting\n";
>         }
> 
>         $self->log('info', "status: $stat->{status}");   ---> added
> log
> 
> 
> Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier@odiso.com a
> écrit :
> > Hi,
> > 
> > I have notice recently bigger downtime on qemu live migration.
> > (I'm not sure if it's after qemu update or qemu-server update)
> > 
> > migration: type=insecure
> > 
> >  qemu-server                          6.3-2  
> >  pve-qemu-kvm                         5.1.0-7   
> > 
> > (I'm not sure about the machine running qemu version)
> > 
> > 
> > 
> > Here a sample:
> > 
> > 
> > 
> > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:28:44 start remote tunnel
> > 2021-01-22 15:28:45 ssh tunnel ver 1
> > 2021-01-22 15:28:45 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:28:45 set migration_caps
> > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > 2021-01-22 15:28:45 set migration parameters
> > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > ms
> > 2021-01-22 15:28:47 migration status: completed
> > 2021-01-22 15:28:51 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> > 
> > That's strange because I don't see the memory transfert loop logs
> > 
> > 
> > 
> > Migrate back to original host is working
> > 
> > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > (::ffff:10.3.94.50)
> > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > 2021-01-22 15:29:39 start remote tunnel
> > 2021-01-22 15:29:40 ssh tunnel ver 1
> > 2021-01-22 15:29:40 starting online/live migration on
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:40 set migration_caps
> > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > 2021-01-22 15:29:40 set migration parameters
> > 2021-01-22 15:29:40 start migrate command to
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:41 migration status: active (transferred
> > 396107554,
> > remaining 1732018176), total 2165383168)
> > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:42 migration status: active (transferred
> > 973010921,
> > remaining 1089216512), total 2165383168)
> > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:43 migration status: active (transferred
> > 1511925476,
> > remaining 483463168), total 2165383168)
> > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
> > 2021-01-22 15:29:44 migration status: completed
> > 2021-01-22 15:29:47 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> > 
> > 
> > Then migrate it again like the first migration is working too
> > 
> > 
> > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:31:12 start remote tunnel
> > 2021-01-22 15:31:13 ssh tunnel ver 1
> > 2021-01-22 15:31:13 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:31:13 set migration_caps
> > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > 2021-01-22 15:31:13 set migration parameters
> > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:31:14 migration status: active (transferred
> > 1092088188,
> > remaining 944365568), total 2165383168)
> > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
> > 2021-01-22 15:31:15 migration status: completed
> > 2021-01-22 15:31:19 migration finished successfully (duration
> > 00:00:12)
> > TASK OK
> > 
> > 
> > Any idea ? Maybe a specific qemu version bug ?
> > 
> > 
> > 
> > 
> 
>