From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 4F8948D7C2 for ; Tue, 8 Nov 2022 23:51:33 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 3A4CAC6CA for ; Tue, 8 Nov 2022 23:51:33 +0100 (CET) Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 8 Nov 2022 23:51:31 +0100 (CET) Received: by mail-ej1-x633.google.com with SMTP id k2so42391348ejr.2 for ; Tue, 08 Nov 2022 14:51:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qxCqZhHpHvtXQADIJ0HBX3sPJ6iUlaPv46R1r+y6+xQ=; b=C0RTHjO4GdRxlShDCTkLGyd1ZPGJln3dqCeDVPFUIdsbaMaUiWhgJgDBGRL7VClFEa K6HCPZ9Ko/AImga1qkEw7O2mac3Xm2OWsfWGFg4QQSC0uS2uJV4C1UkDNabRmqrYgKHJ WUMolPLsma2Xdl8zqzJzqYQg3wAhIdtmxXefIR4hDuw2v2lLjUgcZRlJujqszIfaN59m UQuzi+n3F9+Qp9iGz4Ypzrsl7YPbFydMK1xfs3roT6O87heUy715fsim+3W6nIuZTrUS MPABHPQ2ZqfodvdcRyods5zlhjGmQXd098qyj82sJ2a7e8DMzd67551B2CR6q+r7F4D8 tRCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qxCqZhHpHvtXQADIJ0HBX3sPJ6iUlaPv46R1r+y6+xQ=; b=hufDgKDLpnbiDC4JEoBAEUB+MCVX/gP4o2nG9sZ+ShRWSVqovRnhVLnusT78nu9sFW N13My+9ClZMDgstHVQfCA7klHruOqyfjlcN53bUxr37qiuDnMPSEFw88wI8sloUkWpnl 8nsAUCbViYuWCfnHZ8htG+pnyW5bwtcavmFv2Sw2uUkMXOT8oDTPY3/LQ6UgYqNiwQQk 2v3n50Z5D3YmVzbfPw8arqEXxs7Kp41NBGubMpeU1HgKaWgjXbYCAXw9BXGNC5m2bLZM SwmadCUFaSeeS5l/zRUtLbgco/0YGMLPC0RFdBw+UpePOuUBvcKVm8FyPfsj1yQQdAKg NnOQ== X-Gm-Message-State: ACrzQf1lcrtEgCGDWGkJjYBi90+GGXkcR4Tc0yRJIi6BGd4+Jo6zmz9+ YWN+BZlfl7bNfjPxn/PCSzxV1EgFrmiBVlO7IqJlda7D X-Google-Smtp-Source: AMsMyM6ioyBrBXTt/BsXkrSSViP2eprnGjJMLo6ngNcPnUKucpjFOYTSXn0OjAX9uTBMhvKFc/ikoAs+k8M1JP4a7YE= X-Received: by 2002:a17:907:2bf9:b0:7a4:bbce:dd98 with SMTP id gv57-20020a1709072bf900b007a4bbcedd98mr56350922ejc.669.1667947884051; Tue, 08 Nov 2022 14:51:24 -0800 (PST) MIME-Version: 1.0 References: <1378480941-319@kerio.tuxis.nl> <7827641E-40A1-4E5D-8EDF-4E37BA2BD5AB@volny.cz> <67ED424D-3B01-400A-B71E-40A19B9EE160@volny.cz> In-Reply-To: <67ED424D-3B01-400A-B71E-40A19B9EE160@volny.cz> From: Kyle Schmitt Date: Tue, 8 Nov 2022 16:51:12 -0600 Message-ID: To: Proxmox VE user list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider KAM_ASCII_DIVIDERS 0.8 Spam that uses ascii formatting tricks RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, binovo.es] Subject: Re: [PVE-User] VMs hung after live migration - Intel CPU X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2022 22:51:33 -0000 It's been quite a long time since I've done it, but for what it's worth, I never had problems live migrating KVM machines to hosts with other processors, **as long as it wasn't launched using a processor specific extension**. Get the exact options kvm is running on both hosts, and compare. In openstack there's a tendency to auto-detect processor features and launch with all available, so when I had a cluster of mixed epyc generations, I had to declare features instead of letting it autodetect (previous job, over a year ago, so details sketchy). My guess is some auto-detection gone wrong. My home-cluster is homogeneous cast-off R610s, otherwise I'd test this myself Sorry. --Kyle On Tue, Nov 8, 2022 at 2:57 PM Jan Vlach wrote: > > Hi Eneko, > > thank you a million for taking your time to re-test this! It really helps= me to understand what to expect that works and what doesn=E2=80=99t. I had= a glimpse of an idea to create cluster with mixed CPUs of EPYC gen1 and EP= YC gen3, but this really seems like a road to hell(tm). So I=E2=80=99ll kee= p the clusters homogenous with the same gen of CPU. I have two sites, but f= ortunately, I can keep the clusters homogenous (with one having =E2=80=9Cmo= re power=E2=80=9D). > > Honestly, up until now, I thought I could abstract from the version of li= nux kernel I=E2=80=99m running. Because, hey, it=E2=80=99s all KVM. I=E2= =80=99m setting my VMs with cpu type host to have the benefit of accelerate= d AES and other instructions, but I have yet to see if EPYCv1 is compatible= with EPYCv3. (v being gen) Thanks for teaching me a new trick or a thing t= o be aware of at least! (I remember this to be an issue with VMware heterog= enous clusters (with cpus of different generations), but I really though KV= M64 would help you to abstract from all this, KVM64 being Pentium4-era CPU) > > Do you use virtio drivers for storage and network card at all? Can you se= e a pattern there where the 3 Debian/Windows machines were not affected? Di= d they use virtio or not? > > I really don=E2=80=99t see a reason why the migration back from 5.13 -> 5= .19 should bring that 50/100% CPU load and hanging. I=E2=80=99ve had some p= hantom load before with having =E2=80=9CUse tablet for pointer: Yes=E2=80= =9D before, but that was in the 5% ballpark per VM. > > I=E2=80=99m just a fellow proxmox admin/user. Hope this would ring a bell= or spark interest in the core proxmox team. I=E2=80=99ve had struggles wit= h 5.15 before with GPU passthrough (wasn=E2=80=99t able to do this) and Ope= nBSD vm=E2=80=99s taking minutes compared to tens of seconds to boot on 5.1= 5 before. > > All and all, thanks for all the hints I could test before production, do = it won=E2=80=99t hurt =E2=80=9Cdown the road=E2=80=9D =E2=80=A6 > > JV > P.S. i=E2=80=99m trying to push my boss towards a commercial subscription= for our clusters, but at this point I really am no sure it would help ... > > > > On 8. 11. 2022, at 18:18, Eneko Lacunza via pve-user wrote: > > > > > > From: Eneko Lacunza > > Subject: Re: [PVE-User] VMs hung after live migration - Intel CPU > > Date: 8 November 2022 18:18:44 CET > > To: pve-user@lists.proxmox.com > > > > > > Hi Jan, > > > > I had some time to re-test this. > > > > I tried live migration with KVM64 CPU between 2 nodes: > > > > node-ryzen1700 - kernel 5.19.7-1-pve > > node-ryzen5900x - kernel 5.19.7-1-pve > > > > I bulk-migrated 9 VMs (8 Debian 9/10/11 and 1 Windows 2008r2). > > This works OK in both directions. > > > > Then I downgraded a node to 5.13: > > node-ryzen1700 - kernel 5.19.7-1-pve > > node-ryzen5900x - kernel 5.13.19-6-pve > > > > Migration of those 9 VMs worked well from node-ryzen1700 -> node->ryzen= 5900x > > > > But migration of those 9 VMs back node->ryzen5900x -> node-ryzen1700 wa= s a disaster: all 8 debian VMs hung with 50/100% CPU use. Window 2008r2 see= ms not affected by the issue at all. > > > > 3 other Debian/Windows VMs on node-ryzen1700 were not affected. > > > > After migrating both nodes to kernel 5.13: > > > > node-ryzen1700 - kernel 5.13.19-6-pve > > node-ryzen5900x - kernel 5.13.19-6-pve > > > > Migration of those 9 VMs node->ryzen5900x -> node-ryzen1700 works as in= tended :) > > > > Cheers > > > > > > > > El 8/11/22 a las 9:40, Eneko Lacunza via pve-user escribi=C3=B3: > >> Hi Jan, > >> > >> Yes, there's no issue if CPUs are the same. > >> > >> VMs hang when CPUs are of different enough generation, even being of t= he same brand and using KVM64 vCPU. > >> > >> El 7/11/22 a las 22:59, Jan Vlach escribi=C3=B3: > >>> Hi, > >>> > >>> For what=E2=80=99s it worth, live VM migration with Linux VMs with va= rious debian versions work here just fine. I=E2=80=99m using virtio for net= working and virtio scsi for disks. (The only version where I had problems w= as debian6 where the kernel does not support virtio scsi and megaraid sas 8= 708EM2 needs to be used. I get kernel panic in mpt_sas on thaw after migrat= ion.) > >>> > >>> We're running 5.15.60-1-pve on three node cluster with AMD EPYC 7551P= 32-Core Processor. These are supermicros with latest bios (latest microcod= e?) and BMC > >>> > >>> Storage is local ZFS pool, backed by SSDS in striped mirrors (4 devic= es on each node). Migration has dedicated 2x 10GigE LACP and dedicated VLAN= on switch stack. > >>> > >>> I have more nodes with EPYC3/Milan on the way, so I=E2=80=99ll test t= hose later as well. > >>> > >>> What does your cluster look hardware-wise? What are the problems you = experienced with VM migratio on 5.13->5.19? > >>> > >>> Thanks, > >>> JV > > > > Eneko Lacunza > > Zuzendari teknikoa | Director t=C3=A9cnico > > Binovo IT Human Project > > > > Tel. +34 943 569 206 |https://www.binovo.es > > Astigarragako Bidea, 2 - 2=C2=BA izda. Oficina 10-11, 20180 Oiartzun > > > > https://www.youtube.com/user/CANALBINOVO > > https://www.linkedin.com/company/37269706/ > > > > > > _______________________________________________ > > pve-user mailing list > > pve-user@lists.proxmox.com > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user