From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id AFE059CE97 for ; Fri, 2 Jun 2023 09:28:58 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 94871263EA for ; Fri, 2 Jun 2023 09:28:58 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 2 Jun 2023 09:28:57 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id B160348303; Fri, 2 Jun 2023 09:28:56 +0200 (CEST) Message-ID: Date: Fri, 2 Jun 2023 09:28:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Content-Language: en-US To: "DERUMIER, Alexandre" , "pve-devel@lists.proxmox.com" , "aderumier@odiso.com" References: <20230522102528.186955-1-aderumier@odiso.com> <4d8191f2-4954-1e4f-a40c-51544289b2ce@proxmox.com> <036ad8c33f6af74da89eb8b9c24c1c6cda8fc938.camel@groupe-cyllene.com> <971898fbd097f9a6817a36dfedf6eae6477339bf.camel@groupe-cyllene.com> From: Fiona Ebner In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.002 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.1 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [nongnu.org, proxmox.com] Subject: Re: [pve-devel] [PATCH-SERIES v3 qemu-server/manager/common] add and set x86-64-v2 as default model for new vms and detect best cpumodel X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2023 07:28:58 -0000 Am 01.06.23 um 23:15 schrieb DERUMIER, Alexandre: > Hi, > I found an interesting thread on the forum about kvm_pv_unhalt > > https://forum.proxmox.com/threads/live-migration-between-intel-xeon-and-amd-epyc2-linux-guests.68663/ > > > Sounds good. Please also take a look at the default flag > "kvm_pv_unhalt". As I mentioned, it would cause a kernel crash in > paravirtualized unhalt code sooner or later in a migrated VM (started > on Intel, migrated to AMD). > > Please note that according to our tests simply leaving the CPU type > empty in the GUI (leading to the qemu command line argument of -cpu > kvm64,+sep,+lahf_lm,+kvm_pv_unhalt,+kvm_pv_eoi,enforce), while > seemingly working at first, will after some load and idle time in the > VM result in a crash involving kvm_kick_cpu function somewhere inside > of the paravirtualized halt/unhalt code. Linux kernels tested ranged > from Debian's 4.9.210-1 to Ubuntu's 5.3.0-46 (and some in between). > Therefore the Proxmox default seems to be unsafe and apparently the > very minimum working command line probably would be args: -cpu > kvm64,+sep,+lahf_lm,+kvm_pv_eoi. > > > > > So,it sound like it's crash if it's defined with a cpu vendor not > matching the real hardware ? > > as it's break migration between intel && amd, maybe we shouldn't add > it to the new x86-64-vx model ? > > > > a discussion on qemu-devel mailing is talking about performance > with/witout it > https://lists.nongnu.org/archive/html/qemu-devel/2017-10/msg01816.html > > and it's seem to help when you have a lot of cores/numa nodes in guest, > but can slowdown small vms. > Note that migration between CPUs of different vendors is not a supported use case (it will always depend on specific models, kernel versions, etc.), so we can only justify not adding it to the new default model if it doesn't make life worse for everybody else. And I'd be a bit careful to jump to general conclusions just from one forum post. It seems like you were the one adding the flag ;) https://git.proxmox.com/?p=qemu-server.git;a=commitdiff;h=117a041466b3af8368506ae3ab7b8d26fc07d9b7 and the LWN-archived mail linked in the commit message says > Ticket locks have an inherent problem in a virtualized case, because > the vCPUs are scheduled rather than running concurrently (ignoring > gang scheduled vCPUs). This can result in catastrophic performance > collapses when the vCPU scheduler doesn't schedule the correct "next" > vCPU, and ends up scheduling a vCPU which burns its entire timeslice > spinning. (Note that this is not the same problem as lock-holder > preemption, which this series also addresses; that's also a problem, > but not catastrophic). "catastrophic performance collapses" doesn't sound very promising :/ But if we find that kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+sep,+lahf_lm,+popcnt,+sse4.1,+sse4.2,+ssse3 causes issues (even if not cross-vendor live-migrating) with the +kvm_pv_unhalt flag, but not without, it would be a much more convincing reason against adding the flag for the new default.