From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id AFCAB957A for ; Mon, 4 Sep 2023 11:39:55 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 96915CE8F for ; Mon, 4 Sep 2023 11:39:25 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 4 Sep 2023 11:39:24 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7FDA140F53 for ; Mon, 4 Sep 2023 11:39:24 +0200 (CEST) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Mon, 4 Sep 2023 11:39:20 +0200 Message-Id: <20230904093920.920781-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.079 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH kernel] cherry-pick fix for KVM vCPU page fault loop X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Sep 2023 09:39:55 -0000 The mailing list thread [0] (found by Friedrich, many thanks!) leading up to this patch sounds very familiar to issues users reported in the community forum [1] and enterprise support channel, where a VM would be stuck for no discernable reason with all vCPU threads spinning. [0]: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com/T/#u [1]: https://forum.proxmox.com/threads/127459/ Suggested-by: Friedrich Weber Signed-off-by: Fiona Ebner --- ...an-sign-extension-bug-with-mmu_seq-t.patch | 75 +++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch diff --git a/patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch b/patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch new file mode 100644 index 0000000..18c268e --- /dev/null +++ b/patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch @@ -0,0 +1,75 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Sean Christopherson +Date: Wed, 23 Aug 2023 18:01:04 -0700 +Subject: [PATCH] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that + hangs vCPUs +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in +kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring +how KVM tracks the sequence counter snapshot. + +Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int" +when checking to see if a page fault is stale, as the sequence count is +stored as an "unsigned long" everywhere else in KVM. This fixes a bug +where KVM will effectively hang vCPUs due to always thinking page faults +are stale, which results in KVM refusing to "fix" faults. + +mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when +KVM is handling page faults to detect if userspace mappings relevant to +the guest were invalidated between snapshotting the counter and acquiring +mmu_lock, i.e. to ensure that the userspace mapping KVM is using to +resolve the page fault is fresh. If KVM sees that the counter has +changed, KVM simply resumes the guest without fixing the fault. + +What _should_ happen is that the source of the mmu_notifier invalidations +eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once +again fix guest page fault(s). + +But for a long-lived VM and/or a VM that the host just doesn't particularly +like, it's possible for a VM to be on the receiving end of 2 billion (with +a B) mmu_notifier invalidations. When that happens, bit 31 will be set in +mmu_invalidate_seq. This causes the value to be turned into a 32-bit +negative value when implicitly cast to an "int" by is_page_fault_stale(), +and then sign-extended into a 64-bit unsigned when the signed "int" is +implicitly cast back to an "unsigned long" on the call to +mmu_invalidate_retry_hva(). + +As a result of the casting and sign-extension, given a sequence counter of +e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing + + if (0x8002dc25 != 0xffffffff8002dc25) + +and signals that the page fault is stale and needs to be retried even +though the sequence counter is stable, and KVM effectively hangs any vCPU +that takes a page fault (EPT violation or #NPF when TDP is enabled). + +Reported-by: Brian Rak +Reported-by: Amaan Cheval +Reported-by: Eric Wheeler +Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com +Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update") +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +(cherry-picked from commit 82d811ff566594de3676f35808e8a9e19c5c864c in stable v6.1.51) +Signed-off-by: Fiona Ebner +--- + arch/x86/kvm/mmu/mmu.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c +index 3220c1285984..c42ba5cde7a4 100644 +--- a/arch/x86/kvm/mmu/mmu.c ++++ b/arch/x86/kvm/mmu/mmu.c +@@ -4261,7 +4261,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) + * root was invalidated by a memslot update or a relevant mmu_notifier fired. + */ + static bool is_page_fault_stale(struct kvm_vcpu *vcpu, +- struct kvm_page_fault *fault, int mmu_seq) ++ struct kvm_page_fault *fault, ++ unsigned long mmu_seq) + { + struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa); + -- 2.39.2