From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id AFF4291E4 for ; Thu, 24 Aug 2023 16:31:13 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 928A934182 for ; Thu, 24 Aug 2023 16:30:43 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 24 Aug 2023 16:30:42 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 63F8D43C68 for ; Thu, 24 Aug 2023 16:30:42 +0200 (CEST) From: Stoiko Ivanov To: pve-devel@lists.proxmox.com Date: Thu, 24 Aug 2023 16:30:21 +0200 Message-Id: <20230824143021.2440581-3-s.ivanov@proxmox.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230824143021.2440581-1-s.ivanov@proxmox.com> References: <20230824143021.2440581-1-s.ivanov@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.090 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH pve-kernel 2/2] cherry-pick fix for uefi guests hanging upon guest-initialized reboot X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Aug 2023 14:31:13 -0000 This was identified as a potential fix for an issue we analyzed in our Enterprise support, where guests would hang before the boot-loader after being rebooted from within the guest (after applying updates for RHEL 8). https://lore.kernel.org/lkml/20230608090348.414990-1-gshan@redhat.com/ Suggested-by: Stefan Hanreich Signed-off-by: Stoiko Ivanov --- ...l-stage2-mapping-on-invalid-memory-s.patch | 122 ++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch diff --git a/patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch b/patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch new file mode 100644 index 000000000000..d50aab8e4d7c --- /dev/null +++ b/patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch @@ -0,0 +1,122 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Gavin Shan +Date: Thu, 15 Jun 2023 15:42:59 +1000 +Subject: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot + +commit 2230f9e1171a2e9731422a14d1bbc313c0b719d1 upstream. + +We run into guest hang in edk2 firmware when KSM is kept as running on +the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash +device (TYPE_PFLASH_CFI01) during the operation of sector erasing or +buffered write. The status is returned by reading the memory region of +the pflash device and the read request should have been forwarded to QEMU +and emulated by it. Unfortunately, the read request is covered by an +illegal stage2 mapping when the guest hang issue occurs. The read request +is completed with QEMU bypassed and wrong status is fetched. The edk2 +firmware runs into an infinite loop with the wrong status. + +The illegal stage2 mapping is populated due to same page sharing by KSM +at (C) even the associated memory slot has been marked as invalid at (B) +when the memory slot is requested to be deleted. It's notable that the +active and inactive memory slots can't be swapped when we're in the middle +of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count +is elevated, and kvm_swap_active_memslots() will busy loop until it reaches +to zero again. Besides, the swapping from the active to the inactive memory +slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(), +corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots(). + + CPU-A CPU-B + ----- ----- + ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION) + kvm_vm_ioctl_set_memory_region + kvm_set_memory_region + __kvm_set_memory_region + kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE) + kvm_invalidate_memslot + kvm_copy_memslot + kvm_replace_memslot + kvm_swap_active_memslots (A) + kvm_arch_flush_shadow_memslot (B) + same page sharing by KSM + kvm_mmu_notifier_invalidate_range_start + : + kvm_mmu_notifier_change_pte + kvm_handle_hva_range + __kvm_handle_hva_range + kvm_set_spte_gfn (C) + : + kvm_mmu_notifier_invalidate_range_end + +Fix the issue by skipping the invalid memory slot at (C) to avoid the +illegal stage2 mapping so that the read request for the pflash's status +is forwarded to QEMU and emulated by it. In this way, the correct pflash's +status can be returned from QEMU to break the infinite loop in the edk2 +firmware. + +We tried a git-bisect and the first problematic commit is cd4c71835228 (" +KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this, +clean_dcache_guest_page() is called after the memory slots are iterated +in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called +before the iteration on the memory slots before this commit. This change +literally enlarges the racy window between kvm_mmu_notifier_change_pte() +and memory slot removal so that we're able to reproduce the issue in a +practical test case. However, the issue exists since commit d5d8184d35c9 +("KVM: ARM: Memory virtualization setup"). + +Cc: stable@vger.kernel.org # v3.9+ +Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") +Reported-by: Shuai Hu +Reported-by: Zhenyu Zhang +Signed-off-by: Gavin Shan +Reviewed-by: David Hildenbrand +Reviewed-by: Oliver Upton +Reviewed-by: Peter Xu +Reviewed-by: Sean Christopherson +Reviewed-by: Shaoqin Huang +Message-Id: <20230615054259.14911-1-gshan@redhat.com> +Signed-off-by: Paolo Bonzini +Signed-off-by: Greg Kroah-Hartman +(cherry picked from commit 953dd7e2df8181d5ce4117fca347992d616f0621) +Signed-off-by: Stoiko Ivanov +--- + virt/kvm/kvm_main.c | 20 +++++++++++++++++++- + 1 file changed, 19 insertions(+), 1 deletion(-) + +diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c +index db159be9d5b8..6deb43c2d091 100644 +--- a/virt/kvm/kvm_main.c ++++ b/virt/kvm/kvm_main.c +@@ -636,6 +636,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn + + return __kvm_handle_hva_range(kvm, &range); + } ++ ++static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) ++{ ++ /* ++ * Skipping invalid memslots is correct if and only change_pte() is ++ * surrounded by invalidate_range_{start,end}(), which is currently ++ * guaranteed by the primary MMU. If that ever changes, KVM needs to ++ * unmap the memslot instead of skipping the memslot to ensure that KVM ++ * doesn't hold references to the old PFN. ++ */ ++ WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count)); ++ ++ if (range->slot->flags & KVM_MEMSLOT_INVALID) ++ return false; ++ ++ return kvm_set_spte_gfn(kvm, range); ++} ++ + static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address, +@@ -656,7 +674,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, + if (!READ_ONCE(kvm->mmu_notifier_count)) + return; + +- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); ++ kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn); + } + + void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start, -- 2.39.2