public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH pve-kernel 2/2] cherry-pick fix for uefi guests hanging upon guest-initialized reboot
Date: Thu, 24 Aug 2023 16:30:21 +0200	[thread overview]
Message-ID: <20230824143021.2440581-3-s.ivanov@proxmox.com> (raw)
In-Reply-To: <20230824143021.2440581-1-s.ivanov@proxmox.com>

This was identified as a potential fix for an issue we analyzed in our
Enterprise support, where guests would hang before the boot-loader
after being rebooted from within the guest (after applying updates for
RHEL 8).

https://lore.kernel.org/lkml/20230608090348.414990-1-gshan@redhat.com/

Suggested-by: Stefan Hanreich <s.hanreich@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
---
 ...l-stage2-mapping-on-invalid-memory-s.patch | 122 ++++++++++++++++++
 1 file changed, 122 insertions(+)
 create mode 100644 patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch

diff --git a/patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch b/patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch
new file mode 100644
index 000000000000..d50aab8e4d7c
--- /dev/null
+++ b/patches/kernel/0025-KVM-Avoid-illegal-stage2-mapping-on-invalid-memory-s.patch
@@ -0,0 +1,122 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Gavin Shan <gshan@redhat.com>
+Date: Thu, 15 Jun 2023 15:42:59 +1000
+Subject: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot
+
+commit 2230f9e1171a2e9731422a14d1bbc313c0b719d1 upstream.
+
+We run into guest hang in edk2 firmware when KSM is kept as running on
+the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
+device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
+buffered write. The status is returned by reading the memory region of
+the pflash device and the read request should have been forwarded to QEMU
+and emulated by it. Unfortunately, the read request is covered by an
+illegal stage2 mapping when the guest hang issue occurs. The read request
+is completed with QEMU bypassed and wrong status is fetched. The edk2
+firmware runs into an infinite loop with the wrong status.
+
+The illegal stage2 mapping is populated due to same page sharing by KSM
+at (C) even the associated memory slot has been marked as invalid at (B)
+when the memory slot is requested to be deleted. It's notable that the
+active and inactive memory slots can't be swapped when we're in the middle
+of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
+is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
+to zero again. Besides, the swapping from the active to the inactive memory
+slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
+corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
+
+  CPU-A                    CPU-B
+  -----                    -----
+                           ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
+                           kvm_vm_ioctl_set_memory_region
+                           kvm_set_memory_region
+                           __kvm_set_memory_region
+                           kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
+                             kvm_invalidate_memslot
+                               kvm_copy_memslot
+                               kvm_replace_memslot
+                               kvm_swap_active_memslots        (A)
+                               kvm_arch_flush_shadow_memslot   (B)
+  same page sharing by KSM
+  kvm_mmu_notifier_invalidate_range_start
+        :
+  kvm_mmu_notifier_change_pte
+    kvm_handle_hva_range
+    __kvm_handle_hva_range
+    kvm_set_spte_gfn            (C)
+        :
+  kvm_mmu_notifier_invalidate_range_end
+
+Fix the issue by skipping the invalid memory slot at (C) to avoid the
+illegal stage2 mapping so that the read request for the pflash's status
+is forwarded to QEMU and emulated by it. In this way, the correct pflash's
+status can be returned from QEMU to break the infinite loop in the edk2
+firmware.
+
+We tried a git-bisect and the first problematic commit is cd4c71835228 ("
+KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
+clean_dcache_guest_page() is called after the memory slots are iterated
+in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
+before the iteration on the memory slots before this commit. This change
+literally enlarges the racy window between kvm_mmu_notifier_change_pte()
+and memory slot removal so that we're able to reproduce the issue in a
+practical test case. However, the issue exists since commit d5d8184d35c9
+("KVM: ARM: Memory virtualization setup").
+
+Cc: stable@vger.kernel.org # v3.9+
+Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
+Reported-by: Shuai Hu <hshuai@redhat.com>
+Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
+Signed-off-by: Gavin Shan <gshan@redhat.com>
+Reviewed-by: David Hildenbrand <david@redhat.com>
+Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
+Reviewed-by: Peter Xu <peterx@redhat.com>
+Reviewed-by: Sean Christopherson <seanjc@google.com>
+Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
+Message-Id: <20230615054259.14911-1-gshan@redhat.com>
+Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+(cherry picked from commit 953dd7e2df8181d5ce4117fca347992d616f0621)
+Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
+---
+ virt/kvm/kvm_main.c | 20 +++++++++++++++++++-
+ 1 file changed, 19 insertions(+), 1 deletion(-)
+
+diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
+index db159be9d5b8..6deb43c2d091 100644
+--- a/virt/kvm/kvm_main.c
++++ b/virt/kvm/kvm_main.c
+@@ -636,6 +636,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
+ 
+ 	return __kvm_handle_hva_range(kvm, &range);
+ }
++
++static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
++{
++	/*
++	 * Skipping invalid memslots is correct if and only change_pte() is
++	 * surrounded by invalidate_range_{start,end}(), which is currently
++	 * guaranteed by the primary MMU.  If that ever changes, KVM needs to
++	 * unmap the memslot instead of skipping the memslot to ensure that KVM
++	 * doesn't hold references to the old PFN.
++	 */
++	WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
++
++	if (range->slot->flags & KVM_MEMSLOT_INVALID)
++		return false;
++
++	return kvm_set_spte_gfn(kvm, range);
++}
++
+ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
+ 					struct mm_struct *mm,
+ 					unsigned long address,
+@@ -656,7 +674,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
+ 	if (!READ_ONCE(kvm->mmu_notifier_count))
+ 		return;
+ 
+-	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
++	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
+ }
+ 
+ void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
-- 
2.39.2





  parent reply	other threads:[~2023-08-24 14:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-24 14:30 [pve-devel] [PATCH pve-kernel 0/2] cherry-pick a patch from kernel.org stable 5.15 for guests hanging during reboot Stoiko Ivanov
2023-08-24 14:30 ` [pve-devel] [PATCH pve-kernel 1/2] refresh patches after ./debian/scripts/export-patchqueue Stoiko Ivanov
2023-08-24 14:30 ` Stoiko Ivanov [this message]
2023-08-25  7:35   ` [pve-devel] [PATCH pve-kernel 2/2] cherry-pick fix for uefi guests hanging upon guest-initialized reboot Fiona Ebner
2023-08-25  7:40     ` Stefan Hanreich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230824143021.2440581-3-s.ivanov@proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal