public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Friedrich Weber <f.weber@proxmox.com>
Subject: [pve-devel] applied: [RFC kernel] cherry-pick scheduler fix to avoid temporary VM freezes on NUMA hosts
Date: Mon, 11 Mar 2024 13:51:13 +0100	[thread overview]
Message-ID: <3f66820e-b240-4265-9ecf-68e253a5c7ff@proxmox.com> (raw)
In-Reply-To: <20240117144521.2960958-1-f.weber@proxmox.com>

Am 17/01/2024 um 15:45 schrieb Friedrich Weber:
> Users have been reporting [1] that VMs occasionally become
> unresponsive with high CPU usage for some time (varying between ~1 and
> more than 60 seconds). After that time, the guests come back and
> continue running. Windows VMs seem most affected (not responding to
> pings during the hang, RDP sessions time out), but we also got reports
> about Linux VMs (reporting soft lockups). The issue was not present on
> host kernel 5.15 and was first reported with kernel 6.2. Users
> reported that the issue becomes easier to trigger the more memory is
> assigned to the guests. Setting mitigations=off was reported to
> alleviate (but not eliminate) the issue. For most users the issue
> seems to disappear after (also) disabling KSM [2], but some users
> reported freezes even with KSM disabled [3].
> 
> It turned out the reports concerned NUMA hosts only, and that the
> freezes correlated with runs of the NUMA balancer [4]. Users reported
> that disabling the NUMA balancer resolves the issue (even with KSM
> enabled).
> 
> We put together a Linux VM reproducer, ran a git-bisect on the kernel
> to find the commit introducing the issue and asked upstream for help
> [5]. As it turned out, an upstream bugreport was recently opened [6]
> and a preliminary fix to the KVM TDP MMU was proposed [7]. With that
> patch [7] on top of kernel 6.7, the reproducer does not trigger
> freezes anymore. As of now, the patch (or its v2 [8]) is not yet
> merged in the mainline kernel, and backporting it may be difficult due
> to dependencies on other KVM changes [9].
> 
> However, the bugreport [6] also prompted an upstream developer to
> propose a patch to the kernel scheduler logic that decides whether a
> contended spinlock/rwlock should be dropped [10]. Without the patch,
> PREEMPT_DYNAMIC kernels (such as ours) would always drop contended
> locks. With the patch, the kernel only drops contended locks if the
> kernel is currently set to preempt=full. As noted in the commit
> message [10], this can (counter-intuitively) improve KVM performance.
> Our kernel defaults to preempt=voluntary (according to
> /sys/kernel/debug/sched/preempt), so with the patch it does not drop
> contended locks anymore, and the reproducer does not trigger freezes
> anymore. Hence, backport [10] to our kernel.
> 
> [1] https://forum.proxmox.com/threads/130727/
> [2] https://forum.proxmox.com/threads/130727/page-4#post-575886
> [3] https://forum.proxmox.com/threads/130727/page-8#post-617587
> [4] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing
> [5] https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/
> [6] https://bugzilla.kernel.org/show_bug.cgi?id=218259
> [7] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/
> [8] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/
> [9] https://lore.kernel.org/kvm/Zaa654hwFKba_7pf@google.com/
> [10] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> 
> Notes:
>     This RFC is not meant to be applied immediately, but is intended to
>     sum up the current state of the issue and point out potential fixes.
>     
>     The patch [10] backported in this RFC hasn't been reviewed upstream
>     yet. And while it fixes the reproducer, it is not certain that it will
>     fix freezes seen by users on real-world workloads. Hence, it would be
>     desirable to also apply some variant of [7] [8] once it is applied
>     upstream, however there may be difficulties backporting it, as noted
>     above.
>     
>     So, in any case, for now it might sense to monitor how upstream
>     handles the situation, and then react accordingly. I'll continue to
>     participate upstream and send a v2 in due time.
> 
>  ...spinlocks-on-contention-iff-kernel-i.patch | 78 +++++++++++++++++++
>  1 file changed, 78 insertions(+)
>  create mode 100644 patches/kernel/0018-sched-core-Drop-spinlocks-on-contention-iff-kernel-i.patch
> 
>

this was actually already applied for 6.5.13-1, thanks!




      reply	other threads:[~2024-03-11 12:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 14:45 [pve-devel] " Friedrich Weber
2024-03-11 12:51 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f66820e-b240-4265-9ecf-68e253a5c7ff@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=f.weber@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal