all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Friedrich Weber <f.weber@proxmox.com>
Subject: [pve-devel] applied: [RFC kernel] cherry-pick scheduler fix to avoid temporary VM freezes on NUMA hosts
Date: Mon, 11 Mar 2024 13:51:13 +0100	[thread overview]
Message-ID: <3f66820e-b240-4265-9ecf-68e253a5c7ff@proxmox.com> (raw)
In-Reply-To: <20240117144521.2960958-1-f.weber@proxmox.com>

Am 17/01/2024 um 15:45 schrieb Friedrich Weber:
> Users have been reporting [1] that VMs occasionally become
> unresponsive with high CPU usage for some time (varying between ~1 and
> more than 60 seconds). After that time, the guests come back and
> continue running. Windows VMs seem most affected (not responding to
> pings during the hang, RDP sessions time out), but we also got reports
> about Linux VMs (reporting soft lockups). The issue was not present on
> host kernel 5.15 and was first reported with kernel 6.2. Users
> reported that the issue becomes easier to trigger the more memory is
> assigned to the guests. Setting mitigations=off was reported to
> alleviate (but not eliminate) the issue. For most users the issue
> seems to disappear after (also) disabling KSM [2], but some users
> reported freezes even with KSM disabled [3].
> 
> It turned out the reports concerned NUMA hosts only, and that the
> freezes correlated with runs of the NUMA balancer [4]. Users reported
> that disabling the NUMA balancer resolves the issue (even with KSM
> enabled).
> 
> We put together a Linux VM reproducer, ran a git-bisect on the kernel
> to find the commit introducing the issue and asked upstream for help
> [5]. As it turned out, an upstream bugreport was recently opened [6]
> and a preliminary fix to the KVM TDP MMU was proposed [7]. With that
> patch [7] on top of kernel 6.7, the reproducer does not trigger
> freezes anymore. As of now, the patch (or its v2 [8]) is not yet
> merged in the mainline kernel, and backporting it may be difficult due
> to dependencies on other KVM changes [9].
> 
> However, the bugreport [6] also prompted an upstream developer to
> propose a patch to the kernel scheduler logic that decides whether a
> contended spinlock/rwlock should be dropped [10]. Without the patch,
> PREEMPT_DYNAMIC kernels (such as ours) would always drop contended
> locks. With the patch, the kernel only drops contended locks if the
> kernel is currently set to preempt=full. As noted in the commit
> message [10], this can (counter-intuitively) improve KVM performance.
> Our kernel defaults to preempt=voluntary (according to
> /sys/kernel/debug/sched/preempt), so with the patch it does not drop
> contended locks anymore, and the reproducer does not trigger freezes
> anymore. Hence, backport [10] to our kernel.
> 
> [1] https://forum.proxmox.com/threads/130727/
> [2] https://forum.proxmox.com/threads/130727/page-4#post-575886
> [3] https://forum.proxmox.com/threads/130727/page-8#post-617587
> [4] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing
> [5] https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/
> [6] https://bugzilla.kernel.org/show_bug.cgi?id=218259
> [7] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/
> [8] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/
> [9] https://lore.kernel.org/kvm/Zaa654hwFKba_7pf@google.com/
> [10] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> 
> Notes:
>     This RFC is not meant to be applied immediately, but is intended to
>     sum up the current state of the issue and point out potential fixes.
>     
>     The patch [10] backported in this RFC hasn't been reviewed upstream
>     yet. And while it fixes the reproducer, it is not certain that it will
>     fix freezes seen by users on real-world workloads. Hence, it would be
>     desirable to also apply some variant of [7] [8] once it is applied
>     upstream, however there may be difficulties backporting it, as noted
>     above.
>     
>     So, in any case, for now it might sense to monitor how upstream
>     handles the situation, and then react accordingly. I'll continue to
>     participate upstream and send a v2 in due time.
> 
>  ...spinlocks-on-contention-iff-kernel-i.patch | 78 +++++++++++++++++++
>  1 file changed, 78 insertions(+)
>  create mode 100644 patches/kernel/0018-sched-core-Drop-spinlocks-on-contention-iff-kernel-i.patch
> 
>

this was actually already applied for 6.5.13-1, thanks!




      reply	other threads:[~2024-03-11 12:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 14:45 [pve-devel] " Friedrich Weber
2024-03-11 12:51 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f66820e-b240-4265-9ecf-68e253a5c7ff@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=f.weber@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal