From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 673B8B8B25
 for <pve-devel@lists.proxmox.com>; Mon, 11 Mar 2024 13:51:45 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 3FD668CAC
 for <pve-devel@lists.proxmox.com>; Mon, 11 Mar 2024 13:51:15 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Mon, 11 Mar 2024 13:51:14 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4D894488FF
 for <pve-devel@lists.proxmox.com>; Mon, 11 Mar 2024 13:51:14 +0100 (CET)
Message-ID: <3f66820e-b240-4265-9ecf-68e253a5c7ff@proxmox.com>
Date: Mon, 11 Mar 2024 13:51:13 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Friedrich Weber <f.weber@proxmox.com>
References: <20240117144521.2960958-1-f.weber@proxmox.com>
Content-Language: en-GB, de-AT
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Autocrypt: addr=t.lamprecht@proxmox.com; keydata=
 xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J
 uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w
 OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD
 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7
 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+
 wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6
 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY
 virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt
 dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ
 jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt
 cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO
 R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK
 CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/
 bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R
 G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF
 sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB
 FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB
 UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk
 tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1
 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI
 tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu
 cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR
 D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95
 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy
 HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD
 txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6
 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY
 ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM
 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X
 s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l
 gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9
 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk
 IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j
 /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk
 IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0
 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS
 RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj
 C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA
 kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF
 BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg
 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja
 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx
In-Reply-To: <20240117144521.2960958-1-f.weber@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.054 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: [pve-devel] applied: [RFC kernel] cherry-pick scheduler fix to
 avoid temporary VM freezes on NUMA hosts
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 11 Mar 2024 12:51:45 -0000

Am 17/01/2024 um 15:45 schrieb Friedrich Weber:
> Users have been reporting [1] that VMs occasionally become
> unresponsive with high CPU usage for some time (varying between ~1 and
> more than 60 seconds). After that time, the guests come back and
> continue running. Windows VMs seem most affected (not responding to
> pings during the hang, RDP sessions time out), but we also got reports
> about Linux VMs (reporting soft lockups). The issue was not present on
> host kernel 5.15 and was first reported with kernel 6.2. Users
> reported that the issue becomes easier to trigger the more memory is
> assigned to the guests. Setting mitigations=off was reported to
> alleviate (but not eliminate) the issue. For most users the issue
> seems to disappear after (also) disabling KSM [2], but some users
> reported freezes even with KSM disabled [3].
> 
> It turned out the reports concerned NUMA hosts only, and that the
> freezes correlated with runs of the NUMA balancer [4]. Users reported
> that disabling the NUMA balancer resolves the issue (even with KSM
> enabled).
> 
> We put together a Linux VM reproducer, ran a git-bisect on the kernel
> to find the commit introducing the issue and asked upstream for help
> [5]. As it turned out, an upstream bugreport was recently opened [6]
> and a preliminary fix to the KVM TDP MMU was proposed [7]. With that
> patch [7] on top of kernel 6.7, the reproducer does not trigger
> freezes anymore. As of now, the patch (or its v2 [8]) is not yet
> merged in the mainline kernel, and backporting it may be difficult due
> to dependencies on other KVM changes [9].
> 
> However, the bugreport [6] also prompted an upstream developer to
> propose a patch to the kernel scheduler logic that decides whether a
> contended spinlock/rwlock should be dropped [10]. Without the patch,
> PREEMPT_DYNAMIC kernels (such as ours) would always drop contended
> locks. With the patch, the kernel only drops contended locks if the
> kernel is currently set to preempt=full. As noted in the commit
> message [10], this can (counter-intuitively) improve KVM performance.
> Our kernel defaults to preempt=voluntary (according to
> /sys/kernel/debug/sched/preempt), so with the patch it does not drop
> contended locks anymore, and the reproducer does not trigger freezes
> anymore. Hence, backport [10] to our kernel.
> 
> [1] https://forum.proxmox.com/threads/130727/
> [2] https://forum.proxmox.com/threads/130727/page-4#post-575886
> [3] https://forum.proxmox.com/threads/130727/page-8#post-617587
> [4] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing
> [5] https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/
> [6] https://bugzilla.kernel.org/show_bug.cgi?id=218259
> [7] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/
> [8] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/
> [9] https://lore.kernel.org/kvm/Zaa654hwFKba_7pf@google.com/
> [10] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> 
> Notes:
>     This RFC is not meant to be applied immediately, but is intended to
>     sum up the current state of the issue and point out potential fixes.
>     
>     The patch [10] backported in this RFC hasn't been reviewed upstream
>     yet. And while it fixes the reproducer, it is not certain that it will
>     fix freezes seen by users on real-world workloads. Hence, it would be
>     desirable to also apply some variant of [7] [8] once it is applied
>     upstream, however there may be difficulties backporting it, as noted
>     above.
>     
>     So, in any case, for now it might sense to monitor how upstream
>     handles the situation, and then react accordingly. I'll continue to
>     participate upstream and send a v2 in due time.
> 
>  ...spinlocks-on-contention-iff-kernel-i.patch | 78 +++++++++++++++++++
>  1 file changed, 78 insertions(+)
>  create mode 100644 patches/kernel/0018-sched-core-Drop-spinlocks-on-contention-iff-kernel-i.patch
> 
>

this was actually already applied for 6.5.13-1, thanks!