From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 673B8B8B25 for ; Mon, 11 Mar 2024 13:51:45 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 3FD668CAC for ; Mon, 11 Mar 2024 13:51:15 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 11 Mar 2024 13:51:14 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4D894488FF for ; Mon, 11 Mar 2024 13:51:14 +0100 (CET) Message-ID: <3f66820e-b240-4265-9ecf-68e253a5c7ff@proxmox.com> Date: Mon, 11 Mar 2024 13:51:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta To: Proxmox VE development discussion , Friedrich Weber References: <20240117144521.2960958-1-f.weber@proxmox.com> Content-Language: en-GB, de-AT From: Thomas Lamprecht Autocrypt: addr=t.lamprecht@proxmox.com; keydata= xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+ wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/ bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx In-Reply-To: <20240117144521.2960958-1-f.weber@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.054 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pve-devel] applied: [RFC kernel] cherry-pick scheduler fix to avoid temporary VM freezes on NUMA hosts X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2024 12:51:45 -0000 Am 17/01/2024 um 15:45 schrieb Friedrich Weber: > Users have been reporting [1] that VMs occasionally become > unresponsive with high CPU usage for some time (varying between ~1 and > more than 60 seconds). After that time, the guests come back and > continue running. Windows VMs seem most affected (not responding to > pings during the hang, RDP sessions time out), but we also got reports > about Linux VMs (reporting soft lockups). The issue was not present on > host kernel 5.15 and was first reported with kernel 6.2. Users > reported that the issue becomes easier to trigger the more memory is > assigned to the guests. Setting mitigations=off was reported to > alleviate (but not eliminate) the issue. For most users the issue > seems to disappear after (also) disabling KSM [2], but some users > reported freezes even with KSM disabled [3]. > > It turned out the reports concerned NUMA hosts only, and that the > freezes correlated with runs of the NUMA balancer [4]. Users reported > that disabling the NUMA balancer resolves the issue (even with KSM > enabled). > > We put together a Linux VM reproducer, ran a git-bisect on the kernel > to find the commit introducing the issue and asked upstream for help > [5]. As it turned out, an upstream bugreport was recently opened [6] > and a preliminary fix to the KVM TDP MMU was proposed [7]. With that > patch [7] on top of kernel 6.7, the reproducer does not trigger > freezes anymore. As of now, the patch (or its v2 [8]) is not yet > merged in the mainline kernel, and backporting it may be difficult due > to dependencies on other KVM changes [9]. > > However, the bugreport [6] also prompted an upstream developer to > propose a patch to the kernel scheduler logic that decides whether a > contended spinlock/rwlock should be dropped [10]. Without the patch, > PREEMPT_DYNAMIC kernels (such as ours) would always drop contended > locks. With the patch, the kernel only drops contended locks if the > kernel is currently set to preempt=full. As noted in the commit > message [10], this can (counter-intuitively) improve KVM performance. > Our kernel defaults to preempt=voluntary (according to > /sys/kernel/debug/sched/preempt), so with the patch it does not drop > contended locks anymore, and the reproducer does not trigger freezes > anymore. Hence, backport [10] to our kernel. > > [1] https://forum.proxmox.com/threads/130727/ > [2] https://forum.proxmox.com/threads/130727/page-4#post-575886 > [3] https://forum.proxmox.com/threads/130727/page-8#post-617587 > [4] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing > [5] https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/ > [6] https://bugzilla.kernel.org/show_bug.cgi?id=218259 > [7] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/ > [8] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/ > [9] https://lore.kernel.org/kvm/Zaa654hwFKba_7pf@google.com/ > [10] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/ > > Signed-off-by: Friedrich Weber > --- > > Notes: > This RFC is not meant to be applied immediately, but is intended to > sum up the current state of the issue and point out potential fixes. > > The patch [10] backported in this RFC hasn't been reviewed upstream > yet. And while it fixes the reproducer, it is not certain that it will > fix freezes seen by users on real-world workloads. Hence, it would be > desirable to also apply some variant of [7] [8] once it is applied > upstream, however there may be difficulties backporting it, as noted > above. > > So, in any case, for now it might sense to monitor how upstream > handles the situation, and then react accordingly. I'll continue to > participate upstream and send a v2 in due time. > > ...spinlocks-on-contention-iff-kernel-i.patch | 78 +++++++++++++++++++ > 1 file changed, 78 insertions(+) > create mode 100644 patches/kernel/0018-sched-core-Drop-spinlocks-on-contention-iff-kernel-i.patch > > this was actually already applied for 6.5.13-1, thanks!