From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 5AF2493A45 for ; Tue, 9 Apr 2024 14:49:04 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 410D71E2A2 for ; Tue, 9 Apr 2024 14:49:04 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 9 Apr 2024 14:49:03 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 616E941856 for ; Tue, 9 Apr 2024 14:49:03 +0200 (CEST) From: Friedrich Weber To: pve-devel@lists.proxmox.com Date: Tue, 9 Apr 2024 14:48:25 +0200 Message-Id: <20240409124825.1809359-1-f.weber@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.069 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [RFC pve-kernel] split lock detection: warn more prominently about misery mode X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Apr 2024 12:49:04 -0000 On processors with the `split_lock_detect` feature, the kernel will detect user-space split locks and, by default, apply a "misery mode" to offending tasks [1]. When a split lock is detected, the misery mode artificially slows down the task by forcing a 10ms and serialization with other offending tasks. This can also apply to vCPU threads of VM guests that perform unaligned memory access. In this case, the misery mode can result in temporary vCPU freezes of 10ms or more. Currently, correlating observed vCPU freezes with split lock detection is hard for users: Split lock detection does log a warning, but the warning does not mention the slowdown, and is logged only once for each task, which makes troubleshooting hard in case of long-running VMs. Currently, vCPU threads of OVMF VMs seem to already produce one such warning on boot, which masks any split locks taken later by the guest OS. To ease troubleshooting, add a patch that warns more prominently about misery mode. With this patch, the kernel warns every time a task is artificially slowed down. Even though the warnings are rate-limited by the `printk_ratelimit` mechanism, there is a risk of large volume of warnings, but this seems preferable to the misery mode going unnoticed. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/intel.c?id=727209376f49 Signed-off-by: Friedrich Weber --- Notes: With this patch applied, I see a risk that some users will get a *lot* of warnings about misery mode. By default, these are rate-limited to 10 messages per 5 seconds, but depending on the workload, this can still mean a lot of messages. This is good for driving user attention towards the split locks, but still, I wonder whether this might spam the journal too much. What do you think? Also, currently there is no option to stop the warnings without disabling misery mode (or split lock detection in general). Should there be such an option? I could imagine making the `split_lock_mitigate` sysctl ternary: 2: Enable misery mode and warn every time (new behavior, new default) 1: Enable misery mode but warn only once (old behavior) 0: Disable misery mode and warn only once ... or even completely decouple misery mode and warning behavior? What do you think? ...lways-warn-when-applying-misery-mode.patch | 67 +++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch diff --git a/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch new file mode 100644 index 0000000..658de5e --- /dev/null +++ b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch @@ -0,0 +1,67 @@ +From f8db8223735c529a1dcaaf41217024ee4091a464 Mon Sep 17 00:00:00 2001 +From: Friedrich Weber +Date: Tue, 9 Apr 2024 10:57:43 +0200 +Subject: [PATCH] x86/split_lock: always warn when applying misery to task + +Since commits b041b525dab9 ("x86/split_lock: Make life miserable for +split lockers") and 727209376f49 ("x86/split_lock: Add sysctl to +control the misery mode"), split lock detection artificially slows +down the offending task ("misery mode") if the `split_lock_mitigate` +sysctl is enabled (which it is by default). The corresponding warning +does not mention the slowdown, and is logged only once for each task. +In case of long-running VMs, this makes it hard to correlate observed +vCPU freezes to the split lock detection misery mode. + +To make troubleshooting easier, change the warning behavior: + +- If the `split_lock_mitigate` sysctl is enabled, warn every time + misery is applied to a task, and rephrase the warning. Even though + the warnings are rate-limited by the `printk_ratelimit` mechanism, + this may result in a large volume of warnings for split-lock-heavy + workloads, but this seems preferable to the misery mode going + unnoticed. +- If the `split_lock_mitigate` sysctl is disabled, warn only once (as + before), but mention explicitly that no further action is taken and + subsequent traps will not be logged. + +Signed-off-by: Friedrich Weber +--- + arch/x86/kernel/cpu/intel.c | 13 +++++++++---- + 1 file changed, 9 insertions(+), 4 deletions(-) + +diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c +index 40dec9b56f87..dad937a438e6 100644 +--- a/arch/x86/kernel/cpu/intel.c ++++ b/arch/x86/kernel/cpu/intel.c +@@ -1164,12 +1164,11 @@ static void split_lock_warn(unsigned long ip) + struct delayed_work *work; + int cpu; + +- if (!current->reported_split_lock) +- pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n", ++ if (sysctl_sld_mitigate) { ++ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx. " ++ "Artificially slowing down task as mitigation because split_lock_mitigate=1\n", + current->comm, current->pid, ip); +- current->reported_split_lock = 1; + +- if (sysctl_sld_mitigate) { + /* + * misery factor #1: + * sleep 10ms before trying to execute split lock. +@@ -1184,6 +1183,12 @@ static void split_lock_warn(unsigned long ip) + return; + work = &sl_reenable_unlock; + } else { ++ if (!current->reported_split_lock) ++ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx. " ++ "Taking no further action. Subsequent split_lock traps will not be logged\n", ++ current->comm, current->pid, ip); ++ current->reported_split_lock = 1; ++ + work = &sl_reenable; + } + +-- +2.39.2 + -- 2.39.2