public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC pve-kernel] split lock detection: warn more prominently about misery mode
@ 2024-04-09 12:48 Friedrich Weber
  2024-06-11 14:08 ` Fiona Ebner
  0 siblings, 1 reply; 2+ messages in thread
From: Friedrich Weber @ 2024-04-09 12:48 UTC (permalink / raw)
  To: pve-devel

On processors with the `split_lock_detect` feature, the kernel will
detect user-space split locks and, by default, apply a "misery mode"
to offending tasks [1]. When a split lock is detected, the misery mode
artificially slows down the task by forcing a 10ms and serialization
with other offending tasks. This can also apply to vCPU threads of VM
guests that perform unaligned memory access. In this case, the misery
mode can result in temporary vCPU freezes of 10ms or more.

Currently, correlating observed vCPU freezes with split lock detection
is hard for users: Split lock detection does log a warning, but the
warning does not mention the slowdown, and is logged only once for
each task, which makes troubleshooting hard in case of long-running
VMs. Currently, vCPU threads of OVMF VMs seem to already produce one
such warning on boot, which masks any split locks taken later by the
guest OS.

To ease troubleshooting, add a patch that warns more prominently about
misery mode. With this patch, the kernel warns every time a task is
artificially slowed down. Even though the warnings are rate-limited by
the `printk_ratelimit` mechanism, there is a risk of large volume of
warnings, but this seems preferable to the misery mode going
unnoticed.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/intel.c?id=727209376f49

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---

Notes:
    With this patch applied, I see a risk that some users will get a *lot*
    of warnings about misery mode. By default, these are rate-limited to
    10 messages per 5 seconds, but depending on the workload, this can
    still mean a lot of messages. This is good for driving user
    attention towards the split locks, but still, I wonder whether this
    might spam the journal too much. What do you think?
    
    Also, currently there is no option to stop the warnings without
    disabling misery mode (or split lock detection in general). Should
    there be such an option? I could imagine making the
    `split_lock_mitigate` sysctl ternary:
    
    2: Enable misery mode and warn every time (new behavior, new default)
    1: Enable misery mode but warn only once (old behavior)
    0: Disable misery mode and warn only once
    
    ... or even completely decouple misery mode and warning behavior?
    What do you think?

 ...lways-warn-when-applying-misery-mode.patch | 67 +++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch

diff --git a/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
new file mode 100644
index 0000000..658de5e
--- /dev/null
+++ b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
@@ -0,0 +1,67 @@
+From f8db8223735c529a1dcaaf41217024ee4091a464 Mon Sep 17 00:00:00 2001
+From: Friedrich Weber <f.weber@proxmox.com>
+Date: Tue, 9 Apr 2024 10:57:43 +0200
+Subject: [PATCH] x86/split_lock: always warn when applying misery to task
+
+Since commits b041b525dab9 ("x86/split_lock: Make life miserable for
+split lockers") and 727209376f49 ("x86/split_lock: Add sysctl to
+control the misery mode"), split lock detection artificially slows
+down the offending task ("misery mode") if the `split_lock_mitigate`
+sysctl is enabled (which it is by default). The corresponding warning
+does not mention the slowdown, and is logged only once for each task.
+In case of long-running VMs, this makes it hard to correlate observed
+vCPU freezes to the split lock detection misery mode.
+
+To make troubleshooting easier, change the warning behavior:
+
+- If the `split_lock_mitigate` sysctl is enabled, warn every time
+  misery is applied to a task, and rephrase the warning. Even though
+  the warnings are rate-limited by the `printk_ratelimit` mechanism,
+  this may result in a large volume of warnings for split-lock-heavy
+  workloads, but this seems preferable to the misery mode going
+  unnoticed.
+- If the `split_lock_mitigate` sysctl is disabled, warn only once (as
+  before), but mention explicitly that no further action is taken and
+  subsequent traps will not be logged.
+
+Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ arch/x86/kernel/cpu/intel.c | 13 +++++++++----
+ 1 file changed, 9 insertions(+), 4 deletions(-)
+
+diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
+index 40dec9b56f87..dad937a438e6 100644
+--- a/arch/x86/kernel/cpu/intel.c
++++ b/arch/x86/kernel/cpu/intel.c
+@@ -1164,12 +1164,11 @@ static void split_lock_warn(unsigned long ip)
+ 	struct delayed_work *work;
+ 	int cpu;
+ 
+-	if (!current->reported_split_lock)
+-		pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
++	if (sysctl_sld_mitigate) {
++		pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx. "
++				    "Artificially slowing down task as mitigation because split_lock_mitigate=1\n",
+ 				    current->comm, current->pid, ip);
+-	current->reported_split_lock = 1;
+ 
+-	if (sysctl_sld_mitigate) {
+ 		/*
+ 		 * misery factor #1:
+ 		 * sleep 10ms before trying to execute split lock.
+@@ -1184,6 +1183,12 @@ static void split_lock_warn(unsigned long ip)
+ 			return;
+ 		work = &sl_reenable_unlock;
+ 	} else {
++		if (!current->reported_split_lock)
++			pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx. "
++					    "Taking no further action. Subsequent split_lock traps will not be logged\n",
++					    current->comm, current->pid, ip);
++		current->reported_split_lock = 1;
++
+ 		work = &sl_reenable;
+ 	}
+ 
+-- 
+2.39.2
+
-- 
2.39.2





^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [pve-devel] [RFC pve-kernel] split lock detection: warn more prominently about misery mode
  2024-04-09 12:48 [pve-devel] [RFC pve-kernel] split lock detection: warn more prominently about misery mode Friedrich Weber
@ 2024-06-11 14:08 ` Fiona Ebner
  0 siblings, 0 replies; 2+ messages in thread
From: Fiona Ebner @ 2024-06-11 14:08 UTC (permalink / raw)
  To: Proxmox VE development discussion, Friedrich Weber

Am 09.04.24 um 14:48 schrieb Friedrich Weber:
> On processors with the `split_lock_detect` feature, the kernel will
> detect user-space split locks and, by default, apply a "misery mode"
> to offending tasks [1]. When a split lock is detected, the misery mode
> artificially slows down the task by forcing a 10ms and serialization
> with other offending tasks. This can also apply to vCPU threads of VM
> guests that perform unaligned memory access. In this case, the misery
> mode can result in temporary vCPU freezes of 10ms or more.
> 
> Currently, correlating observed vCPU freezes with split lock detection
> is hard for users: Split lock detection does log a warning, but the
> warning does not mention the slowdown, and is logged only once for
> each task, which makes troubleshooting hard in case of long-running
> VMs. Currently, vCPU threads of OVMF VMs seem to already produce one
> such warning on boot, which masks any split locks taken later by the
> guest OS.
> 
> To ease troubleshooting, add a patch that warns more prominently about
> misery mode. With this patch, the kernel warns every time a task is
> artificially slowed down. Even though the warnings are rate-limited by
> the `printk_ratelimit` mechanism, there is a risk of large volume of
> warnings, but this seems preferable to the misery mode going
> unnoticed.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/intel.c?id=727209376f49
> 
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
> 
> Notes:
>     With this patch applied, I see a risk that some users will get a *lot*
>     of warnings about misery mode. By default, these are rate-limited to
>     10 messages per 5 seconds, but depending on the workload, this can
>     still mean a lot of messages. This is good for driving user
>     attention towards the split locks, but still, I wonder whether this
>     might spam the journal too much. What do you think?
>     

I know this is an upper limit, but 10 messages per 5 seconds is too much
IMHO. In particular, because certain guest OSes cannot be changed to
avoid taking split locks and users will get scared by these messages.
Unfortunately, reported_split_lock is just a flag and cannot be used as
a counter to log every N taken split locks, which for the right N might
be closer to what we/users actually want. Turning the flag into a
counter would mean wasting more space in the "current" struct, not sure
if that's worth it.

>     Also, currently there is no option to stop the warnings without
>     disabling misery mode (or split lock detection in general). Should
>     there be such an option? I could imagine making the
>     `split_lock_mitigate` sysctl ternary:
>     
>     2: Enable misery mode and warn every time (new behavior, new default)
>     1: Enable misery mode but warn only once (old behavior)
>     0: Disable misery mode and warn only once
>     
>     ... or even completely decouple misery mode and warning behavior?
>     What do you think?
> 

If we go for the current patch, I'm in favor of having a switch to turn
the frequent warnings off. I feel like it should be a separate switch then.

>  ...lways-warn-when-applying-misery-mode.patch | 67 +++++++++++++++++++
>  1 file changed, 67 insertions(+)
>  create mode 100644 patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
> 
> diff --git a/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
> new file mode 100644
> index 0000000..658de5e
> --- /dev/null
> +++ b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
> @@ -0,0 +1,67 @@
> +From f8db8223735c529a1dcaaf41217024ee4091a464 Mon Sep 17 00:00:00 2001
> +From: Friedrich Weber <f.weber@proxmox.com>
> +Date: Tue, 9 Apr 2024 10:57:43 +0200
> +Subject: [PATCH] x86/split_lock: always warn when applying misery to task
> +
> +Since commits b041b525dab9 ("x86/split_lock: Make life miserable for
> +split lockers") and 727209376f49 ("x86/split_lock: Add sysctl to
> +control the misery mode"), split lock detection artificially slows
> +down the offending task ("misery mode") if the `split_lock_mitigate`
> +sysctl is enabled (which it is by default). The corresponding warning
> +does not mention the slowdown, and is logged only once for each task.
> +In case of long-running VMs, this makes it hard to correlate observed
> +vCPU freezes to the split lock detection misery mode.
> +

A start would be to extend the warning to mention that it won't be
logged again and there can be slowdowns.

Maybe you can also ask upstream for feedback and suggestions.

> +To make troubleshooting easier, change the warning behavior:
> +
> +- If the `split_lock_mitigate` sysctl is enabled, warn every time
> +  misery is applied to a task, and rephrase the warning. Even though
> +  the warnings are rate-limited by the `printk_ratelimit` mechanism,
> +  this may result in a large volume of warnings for split-lock-heavy
> +  workloads, but this seems preferable to the misery mode going
> +  unnoticed.
> +- If the `split_lock_mitigate` sysctl is disabled, warn only once (as
> +  before), but mention explicitly that no further action is taken and
> +  subsequent traps will not be logged.
> +
> +Signed-off-by: Friedrich Weber <f.weber@proxmox.com>

As for the patch itself:

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-06-11 14:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-09 12:48 [pve-devel] [RFC pve-kernel] split lock detection: warn more prominently about misery mode Friedrich Weber
2024-06-11 14:08 ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal