[pve-devel] [PATCH kernel] cherry-pick fix to surpress faulty segfault logging

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH kernel] cherry-pick fix to surpress faulty segfault logging
@ 2023-08-25  9:26 Fiona Ebner
  2023-08-25 13:31 ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 1 reply; 2+ messages in thread
From: Fiona Ebner @ 2023-08-25  9:26 UTC (permalink / raw)
  To: pve-devel

While there is no actual issue, users are still nervous about the
faulty logging [0]. It might take a while until the fix comes in via
upstream, so just pick it up manually.

[0]: https://forum.proxmox.com/threads/130628/post-583864

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 ...ault-logging-if-fatal-signal-already.patch | 67 +++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 patches/kernel/0037-mm-suppress-mm-fault-logging-if-fatal-signal-already.patch

diff --git a/patches/kernel/0037-mm-suppress-mm-fault-logging-if-fatal-signal-already.patch b/patches/kernel/0037-mm-suppress-mm-fault-logging-if-fatal-signal-already.patch
new file mode 100644
index 0000000..769811b
--- /dev/null
+++ b/patches/kernel/0037-mm-suppress-mm-fault-logging-if-fatal-signal-already.patch
@@ -0,0 +1,67 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Tue, 25 Jul 2023 09:38:32 -0700
+Subject: [PATCH] mm: suppress mm fault logging if fatal signal already pending
+
+Commit eda0047296a1 ("mm: make the page fault mmap locking killable")
+intentionally made it much easier to trigger the "page fault fails
+because a fatal signal is pending" situation, by having the mmap locking
+fail early in that case.
+
+We have long aborted page faults in other fatal cases when the actual IO
+for a page is interrupted by SIGKILL - which is particularly useful for
+the traditional case of NFS hanging due to network issues, but local
+filesystems could cause it too if you happened to get the SIGKILL while
+waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()).
+
+So aborting the page fault wasn't a new condition - but it now triggers
+earlier, before we even get to 'handle_mm_fault()'.  And as a result the
+error doesn't go through our 'fault_signal_pending()' logic, and doesn't
+get filtered away there.
+
+Normally you'd never even notice, because if a fatal signal is pending,
+the new SIGSEGV we send ends up being ignored anyway.
+
+But it turns out that there is one very noticeable exception: if you
+enable 'show_unhandled_signals', the aborted page fault will be logged
+in the kernel messages, and you'll get a scary line looking something
+like this in your logs:
+
+  pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)
+
+which is rather misleading.  It's not really a segfault at all, it's
+just "the thread was killed before the page fault completed, so we
+aborted the page fault".
+
+Fix this by just making it clear that a pending fatal signal means that
+any new signal coming in after that is implicitly handled.  This will
+avoid the misleading logging, since now the signal isn't 'unhandled' any
+more.
+
+Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com>
+Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
+Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/
+Acked-by: Oleg Nesterov <oleg@redhat.com>
+Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable")
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+(cherry-picked from commit 5f0bc0b042fc77ff70e14c790abdec960cde4ec1)
+Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
+---
+ kernel/signal.c | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/kernel/signal.c b/kernel/signal.c
+index ae26da61c4d9..060f834e9c1a 100644
+--- a/kernel/signal.c
++++ b/kernel/signal.c
+@@ -561,6 +561,10 @@ bool unhandled_signal(struct task_struct *tsk, int sig)
+ 	if (handler != SIG_IGN && handler != SIG_DFL)
+ 		return false;
+ 
++	/* If dying, we handle all new signals by ignoring them */
++	if (fatal_signal_pending(tsk))
++		return false;
++
+ 	/* if ptraced, let the tracer determine */
+ 	return !tsk->ptrace;
+ }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 2+ messages in thread

* [pve-devel] applied: [PATCH kernel] cherry-pick fix to surpress faulty segfault logging
  2023-08-25  9:26 [pve-devel] [PATCH kernel] cherry-pick fix to surpress faulty segfault logging Fiona Ebner
@ 2023-08-25 13:31 ` Thomas Lamprecht
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Lamprecht @ 2023-08-25 13:31 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 25/08/2023 um 11:26 schrieb Fiona Ebner:
> While there is no actual issue, users are still nervous about the
> faulty logging [0]. It might take a while until the fix comes in via
> upstream, so just pick it up manually.
> 
> [0]: https://forum.proxmox.com/threads/130628/post-583864
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  ...ault-logging-if-fatal-signal-already.patch | 67 +++++++++++++++++++
>  1 file changed, 67 insertions(+)
>  create mode 100644 patches/kernel/0037-mm-suppress-mm-fault-logging-if-fatal-signal-already.patch
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-08-25 13:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25  9:26 [pve-devel] [PATCH kernel] cherry-pick fix to surpress faulty segfault logging Fiona Ebner
2023-08-25 13:31 ` [pve-devel] applied: " Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal