From: Friedrich Weber <f.weber@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH corosync] corosync.service: add patch to reduce log spam in broken network setups
Date: Fri, 4 Apr 2025 09:59:57 +0200 [thread overview]
Message-ID: <20250404075957.80057-1-f.weber@proxmox.com> (raw)
Since c761053 ("Check packets come from the correct interface
https://github.com/corosync/corosync/issues/750") in kronosnet,
corosync will produce log messages in certain broken network setups.
See inner patch for details. Drawing attention to such setups is
desirable because such setups may experience whole-cluster fences if
the watchdog is active, see [1].
However, the log volume in such broken setups can be inconveniently
high. In such a setup, when running the following on node 1:
# for i in $(seq 100); do dd if=/dev/urandom bs=1M of=/etc/pve/test.bin count=1; done
On node 2, bursts of ~1300 messages per second are observed:
# journalctl --since="1min ago" -u corosync.service \
| cut -d' ' -f 1-3 | uniq -c | sort -n | tail -n 10
8 Apr 04 09:51:20
8 Apr 04 09:51:24
8 Apr 04 09:51:30
8 Apr 04 09:51:34
8 Apr 04 09:51:40
12 Apr 04 09:51:00
196 Apr 04 09:51:46
1283 Apr 04 09:51:44
1329 Apr 04 09:51:43
1370 Apr 04 09:51:45
To avoid cluttering the journal, rate-limit log messages to 200 per
second. See inner patch for details.
[1] https://github.com/corosync/corosync/issues/750
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---
Notes:
I'm a little confused about the rate limit, as with this patch I do
see that systemd suppresses messages:
Apr 04 09:52:54 coro3 systemd-journald[303]: Suppressed 196 messages from corosync.service
but I still see way more than 200 messages per second:
11 Apr 04 09:52:00
11 Apr 04 09:52:45
13 Apr 04 09:52:13
14 Apr 04 09:52:12
19 Apr 04 09:52:07
67 Apr 04 09:52:08
400 Apr 04 09:52:54
695 Apr 04 09:52:52
715 Apr 04 09:52:53
835 Apr 04 09:52:51
Any idea why?
...-rate-limit-log-messages-to-200-per-.patch | 54 +++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 55 insertions(+)
create mode 100644 debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
diff --git a/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
new file mode 100644
index 0000000..0f91b42
--- /dev/null
+++ b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
@@ -0,0 +1,54 @@
+From 5470f01296a3bd8f47fd4bd97939b3a68f00d309 Mon Sep 17 00:00:00 2001
+From: Friedrich Weber <f.weber@proxmox.com>
+Date: Fri, 4 Apr 2025 09:14:21 +0200
+Subject: [PATCH] corosync.service: rate limit log messages to 200 per second
+
+Since c761053 ("Check packets come from the correct interface
+https://github.com/corosync/corosync/issues/750") in kronosnet,
+corosync will log a message like the following every time a packet is
+received at the wrong interface, i.e., not the interface on which the
+corresponding IP is configured:
+
+> [KNET ] udp: Received packet from 10.8.1.1 to 10.8.1.3 on i/f ens20 when expected ens19
+
+This is to draw attention to broken network setups that
+appear to work fine as long as all corosync links are online, but once
+a link goes down, may go into a state of "asymmetric connectivity"
+which is problematic for corosync. See [1] for more details.
+
+While it is desirable to draw attention to broken setups, the volume
+of log messages in such clusters can get very high and clutter the
+journal. In extreme scenarios, occasional bursts of more than 1000
+messages per second were observed. If we approximate each message with
+100 bytes, logging 1000 messages per second will produce ~8 GiB of raw
+logs per day. While this should be a worst case scenario and the
+logs probably compress well, the volume is still inconveniently high.
+
+Hence, use systemd log rate limiting to limit corosync log messages to
+200 per second, which brings the logs in above scenario down to 1.6
+GiB/day and should still provide enough headroom to avoid suppressing
+benign log messages in non-broken setups.
+
+[1] https://github.com/corosync/corosync/issues/750
+
+Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ init/corosync.service.in | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/init/corosync.service.in b/init/corosync.service.in
+index bd2a48a9..3d7ea2db 100644
+--- a/init/corosync.service.in
++++ b/init/corosync.service.in
+@@ -10,6 +10,8 @@ EnvironmentFile=-@INITCONFIGDIR@/corosync
+ ExecStart=@SBINDIR@/corosync -f $COROSYNC_OPTIONS
+ ExecStop=@SBINDIR@/corosync-cfgtool -H --force
+ Type=notify
++LogRateLimitIntervalSec=1s
++LogRateLimitBurst=200
+
+ # In typical systemd deployments, both standard outputs are forwarded to
+ # journal (stderr is what's relevant in the pristine corosync configuration),
+--
+2.39.5
+
diff --git a/debian/patches/series b/debian/patches/series
index 147e793..7a796c4 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,3 +1,4 @@
0001-Enable-PrivateTmp-in-the-systemd-service-files.patch
0002-only-start-corosync.service-if-conf-exists.patch
0003-totemsrp-Check-size-of-orf_token-msg.patch
+0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next reply other threads:[~2025-04-04 8:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 7:59 Friedrich Weber [this message]
2025-04-04 8:14 ` Maximiliano Sandoval
2025-04-04 8:55 ` Thomas Lamprecht
2025-04-04 9:18 ` Friedrich Weber
2025-04-04 9:28 ` Maximiliano Sandoval
2025-04-04 9:40 ` Thomas Lamprecht
2025-04-04 12:43 ` Friedrich Weber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250404075957.80057-1-f.weber@proxmox.com \
--to=f.weber@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.