all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH corosync] corosync.service: add patch to reduce log spam in broken network setups
@ 2025-04-04  7:59 Friedrich Weber
  2025-04-04  8:14 ` Maximiliano Sandoval
  0 siblings, 1 reply; 7+ messages in thread
From: Friedrich Weber @ 2025-04-04  7:59 UTC (permalink / raw)
  To: pve-devel

Since c761053 ("Check packets come from the correct interface
https://github.com/corosync/corosync/issues/750") in kronosnet,
corosync will produce log messages in certain broken network setups.
See inner patch for details. Drawing attention to such setups is
desirable because such setups may experience whole-cluster fences if
the watchdog is active, see [1].

However, the log volume in such broken setups can be inconveniently
high. In such a setup, when running the following on node 1:

  # for i in $(seq 100); do dd if=/dev/urandom bs=1M of=/etc/pve/test.bin count=1; done

On node 2, bursts of ~1300 messages per second are observed:

  # journalctl --since="1min ago" -u corosync.service \
    | cut -d' ' -f 1-3 | uniq -c | sort -n | tail -n 10
      8 Apr 04 09:51:20
      8 Apr 04 09:51:24
      8 Apr 04 09:51:30
      8 Apr 04 09:51:34
      8 Apr 04 09:51:40
     12 Apr 04 09:51:00
    196 Apr 04 09:51:46
   1283 Apr 04 09:51:44
   1329 Apr 04 09:51:43
   1370 Apr 04 09:51:45

To avoid cluttering the journal, rate-limit log messages to 200 per
second. See inner patch for details.

[1] https://github.com/corosync/corosync/issues/750

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---

Notes:
    I'm a little confused about the rate limit, as with this patch I do
    see that systemd suppresses messages:
    
    Apr 04 09:52:54 coro3 systemd-journald[303]: Suppressed 196 messages from corosync.service
    
    but I still see way more than 200 messages per second:
    
         11 Apr 04 09:52:00
         11 Apr 04 09:52:45
         13 Apr 04 09:52:13
         14 Apr 04 09:52:12
         19 Apr 04 09:52:07
         67 Apr 04 09:52:08
        400 Apr 04 09:52:54
        695 Apr 04 09:52:52
        715 Apr 04 09:52:53
        835 Apr 04 09:52:51
    
    Any idea why?

 ...-rate-limit-log-messages-to-200-per-.patch | 54 +++++++++++++++++++
 debian/patches/series                         |  1 +
 2 files changed, 55 insertions(+)
 create mode 100644 debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch

diff --git a/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
new file mode 100644
index 0000000..0f91b42
--- /dev/null
+++ b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
@@ -0,0 +1,54 @@
+From 5470f01296a3bd8f47fd4bd97939b3a68f00d309 Mon Sep 17 00:00:00 2001
+From: Friedrich Weber <f.weber@proxmox.com>
+Date: Fri, 4 Apr 2025 09:14:21 +0200
+Subject: [PATCH] corosync.service: rate limit log messages to 200 per second
+
+Since c761053 ("Check packets come from the correct interface
+https://github.com/corosync/corosync/issues/750") in kronosnet,
+corosync will log a message like the following every time a packet is
+received at the wrong interface, i.e., not the interface on which the
+corresponding IP is configured:
+
+> [KNET  ] udp: Received packet from 10.8.1.1 to 10.8.1.3 on i/f ens20 when expected ens19
+
+This is to draw attention to broken network setups that
+appear to work fine as long as all corosync links are online, but once
+a link goes down, may go into a state of "asymmetric connectivity"
+which is problematic for corosync. See [1] for more details.
+
+While it is desirable to draw attention to broken setups, the volume
+of log messages in such clusters can get very high and clutter the
+journal. In extreme scenarios, occasional bursts of more than 1000
+messages per second were observed. If we approximate each message with
+100 bytes, logging 1000 messages per second will produce ~8 GiB of raw
+logs per day. While this should be a worst case scenario and the
+logs probably compress well, the volume is still inconveniently high.
+
+Hence, use systemd log rate limiting to limit corosync log messages to
+200 per second, which brings the logs in above scenario down to 1.6
+GiB/day and should still provide enough headroom to avoid suppressing
+benign log messages in non-broken setups.
+
+[1] https://github.com/corosync/corosync/issues/750
+
+Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ init/corosync.service.in | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/init/corosync.service.in b/init/corosync.service.in
+index bd2a48a9..3d7ea2db 100644
+--- a/init/corosync.service.in
++++ b/init/corosync.service.in
+@@ -10,6 +10,8 @@ EnvironmentFile=-@INITCONFIGDIR@/corosync
+ ExecStart=@SBINDIR@/corosync -f $COROSYNC_OPTIONS
+ ExecStop=@SBINDIR@/corosync-cfgtool -H --force
+ Type=notify
++LogRateLimitIntervalSec=1s
++LogRateLimitBurst=200
+ 
+ # In typical systemd deployments, both standard outputs are forwarded to
+ # journal (stderr is what's relevant in the pristine corosync configuration),
+-- 
+2.39.5
+
diff --git a/debian/patches/series b/debian/patches/series
index 147e793..7a796c4 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,3 +1,4 @@
 0001-Enable-PrivateTmp-in-the-systemd-service-files.patch
 0002-only-start-corosync.service-if-conf-exists.patch
 0003-totemsrp-Check-size-of-orf_token-msg.patch
+0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-04-04 12:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-04  7:59 [pve-devel] [PATCH corosync] corosync.service: add patch to reduce log spam in broken network setups Friedrich Weber
2025-04-04  8:14 ` Maximiliano Sandoval
2025-04-04  8:55   ` Thomas Lamprecht
2025-04-04  9:18     ` Friedrich Weber
2025-04-04  9:28       ` Maximiliano Sandoval
2025-04-04  9:40         ` Thomas Lamprecht
2025-04-04 12:43           ` Friedrich Weber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal