public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Friedrich Weber <f.weber@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH corosync] corosync.service: add patch to reduce log spam in broken network setups
Date: Fri,  4 Apr 2025 09:59:57 +0200	[thread overview]
Message-ID: <20250404075957.80057-1-f.weber@proxmox.com> (raw)

Since c761053 ("Check packets come from the correct interface
https://github.com/corosync/corosync/issues/750") in kronosnet,
corosync will produce log messages in certain broken network setups.
See inner patch for details. Drawing attention to such setups is
desirable because such setups may experience whole-cluster fences if
the watchdog is active, see [1].

However, the log volume in such broken setups can be inconveniently
high. In such a setup, when running the following on node 1:

  # for i in $(seq 100); do dd if=/dev/urandom bs=1M of=/etc/pve/test.bin count=1; done

On node 2, bursts of ~1300 messages per second are observed:

  # journalctl --since="1min ago" -u corosync.service \
    | cut -d' ' -f 1-3 | uniq -c | sort -n | tail -n 10
      8 Apr 04 09:51:20
      8 Apr 04 09:51:24
      8 Apr 04 09:51:30
      8 Apr 04 09:51:34
      8 Apr 04 09:51:40
     12 Apr 04 09:51:00
    196 Apr 04 09:51:46
   1283 Apr 04 09:51:44
   1329 Apr 04 09:51:43
   1370 Apr 04 09:51:45

To avoid cluttering the journal, rate-limit log messages to 200 per
second. See inner patch for details.

[1] https://github.com/corosync/corosync/issues/750

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
---

Notes:
    I'm a little confused about the rate limit, as with this patch I do
    see that systemd suppresses messages:
    
    Apr 04 09:52:54 coro3 systemd-journald[303]: Suppressed 196 messages from corosync.service
    
    but I still see way more than 200 messages per second:
    
         11 Apr 04 09:52:00
         11 Apr 04 09:52:45
         13 Apr 04 09:52:13
         14 Apr 04 09:52:12
         19 Apr 04 09:52:07
         67 Apr 04 09:52:08
        400 Apr 04 09:52:54
        695 Apr 04 09:52:52
        715 Apr 04 09:52:53
        835 Apr 04 09:52:51
    
    Any idea why?

 ...-rate-limit-log-messages-to-200-per-.patch | 54 +++++++++++++++++++
 debian/patches/series                         |  1 +
 2 files changed, 55 insertions(+)
 create mode 100644 debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch

diff --git a/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
new file mode 100644
index 0000000..0f91b42
--- /dev/null
+++ b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
@@ -0,0 +1,54 @@
+From 5470f01296a3bd8f47fd4bd97939b3a68f00d309 Mon Sep 17 00:00:00 2001
+From: Friedrich Weber <f.weber@proxmox.com>
+Date: Fri, 4 Apr 2025 09:14:21 +0200
+Subject: [PATCH] corosync.service: rate limit log messages to 200 per second
+
+Since c761053 ("Check packets come from the correct interface
+https://github.com/corosync/corosync/issues/750") in kronosnet,
+corosync will log a message like the following every time a packet is
+received at the wrong interface, i.e., not the interface on which the
+corresponding IP is configured:
+
+> [KNET  ] udp: Received packet from 10.8.1.1 to 10.8.1.3 on i/f ens20 when expected ens19
+
+This is to draw attention to broken network setups that
+appear to work fine as long as all corosync links are online, but once
+a link goes down, may go into a state of "asymmetric connectivity"
+which is problematic for corosync. See [1] for more details.
+
+While it is desirable to draw attention to broken setups, the volume
+of log messages in such clusters can get very high and clutter the
+journal. In extreme scenarios, occasional bursts of more than 1000
+messages per second were observed. If we approximate each message with
+100 bytes, logging 1000 messages per second will produce ~8 GiB of raw
+logs per day. While this should be a worst case scenario and the
+logs probably compress well, the volume is still inconveniently high.
+
+Hence, use systemd log rate limiting to limit corosync log messages to
+200 per second, which brings the logs in above scenario down to 1.6
+GiB/day and should still provide enough headroom to avoid suppressing
+benign log messages in non-broken setups.
+
+[1] https://github.com/corosync/corosync/issues/750
+
+Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ init/corosync.service.in | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/init/corosync.service.in b/init/corosync.service.in
+index bd2a48a9..3d7ea2db 100644
+--- a/init/corosync.service.in
++++ b/init/corosync.service.in
+@@ -10,6 +10,8 @@ EnvironmentFile=-@INITCONFIGDIR@/corosync
+ ExecStart=@SBINDIR@/corosync -f $COROSYNC_OPTIONS
+ ExecStop=@SBINDIR@/corosync-cfgtool -H --force
+ Type=notify
++LogRateLimitIntervalSec=1s
++LogRateLimitBurst=200
+ 
+ # In typical systemd deployments, both standard outputs are forwarded to
+ # journal (stderr is what's relevant in the pristine corosync configuration),
+-- 
+2.39.5
+
diff --git a/debian/patches/series b/debian/patches/series
index 147e793..7a796c4 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,3 +1,4 @@
 0001-Enable-PrivateTmp-in-the-systemd-service-files.patch
 0002-only-start-corosync.service-if-conf-exists.patch
 0003-totemsrp-Check-size-of-orf_token-msg.patch
+0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


             reply	other threads:[~2025-04-04  8:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  7:59 Friedrich Weber [this message]
2025-04-04  8:14 ` Maximiliano Sandoval
2025-04-04  8:55   ` Thomas Lamprecht
2025-04-04  9:18     ` Friedrich Weber
2025-04-04  9:28       ` Maximiliano Sandoval
2025-04-04  9:40         ` Thomas Lamprecht
2025-04-04 12:43           ` Friedrich Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250404075957.80057-1-f.weber@proxmox.com \
    --to=f.weber@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal