all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Maximiliano Sandoval <m.sandoval@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH corosync] corosync.service: add patch to reduce log spam in broken network setups
Date: Fri, 04 Apr 2025 10:14:17 +0200	[thread overview]
Message-ID: <s8oa58w72hl.fsf@proxmox.com> (raw)
In-Reply-To: <20250404075957.80057-1-f.weber@proxmox.com>


Friedrich Weber <f.weber@proxmox.com> writes:

> Since c761053 ("Check packets come from the correct interface
> https://github.com/corosync/corosync/issues/750") in kronosnet,
> corosync will produce log messages in certain broken network setups.
> See inner patch for details. Drawing attention to such setups is
> desirable because such setups may experience whole-cluster fences if
> the watchdog is active, see [1].
>
> However, the log volume in such broken setups can be inconveniently
> high. In such a setup, when running the following on node 1:
>
>   # for i in $(seq 100); do dd if=/dev/urandom bs=1M of=/etc/pve/test.bin count=1; done
>
> On node 2, bursts of ~1300 messages per second are observed:
>
>   # journalctl --since="1min ago" -u corosync.service \
>     | cut -d' ' -f 1-3 | uniq -c | sort -n | tail -n 10
>       8 Apr 04 09:51:20
>       8 Apr 04 09:51:24
>       8 Apr 04 09:51:30
>       8 Apr 04 09:51:34
>       8 Apr 04 09:51:40
>      12 Apr 04 09:51:00
>     196 Apr 04 09:51:46
>    1283 Apr 04 09:51:44
>    1329 Apr 04 09:51:43
>    1370 Apr 04 09:51:45
>
> To avoid cluttering the journal, rate-limit log messages to 200 per
> second. See inner patch for details.
>
> [1] https://github.com/corosync/corosync/issues/750
>
> Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> ---
>
> Notes:
>     I'm a little confused about the rate limit, as with this patch I do
>     see that systemd suppresses messages:
>     
>     Apr 04 09:52:54 coro3 systemd-journald[303]: Suppressed 196 messages from corosync.service
>     
>     but I still see way more than 200 messages per second:
>     
>          11 Apr 04 09:52:00
>          11 Apr 04 09:52:45
>          13 Apr 04 09:52:13
>          14 Apr 04 09:52:12
>          19 Apr 04 09:52:07
>          67 Apr 04 09:52:08
>         400 Apr 04 09:52:54
>         695 Apr 04 09:52:52
>         715 Apr 04 09:52:53
>         835 Apr 04 09:52:51
>     
>     Any idea why?
>
>  ...-rate-limit-log-messages-to-200-per-.patch | 54 +++++++++++++++++++
>  debian/patches/series                         |  1 +
>  2 files changed, 55 insertions(+)
>  create mode 100644 debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
>
> diff --git a/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
> new file mode 100644
> index 0000000..0f91b42
> --- /dev/null
> +++ b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch
> @@ -0,0 +1,54 @@
> +From 5470f01296a3bd8f47fd4bd97939b3a68f00d309 Mon Sep 17 00:00:00 2001
> +From: Friedrich Weber <f.weber@proxmox.com>
> +Date: Fri, 4 Apr 2025 09:14:21 +0200
> +Subject: [PATCH] corosync.service: rate limit log messages to 200 per second
> +
> +Since c761053 ("Check packets come from the correct interface
> +https://github.com/corosync/corosync/issues/750") in kronosnet,
> +corosync will log a message like the following every time a packet is
> +received at the wrong interface, i.e., not the interface on which the
> +corresponding IP is configured:
> +
> +> [KNET  ] udp: Received packet from 10.8.1.1 to 10.8.1.3 on i/f ens20 when expected ens19
> +
> +This is to draw attention to broken network setups that
> +appear to work fine as long as all corosync links are online, but once
> +a link goes down, may go into a state of "asymmetric connectivity"
> +which is problematic for corosync. See [1] for more details.
> +
> +While it is desirable to draw attention to broken setups, the volume
> +of log messages in such clusters can get very high and clutter the
> +journal. In extreme scenarios, occasional bursts of more than 1000
> +messages per second were observed. If we approximate each message with
> +100 bytes, logging 1000 messages per second will produce ~8 GiB of raw
> +logs per day. While this should be a worst case scenario and the
> +logs probably compress well, the volume is still inconveniently high.
> +
> +Hence, use systemd log rate limiting to limit corosync log messages to
> +200 per second, which brings the logs in above scenario down to 1.6
> +GiB/day and should still provide enough headroom to avoid suppressing
> +benign log messages in non-broken setups.
> +
> +[1] https://github.com/corosync/corosync/issues/750
> +
> +Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
> +---
> + init/corosync.service.in | 2 ++

An option that might require lower maintenance would be to ship a
service file override, e.g. at
/lib/systemd/system/corosync.service.d/set-log-rate-limit.conf with
contents:

```
[Service]
LogRateLimitIntervalSec=1s
LogRateLimitBurst=200
```

No strong feelings, it is just a matter of taste.

> + 1 file changed, 2 insertions(+)
> +
> +diff --git a/init/corosync.service.in b/init/corosync.service.in
> +index bd2a48a9..3d7ea2db 100644
> +--- a/init/corosync.service.in
> ++++ b/init/corosync.service.in
> +@@ -10,6 +10,8 @@ EnvironmentFile=-@INITCONFIGDIR@/corosync
> + ExecStart=@SBINDIR@/corosync -f $COROSYNC_OPTIONS
> + ExecStop=@SBINDIR@/corosync-cfgtool -H --force
> + Type=notify
> ++LogRateLimitIntervalSec=1s
> ++LogRateLimitBurst=200

200 hundred messages per second might be a bit too many. Since we are
not sure how many messages a unlucky user might see, I would suggest to
lower it a bit for the time being, 100 is a good round number.

> + 
> + # In typical systemd deployments, both standard outputs are forwarded to
> + # journal (stderr is what's relevant in the pristine corosync configuration),
> +-- 
> +2.39.5
> +
> diff --git a/debian/patches/series b/debian/patches/series
> index 147e793..7a796c4 100644
> --- a/debian/patches/series
> +++ b/debian/patches/series
> @@ -1,3 +1,4 @@
>  0001-Enable-PrivateTmp-in-the-systemd-service-files.patch
>  0002-only-start-corosync.service-if-conf-exists.patch
>  0003-totemsrp-Check-size-of-orf_token-msg.patch
> +0004-corosync.service-rate-limit-log-messages-to-200-per-.patch



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2025-04-04  8:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  7:59 Friedrich Weber
2025-04-04  8:14 ` Maximiliano Sandoval [this message]
2025-04-04  8:55   ` Thomas Lamprecht
2025-04-04  9:18     ` Friedrich Weber
2025-04-04  9:28       ` Maximiliano Sandoval
2025-04-04  9:40         ` Thomas Lamprecht
2025-04-04 12:43           ` Friedrich Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s8oa58w72hl.fsf@proxmox.com \
    --to=m.sandoval@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal